Sharing user search data

monopolymainLast week I presented at the Lear Antitrust 2.0 conference in Rome. While there was no specific agenda beyond thinking about search and advertising, much of the discussion centred around Google’s travails as EU Competition Authorities investigate its Google Shopping results (yes, really) and its dominance in Android. It was clear from the presentations by the chief economists of Google (Hal Varian) and Microsoft (Preston McAfee) that there was a significant issue at the heart of search.

The general thinking is that Google’s dominance in search comes from its superiority in handling ‘long tail’ queries. That is, for common searches, both Google and Bing are roughly the same but Google performs better as searches become less common. An example of a less common search is apparently “red roses in Birmingham Alabama” that, if not for presentations for this conference, would not have been searched for on Bing at all! Why this search term? Well, it was the subject of a SearchEngineLand article some years back.

The big question is whether Google’s large share of searches gives it an insurmountable advantage over Bing and others in competition for consumers. Google argued that it did not — well, sort of — while Microsoft argued that it did. This puts Microsoft in the awkward position of arguing before regulators that Bing is worse while Google argues that Bing is just as good.

Let me try and unpack it a little. It turns out that nowadays, in contrast to a decade ago, search engine results are made better by a process of machine learning. The primary input to this is the combination of a search term and then an observation of the websites users, not only click, by stay with. In other words, if you search for something, click on a site and immediately dart back to search, then the machine decides that the url of that site was not a good outcome for that search term. By contrast, if you search for something, click and stay away from search, the machine decides it is a good outcome. Basically, it is as if search engines are run by Reddit up and down voters.

Now there is more going on at the heart of search than just this but this particular set of inputs seemed to be at the heart of the matter. Google argued that Bing gets “enough” searches to not be at a significant disadvantage in providing and developing a competitive search engine while Microsoft argues that is not the case. Both have a point. In the US, Bing has about 20 percent of the market, and so, for Internet Explorer users, may actually learn as much as Google does from user behaviour. But in the rest of the world, Bing has a much smaller fraction of searches. So if geographical differences matter, and they surely do, and if platforms matter — Mac rather than PC or, more critically, mobile rather than desktop, then Bing would appear to be at a disadvantage. It is, of course, the latter that matters for the current European antitrust investigations. Put simply there are likely to be many searches that occur on Google that never occur on Bing and so Google is likely to learn about behaviour at a quicker rate.

We could argue — and apparently some high powered economists would for at least two days — whether even this matters or not. But, for me, there is a larger issue. Let’s take the datum as a quadruple of a search term, a URL list, a clicked URL, and a success/failure of that click. That data is provided by users to search engines. To be sure, users get a return for that, in the form of the URL list and search service. But apart from that URL list, the remaining triple is all provided by the user. And there is a case to be made that it is that data that really matters for search engine performance.

In a world where we like user data — to the extent that it does not impinge on privacy concerns — to be openly available, it is surely unusual that data that is being contributed in the billions everyday is locked into proprietary platforms. While Google may claim that Microsoft is at no disadvantage as a result of this, I have to observe that Google isn’t making this data available to put the proof in the pudding here, so we would have to guess that they don’t really believe that.

So even if the Europeans hit Google for antitrust violations, is it really going to change everything. Google could be up for 10% of its revenue in fines but is that just a tax for operating in Europe? What one would want to look for is remedies that will actually go to the heart of the aspects that are driving Google’s dominance. I would not want to suggest that Google give away stuff it has actually innovated on — namely, the algorithms that it has learned to provide good search results — but the data provided by users is another matter.

In an ideal world, this data should be available to all so that many can experiment with different algorithms using that data and we can have real competition in search. It is time we recognised the user contributions at the heart of search and think about ways in which that contribution can be leveraged beyond closed, proprietary platforms. Google is actually very good about this with regard to the email information users have stored in Gmail, for example. Why not extend it more broadly?

[A disclosure: in the past I have worked for Microsoft Research and have provided advice to Microsoft on intellectual property matters. I also own both Google and Microsoft shares.]

3 Replies to “Sharing user search data”

Leave a comment