December 8, 2009

Personalized Search Opacity

Filed under: Internet, code, search — wseltzer @ 6:11 am

Google announced Friday that it would now be “personalizing” all searches, not just those for signed-in users. If your browser has a Google cookie, unless you’ve explicitly opted out, your search results will be customized based on search history.

Danny Sullivan, at Search Engine Land, wonders why more people aren’t paying attention:

On Friday afternoon, Google made the biggest change that has ever happened in search engines, and the world largely yawned. Maybe Google timed its announcement that it was personalizing everyone’s search results just right, so few would notice. Maybe no one really understood how significant the change was. Whatever the reason, it was a huge development and deserves much more attention than it has received so far.

I agree this is a big deal, even if it’s only the next step in a trend begun by customized search for signed-in users years ago. And except for here, I won’t even mention the P-word, “privacy.” Because on top of the implications of storing all a user’s search history, I wonder about the transparency of personalized search. How do we understand what search looks like to the world as it gets sliced up by history, location, and other inferences search providers make about their searchers?

As users, we’ve basically come to terms with the non-transparency of the search algorithms that determine which results to show and how to order them. We use the engine that mostly gets us relevant results (or perhaps, that offers shopping discounts). If we’re dissatisfied with the results Google returns, we can use Yahoo or Bing.

We also have some degree of trust that search isn’t systematically discriminating against particular pages or providers for undisclosed reasons. When Google received copyright takedown demands from the Church of Scientology years ago, prompting it to remove many links to “Operation Clambake,” Google sent the takedowns to Chilling Effects and linked them from its search pages so searchers could see why the search had apparently become more pro-Scientology in its results. More recently, the search engine has worked with the Berkman Center’s StopBadware to flag malware distribution points and let searchers know why sites have been flagged “harmful.” When a racist image appeared in searches for “Michelle Obama,” Google used an ad to explain why, but did not tweak algorithms to remove the picture.

How do we verify that this trust is warranted, that page visibility is a relative meritocracy? With open source, we could read the code or delegate that task to others. With a closed platform where we can’t do that, our next best alternative is implicit or explicit comparison of results with others. Investigative journalists might follow a tip-off that liberal media seemed to rank higher than conservative, and run some comparisons and questions to test and report back; search engine optimizers, motivated to improve their own pages’ rankings, might also alert us to biases that caused unfair demotions — we can believe we’re seeing a reasonable mix of digital camera stores because proprietors would complain if they were omitted. If something “feels wrong” to enough people, chances are it will bubble up through the crowd for verification (or debunking — see the complaints that iTunes “shuffle” feature isn’t random, by listeners who confuse randomness with a non-random even distribution). If a search engine failed to disclose payment-induced bias, the FTC might even follow with a complaint.

With personalized search, these crowd-sourced modes of verification will work less well. We won’t know if the biases we encounter in search are also seen by others, or if the store shuffles its end-caps when it sees us walk in. It would be easier for an Evil search provider to subtly tweak results to favor paying clients or ideologies, unnoticed.

Finally, I’m reminded of the “ants” in Cory Doctorow’s excellent Human Readable — an automated adaptive system so complex even its creators can’t debug it or determine its patterns. If someone is paying off the ants, society can’t trace the payments.

When I asked a version of this transparency question to the “real-time search” panel at Supernova, Barney Pell of Bing suggested that users don’t want to know how the search works, only that it gets them useful results. Part of my utility function, though, is fairness. I hope we can reconstruct that broader view in a world of ever-more-personalized search.


  1. Good questions. I think the answer is two-fold. First, search personalization is actually quite soft. Second (and more importantly), there’s a link in the top right to see that the results were personalized. From there, you can turn off the personalization so that people can compare results without personalization.

    Comment by Matt Cutts — December 8, 2009 @ 3:31 pm

  2. Thanks Matt. I’m glad it’s relatively easy to see the non-personalized version of the search, and hope enough people will notice and follow the links when they have questions about results.

    Comment by wseltzer — December 8, 2009 @ 3:57 pm

  3. thanks for serving up the opt-out link

    Comment by cast customer — December 9, 2009 @ 10:15 am

  4. Wendy, when you originally asked me the question about whether search engines would offer features to let users adjust how various information was used in search results, I thought you were asking about some form of tunability, and my answer was that I think consumers wouldn’t value those added features. I also think that if a typical user is happy with a search engine’s results then they won’t be concerned with whether the results were “fair”. But I completely agree that there are groups of users who I wouldn’t deem typical consumers, which include SEO community, journalists, researchers, and consumer watchdogs, who could be very concerned with the details of search algorithms and results, including issues of fairness and bias. These concerns could potentially be addressed through increased transparency, which becomes more challenging to achieve in the presence of increased personalization.

    I think the feature that Matt points out, to flag that results are personalized and let a user turn them off, is an important feature in this regard. Going beyond that with more detailed explanations of how a result is personalized or what specific factors were taken into account seems difficult, because search engines take account of so many features already.

    Note also that there are already other forms of tuning beyond personalization that make it difficult to compare results. For example, some search engines use your location (e.g. based on IP address) to provide better search results for queries with local intent. This is so natural, and so useful, that it’s not even clear what it would mean to “turn off” the localization, as such results might not make much sense. As intent, context, and personalization become increasingly central in search, I expect that turning these features off would seem as bizarre as removing prescription lenses or wearing clothes that fit everyone. So your points about the difficulty of fairness detection in this new world are spot on.

    Comment by Barney Pell — December 10, 2009 @ 2:15 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress