December 8, 2009

Personalized Search Opacity

Filed under: Internet, code, search — wseltzer @ 6:11 am

Google announced Friday that it would now be “personalizing” all searches, not just those for signed-in users. If your browser has a Google cookie, unless you’ve explicitly opted out, your search results will be customized based on search history.

Danny Sullivan, at Search Engine Land, wonders why more people aren’t paying attention:

On Friday afternoon, Google made the biggest change that has ever happened in search engines, and the world largely yawned. Maybe Google timed its announcement that it was personalizing everyone’s search results just right, so few would notice. Maybe no one really understood how significant the change was. Whatever the reason, it was a huge development and deserves much more attention than it has received so far.

I agree this is a big deal, even if it’s only the next step in a trend begun by customized search for signed-in users years ago. And except for here, I won’t even mention the P-word, “privacy.” Because on top of the implications of storing all a user’s search history, I wonder about the transparency of personalized search. How do we understand what search looks like to the world as it gets sliced up by history, location, and other inferences search providers make about their searchers?

As users, we’ve basically come to terms with the non-transparency of the search algorithms that determine which results to show and how to order them. We use the engine that mostly gets us relevant results (or perhaps, that offers shopping discounts). If we’re dissatisfied with the results Google returns, we can use Yahoo or Bing.

We also have some degree of trust that search isn’t systematically discriminating against particular pages or providers for undisclosed reasons. When Google received copyright takedown demands from the Church of Scientology years ago, prompting it to remove many links to “Operation Clambake,” Google sent the takedowns to Chilling Effects and linked them from its search pages so searchers could see why the search had apparently become more pro-Scientology in its results. More recently, the search engine has worked with the Berkman Center’s StopBadware to flag malware distribution points and let searchers know why sites have been flagged “harmful.” When a racist image appeared in searches for “Michelle Obama,” Google used an ad to explain why, but did not tweak algorithms to remove the picture.

How do we verify that this trust is warranted, that page visibility is a relative meritocracy? With open source, we could read the code or delegate that task to others. With a closed platform where we can’t do that, our next best alternative is implicit or explicit comparison of results with others. Investigative journalists might follow a tip-off that liberal media seemed to rank higher than conservative, and run some comparisons and questions to test and report back; search engine optimizers, motivated to improve their own pages’ rankings, might also alert us to biases that caused unfair demotions — we can believe we’re seeing a reasonable mix of digital camera stores because proprietors would complain if they were omitted. If something “feels wrong” to enough people, chances are it will bubble up through the crowd for verification (or debunking — see the complaints that iTunes “shuffle” feature isn’t random, by listeners who confuse randomness with a non-random even distribution). If a search engine failed to disclose payment-induced bias, the FTC might even follow with a complaint.

With personalized search, these crowd-sourced modes of verification will work less well. We won’t know if the biases we encounter in search are also seen by others, or if the store shuffles its end-caps when it sees us walk in. It would be easier for an Evil search provider to subtly tweak results to favor paying clients or ideologies, unnoticed.

Finally, I’m reminded of the “ants” in Cory Doctorow’s excellent Human Readable — an automated adaptive system so complex even its creators can’t debug it or determine its patterns. If someone is paying off the ants, society can’t trace the payments.

When I asked a version of this transparency question to the “real-time search” panel at Supernova, Barney Pell of Bing suggested that users don’t want to know how the search works, only that it gets them useful results. Part of my utility function, though, is fairness. I hope we can reconstruct that broader view in a world of ever-more-personalized search.

Powered by WordPress