July 4, 2008

Privacy Falls into YouTube’s Data Tar Pit

Filed under: Internet, privacy, trade secret — wseltzer @ 3:53 pm

As a big lawsuit grinds forward, its parties engage in discovery, a wide-ranging search for information “reasonably calculated to lead to the discovery of admissible evidence.” (FRCP Rule 26(b)) And so Viacom has calculated that scouring YouTube’s data dumps would help provide evidence Viacom’s copyright lawsuit.

According to a discovery order released Wednesday, Viacom asked for discovery of YouTube source code and of logs of YouTube video viewership; Google refused both. The dispute came before Judge Stanton, in the Southern District of New York, who ordered the video viewing records — but not the source code — disclosed.

The order shows the difficulty we have protecting personally sensitive information. The court could easily see the economic value of Google’s secret source code for search and video ID, and so it refused to compel disclosure of that “vital asset,” the “product of over a thousand person-years of work.”

But the user privacy concerns proved harder to evaluate. Viacom asked for “all data from the Logging database concerning each time a YouTube video has been viewd on the YouTube website or through embedding on a third-party website,” including users’ viewed videos, login IDs, and IP addresses. Google contended it should not be forced to release these records because of users’ privacy concerns, which the court rejected.

The court erred both in its assessment of the personally identifying nature of these records, and the scope of the harm. It makes no sense to discuss whether an IP address is or is not “personally identifying” without considering the context with which it is connected. It may not be a name, but is often one search step from it. Moreover, even “anonymized” records often provide sufficiently deep profiles that they can be traced back to individuals, as researchers armed with the AOL and Netflix data releases showed.

Viewers “gave” their IP address and username information to YouTube for the purpose of watching videos. They might have expected the information to be used within Google, but not anticipate that it would be shared with a corporation busily prosecuting copyright infringement. Viewers may not be able to quantify economic harm, but if communications are chilled by the disclosure of viewing habits, we’re all harmed socially. The court failed to consider these third party interests in ordering the disclosure.

Trade secret wins, privacy loses. Google has said it will not appeal the order.

Is there hope for the end users here, concerned about disclosure of their video viewing habits? First, we see the general privacy problem with “cloud” computing: by conducting our activities at third-party sites, we place a great deal of information about our activities in their hands. We may do so because Google is indispensable, or because it tells us its motto is “don’t be evil.” But discovery demands show that it’s not enough for Google to follow good precepts.

Google, like most companies, indicates that it will share data where “We have a good faith belief that access, use, preservation or disclosure of such information is reasonably necessary to (a) satisfy any applicable law, regulation, legal process or enforceable governmental request.” Its reputation as a good actor is important, but the company is not going to face contempt charges over user privacy.

I worry that this discovery demand is just the first of a wave, as more litigants recognize the data gold mines that online service providers have been gathering: search terms, blog readership and posting habits, video viewing, and browsing might all “lead to the discovery of admissible evidence” — if the privacy barriers are as low as Judge Stanton indicates, won’t others follow Viacom’s lead? A gold mine for litigants becomes a tar pit for online services’ user.

Economic concerns, the cost of producing the data in response to a wave of subpoenas, or reputational concerns, the fear that users will be driven away from a service that leaves their sensitive data vulnerable, may exercise some constraint, but they’re unlikely to be enough to match our privacy expectations.

We need the law to supply protection against unwanted data flows, to declare that personally sensitive information — or the profiles from which identity may be extracted and correlated — deserves consideration at least on par with “economically valuable secrets.” We need better assurance that the data we provide in the course of communicative activities will be kept in context. There is room for that consideration in the “undue burden” discovery standard, but statutory clarification would help both users and their Internet service providers to negotiate privacy expectations better.

Is there a law? In this particular context, there might actually be law on the viewers’ side. The Video Privacy Protection Act, passed after reporters looked into Judge Bork’s video rental records, gives individuals a cause of action against “a video tape service provider who knowingly discloses, to any person, personally identifiable information concerning any consumer of such provider.” (”Video tape” includes similar audio visual materials.) Will any third parties intervene to ask that the discovery order be quashed?

Further, Bloomberg notes the concerns of Europeans, whose privacy regime is far more user-protective than that of the United States. Is this one case where “harmonization” can work in favor of individual rights?

May 2, 2007

Selective Disclosure and Privacy

Filed under: trade secret — Wendy @ 1:34 pm

Often, when we’re asked for “identification,” it’s not because the asker needs to know everything about us, but because they need to verify one aspect of identity: that I’m over 21, for example, if I’m trying to buy a drink. But since I don’t have an “over 21″ card that the bar can verify connects to me, I’m forced to give them my driver’s license, from which they can also glean and store other data. Online, it doesn’t have to be that way.

Builders of identity-management systems can design in stronger protections for their users’ privacy, giving people a separate virtual “card” for every transaction, with only the necessary data included. Ben Laurie has written a good concise overview, Selective Disclosure, explaining how zero-knowledge proofs let us make verifiable assertions without giving away the store.

I claim that for an identity management system to be both useful
and privacy preserving, there are three properties assertions must
be able to have. They must be:

  • Verifiable
    There’s often no point in making a statement unless the relying
    party has some way of checking it is true. Note that this isn’t
    always a requirement - I don’t have to prove my address is mine
    to Amazon, because its up to me where my goods get delivered.
    But I may have to prove I’m over 18 to get alcohol delivered.

  • Minimal
    This is the privacy preserving bit - I want to tell the relying
    party the very least he needs to know. I shouldn’t have to reveal
    my date of birth, just prove I’m over 18 somehow.

  • Unlinkable
    If the relying party or parties, or other actors in the system,
    can, either on their own or in collusion, link together my various
    assertions, then I’ve blown the minimality requirement out of
    the water.

While digital signatures are widely used for verification, the same signature on each item is a privacy-busting linkage. With the help of third parties and selective disclosure proofs, however, we can make assertions that are minimal and don’t leave a trail. We can create digital one-time cards each time we’re asked for a facet of our identities.

These properties fit well with legal principle of narrow tailoring. Limiting the identification provided to that required limits spillover effects and opportunities for misuse (”mission creep”). An ID-check law shouldn’t become a source of marketing information; an online purchase needn’t be an entry in a growing retailer profile — unless that’s an explicit choice. We might even be more willing to give accurate information in places like online newspaper sign-ins if we knew that information could never be added to or correlated with profile data elsewhere.

The next hard part, of course, is getting those with whom we do business to accept less information where they’ve been accustomed to getting more by default, but at least if we build the identity technology right, it will be possible.

Powered by WordPress