Attributor’s Recipe for Inflated Copyright Claims: Not All Copying Is Infringement
Over at CNet, Jennifer Guevin discusses a report from the content-tracking company “Attributor,” asserting that “The next big copyright battle may be fought in the kitchen.”
Attributor collected all the original recipes that appear on Epicurious.com, Allrecipes.com and RachaelRay.com. The software then checked those recipes against what was available elsewhere on the Web, looking for what they call matches–or instances in which two recipes are similar enough to be possibly copyright infringing.
For the purposes of the study, Attributor researchers defined a match as any two recipes in which at least 50 percent of the content was identical. Then they looked more closely at the matches with low percentages of similarity and threw out those they thought couldn’t be considered clear cases of copyright infringement.
Based on the results, Attributor found that copying recipes online is “rampant,” said Rich Pearson, senior marketing director for the company. Attributor found just over 10,000 copies of recipes that originated on the three sites. In more than 60 percent of those cases, the reposted recipes weren’t attributed to their original sources.
What Attributor didn’t note, because its software can’t possibly tell the difference, is that not all copying is infringement. This is particularly true of recipes, where the guts — the proportions of ingredients and the steps for combining them — is a “process” unprotectable by copyright. While the creative description of the crisp crust and tart-sweet filling of the perfect apple pie might contain sufficient expression to claim protection, the steps for making the dessert are in the public domain.
So Attributor might well have found 50% identity of content between a pair of pages because they offered instruction for the same dish, without finding any copying of copyrightable text. Even the surrounding prose might be so minimally expressive as to merge with the ideas. How many ways can one say “bake until crust is golden”? The cooking-instruction industry seems to be stronger than ever, notwithstanding the lack of copyright protection.
The failure to distinguish between copying and copyright infringement is likely to plague another new roll-out, YouTube’s video identification filter as well. YouTube notes one side of the problem: “No matter how accurate the tools get, it is important to remember that no technology can tell legal from infringing material without the cooperation of the content owners themselves,” but the other side is that with or without technology, copyright owners often overstate their rights.
If Video ID offers a block every time it recognizes copyrighted music, for example, it may fail to distinguish between parodic remix and infringement; if it blocks video matches, will it distinguish between scientific discussion on An Inconvenient Truth and wholesale copying? Since automated filters will never be able to distinguish between fair use and infringement — a task even judges find difficult — adding them to the YouTube workflow will likely make multimedia parody, criticism, and remix more difficult.

Hi Wendy,
I work for Attributor and helped put together the study. You can get a little more context for our study on the Attributor blog - “A link is worth 1,000 words” - while I can’t say that we found all derivative works, we tried to pull these out of the totals that we shared with Jenifer Guevin.
You are absolutely correct about recipes being a tough category to quantify the amount of copying. Like you, we were skeptical of the initial results, but when we looked at the details, we were surprised to find so many word-for-word copies. Digging deeper, we found that most of these sites failed to link back or give attribution to the source.
I’m sure some of the copying was done unintentionally and without malice - our goal is to give rights holders visibility into how and where their content is being copied plus a set of flexible set of remedies such as sending a link request as we noted in our blog.
I’d be happy to get your feedback and answer any questions you have about our service.
Rich
Comment by Rich Pearson — October 16, 2007 @ 8:25 pm
Thanks Rich,
I’m still not completely on-board. Someone might intentionally copy, word-for-word, a procedural description of the steps to bake a cake, and yet commit no infringement. The initial poster might prefer a link back, but there’s no legal obligation to offer one.
If your tools are simply noting these occurrences, fine, but if you’re suggesting legal threats or implying that there would be legal consequences based on “copying,” I think you’re assuming too much.
–Wendy
Comment by wseltzer — October 17, 2007 @ 7:05 am
Wendy - we seem to be in sync. Attributor’s platform provides the visibility and the content creator decides if or how to act.
Comment by Rich Pearson — October 17, 2007 @ 9:24 am
If you read their blog, Attributor seems to take a stance that Google shouldn’t even surface search results containing “stolen” copyrighted materials as Google sells advertising against its search results. It also seems to say that Google shouldn’t allow Adwords on any page that might contain copyrighted data.
Rich, if you’re still reading this board, could you comment on my interpretation? Attributor’s stance seems legally shaky, but I may have misread your blogs intent.
Comment by Mike — October 17, 2007 @ 4:35 pm
Mike,
I think you are referring to our Oct 1st blog post - http://attributor.com/blog/?p=21 - and may have misread our intent. Let me explain our view and see if it changes your interpretation.
We believe that when an infringement is found, Google needs to do more than just pull the content from YouTube - they need to also remove the host page from the Google index so that the infringing party can’t continue to make money via Adwords.
To illustrate how automated and widespread content theft has become, I found a few pages that have taken the lead paragraph from Jennifer’s Recipe story and surrounded it with Google AdSense links.
http://www.bellook.com/food/study-finds-copyright-infringement-of-recipes-online-gigalawcom-7.html
http://publisherswiki.com/?p=5231
From the data we’ve seen so far, this is the tip of iceberg - these pages are automatically created and done so to rank highly in search engines. When you consider that Google makes a money off of every click on these ads, two questions come to mind:
1) How much of Google’s $5b+ AdSense revenue is on the backs of this theft and
2) How much traffic/notoriety/revenue is being siphoned off the content creators?
I hope that helps,
Rich
Comment by Rich Pearson — October 17, 2007 @ 11:39 pm
Rich–
You’re saying the content could still surface in a Google search but they shouldn’t allow the pages to use AdSense?
I’m not saying these pages (bellook at least, publisherswiki is down) aren’t contemptible — I’m just saying I don’t think Google has a *legal* liability here.
Thanks,
Mike
Comment by Mike — October 18, 2007 @ 9:56 am
Mike - I agree, there is no legal liability but it calls into question their “Don’t be evil” mantra :-)
Comment by Rich Pearson — October 18, 2007 @ 12:29 pm