This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.

Perspectives

| 2 minute read

Arrrgh: Piracy and Fair Use in AI Training

Monday's decision in a copyright suit against Anthropic for the use of copyrighted works in the training of its LLM came down as expected with Judge William Alsup’s decision in the case noting that how the material is acquired is just as important as how it is used.

In his opinion, Judge Alsup acknowledged that Anthropic’s large language model Claude is “spectacularly” transformative. The technology does not simply reproduce the books it was trained on, and instead creates new content that reflects general patterns and language use. Such transformative use would weigh heavily in favor of the application of the fair use defense under traditional copyright principles. But Alsup made clear that transformative use does not excuse theft. Anthropic had downloaded more than seven million books from pirate websites in its effort to build a world-class AI model. Later purchases of licensed copies of those same books, Alsup stressed, did not cure the problem.

On a practical level, and surely in this case, this part of the decision may prove more important than the fair use finding itself. Courts and commentators have long debated how to apply fair use in the context of AI training and Alsup’s decision makes clear that even if the use might be considered fair under copyright law, acquiring the material illegally is not. The source matters.

This has clear implications for AI developers. Companies that rely on freely available online datasets must pay close attention to how that data was originally obtained. Scraping from platforms that host pirated content can create significant legal exposure, even if the final use is innovative and socially beneficial.

It also gives content creators a new foothold in protecting their rights. Many authors and artists have struggled to prove that AI outputs directly infringe their works. This ruling suggests they may not need to if they can show that their work was included in a pirated dataset.  In that case, they could still be entitled to a finding of infringement and statutory damages (assuming, of course, the works were timely registered) regardless of what the AI produced.

As courts continue to define the limits of fair use in this context, Judge Alsup’s opinion offers a simple but powerful reminder that the rules of copyright do not disappear in the face of new technology. The copying and use of materials hosted on a site that itself doesn't have authorization will remain a violation of the content owners' rights, and the technology will need to be employed to avoid such pirate sites. This, of course, won't be the last word, and decisions in the numerous existing suits against AI providers will continue to clarify the bounds of the use of copyrighted works in the training of AI systems.

Judge Alsup's opinion is unlikely to be the last word on the legality of using copyrighted works for AI training, with dozens of other copyright lawsuits pending against tech companies by artists, authors, musicians and other content creators.

Tags

perspectives, artificial intelligence, copyright, intellectual property