A notable moment in the ongoing litigation over the use of copyrighted materials as AI training data emerged this week when U.S. District Judge William Alsup suggested a key distinction, noting that even if training an AI on copyrighted materials qualifies as fair use (on which there has yet to be clear precedent), acquiring those materials from pirate sites might still violate the Copyright Act. Judge Alsup tipped his hand at a hearing on Anthropic's motion for summary judgment in Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. filed 2024).
That nuance—how the training data was obtained, not just how it’s used—has broad implications for developers of large language models (LLMs). In the Bartz suit brought by authors against Anthropic, plaintiffs argue that the company trained its Claude model on books downloaded from infringing websites instead of purchasing or licensing them. While Anthropic defends the end use as transformative and lawful, Judge Alsup appeared skeptical that fair use can shield conduct rooted in unlawful acquisition.
This signals a potential fault line in future rulings as courts may increasingly treat data provenance as an independent legal issue, regardless of whether the AI’s outputs infringe. In practice, this means companies developing generative AI systems could face liability not for copying and using data for training, but for where that data was acquired.
That distinction matters. It reframes the debate from abstract concerns over creativity and machine learning to concrete questions of property rights and digital accountability. It also suggests that compliance with copyright law can’t be reduced to the familiar four-factor fair use test applied only at the point of model output. If data acquisition itself is infringing, developers may not get to the fair use question at all. But it is also not clear that it would be reasonable or even possible to avoid such pirated materials in broad internet-based training, and Judge Alsup didn't reach that issue at the hearing.
Thus, while this latest discussion provides a clue to the future, it certainly does not define it. Judge Alsup has yet to rule, and even when he issues his ruling, he will not have the only or final word on this complicated issue.