
By Michaela Gordoni
Anthropic used millions of books to train its AI, enraging authors, but a judge recently ruled in favor of the tech company, calling the training “fair use.”
The tech is “among the most transformative many of us will see in our lifetimes,” wrote U.S. District Judge William Alsup on June 23. He concluded that the AI models did not make works identical to authors’ styles or specific books.
“The purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative,” Alsup wrote. “Like any reader aspiring to be a writer.”
This is a significant case — the first of its kind. Now, large language models (LLMs) can train on any copyrighted work as long as it’s paid for legally.
Anthropic is happy with the win, but the three authors who sued the company — Andrea Bartz, Charles Graeber and Kirk Wallace Johnson — are not. The authors had claimed Anthropic would make billions off of millions of copyrighted works.
Anthropic told NPR in a statement, “Consistent with copyright’s purpose in enabling creativity and fostering scientific progress, Anthropic’s large language models are trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different.”
Related: Amazon Introduces Reasoning AI Model—Do OpenAI and Anthropic Have a New Competitor?
“This Anthropic decision will likely be cited by all creators of AI models to support the argument that fair use applies to the use of massive datasets to train foundational models,” said Daniel Barsky, an intellectual property lawyer at Holland & Knight.
Anthropic actually illegally downloaded 7 million copyrighted books and subsequently bought them, but it was too late to rectify its mistake. The company will face an upcoming trial on the matter in December and could potentially pay very hefty damages, The Hollywood Reporter wrote.
Copyright infringement can be up to $150,000 per book, and since Anthropic downloaded 7 million books illegally, it could owe over one trillion dollars at worst.
On Wednesday, authors lost in another case, this time against Meta. Over a dozen authors sued the company for copyright infringement after it also used pirated books to train its AI.
“The court ruled that AI companies that ‘feed copyright-protected works into their models without getting permission from the copyright holders or paying for them are generally violating the law,” said Boies Schiller Flexner LLP, attorneys for the plaintiffs. “Yet despite the undisputed record of Meta’s historically unprecedented pirating of copyrighted works, the court ruled in Meta’s favor. We respectfully disagree with that conclusion.”
For now, it seems like AI companies can get away with using copyrighted works for AI training, but there’s a chance for change if a case reaches the Supreme Court.
Read Next: ‘Protect The Literary Landscape’: Authors Guild Files Suit Against OpenAI
Questions or comments? Please write to us here.