Meta Platforms—Meta trained Llama AI models on 81.7TB of pirated books from LibGen and shadow libraries with executive approval

Jun 1, 2023

Court filings revealed Meta engineers torrented 81.7 terabytes of copyrighted books from Library Genesis, Z-Library, and Anna's Archive to train Llama models. Internal emails showed Meta director Sony Theakanath confirmed 'GenAI has been approved to use LibGen for Llama 3' after escalation to Mark Zuckerberg, with explicit instruction to never publicly disclose the use. Engineers wrote scripts to strip copyright notices from ebooks. A June 2025 ruling found this piracy was not protected by fair use.

Scoring Impact

Topic	Direction	Relevance	Contribution
Intellectual Property Ethics	-against	primary	-1.00
Overall incident score =			-1.360

Score = avg(topic contributions) × significance (critical ×2) × confidence (0.68)

Evidence (2 signals)

Confirms Legal Action Jun 25, 2025 verified

Judge ruled Meta's use of pirated books for AI training was not fair use, but granted fair use for legally acquired works

US District Judge Vince Chhabria sided with Meta on fair use for legally acquired books but denied fair use protection for pirated copies. The judge described Meta's claim that public interest would be 'badly disserved' if prevented from using copyrighted text for free as 'nonsense'.

CNBC

Confirms Legal Action Jan 9, 2025 verified

Court filings revealed Zuckerberg approved Meta's use of pirated LibGen books for Llama 3 training

Internal emails showed Meta's director of product management confirmed 'GenAI has been approved to use LibGen for Llama 3' after escalation to Zuckerberg, with explicit instruction to never publicly disclose the use. Engineers torrented 81.7TB of pirated books.

TechCrunch,Rolling Stone

Related: Same Topics

David Holz · David Holz admitted Midjourney trained on 'hundreds of millions' of images scraped from internet without consent

Sep 1, 2022

Mira Murati · Gave evasive and contradictory answers about Sora AI video model's training data sources

Mar 13, 2024

Reddit · Reddit licensed user-generated content to Google and OpenAI for $203M to train AI models

Feb 22, 2024

Microsoft · GitHub Copilot faced class-action lawsuit for training on billions of lines of open-source code without license compliance

Nov 3, 2022

Tim Berners-Lee · W3C formally adopted royalty-free patent policy for web standards

May 1, 2003