Meta Platforms—Meta trained Llama AI models on 81.7TB of pirated books from LibGen and shadow libraries with executive approval
Court filings revealed Meta engineers torrented 81.7 terabytes of copyrighted books from Library Genesis, Z-Library, and Anna's Archive to train Llama models. Internal emails showed Meta director Sony Theakanath confirmed 'GenAI has been approved to use LibGen for Llama 3' after escalation to Mark Zuckerberg, with explicit instruction to never publicly disclose the use. Engineers wrote scripts to strip copyright notices from ebooks. A June 2025 ruling found this piracy was not protected by fair use.
Scoring Impact
| Topic | Direction | Relevance | Contribution |
|---|---|---|---|
| Intellectual Property Ethics | -against | primary | -1.00 |
| Overall incident score = | -1.360 | ||
Score = avg(topic contributions) × significance (critical ×2) × confidence (0.68)
Evidence (2 signals)
Judge ruled Meta's use of pirated books for AI training was not fair use, but granted fair use for legally acquired works
US District Judge Vince Chhabria sided with Meta on fair use for legally acquired books but denied fair use protection for pirated copies. The judge described Meta's claim that public interest would be 'badly disserved' if prevented from using copyrighted text for free as 'nonsense'.
Court filings revealed Zuckerberg approved Meta's use of pirated LibGen books for Llama 3 training
Internal emails showed Meta's director of product management confirmed 'GenAI has been approved to use LibGen for Llama 3' after escalation to Zuckerberg, with explicit instruction to never publicly disclose the use. Engineers torrented 81.7TB of pirated books.