reactive $1.5B
In August 2025, Anthropic agreed to pay $1.5 billion to settle the Bartz v. Anthropic class action, the largest copyright settlement in US history. The settlement covered approximately 500,000 copyrighted works at ~$3,000 each. Anthropic also agreed to destroy the two pirated book libraries and derivative copies within 30 days. The settlement only covered past conduct and did not create an ongoing licensing scheme. Judge Alsup granted preliminary approval in September 2025.
negligent
In mid-2025, an AI-generated music project called 'Velvet Sundown' accumulated over 1 million streams on Spotify and received a verified artist badge, despite being entirely created by artificial intelligence without human musicians. The case highlighted Spotify's inadequate systems for detecting and labeling AI-generated content, raising concerns about AI music displacing human artists and misleading listeners about the nature of what they're hearing.
negligent
In June 2025, a Guardian investigation exposed how AI-powered bot farms were systematically uploading thousands of AI-generated tracks to Spotify and using automated streaming bots to generate fraudulent royalty payments. The investigation found organized operations generating millions of fake streams, diverting royalty money from legitimate artists. Despite Spotify's claims of anti-fraud measures, the investigation showed the platform's detection systems were largely ineffective against sophisticated AI fraud operations.
Reddit filed a lawsuit in California state court against Anthropic, alleging the AI company made over 100,000 unauthorized requests to Reddit's servers to collect user posts and comments without permission. The suit alleged Anthropic circumvented Reddit's robots.txt file and refused to engage in licensing negotiations, unlike Google and OpenAI which entered formal licensing agreements. The case raised questions about intellectual property rights and data protection for user-generated content.
negligent
Universal Music Corp., ABKCO Music Inc., Concord Music Group, and other music publishers sued Anthropic in federal court, alleging Claude reproduces copyrighted song lyrics without proper licensing. On December 30, 2024, Anthropic agreed to a partial injunction while continuing to fight the copyright infringement claims. The case is ongoing in the Northern District of California.
In September 2024, Dahl launched a petition to cancel Oracle's trademark on 'JavaScript', gathering over 14,000 signatures. In November 2024, the Deno team filed a formal petition with the USPTO for cancellation, arguing Oracle has not used the trademark in commerce and its control harms the JavaScript community.
A class action lawsuit (Bartz v. Anthropic) filed in August 2024 alleged Anthropic used over 7 million digital copies of copyrighted books acquired from pirating sites Library Genesis and Pirate Library Mirror to train its Claude language models. In June 2025, Judge Alsup ruled that while using legally acquired books for AI training was fair use, training on pirated copies was not protected.
reactive
In June 2024, Adobe updated its terms of service requiring users to agree to give the company access to their content via 'automated and manual methods.' The vague language went viral as creatives feared Adobe would use their work to train its Firefly AI model or access NDA-protected projects. Adobe quickly responded with a blog post calling it a 'misunderstanding' and on June 24, 2024 released updated terms explicitly stating users own their content and Adobe would not train generative AI on customer content except for Adobe Stock submissions.
Google used YouTube video content to train its Gemini and Veo AI models without explicit creator consent or compensation, while simultaneously prohibiting competitors from accessing the same content via YouTube's terms of service. In December 2024 YouTube introduced opt-in settings for third-party AI training but these did not apply to Google's own internal use. In January 2026, Google publicly stated it should not pay for 'freely available' web content used in AI training. The EU opened an investigation in December 2025.
Reports revealed that OpenAI transcribed more than 1 million hours of YouTube videos using its Whisper speech recognition system to create training data for GPT-4. OpenAI President Greg Brockman assisted with the process. Internal staff discussed whether transcribing YouTube videos violated the platform's terms of service, which prohibit scraping and downloading content.
In a March 2024 WSJ interview, OpenAI CTO Mira Murati gave evasive responses about whether Sora was trained on YouTube or Instagram data, saying she was 'not sure' about publicly available data sources. She later confirmed off-camera that Shutterstock data was used. The incident raised concerns about OpenAI's transparency regarding intellectual property and training data provenance.
In early 2024, reports revealed that Automattic had sold or licensed user-generated content from Tumblr and WordPress.com to AI companies including Midjourney and OpenAI for AI model training. Users were not individually notified or given prior opt-out options. A data export tool was later provided, but critics argued it was insufficient given the retroactive nature of the data sharing.
$203.0M
Reddit entered into data licensing agreements worth $203 million over 2-3 years, including a $60M/year deal with Google and approximately $70M/year deal with OpenAI, granting access to user-generated content for AI model training. The deals were announced around the time of Reddit's IPO filing in February 2024, raising concerns about monetizing user content without explicit user consent or compensation.
In early 2024, Spotify implemented a new policy requiring tracks to reach at least 1,000 streams per year before generating any royalty payments. The company framed this as an anti-fraud measure, but independent musicians and advocacy groups criticized it as de-monetizing tens of thousands of small artists while redirecting their royalty share to major labels and top-performing acts. The policy particularly impacts niche genres, emerging artists, and musicians in developing countries.
The New York Times sued OpenAI and Microsoft in December 2023 for using NYT articles to train ChatGPT without permission. The Authors Guild separately sued with 17 authors including John Grisham and George R.R. Martin. By 2025, 51 total copyright lawsuits had been filed against AI companies. In January 2025, a federal judge ordered OpenAI to produce its GPT-4 training dataset to plaintiffs. Canadian and Indian news publishers also filed suits.
Court filings revealed Meta engineers torrented 81.7 terabytes of copyrighted books from Library Genesis, Z-Library, and Anna's Archive to train Llama models. Internal emails showed Meta director Sony Theakanath confirmed 'GenAI has been approved to use LibGen for Llama 3' after escalation to Mark Zuckerberg, with explicit instruction to never publicly disclose the use. Engineers wrote scripts to strip copyright notices from ebooks. A June 2025 ruling found this piracy was not protected by fair use.
Adobe trained its Firefly generative AI model on licensed Adobe Stock content and public domain works rather than scraping the web. The company developed Content Credentials, an industry-standard metadata system acting as a 'digital nutrition label' for AI-generated content. Adobe also implemented a creator opt-out preference allowing artists to indicate they don't want their content used for AI training, embedded via Content Credentials. Adobe began paying annual Firefly bonuses to contributing Stock artists in September 2023.
Stability AI built its Stable Diffusion image generation model using the LAION dataset of 5 billion images scraped from the internet without creator consent or licensing. Visual artists filed a class action (Andersen v. Stability AI) in January 2023, and Getty Images sued in February 2023 over infringement of 12 million+ photographs. A leaked Midjourney spreadsheet cataloged 4,700+ artist names whose styles were specifically targeted. Trial is set for September 2026.
In November 2022, developers filed a class-action lawsuit against GitHub, Microsoft, and OpenAI alleging that GitHub Copilot was trained on billions of lines of publicly available code from GitHub repositories without complying with open-source license terms (GPL, MIT, etc.) requiring attribution and copyright notices. The suit sought over $9 billion in statutory damages. By May 2023, a judge dismissed 20 of 22 claims but allowed breach of contract and DMCA claims to proceed. GitHub's FAQ acknowledged that about 1% of suggestions may match training data.
Clearview AI built a facial recognition database of over 60 billion facial images by scraping photographs from social media platforms, news websites, Venmo, and other publicly accessible online sources — all without the knowledge or consent of the people depicted. The database grows by approximately 75 million images daily. Users can upload a photo and receive links to everywhere that face appears online, enabling identification of virtually anyone from a single photograph.