David Holz—David Holz admitted Midjourney trained on 'hundreds of millions' of images scraped from internet without consent

Sep 1, 2022

In multiple 2022 interviews, Midjourney founder and CEO David Holz openly admitted that the company trained its AI on 'hundreds of millions' of existing artworks and photographs scraped from the internet without consent from creators. Holz stated they 'grab everything they can, dump it in a huge file, and set it on fire to train some huge thing.' When asked about seeking consent, Holz said 'There isn't really a way to get a hundred million images and know where they're coming from.' He argued the process was 'kind of like a search engine' and compared it to how humans learn, claiming existing law doesn't specifically address this. Midjourney later confirmed it made $300 million in 2024 using these models.

Scoring Impact

Topic	Direction	Relevance	Contribution
AI Oversight	-against	secondary	-0.50
Creator Compensation	-against	primary	-1.00
Intellectual Property Ethics	-against	primary	-1.00
Overall incident score =			-1.073

Score = avg(topic contributions) × significance (critical ×2) × confidence (0.64)

Evidence (2 signals)

Confirms Statement Sep 1, 2022 documented

David Holz admitted in Forbes interview that Midjourney trained on images without consent from creators

In September 2022 interview with Forbes, Midjourney founder David Holz openly admitted that the company based its AI on existing artworks and photographs without any consent from the creators. Holz stated Midjourney's dataset was built from 'a big scrape of the internet' and that seeking consent for 'a hundred million images' wasn't feasible because 'there isn't really a way' to trace images to owners or authenticate them.

Digital Camera World

Confirms Statement Sep 1, 2022 documented

David Holz described Midjourney's training process as 'grab everything they can, dump it in a huge file, and set it on fire'

In 2022 interview, Midjourney founder David Holz described his company's training data collection process: employees 'grab everything they can, they dump it in a huge file, and they kind of set it on fire to train some huge thing.' This admission confirmed the company's approach of mass-scraping internet content without permission. Midjourney later reported $300 million in revenue in 2024 from models trained on this data.

PetaPixel

Related: Same Topics

Mistral AI · Mistral AI lobbied to weaken EU AI Act while co-founder held undisclosed €23M stake and company secretly negotiated Microsoft deal

Jun 1, 2023

Naval Ravikant · Publicly advocated for DeFi as means to 'defy the government' and opposed cryptocurrency regulation

Oct 1, 2021

Koa Health · Koa Health published third-party ethics audit showing 24% improvement and perfect bias reduction scores

Jun 1, 2023

Andrew Ng · Advocated for AI transparency laws including California SB 53 and New York RAISE Act

Jun 1, 2024

Vinod Khosla · Khosla aggressively opposed California SB 1047 AI safety bill, called author 'clueless' and 'not qualified'

Oct 28, 2024