Skip to main content

David HolzDavid Holz admitted Midjourney trained on 'hundreds of millions' of images scraped from internet without consent

In multiple 2022 interviews, Midjourney founder and CEO David Holz openly admitted that the company trained its AI on 'hundreds of millions' of existing artworks and photographs scraped from the internet without consent from creators. Holz stated they 'grab everything they can, dump it in a huge file, and set it on fire to train some huge thing.' When asked about seeking consent, Holz said 'There isn't really a way to get a hundred million images and know where they're coming from.' He argued the process was 'kind of like a search engine' and compared it to how humans learn, claiming existing law doesn't specifically address this. Midjourney later confirmed it made $300 million in 2024 using these models.

Scoring Impact

TopicDirectionRelevanceContribution
AI Oversight-againstsecondary-0.50
Creator Compensation-againstprimary-1.00
Intellectual Property Ethics-againstprimary-1.00
Overall incident score =-1.073

Score = avg(topic contributions) × significance (critical ×2) × confidence (0.64)

Evidence (2 signals)

Confirms Statement Sep 1, 2022 documented

David Holz admitted in Forbes interview that Midjourney trained on images without consent from creators

In September 2022 interview with Forbes, Midjourney founder David Holz openly admitted that the company based its AI on existing artworks and photographs without any consent from the creators. Holz stated Midjourney's dataset was built from 'a big scrape of the internet' and that seeking consent for 'a hundred million images' wasn't feasible because 'there isn't really a way' to trace images to owners or authenticate them.

Confirms Statement Sep 1, 2022 documented

David Holz described Midjourney's training process as 'grab everything they can, dump it in a huge file, and set it on fire'

In 2022 interview, Midjourney founder David Holz described his company's training data collection process: employees 'grab everything they can, they dump it in a huge file, and they kind of set it on fire to train some huge thing.' This admission confirmed the company's approach of mass-scraping internet content without permission. Midjourney later reported $300 million in revenue in 2024 from models trained on this data.

Related: Same Topics