Skip to main content

OpenAIOpenAI transcribed over 1 million hours of YouTube videos to train GPT-4 despite terms of service concerns

Reports revealed that OpenAI transcribed more than 1 million hours of YouTube videos using its Whisper speech recognition system to create training data for GPT-4. OpenAI President Greg Brockman assisted with the process. Internal staff discussed whether transcribing YouTube videos violated the platform's terms of service, which prohibit scraping and downloading content.

Scoring Impact

TopicDirectionRelevanceContribution
Intellectual Property Ethics-againstprimary-1.00
Overall incident score =-0.858

Score = avg(topic contributions) × significance (high ×1.5) × confidence (0.57)

Evidence (1 signal)

Confirms Statement Apr 7, 2024 documented

Reports confirmed OpenAI transcribed 1M+ hours of YouTube videos for GPT-4 training with Greg Brockman's involvement

Reporting revealed OpenAI used its Whisper speech recognition system to transcribe over 1 million hours of YouTube videos for GPT-4 training data. OpenAI President Greg Brockman assisted with the process. Staff internally discussed potential violations of YouTube's terms of service.

Related: Same Topics