OpenAI—OpenAI transcribed over 1 million hours of YouTube videos to train GPT-4 despite terms of service concerns
Reports revealed that OpenAI transcribed more than 1 million hours of YouTube videos using its Whisper speech recognition system to create training data for GPT-4. OpenAI President Greg Brockman assisted with the process. Internal staff discussed whether transcribing YouTube videos violated the platform's terms of service, which prohibit scraping and downloading content.
Scoring Impact
| Topic | Direction | Relevance | Contribution |
|---|---|---|---|
| Intellectual Property Ethics | -against | primary | -1.00 |
| Overall incident score = | -0.858 | ||
Score = avg(topic contributions) × significance (high ×1.5) × confidence (0.57)
Evidence (1 signal)
Reports confirmed OpenAI transcribed 1M+ hours of YouTube videos for GPT-4 training with Greg Brockman's involvement
Reporting revealed OpenAI used its Whisper speech recognition system to transcribe over 1 million hours of YouTube videos for GPT-4 training data. OpenAI President Greg Brockman assisted with the process. Staff internally discussed potential violations of YouTube's terms of service.