Skip to main content

OpenAIOpenAI spent 6+ months on GPT-4 safety alignment, achieving 82% reduction in disallowed content and 40% more factual responses

OpenAI spent more than 6 months working across the organization to make GPT-4 safer and more aligned prior to public release. GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on OpenAI's internal evaluations. The company also published safety research for external review and led on risk assessments, conducting the only human participant bio-risk trials.

Scoring Impact

TopicDirectionRelevanceContribution
AI Safety+towardprimary+1.00
Corporate Transparency+towardsecondary+0.50
Overall incident score =+0.643

Score = avg(topic contributions) × significance (high ×1.5) × confidence (0.57)

Evidence (1 signal)

Confirms product_decision Mar 14, 2023 documented

OpenAI announced GPT-4 with 82% reduction in disallowed content after 6 months of alignment work

Related: Same Topics