OpenAI—OpenAI spent 6+ months on GPT-4 safety alignment, achieving 82% reduction in disallowed content and 40% more factual responses

Mar 14, 2023

OpenAI spent more than 6 months working across the organization to make GPT-4 safer and more aligned prior to public release. GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on OpenAI's internal evaluations. The company also published safety research for external review and led on risk assessments, conducting the only human participant bio-risk trials.

Scoring Impact

Topic	Direction	Relevance	Contribution
AI Safety	+toward	primary	+1.00
Corporate Transparency	+toward	secondary	+0.50
Overall incident score =			+0.643

Score = avg(topic contributions) × significance (high ×1.5) × confidence (0.57)

Evidence (1 signal)

Confirms product_decision Mar 14, 2023 documented

OpenAI announced GPT-4 with 82% reduction in disallowed content after 6 months of alignment work

Related: Same Topics

Arvind Krishna · Announced pause on hiring for 8,000 back-office roles that could be replaced by AI and automation

May 1, 2023

Demis Hassabis · Signed AI extinction risk statement calling for global priority on AI safety

May 30, 2023

Ilya Sutskever · Led OpenAI board's ouster of CEO Sam Altman over safety and governance concerns

Nov 17, 2023

Ilya Sutskever · Founded Safe Superintelligence Inc. (SSI) focused exclusively on AI safety

Jun 19, 2024

Jan Leike · Resigned from OpenAI criticizing company for prioritizing 'shiny products' over AI safety

May 17, 2024