Anthropic—Proactively activated ASL-3 protections with bioweapon classifiers for Claude Opus 4

Jun 1, 2025

When tests showed models approaching risk thresholds, Anthropic implemented bioweapon-specific classifiers that block harmful outputs. These classifiers cost approximately 5% of total inference costs but are robust against adversarial attacks. Applied to Opus 4, Sonnet 4.5, Opus 4.1, and Opus 4.5.

Scoring Impact

Topic	Direction	Relevance	Contribution
AI Safety	+toward	primary	+1.00
Overall incident score =			+0.590

Score = avg(topic contributions) × significance (medium ×1) × confidence (0.59)

Evidence (1 signal)

Confirms Policy Change May 22, 2025 verified

Anthropic activated ASL-3 protections with Constitutional Classifiers for Claude Opus 4 on May 22, 2025

On May 22, 2025, Anthropic activated AI Safety Level 3 (ASL-3) Deployment and Security Standards for Claude Opus 4 launch. Implemented Constitutional Classifiers - real-time classifier guards trained on synthetic data representing harmful and harmless CBRN-related prompts that monitor inputs/outputs and block narrow class of harmful content. The ASL-3 Deployment Standard limits risk of misuse specifically for chemical, biological, radiological, and nuclear (CBRN) weapons development. Anthropic determined that clearly ruling out ASL-3 risks was not possible for Opus 4 due to continued improvements in CBRN-related capabilities.

Anthropic

Related: Same Topics

Arvind Krishna · Announced pause on hiring for 8,000 back-office roles that could be replaced by AI and automation

May 1, 2023

Demis Hassabis · Signed AI extinction risk statement calling for global priority on AI safety

May 30, 2023

Ilya Sutskever · Led OpenAI board's ouster of CEO Sam Altman over safety and governance concerns

Nov 17, 2023

Ilya Sutskever · Founded Safe Superintelligence Inc. (SSI) focused exclusively on AI safety

Jun 19, 2024

Jan Leike · Resigned from OpenAI criticizing company for prioritizing 'shiny products' over AI safety

May 17, 2024