Skip to main content

AnthropicProactively activated ASL-3 protections with bioweapon classifiers for Claude Opus 4

When tests showed models approaching risk thresholds, Anthropic implemented bioweapon-specific classifiers that block harmful outputs. These classifiers cost approximately 5% of total inference costs but are robust against adversarial attacks. Applied to Opus 4, Sonnet 4.5, Opus 4.1, and Opus 4.5.

Scoring Impact

TopicDirectionRelevanceContribution
AI Safety+towardprimary+1.00
Overall incident score =+0.590

Score = avg(topic contributions) × significance (medium ×1) × confidence (0.59)

Evidence (1 signal)

Confirms Policy Change May 22, 2025 verified

Anthropic activated ASL-3 protections with Constitutional Classifiers for Claude Opus 4 on May 22, 2025

On May 22, 2025, Anthropic activated AI Safety Level 3 (ASL-3) Deployment and Security Standards for Claude Opus 4 launch. Implemented Constitutional Classifiers - real-time classifier guards trained on synthetic data representing harmful and harmless CBRN-related prompts that monitor inputs/outputs and block narrow class of harmful content. The ASL-3 Deployment Standard limits risk of misuse specifically for chemical, biological, radiological, and nuclear (CBRN) weapons development. Anthropic determined that clearly ruling out ASL-3 risks was not possible for Opus 4 due to continued improvements in CBRN-related capabilities.

Related: Same Topics