Anthropic—Proactively activated ASL-3 protections with bioweapon classifiers for Claude Opus 4
When tests showed models approaching risk thresholds, Anthropic implemented bioweapon-specific classifiers that block harmful outputs. These classifiers cost approximately 5% of total inference costs but are robust against adversarial attacks. Applied to Opus 4, Sonnet 4.5, Opus 4.1, and Opus 4.5.
Scoring Impact
| Topic | Direction | Relevance | Contribution |
|---|---|---|---|
| AI Safety | +toward | primary | +1.00 |
| Overall incident score = | +0.590 | ||
Score = avg(topic contributions) × significance (medium ×1) × confidence (0.59)
Evidence (1 signal)
Anthropic activated ASL-3 protections with Constitutional Classifiers for Claude Opus 4 on May 22, 2025
On May 22, 2025, Anthropic activated AI Safety Level 3 (ASL-3) Deployment and Security Standards for Claude Opus 4 launch. Implemented Constitutional Classifiers - real-time classifier guards trained on synthetic data representing harmful and harmless CBRN-related prompts that monitor inputs/outputs and block narrow class of harmful content. The ASL-3 Deployment Standard limits risk of misuse specifically for chemical, biological, radiological, and nuclear (CBRN) weapons development. Anthropic determined that clearly ruling out ASL-3 risks was not possible for Opus 4 due to continued improvements in CBRN-related capabilities.