Published: June 15, 2026
On June 9, 2026, Anthropic did something it had never done before: it handed the public a model from its top-secret "Mythos" tier — the class of models that, until now, only cyber-defense partners and a handful of biology researchers were allowed to touch. The public-safe version is called Claude Fable 5, and it doesn't sit in the Opus family. It sits above it.
[Source: Anthropic Official Blog]
Let's start with the numbers that matter. Fable 5 doesn't just edge out the competition — it laps them.
On the verified 500-problem software engineering benchmark, Claude Fable 5 leads every model ever tested with a score of 95.0%. For context:
That's a verified, independent leaderboard — not just a vendor claim.
On the harder SWE-Bench Pro (actively-maintained repos with multi-file diffs and no ground-truth leakage), Fable 5 posts 80.3% — 11.1 points ahead of Opus 4.8's 69.2% and over 20 points ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%).
[Source: Vellum AI Benchmark Analysis]
Anthropic's own FrontierCode evaluation tests whether models can pass difficult coding tasks while meeting production-codebase standards. On the hardest Diamond split, Fable 5 hits 29.3% — more than double Opus 4.8's 13.4% and far ahead of GPT-5.5's 5.7%.
[Source: Anthropic System Card]
| Benchmark | Fable 5 | Opus 4.8 | GPT-5.5 |
|---|---|---|---|
| OSWorld-Verified | 85.0% | 83.4% | 78.7% |
| GDPval-AA (Elo) | 1,932 | 1,890 | 1,769 |
| Terminal-Bench 2.1 | 84.3%* | 74.6% | 83.4%† |
| Legal Agent Benchmark | 13.3% | 10.4% | 2.1% |
*Fable 5 score on Terminal-Bench impacted by 20.9% fallback rate to Opus 4.8 †GPT-5.5 score uses OpenAI's proprietary Codex CLI harness
[Sources: Vellum, llm-stats.com]
The benchmark that best tells the story isn't a benchmark at all. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a single day — work that Stripe estimated would have taken a full team over two months by hand.
[Source: Anthropic Official Blog]
Fable 5 is priced at $10 per million input tokens and $50 per million output tokens — double the Opus 4.8 rate ($5/$25) but less than half the earlier Mythos Preview price ($25/$125).
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Fable 5 | $10.00 | $50.00 |
| Claude Opus 4.8 | $5.00 | $25.00 |
| Claude Mythos Preview | $25.00 | $125.00 |
Free period alert: Through June 22, Fable 5 is included at no extra cost on Pro, Max, Team, and seat-based Enterprise plans. After June 23, it shifts to usage credits.
[Sources: TechCrunch, CNBC, NBC News]
Here's where this launch gets genuinely interesting — and a little weird.
Fable 5 shares identical weights with Claude Mythos 5, a restricted-access model for government-backed cyber defense partners through Project Glasswing. The difference? Fable 5 comes with safety classifiers that watch for three categories of high-risk requests:
When a request trips a classifier, Fable 5 doesn't refuse outright (mostly). Instead, it silently routes the query to Claude Opus 4.8 — and the user is told. Anthropic reports this happens in fewer than 5% of sessions, meaning over 95% of sessions run entirely on Fable 5's full Mythos-class capability.
The trade-off is clear: on cybersecurity benchmarks, the unblocked Mythos 5 scores 78.0% on ExploitBench (nearly double Opus 4.8's 40.0%). But in the publicly available Fable 5, those queries land closer to Opus 4.8's performance.
[Sources: WIRED, Vellum, Anthropic]
Former OpenAI researcher and AI thought leader Andrej Karpathy shared his take on launch day:
"The benchmarks are great and it's SOTA on everything by margin... qualitatively also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model 'gets it' and it will just go..."
He also flagged the safeguards as "configured to be a little too trigger happy" — something Anthropic acknowledges and is actively tuning.
[Source: TrueFoundry Blog]
As a flex, Anthropic showed that Fable 5 can beat Pokémon FireRed from start to finish using only raw game screenshots — no maps, no navigation aids, no helper harness. Earlier Claude models needed a complex scaffolding system to play at all. Fable 5 did it with vision alone.
In practical terms: one CTO reported apps "that took a hundred prompts a year ago now get one-shotted."
[Source: Anthropic Official Blog]
For teams running autonomous coding agents on hard engineering problems: This is worth evaluating immediately. The 11-point gap on SWE-Bench Pro and the Stripe migration story are real-world signals that this model genuinely unlocks new capabilities.
For teams running regulated workloads: Opus 4.8 may still be the safer default. The safeguard fallback on cyber, bio, and chemistry queries means your most sensitive prompts may not get the full Mythos-class treatment. Run your own evaluation before committing.
For everyone else: The free window through June 22 is basically an invitation to stress-test Fable 5 on your hardest problems. Don't waste it.