Claude Fable 5: The Mythos-Class Model That Finally Went Public — And It's a Beast

Published: June 15, 2026

On June 9, 2026, Anthropic did something it had never done before: it handed the public a model from its top-secret "Mythos" tier — the class of models that, until now, only cyber-defense partners and a handful of biology researchers were allowed to touch. The public-safe version is called Claude Fable 5, and it doesn't sit in the Opus family. It sits above it.

[Source: Anthropic Official Blog]

The Benchmarks: A Tier Above Everything Else

Let's start with the numbers that matter. Fable 5 doesn't just edge out the competition — it laps them.

SWE-Bench Verified: 95.0%

On the verified 500-problem software engineering benchmark, Claude Fable 5 leads every model ever tested with a score of 95.0%. For context:

Claude Opus 4.8: 88.6%
Claude Mythos Preview: 93.9%
Claude Opus 4.7: 87.6%

That's a verified, independent leaderboard — not just a vendor claim.

SWE-Bench Pro: 80.3%

On the harder SWE-Bench Pro (actively-maintained repos with multi-file diffs and no ground-truth leakage), Fable 5 posts 80.3% — 11.1 points ahead of Opus 4.8's 69.2% and over 20 points ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%).

[Source: Vellum AI Benchmark Analysis]

FrontierCode Diamond: 29.3%

Anthropic's own FrontierCode evaluation tests whether models can pass difficult coding tasks while meeting production-codebase standards. On the hardest Diamond split, Fable 5 hits 29.3% — more than double Opus 4.8's 13.4% and far ahead of GPT-5.5's 5.7%.

[Source: Anthropic System Card]

Other Headline Benchmarks

Benchmark	Fable 5	Opus 4.8	GPT-5.5
OSWorld-Verified	85.0%	83.4%	78.7%
GDPval-AA (Elo)	1,932	1,890	1,769
Terminal-Bench 2.1	84.3%*	74.6%	83.4%†
Legal Agent Benchmark	13.3%	10.4%	2.1%

*Fable 5 score on Terminal-Bench impacted by 20.9% fallback rate to Opus 4.8 †GPT-5.5 score uses OpenAI's proprietary Codex CLI harness

[Sources: Vellum, llm-stats.com]

The Stripe Story: One Day vs. Two Months

The benchmark that best tells the story isn't a benchmark at all. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a single day — work that Stripe estimated would have taken a full team over two months by hand.

[Source: Anthropic Official Blog]

Pricing: $10/$50 Per Million Tokens

Fable 5 is priced at $10 per million input tokens and $50 per million output tokens — double the Opus 4.8 rate ($5/$25) but less than half the earlier Mythos Preview price ($25/$125).

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude Fable 5	$10.00	$50.00
Claude Opus 4.8	$5.00	$25.00
Claude Mythos Preview	$25.00	$125.00

Free period alert: Through June 22, Fable 5 is included at no extra cost on Pro, Max, Team, and seat-based Enterprise plans. After June 23, it shifts to usage credits.

[Sources: TechCrunch, CNBC, NBC News]

The Safety Catch: One Model, Two Personalities

Here's where this launch gets genuinely interesting — and a little weird.

Fable 5 shares identical weights with Claude Mythos 5, a restricted-access model for government-backed cyber defense partners through Project Glasswing. The difference? Fable 5 comes with safety classifiers that watch for three categories of high-risk requests:

Cybersecurity — vulnerability research, exploit generation
Biology & Chemistry — bioweapon-adjacent queries
Model Distillation — using Fable to build rival models

When a request trips a classifier, Fable 5 doesn't refuse outright (mostly). Instead, it silently routes the query to Claude Opus 4.8 — and the user is told. Anthropic reports this happens in fewer than 5% of sessions, meaning over 95% of sessions run entirely on Fable 5's full Mythos-class capability.

The trade-off is clear: on cybersecurity benchmarks, the unblocked Mythos 5 scores 78.0% on ExploitBench (nearly double Opus 4.8's 40.0%). But in the publicly available Fable 5, those queries land closer to Opus 4.8's performance.

[Sources: WIRED, Vellum, Anthropic]

Andrej Karpathy's Verdict

Former OpenAI researcher and AI thought leader Andrej Karpathy shared his take on launch day:

"The benchmarks are great and it's SOTA on everything by margin... qualitatively also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model 'gets it' and it will just go..."

He also flagged the safeguards as "configured to be a little too trigger happy" — something Anthropic acknowledges and is actively tuning.

[Source: TrueFoundry Blog]

Vision: Beating Pokémon With Screenshots Alone

As a flex, Anthropic showed that Fable 5 can beat Pokémon FireRed from start to finish using only raw game screenshots — no maps, no navigation aids, no helper harness. Earlier Claude models needed a complex scaffolding system to play at all. Fable 5 did it with vision alone.

In practical terms: one CTO reported apps "that took a hundred prompts a year ago now get one-shotted."

[Source: Anthropic Official Blog]

The Bottom Line

For teams running autonomous coding agents on hard engineering problems: This is worth evaluating immediately. The 11-point gap on SWE-Bench Pro and the Stripe migration story are real-world signals that this model genuinely unlocks new capabilities.

For teams running regulated workloads: Opus 4.8 may still be the safer default. The safeguard fallback on cyber, bio, and chemistry queries means your most sensitive prompts may not get the full Mythos-class treatment. Run your own evaluation before committing.

For everyone else: The free window through June 22 is basically an invitation to stress-test Fable 5 on your hardest problems. Don't waste it.