Released June 1, 2026 — MiniMax M3 is the first open-weight model to combine frontier-level coding, a 1M-token context window, and native multimodal understanding in a single architecture. And at roughly 15-17× cheaper than Claude Opus 4.7, it's shaking up the AI landscape.
On June 1, 2026, MiniMax officially released M3, calling it "the first and only open-weight model" to bring together three capabilities that, until now, were exclusive to closed-source frontier models: frontier-level coding and agentic performance, a 1M-token context window, and native multimodal input (text, image, and video).
According to MiniMax's official launch post, M3 "reaches frontier-level performance on specialized tasks such as coding and agentic work" using a brand-new attention architecture called MSA (MiniMax Sparse Attention) — proposed entirely by their research team. The model also supports image and video input natively and can operate a desktop computer.
This three-pillar combination has been table stakes for closed-source models like Claude Opus 4.7, GPT-5.5, and Gemini 3.1 Pro. M3 is the first open-weight entry into that tier.
At the architectural heart of M3 is MSA (MiniMax Sparse Attention) — a block-sparse attention mechanism built on top of Grouped Query Attention (GQA). Its purpose is straightforward: escape the quadratic computational cost of full attention that makes scaling context beyond 128K tokens impractical.
MSA operates in two stages:
Unlike compressed latent KV approaches (like MLA), MSA works on real, uncompressed key/value tensors at block granularity, preserving attention expressiveness while dramatically reducing compute. MiniMax's own optimization adopted a "KV outer gather Q" approach — using KV blocks as the outer loop to aggregate queries that hit them — achieving more than 4× faster arithmetic intensity than open-source alternatives like Flash-Sparse-Attention and flash-moba.
At a context length of 1 million tokens, MiniMax reports:
| Metric | Improvement |
|---|---|
| Per-token compute vs M2 | 1/20th of previous generation |
| Prefilling stage speedup | More than 9× |
| Decoding stage speedup | More than 15× |
| Effective context coverage | Significantly higher than DSA and MoBA |
Crucially, across multiple ablations, MSA matched full attention on the vast majority of capabilities — meaning the sparsity doesn't come at the cost of quality.
MiniMax reports that M3 reaches frontier performance on a suite of internationally recognized coding and agentic benchmarks:
| Benchmark | M3 Score | What It Measures |
|---|---|---|
| SWE-Bench Pro | 59.0% | Real-world software engineering fixes |
| Terminal-Bench 2.1 | 66.0% | Terminal command execution and agentic tasks |
| SWE-fficiency | 34.8% | Efficient code change granularity |
| KernelBench Hard | 28.8% | Low-level CUDA/kernel optimization |
| MCP Atlas | 74.2% | Tool-use via Model Context Protocol |
These scores place M3 in direct competition with models like GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.7 on specific software engineering tasks. On SWE-Bench Pro, MiniMax claims M3 beats GPT-5.5 and Gemini 3.1 Pro, approaching Opus 4.7's performance.
Additionally, MiniMax published a follow-up article on June 9, 2026 titled "MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Evolutionary Search" — where they revealed that with the MaxProof framework, M3 exceeded the human gold-medal threshold on both IMO 2025 and USAMO 2026 olympiad benchmarks.
Beyond benchmarks, MiniMax showcased three autonomous, long-horizon tasks that demonstrate M3's combined capabilities:
M3 was given an ICLR 2025 Outstanding Paper Award-winning paper — "Learning Dynamics of LLM Finetuning" — and asked to reproduce it autonomously. Over nearly 12 hours, M3:
Multimodal capabilities were required to read curves, data, and formulas in the paper. The 1M context window allowed the paper, code, and experiment logs to fit simultaneously. And the agentic coding capability made the multi-hour autonomous execution possible.
FP8 matrix multiplication (GEMM) on NVIDIA Hopper architecture is notoriously difficult to optimize — typically requiring one to two weeks of work from an experienced engineering team. M3 was given only a task description and a broken Triton skeleton with no reference implementation.
Over 24 hours of continuous execution, M3:
Notably, M3's best solution appeared on submission #145 — meaning it persisted through multiple performance plateaus where other models (except Opus 4.7) gave up around submission #30.
M3 was given four base models that had only completed pretraining (no downstream capabilities) and tasked to autonomously complete data synthesis, training, evaluation, and iteration within 12 hours — across 5 skills: mathematical reasoning (AIME2025), tool calling (BFCL), scientific reasoning (GPQA Main), arithmetic (GSM8K), and code generation (HumanEval).
M3 scored 0.37 on PostTrainBench — below Opus 4.7 (0.42) and GPT-5.5 (0.39), but clearly ahead of all other models tested.
M3's pricing strategy is aggressively competitive. Here's the breakdown from MiniMax's official pricing page:
| Tier | Input (per 1M tokens) | Output (per 1M tokens) | Prompt Cache Read |
|---|---|---|---|
| ≤512K input (Standard) | $0.60 | $2.40 | $0.12 |
| ≤512K input (Launch Discount) | $0.30 | $1.20 | $0.06 |
| >512K input (Standard) | $1.20 | $4.80 | $0.24 |
| >512K input (Launch Discount) | $0.60 | $2.40 | $0.12 |
For context: Claude Opus 4.7 is priced at approximately $5 per million input tokens. M3 at the discounted rate is ~16× cheaper for output and ~17× cheaper for input.
MiniMax also offers a Token Plan for individuals and small teams:
| Plan | Monthly Cost | Tokens/Month |
|---|---|---|
| Plus | $20 | Up to ~1.7B tokens |
| Max | $50 | Up to ~5.1B tokens |
| Ultra | $120 | Up to ~9.8B tokens |
On MiniMax's landing page, they compare the $20 Plus plan directly: "$20 = 10× Claude Pro. Same price, 10× the throughput."
The model is already available via:
MiniMaxAI/MiniMax-M3)Perhaps the most significant aspect of M3 is its open-weight release. The weights are available on Hugging Face at MiniMaxAI/MiniMax-M3, and MiniMax promises the weights and technical report on GitHub within approximately 10 days of launch.
This is a big deal for several reasons:
However — a note of caution: "open-weight" does not necessarily mean "open source." One source noted that the license terms were not published at launch, and the open-source status depends on the final license. The Hugging Face page confirms the model card includes the standard MiniMax license structure.
Here's how M3 positions against the current frontier:
| Dimension | MiniMax M3 | Claude Opus 4.7 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|
| Context | 1M | 1M | 1,050,000 | 2M+ |
| SWE-Bench Pro | 59.0% | ~Frontier (higher) | ~Frontier (higher) | ~Frontier |
| Multimodal | Native (text+image+video) | Text+Image | Text+Image+Audio | Native (all) |
| Open-Weight | ✅ Yes | ❌ No | ❌ No | ❌ No |
| Price/1M input | $0.30 (launch) | ~$5.00 | ~$3.00 | ~$1.50 |
| Desktop Control | ✅ Yes | ❌ No | ❌ No | ❌ No |
| Deployment | Self-host or API | API only | API only | API only |
M3's differentiation is clear: it's the only model in this tier that can be self-hosted while delivering competitive benchmark scores. It trades absolute top performance (Opus 4.7 still leads on some metrics) for an open deployment model at a fraction of the cost.
MiniMax M3 represents a genuine inflection point in the open-weight AI landscape. For the first time, developers and teams have access to a model that:
The real-world demos — the paper reproduction, the 24-hour CUDA optimization, the model training task — are what separate M3 from the hype. These aren't cherry-picked benchmarks; they're genuine stress tests of long-horizon autonomous capability.
Is it the best model in every category? No. Opus 4.7 still leads on PostTrainBench and likely on general reasoning quality. GPT-5.5 has deeper ecosystem integration. Gemini has Google's infrastructure muscle.
But M3 is the most accessible frontier-level model available today. For teams that need production-grade coding agents, long-context document analysis, or multimodal workflows without the per-token pricing anxiety of closed APIs — M3 is the model to beat.
And that's the real story: the open-weight frontier just caught up.
Published June 13, 2026