Model comparison guide
Last updated: March 2026This guide is updated within days of major model releases. The version in the book was accurate at time of publication; this page always reflects the current landscape. Every stat links to its source.
Claude Opus 4.6 ↗
by Anthropic · Released February 2026
Complex reasoning, extended analysis, long documents, agentic coding, research
Pricing
$15 input / $75 output per million tokens. Included in Claude Pro ($20/mo).
Context window
200K tokens
Strengths
- + Frontier reasoning capability
- + METR 50% time horizon: ~14.5 hours
- + Excellent long-form writing
- + Agentic tool use and coding
Limitations
- − Slower than smaller models
- − Higher cost per token
- − Can be verbose
Claude Sonnet 4.6 ↗
by Anthropic · Released February 2026
Everyday tasks, coding, analysis, balance of speed and capability
Pricing
$3 input / $15 output per million tokens. Included in Claude Pro ($20/mo).
Context window
200K tokens
Strengths
- + Fast and capable
- + Excellent at coding
- + Strong instruction following
- + Best value for daily use
Limitations
- − Less depth than Opus on complex tasks
- − Shorter generation limits
General purpose, multimodal tasks, reasoning, creative work
Pricing
Included in ChatGPT Plus ($20/mo) and ChatGPT Pro ($200/mo).
Context window
200K tokens
Strengths
- + Strong multimodal capability
- + Advanced reasoning
- + Wide ecosystem and integrations
- + Image generation built in
Limitations
- − Can hallucinate confidently
- − Expensive at API scale
- − Pro tier needed for full capability
Advanced reasoning, mathematics, research, complex problem solving
Pricing
$10 input / $40 output per million tokens. Available on ChatGPT Pro ($200/mo).
Context window
200K tokens
Strengths
- + Exceptional reasoning
- + Strong at maths and science
- + Chain-of-thought built in
- + METR 50% time horizon: ~75-90 min
Limitations
- − Slow (thinks before responding)
- − Expensive
- − Can overthink simple tasks
Gemini 3.1 Pro ↗
by Google · Released February 2026
Multimodal analysis, real-time information, code generation, long context
Pricing
Included in Gemini Advanced ($20/mo). Competitive API pricing.
Context window
1M tokens
Strengths
- + Largest context window (1M tokens)
- + ARC-AGI-2 score: 77.1%
- + Real-time web access
- + Strong multimodal
Limitations
- − Newer ecosystem than OpenAI/Anthropic
- − Output quality can vary
- − Less established for long-form writing
Gemini 2.5 Flash ↗
by Google · Released March 2025
Fast tasks, summarisation, classification, high-volume processing
Pricing
$0.15 input / $0.60 output per million tokens. Free tier available.
Context window
1M tokens
Strengths
- + Very fast
- + Extremely cheap
- + Large context window
- + Good for batch processing
Limitations
- − Less capable on complex tasks
- − Weaker reasoning
- − Less nuanced writing
Quick comparison
| Model | Provider | Context | Speed | Cost |
|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | 200K | Moderate | $$$ |
| Claude Sonnet 4.6 | Anthropic | 200K | Fast | $$ |
| GPT-5.4 | OpenAI | 200K | Fast | $$ |
| GPT-o3 | OpenAI | 200K | Slow | $$$$ |
| Gemini 3.1 Pro | 1M | Fast | $$ | |
| Gemini 2.5 Flash | 1M | Very fast | $ |
How to choose
For complex reasoning, extended research, or documents longer than 10,000 words: Claude Opus 4.6 or GPT-o3. These models think deeper and handle nuance better, but they cost more and respond slower.
For everyday work, coding, email drafting, and analysis: Claude Sonnet 4.6 or GPT-5.4. These are the workhorses. Fast, capable, affordable. Start here for most tasks.
For tasks involving images, audio, or video: GPT-5.4 or Gemini 3.1 Pro. Both handle multimodal input well. Gemini has the edge on real-time web access and the largest context window.
For high-volume processing or cost-sensitive tasks: Gemini 2.5 Flash. By far the cheapest, with a massive 1M token context window. Ideal for summarisation, classification, and batch work.
Always test on your specific task. Model performance varies dramatically across different types of work. What works best for coding might not be optimal for writing or analysis.
Sources
- Anthropic model pricing: anthropic.com/pricing ↗
- OpenAI model pricing: openai.com/api/pricing ↗
- Google AI model pricing: ai.google.dev/pricing ↗
- Chatbot Arena leaderboard: lmarena.ai ↗
Stay updated
Model releases happen fast. Get notified when new models arrive and this guide updates.
Join readers of Leverage