Resource

Model comparison guide

There is no best model. There is only the best model for this task, at this cost, right now. This guide pairs with chapter two of Artificial Leverage: a plain-language map of the leading systems and how to weigh them. The landscape moves every quarter, so treat the specifics as a snapshot and the framework as the durable part.

Five ways to judge any model

Reasoning depth

How well it holds a complex chain of thought without losing the thread.

Instruction following

How closely it sticks to what you actually asked for.

Context capacity

How much you can load into one conversation before it forgets.

Cost efficiency

What the output costs at the volume you actually use.

Domain speciality

Where it is genuinely stronger: code, analysis, research, or writing.

The current flagships

Model	Maker	Context	Cost (in / out, USD per 1M)	Open weights
Claude Opus 4.6	Anthropic	1M tokens	$5 / $25	No
GPT-5.2	OpenAI	200K tokens	Varies	No
Gemini 3.1 Pro	Google	1M tokens	$2 / $12	No
DeepSeek V3.2	DeepSeek	164K tokens	$0.26 / $0.38	Yes
Llama 4 Maverick	Meta	1M tokens	Varies	Yes

Specifications from the site capability tracker (METR, official model cards, and Artificial Analysis). Costs and context windows change often; check the live tools below for the current numbers.

What each one is best for

Claude Opus 4.6Long-horizon agentic work, deep analysis, and writing. The longest measured autonomous task horizon.

GPT-5.2Frontier reasoning and software engineering, with strong tool use.

Gemini 3.1 ProResearch synthesis and multimodal work across a very large context window.

DeepSeek V3.2Cost-sensitive workloads and self-hosting. Open weights at a fraction of frontier pricing.

Llama 4 MaverickOn-premise and private deployments where open weights and control matter most.

Go deeper with the live tools

This guide is a starting point. For an interactive walk-through of which model fits a specific task, or for live benchmark data that updates as new models ship, use the tools.

Open the model selector See the live capability tracker