← Outpaced series·Part 1 of 10

Something Big Is Happening

The February 2026 releases that changed the trajectory.

AI AgentsEpoch ChangeFeb 2026

On 11 February 2026, AI researcher and entrepreneur Matt Shumer posted a single observation to X that crystallised what many in the field had been sensing for weeks.

The post went viral. Not because it was hyperbolic. Because while most people outside the AI industry had no idea it was happening, the insiders shared the same uneasy sense of foreboding.

February 2026 was not a month of incremental progress. It was the month that frontier AI crossed a threshold. Every major laboratory, and several open-source teams, released something that would have been front-page news twelve months earlier. Taken individually, each announcement was significant. Taken together, they represented something more troubling: evidence that the acceleration curve the industry had been warning about was not slowing down. It was steepening.

The month that changed the trajectory

To understand why February 2026 matters, you need to see the releases in sequence. Not as isolated product launches, but as a pattern.

29 January 2026

METR publishes AI capability doubling time: 89 days

Independent researchers at METR (Model Evaluation and Threat Research) release a rigorous analysis showing frontier AI capabilities are doubling every 89 days. Not computational power. Capability. The ability to perform real-world tasks that previously required human expertise.

Source: METR

3 February 2026

Alibaba releases Qwen3-Coder-Next as open-weight

An 80-billion parameter model that activates only 3 billion at inference. Scores 70.6% on SWE-Bench Verified, matching models 12 times its active size. Released under Apache 2.0. The signal: frontier coding capability is now available for free, to anyone, running on consumer hardware.

Source: Qwen Team

5 February 2026

Anthropic releases Claude Opus 4.6 with agent teams

One-million-token context window. Agent teams that coordinate multiple AI instances in parallel. Scores 76% on MRCR v2, a needle-in-a-haystack benchmark, compared to 18.5% for its predecessor. This is not a chatbot update. It is the infrastructure for autonomous AI workflows.

Source: Anthropic

5 February 2026

OpenAI launches GPT-5.3-Codex: the model that helped build itself

The same day as Opus 4.6. OpenAI's most capable agentic coding model, 25% faster than its predecessor. The headline: GPT-5.3-Codex is the first model 'instrumental in creating itself.' Early versions debugged their own training, managed their own deployment, and diagnosed their own evaluations. OpenAI's system card rates it High capability in cybersecurity for the first time.

Source: OpenAI

7 February 2026

xAI releases Grok 3 with reasoning mode

Elon Musk's xAI releases Grok 3, competitive on coding and mathematics benchmarks, integrated with real-time X platform data. A company that did not exist three years ago is producing frontier models.

Source: xAI

14 February 2026

Google DeepMind ships Gemini 3.1 Pro

The ARC-AGI-2 benchmark score lands at 77.1%, more than double its predecessor. ARC-AGI-2 is designed specifically to test novel reasoning: problems the model has never seen before. Doubling the score in a single generation is not normal.

Source: Google DeepMind

Mid-February 2026

DeepSeek prepares R2 and V4 for release

China's DeepSeek, the company that shocked the industry with R1 in January 2025, signals the imminent release of two new models: R2 (reasoning, multimodal, 100+ languages) and V4 (next-generation foundation model). The competitive pressure from Chinese open-source labs is relentless.

Source: DeepSeek

24 February 2026

Anthropic raises $30 billion at $380 billion valuation

The largest private funding round in history. Not a technology announcement, but a signal of institutional conviction. The capital markets are pricing AGI timelines in years, not decades.

Source: Anthropic

Eight events in twenty-six days. From five countries. Spanning proprietary labs, open-source teams, and capital markets. Each one demonstrating a different dimension of capability growth. And this is just the public record. The classified and proprietary work behind these releases represents a further acceleration that has not been disclosed.

The AI that helped build itself

Of all the announcements in February 2026, one deserves particular attention. Not because it is the most capable model. Because of what it represents.

OpenAI's GPT-5.3-Codex is the first model that the company describes as "instrumental in creating itself." Early versions of the model were used to debug their own training runs, manage their own deployment infrastructure, and diagnose their own evaluation results. When engineers encountered edge cases during alpha testing, they used the model to identify context rendering bugs and root-cause low cache hit rates. Researchers leveraged it to analyse its own performance improvements across session logs.

OpenAI's system card is careful to note that this does not constitute full recursive self-improvement. The model does not redesign its own architecture or set its own training objectives. But the direction of travel is unmistakable. The gap between "a tool used by engineers" and "a system that participates in its own development" has narrowed to a semantic distinction.

This matters for Part 3 of this series, which examines what happens when the recursive loop accelerates. For now, note the date: 5 February 2026. The same day Anthropic released agent teams in Opus 4.6. Two companies, one day, both crossing the threshold from tool to autonomous agent.

"GPT-5.3-Codex is the first model that was instrumental in creating itself."

OpenAI, 5 February 2026

The open-source front

The proprietary labs were not the only story. February 2026 also demonstrated that the gap between closed and open models is collapsing.

Alibaba's Qwen3-Coder-Next, released 3 February under an Apache 2.0 licence, achieved 70.6% on SWE-Bench Verified with just 3 billion active parameters. For context, DeepSeek V3.2 requires 37 billion active parameters to score 70.2% on the same benchmark. Twelve times the compute for less performance. A model that runs on consumer hardware is now matching models that require enterprise infrastructure.

Meanwhile, DeepSeek, the Hangzhou-based lab that shocked the industry with R1 in January 2025, was preparing two successor models. DeepSeek R2 (reasoning, multimodal, over 100 languages) and DeepSeek V4 (next-generation foundation model) were both signalled for imminent release. The competitive pressure from Chinese open-source labs shows no sign of slowing.

The implication for national policy is stark. Even if Western governments could regulate proprietary AI labs, they cannot regulate open-weight models released under permissive licences from teams in Hangzhou. The capability is diffusing globally, in the open, faster than any governance framework can contain it.

Why this matters

The significance of February 2026 is not any single release. It is the compression. The gap between major capability jumps is shrinking. In 2023, a meaningful advance might arrive once per quarter. In 2024, once per month. By February 2026, meaningful advances were arriving every few days.

This is what exponential acceleration looks like from the inside. Not a sudden explosion, but a gradual tightening of the interval between milestones until the pace becomes impossible for institutions to match.

The Australian Government's national AI strategy, published in 2024, operates on a multi-year implementation cycle. The technology it seeks to govern is operating on a multi-week cycle. This mismatch is not a policy failure in the conventional sense. It is a structural incompatibility between how institutions plan and how the technology moves.

"Institutions are planning on 5-year cycles. The technology is moving on 3-month cycles. The gap between these two timelines is where the damage will be done."

The infrastructure signal

Look beyond the models for a moment. Look at the money.

By February 2026, the combined AI infrastructure commitment from the major technology companies exceeded $700 billion for 2026 alone. Microsoft, Google, Amazon, Meta, and others are building data centres, securing energy contracts, and stockpiling semiconductors at a pace that has no parallel in the history of corporate investment.

These are not speculative bets. These companies have access to internal benchmarks and capability assessments that the public does not. Their capital allocation decisions are the clearest signal available about what they expect to happen next. And what they expect, evidently, is that the models being trained today will justify hundreds of billions in infrastructure within the next 24 months.

Anthropic's $30 billion round at a $380 billion valuation is particularly instructive. The company was founded in 2021. In less than five years, investors have priced it above the market capitalisation of companies like Lockheed Martin, Goldman Sachs, and BHP. That valuation is not based on current revenue. It is based on the capability trajectory of the models being developed in their labs.

What this series will cover

Outpaced is a 10-part investigation built on a single premise: the acceleration documented in February 2026 is not a one-off event. It is the visible surface of a structural shift that will reshape employment, taxation, national power, and daily life in Australia within the next decade.

Each subsequent instalment will examine one dimension of this shift. Part 2 examines the METR measurement in detail: what the 89-day doubling time means when translated to everyday occupations. Part 3 looks at the recursive loop, and what it means that GPT-5.3-Codex was instrumental in creating itself. Parts 4 through 9 trace the consequences across the Australian economy, energy systems, semiconductor supply chains, robotics, logistics, and space infrastructure. Part 10 asks the question nobody wants to answer: what does this mean for Australian families between now and 2035?

Every claim will be sourced. Every statistic will link to primary data. Where estimates are used, they will be clearly labelled.

This is not prediction. It is pattern recognition. The trends documented here are already underway. The only question is speed.

Sources

Follow the investigation

Part 2, "The Clock," examines what the 89-day doubling time means when translated to everyday jobs. Subscribe to get it when it publishes.

Join readers of Leverage