AI & ML

Small Models, Big Context

The frontier is quietly shifting from parameter count to context length. What a model can hold in mind may matter more than how much it knows.

by Dr. Priya Nair, Machine Learning · June 24, 2026 · 8 min read

For two years the headline number was parameters. The quieter revolution has been the context window — the span of text, code, or transcript a model can attend to at once — which has grown from a paragraph to a small library.

A modest model with a million-token window behaves unlike its specification suggests. It can read an entire codebase, a quarter's worth of correspondence, or a patient's full chart before answering, substituting retrieval-in-context for knowledge baked into weights.

The trade is real: attention over long sequences is expensive, and most of those tokens are noise. The research frontier is now about which tokens earn their place — learned compression, hierarchical caches, and routing that decides what to keep.

If the last era asked how much a model could memorize, this one asks how much it can hold in working memory at the moment of decision. The answer is reshaping what 'small' even means.

Small Models, Big Context

More in AI & ML

The Alignment Tax, Measured

Scaling Laws Beyond the Plateau

Agents That Know When to Stop