AI & ML
The Alignment Tax, Measured
Safety training was supposed to cost capability. New evaluations suggest the bill is smaller — and stranger — than anyone expected.
by Dr. Priya Nair, Machine Learning · June 11, 2026 · 7 min read
It was long assumed that making a model safer made it duller: every refusal, every hedge, every guardrail shaving a little off raw capability. The 'alignment tax' was treated as an unfortunate but unavoidable cost.
Careful evaluation complicates the story. On most benchmarks, well-aligned models lose almost nothing; on some they improve, because the same training that teaches restraint also teaches the model to follow instructions precisely.
Where the tax does appear is narrow and revealing: tasks that reward the very behaviors safety training suppresses — unfiltered speculation, confident extrapolation past the evidence.
The finding reframes the debate. Alignment is not a uniform drag on intelligence but a redistribution of it, sharpening some faculties while dulling others. The question is whether we are trimming the right ones.