First PrinciplesA Clean & Minimal Research JournalSubscribe
← Back to all articles

AI & ML

The Alignment Tax, Measured

Safety training was supposed to cost capability. New evaluations suggest the bill is smaller — and stranger — than anyone expected.

by Dr. Priya Nair, Machine Learning · June 11, 2026 · 7 min read

The Alignment Tax, Measured

It was long assumed that making a model safer made it duller: every refusal, every hedge, every guardrail shaving a little off raw capability. The 'alignment tax' was treated as an unfortunate but unavoidable cost.

Careful evaluation complicates the story. On most benchmarks, well-aligned models lose almost nothing; on some they improve, because the same training that teaches restraint also teaches the model to follow instructions precisely.

Where the tax does appear is narrow and revealing: tasks that reward the very behaviors safety training suppresses — unfiltered speculation, confident extrapolation past the evidence.

The finding reframes the debate. Alignment is not a uniform drag on intelligence but a redistribution of it, sharpening some faculties while dulling others. The question is whether we are trimming the right ones.

More in AI & ML

View all »