Terrain mark
Confidence ridge
Where fluent structure rises faster than evidence. The question is whether the response shows its footing.

Behavior atlas
WikiLM uses terrain language because model responses shift with prompt pressure, context, source quality, and requested format. A map does not claim the mountain is the same in every season. It gives a reader landmarks for careful movement.
Terrain mark
Where fluent structure rises faster than evidence. The question is whether the response shows its footing.
Terrain mark
Where safety, policy, ambiguity, or missing context blocks the path. The useful detail is the stated reason.
Terrain mark
Where a model shortens source material and quietly drops qualifiers, minority cases, or sequence.
Terrain mark
Where a correction improves the answer, changes the task, or exposes what the model could not track.
A term like hallucination can be necessary, but it can also hide several different behaviors: unsupported invention, source confusion, stale memory, exaggerated synthesis, or a confident bridge between facts. The atlas tries to keep these distinctions visible. It describes what the model did, what the prompt asked, and which part of the response carried the risk.
This makes the site practical for editors, researchers, product teams, and students. Instead of treating every failure as the same category, a note can point to the exact terrain: the answer compressed away a condition, refused without explaining a path forward, or improved only after the evidence requirement became explicit. Those distinctions are small, but they change how people design prompts, evaluate outputs, and cite model-assisted work.