hallucination-benchmark
2/10/2026
An open evaluation dataset for hallucination detection methods.
212 question-response pairs across 7 knowledge domains. Each pair has a grounded response and a human-crafted confabulation — invented institutions, fabricated mechanisms, fictional terminology written by humans, not generated by an LLM.
Key findings: DGI achieves 0.958 AUROC on LLM-generated confabulations. A single global direction achieves 0.96 AUROC on human-crafted data. Cross-domain detection collapses to chance.
GitHub repository · Paper (PDF)
← Back to projects