hallucination-benchmark

An open evaluation dataset for hallucination detection methods.

212 question-response pairs across 7 knowledge domains. Each pair has a grounded response and a human-crafted confabulation — invented institutions, fabricated mechanisms, fictional terminology written by humans, not generated by an LLM.

Key findings: DGI achieves 0.958 AUROC on LLM-generated confabulations. A single global direction achieves 0.96 AUROC on human-crafted data. Cross-domain detection collapses to chance.

GitHub repository · Paper (PDF)

← Back to projects

Info

hallucination-benchmark