Evaluation dashboard

Every answer is provably evaluated. The golden set runs through grounded QA and is scored on citation coverage, faithfulness, and keyword recall — including a negative case that must refuse with NOT_FOUND.

No eval runs yet. Click Run evals to score the golden set.