Standard retrieval-augmented generation quality which is answer correctness, faithfulness, answer relevancy, context precision, and context recall.
Methodology
50 questions generated from the complete Sherlock Holmes corpus across 4 question types: inference, multi-hop, cross-story, and analytical. Evaluated using RAGAS 0.4.x with GPT-4o as the judge model. Each method ingests the same chunked corpus, then answers all questions. RAGAS scores are computed per-question and aggregated.
Neocortex achieves the highest Answer Relevancy score by a significant margin (0.97 vs 0.88) and is competitive on Context Precision. The graph-based retrieval ensures that returned context is highly relevant to the query, even when the answer requires cross-story reasoning.