RAGAS

What It Measures

Standard retrieval-augmented generation quality which is answer correctness, faithfulness, answer relevancy, context precision, and context recall.

Methodology

50 questions generated from the complete Sherlock Holmes corpus across 4 question types: inference, multi-hop, cross-story, and analytical. Evaluated using RAGAS 0.4.x with GPT-4o as the judge model. Each method ingests the same chunked corpus, then answers all questions. RAGAS scores are computed per-question and aggregated.

Methods Compared

neocortex_v1, fastgraphrag, gemini_vdb, mem0, supermemory

Results

Metric

Neocortex

Best Competitor

Competitor

Answer Relevancy

0.97

0.88

supermemory

Context Precision

0.80

0.78

supermemory

Faithfulness

0.97

0.79

gemini_vdb

Answer Correctness

0.78

0.59

gemini_vdb

Context Recall

0.78

0.70

gemini_vdb

Analysis

Neocortex achieves the highest Answer Relevancy score by a significant margin (0.97 vs 0.88) and is competitive on Context Precision. The graph-based retrieval ensures that returned context is highly relevant to the query, even when the answer requires cross-story reasoning.

PreviousOverview NextTemporalBench

Last updated 10 days ago

hashtagWhat It Measures

hashtagMethodology

hashtagMethods Compared

hashtagResults

hashtagAnalysis

What It Measures

Methodology

Methods Compared

Results

Analysis