Architecture
t2s-metrics is organized as a modular evaluation pipeline.
Main flow
JsonlEvaliterates query cases from JSONL input.Experimentcreates context and evaluation engine.EvaluationEngineruns metrics per case.MeanAggregatorcomputes summary values.Export utilities write JSON result files.
Core packages
t2smetrics/core: context, engine, experiment orchestration, exportt2smetrics/metrics: metric definitions and registryt2smetrics/execution: local and endpoint query execution backendst2smetrics/llm: optional LLM backend for judge-style metricst2smetrics/representation: SPARQL preprocessing/tokenization utilities
Runtime constraints
Metrics can declare execution or LLM requirements.
The engine enforces requirements before computing each metric.
ndcgis skipped whenorder_mattersis false in the input case.