# Architecture `t2s-metrics` is organized as a modular evaluation pipeline. ## Main flow 1. `JsonlEval` iterates query cases from JSONL input. 2. `Experiment` creates context and evaluation engine. 3. `EvaluationEngine` runs metrics per case. 4. `MeanAggregator` computes summary values. 5. Export utilities write JSON result files. ## Core packages - `t2smetrics/core`: context, engine, experiment orchestration, export - `t2smetrics/metrics`: metric definitions and registry - `t2smetrics/execution`: local and endpoint query execution backends - `t2smetrics/llm`: optional LLM backend for judge-style metrics - `t2smetrics/representation`: SPARQL preprocessing/tokenization utilities ## Runtime constraints - Metrics can declare execution or LLM requirements. - The engine enforces requirements before computing each metric. - `ndcg` is skipped when `order_matters` is false in the input case.