Benchmarking¶
This section covers how to evaluate OpenSage/SAGE-X agents and measure performance across tasks.
What you'll find here¶
- How to run built-in evaluations (entry point + workflow)
- How to add a new evaluation/benchmark to the repo
- Recommended reporting and reproducibility practices
Entry points¶
- Evaluation Entry: Batch evaluation workflow and lifecycle.
- Adding Evaluations: How to integrate a new benchmark.