Skip to content

Benchmarking¶

This section covers how to evaluate OpenSage/SAGE-X agents and measure performance across tasks.

What you'll find here¶

How to run built-in evaluations (entry point + workflow)
How to add a new evaluation/benchmark to the repo
Recommended reporting and reproducibility practices

Entry points¶

Evaluation Entry: Batch evaluation workflow and lifecycle.
Adding Evaluations: How to integrate a new benchmark.