Skip to content

Benchmarking

This section covers how to evaluate OpenSage/SAGE-X agents and measure performance across tasks.

What you'll find here

  • How to run built-in evaluations (entry point + workflow)
  • How to add a new evaluation/benchmark to the repo
  • Recommended reporting and reproducibility practices

Entry points