improved

Improved Agent Evaluations Experience

We’ve redesigned agent evaluations to make testing AI agents faster, clearer, and easier to adopt, with goal-oriented metric groups, example datasets, clearer errors, and deep trace integration. Identify issues, optimize behavior, and build safer, more reliable, production-ready systems.

See it now ->