Teams can now validate any model or inference router configuration on their own data before production. Run structured LLM-as-a-Judge evaluations across catalog models, fine-tuned models, BYOM imports, and router setups without stitching together a separate evaluation stack.