Improvement

June 6, 20251 Minute Read

You can now run model evaluations with the Models CLI

You can now run prompt evaluations from the command line using the new gh models eval command. This evaluates prompts defined in a .prompt.yml file using the same built-in evaluators available in the GitHub Models UI, including string match, similarity to expected outputs, custom LLM-as-a-judge evaluators, and more.

This makes it easier to test model quality early and often, right from your terminal or CI workflow.

bash
gh models eval my_prompt.prompt.yml

You’ll get a summary of test results for each case, including model output and evaluation scores.

For programmatic use, you can output results in JSON format:

bash
gh models eval my_prompt.prompt.yml --json

The JSON output includes detailed test results, evaluation scores, and summary statistics that can be processed by other tools or CI/CD pipelines.

This new release also improves compatibility with the existing GitHub actions integration for models, making automated evaluations simpler to run as part of your actions workflow. For example, you can run evaluations automatically in actions whenever your .prompt.yml file changes:

Start building AI apps with GitHub Models today

GitHub Models and all our AI development tooling are available now to all GitHub users in public preview. This includes prompt editing and lightweight evaluations. Try our tools out by enabling them in your repository or organization, or learn more in our documentation.

Help us shape what’s next

The Models CLI is open source on GitHub. Check out the code, file issues, or contribute!

We’re just getting started, and your feedback helps guide our roadmap. Join the community discussion to share your thoughts and connect with other developers building the future of AI on GitHub.

Subscribe to our developer newsletter

Discover tips, technical guides, and best practices in our biweekly newsletter just for devs.

By submitting, I agree to let GitHub and its affiliates use my information for personalized communications, targeted advertising, and campaign effectiveness. See the GitHub Privacy Statement for more details.

You can now run model evaluations with the Models CLI - GitHub Changelog