You can now run model evaluations with the Models CLI

You can now run prompt evaluations from the command line using the new gh models eval command. This evaluates prompts defined in a .prompt.yml file using the same built-in evaluators available in the GitHub Models UI, including string match, similarity to expected outputs, custom LLM-as-a-judge evaluators, and more.

This makes it easier to test model quality early and often, right from your terminal or CI workflow.

bash

gh models eval my_prompt.prompt.yml

gh models eval my_prompt.prompt.yml

You’ll get a summary of test results for each case, including model output and evaluation scores.

For programmatic use, you can output results in JSON format:

bash

gh models eval my_prompt.prompt.yml --json

gh models eval my_prompt.prompt.yml --json

The JSON output includes detailed test results, evaluation scores, and summary statistics that can be processed by other tools or CI/CD pipelines.

This new release also improves compatibility with the existing GitHub actions integration for models, making automated evaluations simpler to run as part of your actions workflow. For example, you can run evaluations automatically in actions whenever your .prompt.yml file changes:

Start building AI apps with GitHub Models today

GitHub Models and all our AI development tooling are available now to all GitHub users in public preview. This includes prompt editing and lightweight evaluations. Try our tools out by enabling them in your repository or organization, or learn more in our documentation.

Help us shape what’s next

The Models CLI is open source on GitHub. Check out the code, file issues, or contribute!

We’re just getting started, and your feedback helps guide our roadmap. Join the community discussion to share your thoughts and connect with other developers building the future of AI on GitHub.

JUN.16Retired
GitHub Models is no longer available to new customers
- Ecosystem and Accessibility
MAY.15Improvement
GitHub App installation tokens: Per-request override header
- Ecosystem and Accessibility
MAY.13Improvement
New enterprise installation API now in public preview
- Ecosystem and Accessibility
APR.20Retired
Sunsetting SHA-1 in HTTPS on GitHub
- Ecosystem and Accessibility
MAR.12Release
REST API version 2026-03-10 is now available
- Ecosystem and Accessibility
FEB.3Release
The Dependabot Proxy is now open source with an MIT license
- Ecosystem and Accessibility
JAN.12Improvement
Selectively showing "act on your behalf" warning for GitHub Apps is in public preview
- Ecosystem and Accessibility
NOV.7Retired
GraphQL Explorer removal from API documentation on November 7, 2025
- Ecosystem and Accessibility
OCT.31Retired
Deprecated models in GitHub Models
- Ecosystem and Accessibility

You can now run model evaluations with the Models CLI

Start building AI apps with GitHub Models today

Help us shape what’s next

Related Posts

Subscribe to our developer newsletter