Evaluate skill quality using scenarios
This page covers scenario-based evals for skills and plugins. If you want to run evals against your codebase using real commit diffs, see Evaluate your codebase instead.
Tessl lets you run end-to-end task evaluations for your skills directly from the CLI. You generate a set of scenarios, run an agent against them, and see how well it performs — with and without your skill injected. This workflow is designed for fast, repeatable iteration as you develop and refine a skill, without building your own eval harness.
TL;DR
Generate scenarios against your skill, then test how the agent does with and without your skill present. For example:
You have a skill that says how to communicate with a system.
Generate a set of scenarios using Tessl against that skill, or write your own, on how to communicate with that system.
See if the agent can do the task in the scenario with or without the skill to determine how effective the skill is.
There are different tools, end even different evaluation types in the Tessl toolkit for improving your skills, refer to the Overview for more information on reviews, the different kinds of evals, and skills available.
What you can do with scenario based evals
With self-serve evaluations, you can:
Generate evaluation scenarios automatically using Tessl's scenario generation skill
Generate scenarios directly from the CLI with
tessl scenario generateDefine custom evaluation scenarios manually
Run end-to-end evaluations from the CLI
View and track evaluation results in the Tessl web UI
Prerequisites
Logged into Tessl.
Tessl installed (latest version).
A skill packaged in a plugin available locally.
Your project directory linked to a Tessl project (see Step 0).
Step 0: Link your project to Tessl
Your project directory must be linked to a Tessl project before you can generate or run evaluations.
Link your project directory to a Tessl project:
If the project already exists, use tessl project link instead. See Manage projects from the CLI for the full setup.
If the directory is already linked to a Tessl project, skip this step.
Step 1: Generate evaluation scenarios
There are three ways to create scenarios:
Option A: CLI (quickest — requires an existing plugin)
For plugin-based generation, the workspace is read from .tessl-plugin/plugin.json. Don't pass --workspace. Make sure .tessl-plugin/plugin.json points at a workspace you have publisher rights on before running (see Step 3).
If you created the plugin via tessl skill import without specifying a workspace, .tessl-plugin/plugin.json will default to "name": "local/<skill>" and tessl scenario generate will fail with 404 Workspace not found. Update the workspace in .tessl-plugin/plugin.json (see Step 3) before running generate.
Generation runs server-side. Check progress with:
Then download to disk once complete, run this command from folder where .tessl-plugins resides so it is downloaded to this folder, or specify that output folder location using --output:
tessl scenario download places the evals/ directory relative to your current working directory, but tessl eval run <path/to/plugin> looks for evals/ inside the plugin's directory. If you ran the download from your project root or some other location, move the folder before running evals:
You can also generate scenarios from repository commits instead of a plugin — see tessl scenario generate for full options.
Option B: Agent-assisted (recommended if starting from a standalone skill file)
This approach uses a Tessl-provided skill to handle converting a standalone skill to a plugin and generating scenarios in one guided flow.
First, install the scenario creator skill in your project:
Then prompt your agent (e.g. Claude):
Where <my_skill> is the name or path to your skill. This will:
Verify Tessl installation
Convert your skill to a plugin (if it is not already in one)
Generate an initial set of scenarios
Option C: Write scenarios by hand
Create the directory structure manually. Each scenario folder must be one level deep from ./evals, Each scenario needs task.md and criteria.json at minimum. Add an optional scenario.json to bundle input files or environment setup with your scenario:
Eval the skills you actually ship, not just simple ones. If your skill depends on a real codebase, sample data, or custom environment setup, scenario.json lets you bundle that context with your scenario:
fixtures— named external content the platform installs before the agent runs. Use acommitfixture to point at a real codebase snapshot, or adirectoryfixture for local content.include— local paths from your scenario directory copied verbatim into the working directory. Aresources/directory is included automatically by convention.setup— shell scripts run after fixtures and includes are installed, for steps like installing dependencies or seeding a database.
See the scenario.json reference for the full schema.
Step 2: (Optional) Review and edit scenarios
Each scenario contains:
task.md— the task brief shown to the agentcriteria.json— the scoring rubric
Tessl auto-generates these, but for best results review them yourself. You're the ultimate authority on what your skill is intended to do and what success looks like.
Automatic scenario generation creates criteria that reflect the instructions in the skill. Review these to check they reflect the outcomes you actually want the skill to achieve.
Step 3: Check .tessl-plugin/plugin.json
Open .tessl-plugin/plugin.json in your skill folder and ensure the workspace name is updated to a workspace you have publisher rights on, and you have chosen a plugin name.
Note that if the workspace name in the .tessl-plugin/plugin.json is one you do not have access to, or the correct permissions, you will not be able to run the evals!
Step 4: Run the evaluation
If a project does not exist, you will be asked to associate the skill to a Tessl project so runs are kept together on the web interface.
The tessl eval run. command expects one of the following scenarios
Point to the plugin root with the .tessl-plugin folder, evals, and skill (file or folder are located). The
.tessl-plugin/plugin.jsonwill point to where the SKILL.md is located. For example this docsreviewer skill has that structure:

Point the command to a folder that has both SKILL.md and /evals. For example this docsreviewer demonstrates that variant:

In the general case , you will pass the path to the root of your plugin:
If you followed the structure above, an evaluation will be executed. However if the eval folder is not in the pointed location, or SKILL.md not w
For example, if your plugin lives at plugins/my-skill/.tessl-plugin/plugin.json, your scenarios must be at plugins/my-skill/evals/. If you used Option A above and downloaded scenarios to your project root, move them first:
You can attach a label to a run to help identify it later in tessl eval list or the Tessl web UI:
By default, evals run using Claude Sonnet. You can specify a different Claude model using --agent:
Supported Claude models:
claude-sonnet-4-6
Default
claude-opus-4-6
Most capable
claude-sonnet-4-5
claude-opus-4-5
claude-haiku-4-5
Fastest, lowest cost
This is useful if you want to evaluate your skill against the specific model you use in production, or to compare how your skill performs across different Claude models. Note that plugin evals support one --agent per run — to compare models, run the command once per model and compare the results.
Benchmarking across models made easier
To run evals across Haiku, Sonnet, and Opus in one guided flow — with automatic side-by-side comparison and gap diagnosis — use the review-model-performance skill:
Then ask your agent: "Run model comparison evals"
You'll receive a URL in the terminal output/CLI to monitor progress and view results in the Tessl web UI.

The id for the next step can be found in the URL (i.e. 019c4791-9eec-7458-b28a-6c94405a3d38)
Step 5: Review your results
Eval runs can take time. Use any of these to check status:
Visit the URL from Step 4
Direct link to results
tessl eval view <id>
View a specific eval run
tessl eval view --last
List last eval run with IDs and status
tessl eval list
List all eval runs with IDs and status
tessl eval view <id> --json
Structured details on the eval run
tessl eval retry <id>
Retry a failed eval run
If you lose the id for your eval run, simply run tessl eval list to easily find it again.
Example output from tessl eval list:

It's not uncommon for an attempt to not succeed on solving the eval scenario. You can either adjust the scenario or retry the scenario using tessl eval retry <id>.
Step 6: Publish your plugin (Optional)
Since your skill was converted to a plugin, you can now manage it at the plugin level using the CLI.
To publish your plugin to the Tessl registry - a new eval will only be run if you have not run an eval previously or if the content of your plugin has changed since the last eval run:
tessl plugin publish
To publish without running a new eval:
tessl plugin publish --skip-evals
Note: plugins created through this flow are published as private by default. To make your plugin public, update .tessl-plugin/plugin.json: setting "private": false to the appropriate value.
For more plugin management options, run:
tessl plugin --help
Last updated

