# Evaluate skill quality using scenarios

{% hint style="info" %}
This page covers scenario-based evals for **skills and plugins**. If you want to run evals against your codebase using real commit diffs, see [Evaluate your codebase](/improving-your-skills/evaluating-your-codebase.md) instead.
{% endhint %}

Tessl lets you run end-to-end task evaluations for your skills directly from the CLI. You generate a set of scenarios, run an agent against them, and see how well it performs — with and without your skill injected. This workflow is designed for fast, repeatable iteration as you develop and refine a skill, without building your own eval harness.

## TL;DR

Generate scenarios against your skill, then test how the agent does with and without your skill present. For example:<br>

* You have a skill that says how to communicate with a system.
* Generate a set of scenarios using Tessl against that skill, or write your own, on how to communicate with that system.
* See if the agent can do the task in the scenario with or without the skill to determine how effective the skill is.

{% hint style="info" %}
There are different tools, end even different evaluation types in the Tessl toolkit for improving your skills, refer to the [Overview](/improving-your-skills/overview-improving-skills-and-plugins.md) for more information on reviews, the different kinds of evals, and skills available.
{% endhint %}

### What you can do with scenario based evals

With self-serve evaluations, you can:

* Generate evaluation scenarios automatically using Tessl's [scenario generation skill](https://tessl.io/registry/tessl-labs/tessl-skill-eval-scenarios)
* Generate scenarios directly from the CLI with [`tessl scenario generate`](/reference/cli-commands.md#tessl-scenario-generate)
* Define custom evaluation scenarios manually
* Run end-to-end evaluations from the CLI
* View and track evaluation results in the Tessl web UI

### Prerequisites

* Logged into Tessl.
* Tessl [installed](/introduction-to-tessl/installation.md) (latest version).
* Access to a [workspace](/reference/workspaces.md) (you must be at least a [Publisher](/administrators/roles.md)).
* A skill packaged in a plugin available locally.
* Your project directory linked to a Tessl project (see Step 0).

***

### Step 0: Link your project to Tessl

Your project directory must be linked to a Tessl project before you can generate or run evaluations.

Link your project directory to a Tessl project:

```sh
$ tessl project create <project-name> --workspace <name-or-id>
```

If the project already exists, use `tessl project link` instead. See [Manage projects from the CLI](/projects/manage-projects-from-the-cli.md) for the full setup.

If the directory is already linked to a Tessl project, skip this step.

***

### Step 1: Generate evaluation scenarios

There are three ways to create scenarios:

#### Option A: CLI (quickest — requires an existing plugin)

```sh
$ tessl scenario generate <path/to/plugin> --count=5
```

For plugin-based generation, the workspace is read from `.tessl-plugin/plugin.json`. Don't pass `--workspace`. Make sure `.tessl-plugin/plugin.json` points at a workspace you have [publisher](/administrators/roles.md) rights on before running (see Step 3).

{% hint style="warning" %}
If you created the plugin via `tessl skill import` without specifying a workspace, `.tessl-plugin/plugin.json` will default to `"name": "local/<skill>"` and `tessl scenario generate` will fail with `404 Workspace not found`. Update the workspace in `.tessl-plugin/plugin.json` (see Step 3) before running generate.
{% endhint %}

Generation runs server-side. Check progress with:

```sh
tessl scenario list --mine
```

Then download to disk once complete, run this command from folder where ***.tessl-plugins*** resides so it is downloaded to this folder, or specify that output folder location using --output:

```sh
tessl scenario download --last
```

{% hint style="warning" %}
`tessl scenario download` places the `evals/` directory relative to your **current working directory**, but `tessl eval run <path/to/plugin>` looks for `evals/` inside the **plugin's directory**. If you ran the download from your project root or some other location, move the folder before running evals:

```sh
mv ./evals/ <path/to/plugin-dir>/evals/
```

{% endhint %}

You can also generate scenarios from repository commits instead of a plugin — see [`tessl scenario generate`](/reference/cli-commands.md#tessl-scenario-generate) for full options.

#### Option B: Agent-assisted (recommended if starting from a standalone skill file)

This approach uses a Tessl-provided skill to handle converting a standalone skill to a plugin and generating scenarios in one guided flow.

First, install the scenario creator skill in your project:

```sh
tessl install tessl-labs/tessl-skill-eval-scenarios
```

Then prompt your agent (e.g. Claude):

```
"Create eval scenarios for <my_skill>"
```

Where `<my_skill>` is the name or path to your skill. This will:

1. Verify Tessl installation
2. Convert your skill to a plugin (if it is not already in one)
3. Generate an initial set of scenarios

#### Option C: Write scenarios by hand

Create the directory structure manually. Each scenario folder must be one level deep from **./evals**, Each scenario needs `task.md` and `criteria.json` at minimum. Add an optional `scenario.json` to bundle input files or environment setup with your scenario:

```
<your-plugin-name>/
├── .tessl-plugin/
│     └── plugin.json
└── skills/<your-skill-name>/SKILL.md
├── evals/
│     └── scenario-1/
│          ├── task.md
│          ├── criteria.json
│          └── scenario.json        # optional: fixtures, includes, setup scripts
│     └── scenario-2/
```

Eval the skills you actually ship, not just simple ones. If your skill depends on a real codebase, sample data, or custom environment setup, `scenario.json` lets you bundle that context with your scenario:

* **`fixtures`** — named external content the platform installs before the agent runs. Use a `commit` fixture to point at a real codebase snapshot, or a `directory` fixture for local content.
* **`include`** — local paths from your scenario directory copied verbatim into the working directory. A `resources/` directory is included automatically by convention.
* **`setup`** — shell scripts run after fixtures and includes are installed, for steps like installing dependencies or seeding a database.

See the [scenario.json reference](/reference/configuration.md#scenariojson) for the full schema.

***

### Step 2: (Optional) Review and edit scenarios

Each scenario contains:

* `task.md` — the task brief shown to the agent
* `criteria.json` — the scoring rubric

Tessl auto-generates these, but for best results review them yourself. You're the ultimate authority on what your skill is intended to do and what success looks like.

{% hint style="info" %}
Automatic scenario generation creates criteria that reflect the instructions in the skill. Review these to check they reflect the outcomes you actually want the skill to achieve.
{% endhint %}

***

### Step 3: Check .tessl-plugin/plugin.json

Open `.tessl-plugin/plugin.json` in your skill folder and ensure the workspace name is updated to a workspace you have [publisher](/administrators/roles.md) rights on, and you have chosen a plugin name.

```json
// Before
{ "name": "placeholder/repo-flow-mapper", ... }

// After
{ "name": "mycompany/repo-flow-mapper", ... }
```

Note that if the workspace name in the `.tessl-plugin/plugin.json` is one you do not have access to, or the correct permissions, you will not be able to run the evals!

***

### Step 4: Run the evaluation

If a project does not exist, you will be asked to associate the skill to a Tessl project so runs are kept together on the web interface.

* See [Managing projects from the CLI](/projects/manage-projects-from-the-cli.md)

The `tessl eval run.` command expects one of the following scenarios

* Point to the plugin root with the .tessl-plugin folder, evals, and skill (file or folder are located). The `.tessl-plugin/plugin.json` will point to where the SKILL.md is located. For example this docsreviewer skill has that structure:

<figure><img src="/files/LuFDfyqJYg2SZv4lfQQW" alt=""><figcaption></figcaption></figure>

* Point the command to a folder that has both SKILL.md and /evals. For example this docsreviewer demonstrates that variant:

<figure><img src="/files/NwZZXfhfiq5JaFOq2buN" alt=""><figcaption></figcaption></figure>

In the general case , you will pass the path to the root of your plugin:

```sh
tessl eval run .
```

If you followed the structure above, an evaluation will be executed. However if the eval folder is not in the pointed location, or SKILL.md not w

For example, if your plugin lives at `plugins/my-skill/.tessl-plugin/plugin.json`, your scenarios must be at `plugins/my-skill/evals/`. If you used Option A above and downloaded scenarios to your project root, move them first:

```sh
mv ./evals/ plugins/my-skill/evals/
tessl eval run plugins/my-skill/.tessl-plugin/plugin.json
```

You can attach a label to a run to help identify it later in `tessl eval list` or the Tessl web UI:

```sh
$ tessl eval run <path/to/plugin> --label "testing prompt changes"
```

By default, evals run using Claude Sonnet. You can specify a different Claude model using `--agent`:

```sh
tessl eval run <path/to/plugin> --agent=claude:claude-sonnet-4-6
tessl eval run <path/to/plugin> --agent=claude:claude-opus-4-6
tessl eval run <path/to/plugin> --agent=claude:claude-haiku-4-5
```

Supported Claude models:

| Model               | Notes                |
| ------------------- | -------------------- |
| `claude-sonnet-4-6` | Default              |
| `claude-opus-4-6`   | Most capable         |
| `claude-sonnet-4-5` |                      |
| `claude-opus-4-5`   |                      |
| `claude-haiku-4-5`  | Fastest, lowest cost |

This is useful if you want to evaluate your skill against the specific model you use in production, or to compare how your skill performs across different Claude models. Note that plugin evals support one `--agent` per run — to compare models, run the command once per model and compare the results.

{% hint style="info" %}
**Benchmarking across models made easier**

To run evals across Haiku, Sonnet, and Opus in one guided flow — with automatic side-by-side comparison and gap diagnosis — use the [`review-model-performance`](https://tessl.io/registry/tessl-labs/review-model-performance) skill:

```sh
$ tessl install tessl-labs/review-model-performance
```

Then ask your agent: `"Run model comparison evals"`
{% endhint %}

You'll receive a URL in the terminal output/CLI to monitor progress and view results in the Tessl web UI.<br>

<figure><img src="/files/tyh0FpqZ0VMbrI6OTU8u" alt=""><figcaption></figcaption></figure>

The id for the next step can be found in the URL (i.e. 019c4791-9eec-7458-b28a-6c94405a3d38)

***

### Step 5: Review your results

Eval runs can take time. Use any of these to check status:

| Command                       | Description                            |
| ----------------------------- | -------------------------------------- |
| Visit the URL from Step 4     | Direct link to results                 |
| `tessl eval view <id>`        | View a specific eval run               |
| `tessl eval view --last`      | List last eval run with IDs and status |
| `tessl eval list`             | List all eval runs with IDs and status |
| `tessl eval view <id> --json` | Structured details on the eval run     |
| `tessl eval retry <id>`       | Retry a failed eval run                |

If you lose the id for your eval run, simply run `tessl eval list` to easily find it again.

**Example output from `tessl eval list`:**

<figure><img src="/files/Jm7WLranclqD17ZbebBC" alt=""><figcaption></figcaption></figure>

It's not uncommon for an attempt to not succeed on solving the eval scenario. You can either adjust the scenario or retry the scenario using `tessl eval retry <id>`.

***

### Step 6: Publish your plugin (Optional)

Since your skill was converted to a plugin, you can now manage it at the plugin level using the CLI.

To publish your plugin to the Tessl registry - a new eval will only be run if you have not run an eval previously or if the content of your plugin has changed since the last eval run:

`tessl plugin publish`

To publish without running a new eval:

`tessl plugin publish --skip-evals`

**Note:** plugins created through this flow are published as `private` by default. To make your plugin public, update `.tessl-plugin/plugin.json`: setting `"private": false` to the appropriate value.

For more plugin management options, run:

* `tessl plugin --help`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tessl.io/improving-your-skills/evaluate-skill-quality-using-scenarios.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
