Overview - Improving skills and plugins
Tessl is evolving, one of the main pillars is the visibility and tools available to ensure your skills actually work! Tessl provides a number of capabilities to assess your skills and plugins to validate quality, actionability and outcomes. Think of Tessl as a toolkit.
Often users use the terms "review", "eval" and "evaluations" interchangeably, or in Tessl you have lint, review, and several types of evals, but at the end of the day, there are tools with different capabilities that are designed to help make your skills better.
Let's review these capabilities and how they work, click the link to see documentation and examples:
Use LLM to judge if skill conforms to best practices, like Skill standard. An example would be if the Description field matches best practices, so that it will activate, which is something often incorrectly set. You can use the Optimize option to take the recommendations, preview changes, and then accept them to be automatically applied.
Available locally on the CLI.
Automatically generated on publish and displayed with the skill in the registry.
Displayed as "Quality" in the web interface when viewing a skill.
Tessl lets you run end-to-end task evaluations for your skills directly from the CLI. You generate a set of scenarios, run an agent against them, and see how well it performs — with and without your skill injected. This workflow is designed for fast, repeatable iteration as you develop and refine a skill, without building your own eval harness.
Available locally on the CLI.
Auto generate scenarios or create your own.
Scenarios, if present on publish, will appear under the Evals tab in the web interface when viewing a skill.
How well your context files (skills, rules, documentation) enable an AI agent to complete real tasks on your codebase. It covers scenario definition, running agents with different setups, testing variations, and comparing results.
Available locally on the CLI.
Generate scenarios based off a commit or create your own.
Scenarios, if present on publish, will appear under the Evals tab in the web interface when viewing a skill.
An official skill to perform agent session analysis to optimize your skills. Discover friction points your agent is having and improve performance using Tessl's powerful logging tools.
Evals vs Reviews
The Lint & Review Skills feature reviews the skills against best practice, whereas Evaluations actually generates scenarios and then validates the quality of the skill, by testing if agents perform better against those scenarios with the skill.
You use both Evaluations and Reviews to make a better plugin.
It's also important to note that there are two main categories of evaluations: synthetic tests based on what the skill says it does (aka "Scenario based evals"), vs trying to reproduce a real commit with context provided (aka "Evaluate codebase agent readiness").
Last updated

