Review a skill against best practices

How to review and optimize skills to ensure they follow best practices

Skills provide procedural knowledge and specific workflows that agents load when relevant. This guide explains how to run skill reviews against best practices and on the next page we'll talk about how to use the automated optimize option to address issues before deploying the skill to your team.

TL;DR

Compare your skill against best practices. For example:

Examine the description in the skill, determine if the wording is effective, and how that affects activation.

Why review skills?

Skills encode team knowledge and workflows. Skill reviews help you:

Assess if skills conform to the skills standard
Validate skill content quality (how likely skill is to help) before deploying to your team
Validate skill description quality (how likely skill is to activate) before deploying to your team

Viewing skill reviews

In the Tessl Registry, skill reviews show multiple scores.

Example: React development skill review

Review Score: Overall quality assessment (0-100%)

Weighted average of the three sub components below

Validation Checks: Validates that skill follows the criteria at https://agentskills.io/specification

Checks covering line count, frontmatter, schema, license and metadata
Pass/warning/fail deterministic grading

Implementation Score: LLM-as-a-judge review of the SKILL.md body, graded on:

Conciseness
Actionability
Workflow clarity
Progressive disclosure

Activation Score: LLM-as-a-judge review of the description, assessing how likely agents are to use the skill, graded on:

Specificity
Completeness
Trigger Term Quality
Distinctiveness Conflict Risk

Each skill review includes detailed validation results showing what passed, what needs improvement, and specific recommendations.

What scores mean:

90%+ Review Score: Skill conforms well to best practices
70-89% Review Score: Good skill, may have minor improvements needed
Below 70%: Likely needs work before deployment

Use these scores to choose quality skills and identify what to fix in your own skills before publishing.

Automatic review on publish

When you publish a skill to the registry using tessl skill publish, skill reviews run automatically:

# Publish skill with automatic skill review
tessl skill publish ./<Path to skill>

What happens automatically:

Skill is linted for format and structure
Skill review is performed
Review scores are calculated and displayed in the registry

Reviewing skills locally

Before publishing skills, validate them locally:

# Validate skill format and structure
tessl skill lint ./<tile.json folder>

# Get detailed quality review
tessl skill review ./<path to SKILL.md folder>

Fix any issues locally, then publish. The registry will show updated review scores. For fixing issues with skill reviews, use the optimise flag.

Here's a sample output of tessl skill review in the CLI:

$tessl skill review skills/debug-api-endpoints/SKILL.md 

Validation Checks

  ✔ skill_md_line_count - SKILL.md line count is 152 (<= 500)
  ✔ frontmatter_valid - YAML frontmatter is valid
  ✔ name_field - 'name' field is valid: 'debug-api-endpoints'
  ✔ description_field - 'description' field is valid (59 chars)
  ✔ description_voice - 'description' uses third person voice
  ⚠ description_trigger_hint - Description may be missing an explicit 'when to use' trigger hint (e.g., 'Use when...')
  ✔ compatibility_field - 'compatibility' field not present (optional)
  ✔ allowed_tools_field - 'allowed-tools' field not present (optional)
  ⚠ metadata_version - 'metadata' field is not a dictionary
  ✔ metadata_field - 'metadata' field not present (optional)
  ⚠ license_field - 'license' field is missing
  ✔ frontmatter_unknown_keys - No unknown frontmatter keys found
  ✔ body_present - SKILL.md body is present
  ✔ body_examples - Examples detected (code fence or 'Example' wording)
  ✔ body_output_format - Output/return/format terms detected
  ✔ body_steps - Step-by-step structure detected (ordered list)

Overall: PASSED (0 errors, 3 warnings)

Judge Evaluation

  Description: 22%
    specificity: 1/4 - The description uses vague language ('test or debug API endpoints systematically') without listing any concrete actions like 'send requests', 'validate responses', 'check status codes', or 'inspect headers'.
    trigger_term_quality: 2/4 - Contains some relevant keywords ('test', 'debug', 'API endpoints') that users might say, but misses common variations like 'REST', 'HTTP requests', 'curl', 'postman', 'API calls', or specific verbs like 'hit an endpoint'.
    completeness: 1/4 - The description only addresses 'when' (framed as a trigger) but completely lacks the 'what' - it never explains what capabilities or actions the skill provides. This inverts the typical problem but still fails completeness.
    distinctiveness_conflict_risk: 2/4 - The API testing domain is somewhat specific, but 'test or debug' is broad enough to potentially conflict with general debugging skills, testing frameworks, or other API-related skills without clear differentiation.

    Assessment: This description is structured as a 'when' clause only, completely omitting what the skill actually does. While it identifies a reasonable use case (API testing/debugging), it lacks concrete actions, comprehensive trigger terms, and sufficient detail to distinguish it from other testing or API-related skills.

    Suggestions:
      - Add specific capabilities the skill provides, e.g., 'Send HTTP requests, inspect response headers and bodies, validate status codes, test authentication flows'
      - Expand trigger terms to include natural variations: 'REST API', 'HTTP requests', 'curl', 'API calls', 'endpoints', 'request/response'
      - Restructure to lead with 'what' then 'when': 'Tests API endpoints by sending HTTP requests and validating responses. Use when debugging REST APIs, testing endpoints, or inspecting HTTP request/response cycles.'

  Content: 77%
    conciseness: 2/4 - The content is reasonably efficient but includes some unnecessary explanation that Claude would already know (e.g., explaining what 401/403/404 mean, basic concepts like 'XSS attempts blocked'). The structure is good but could be tightened.
    actionability: 3/4 - Provides concrete, executable curl commands with expected outputs for each testing phase. The examples are copy-paste ready and cover specific scenarios with clear expected results.
    workflow_clarity: 3/4 - Clear 4-step sequential workflow with explicit ordering (Infrastructure → Security → Validation → Functionality). The troubleshooting section provides a decision tree for debugging, and the 'Test in order' best practice reinforces the validation checkpoint approach.
    progressive_disclosure: 2/4 - Content is well-organized with clear sections and headers, but it's a monolithic document that could benefit from splitting detailed test cases into separate reference files. For a skill of this length (~150 lines), some content like the detailed troubleshooting tips could be externalized.

    Assessment: This is a solid, actionable skill with a clear systematic workflow and executable examples. The main weakness is verbosity - it explains concepts Claude already knows (HTTP status codes, basic security concepts) and could be more concise. The structure is excellent but the document length suggests some content could be split into reference files.

    Suggestions:
      - Remove explanatory text for concepts Claude knows (e.g., 'Request without Authorization header' after 'No token returns 401' is redundant)
      - Consider extracting the detailed test cases for each step into a separate TESTS.md reference file, keeping SKILL.md as a concise overview
      - Consolidate the 'What to check' code blocks - some curl examples are repetitive and could be reduced to one representative example per step

Average Score: 50%

Next steps

Optimize a skill using best practices - Automate fixing issues in your skill
Evaluate skill quality using scenarios - Evaluate skill effect on agent performance
Evaluating documentation - Measure documentation effectiveness
Distributing via registry for publishing skills.
Creating skills - Write better skills
Publishing skills - Share reviewed skills

PreviousDeveloping tiles locally NextOptimize a skill using best practices

Last updated 10 days ago

Good afternoon

hashtagTL;DR

hashtagWhy review skills?

hashtagViewing skill reviews

hashtagAutomatic review on publish

hashtagReviewing skills locally

hashtagNext steps

hashtagRelated resources