8ddieHu0314/Skill-Lab: Agent Skills Evaluation Framework

A Python CLI tool for evaluating agent skills through static analysis, trigger testing, and trace analysis.

SKILL.md Parsing: Parse YAML frontmatter and markdown body from skill definitions
19 Static Checks: Comprehensive checks across 4 dimensions
- Structure: File existence, folder organization, frontmatter validation, standard fields
- Naming: Format, directory matching
- Description: Required, non-empty, max length
- Content: Examples, line budget, reference depth
Trigger Testing: Test skill activation with 4 trigger types (explicit, implicit, contextual, negative)
Quality Scoring: Weighted 0-100 score based on check results
Multiple Output Formats: Console (rich formatting) and JSON

# From PyPI
pip install skill-lab

# From source
pip install -e .

# With development dependencies
pip install -e ".[dev]"

# Evaluate a skill (path defaults to current directory)
sklab evaluate ./my-skill
sklab evaluate                    # Uses current directory

# Quick validation (pass/fail)
sklab validate ./my-skill
sklab validate                    # Uses current directory

# List available checks
sklab list-checks

# Console output (default)
sklab evaluate ./my-skill

# JSON output
sklab evaluate ./my-skill --format json

# Save to file
sklab evaluate ./my-skill --output report.json

# Verbose (show all checks, not just failures)
sklab evaluate ./my-skill --verbose

# Spec-only (skip quality suggestions)
sklab evaluate ./my-skill --spec-only

# Returns exit code 0 if valid, 1 if invalid
sklab validate ./my-skill

# List all checks
sklab list-checks

# Filter by dimension
sklab list-checks --dimension structure

# Show only spec-required checks
sklab list-checks --spec-only

Test whether skills activate correctly with real LLM execution:

# Run trigger tests (path defaults to current directory)
sklab trigger ./my-skill
sklab trigger                     # Uses current directory

# Filter by trigger type
sklab trigger --type explicit
sklab trigger --type negative

Prerequisites: Trigger testing requires:

Claude CLI: Install via npm install -g @anthropic-ai/claude-code

Note: Codex CLI support is coming in v0.3.0.

Test Definition (tests/triggers.yaml):

skill: my-skill
test_cases:
  - id: explicit-1
    name: "Direct invocation to do something"
    type: explicit
    prompt: "$my-skill do something"
    expected: trigger
  - id: negative-1
    name: "Unrelated question (should not trigger)"
    type: negative
    prompt: "unrelated question"
    expected: no_trigger

{
  "skill_path": "/path/to/skill",
  "skill_name": "my-skill",
  "timestamp": "2026-01-25T14:30:00Z",
  "duration_ms": 45.3,
  "quality_score": 87.5,
  "overall_pass": true,
  "checks_run": 19,
  "checks_passed": 17,
  "checks_failed": 2,
  "results": [...],
  "summary": {
    "by_severity": {...},
    "by_dimension": {...}
  }
}

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=skill_lab

# Type checking
mypy src/

# Linting
ruff check src/

# Format code
ruff format src/

MIT

Source link

Leave a Reply Cancel reply