Start from real expertise
A common pitfall in skill creation is asking an LLM to generate a skill without providing domain-specific context — relying solely on the LLM’s general training knowledge. The result is vague, generic procedures (“handle errors appropriately,” “follow best practices for authentication”) rather than the specific API patterns, edge cases, and project conventions that make a skill valuable.
Effective skills are grounded in real expertise. The key is feeding domain-specific context into the creation process.
Extract from a hands-on task
Complete a real task in conversation with an agent, providing context, corrections, and preferences along the way. Then extract the reusable pattern into a skill. Pay attention to:
- Steps that worked — the sequence of actions that led to success
- Corrections you made — places where you steered the agent’s approach (e.g., “use library X instead of Y,” “check for edge case Z”)
- Input/output formats — what the data looked like going in and coming out
- Context you provided — project-specific facts, conventions, or constraints the agent didn’t already know
Synthesize from existing project artifacts
When you have a body of existing knowledge, you can feed it into an LLM and ask it to synthesize a skill. A data-pipeline skill synthesized from your team’s actual incident reports and runbooks will outperform one synthesized from a generic “data engineering best practices” article, because it captures your schemas, failure modes, and recovery procedures. The key is project-specific material, not generic references.
Good source material includes:
- Internal documentation, runbooks, and style guides
- API specifications, schemas, and configuration files
- Code review comments and issue trackers (captures recurring concerns and reviewer expectations)
- Version control history, especially patches and fixes (reveals patterns through what actually changed)
- Real-world failure cases and their resolutions
Refine with real execution
The first draft of a skill usually needs refinement. Run the skill against real tasks, then feed the results — all of them, not just failures — back into the creation process. Ask: what triggered false positives? What was missed? What could be cut?
Even a single pass of execute-then-revise noticeably improves quality, and complex domains often benefit from several.
Read agent execution traces, not just final outputs. If the agent wastes time on unproductive steps, common causes include instructions that are too vague (the agent tries several approaches before finding one that works), instructions that don’t apply to the current task (the agent follows them anyway), or too many options presented without a clear default.
For a more structured approach to iteration, including test cases, assertions, and grading, see Evaluating skill output quality.
Spending context wisely
Once a skill activates, its full SKILL.md body loads into the agent’s context window alongside conversation history, system context, and other active skills. Every token in your skill competes for the agent’s attention with everything else in that window.
Add what the agent lacks, omit what it knows
Focus on what the agent wouldn’t know without your skill: project-specific conventions, domain-specific procedures, non-obvious edge cases, and the particular tools or APIs to use. You don’t need to explain what a PDF is, how HTTP works, or what a database migration does.
<!-- Too verbose — the agent already knows what PDFs are -->
## Extract PDF text
PDF (Portable Document Format) files are a common file format that contains
text, images, and other content. To extract text from a PDF, you'll need to
use a library. pdfplumber is recommended because it handles most cases well.
<!-- Better — jumps straight to what the agent wouldn't know on its own -->
## Extract PDF text
Use pdfplumber for text extraction. For scanned documents, fall back to
pdf2image with pytesseract.
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
```
Ask yourself about each piece of content: “Would the agent get this wrong without this instruction?” If the answer is no, cut it. If you’re unsure, test it. And if the agent already handles the entire task well without the skill, the skill may not be adding value. See Evaluating skill output quality for how to test this systematically.
Design coherent units
Deciding what a skill should cover is like deciding what a function should do: you want it to encapsulate a coherent unit of work that composes well with other skills. Skills scoped too narrowly force multiple skills to load for a single task, risking overhead and conflicting instructions. Skills scoped too broadly become hard to activate precisely. A skill for querying a database and formatting the results may be one coherent unit, while a skill that also covers database administration is probably trying to do too much.
Aim for moderate detail
Overly comprehensive skills can hurt more than they help — the agent struggles to extract what’s relevant and may pursue unproductive paths triggered by instructions that don’t apply to the current task. Concise, stepwise guidance with a working example tends to outperform exhaustive documentation. When you find yourself covering every edge case, consider whether most are better handled by the agent’s own judgment.
Structure large skills with progressive disclosure
The specification recommends keeping SKILL.md under 500 lines and 5,000 tokens — just the core instructions the agent needs on every run. When a skill legitimately needs more content, move detailed reference material to separate files in references/ or similar directories.
The key is telling the agent when to load each file. “Read references/api-errors.md if the API returns a non-200 status code” is more useful than a generic “see references/ for details.” This lets the agent load context on demand rather than up front, which is how progressive disclosure is designed to work.
Calibrating control
Not every part of a skill needs the same level of prescriptiveness. Match the specificity of your instructions to the fragility of the task.
Match specificity to fragility
Give the agent freedom when multiple approaches are valid and the task tolerates variation. For flexible instructions, explaining why can be more effective than rigid directives — an agent that understands the purpose behind an instruction makes better context-dependent decisions. A code review skill can describe what to look for without prescribing exact steps:
## Code review process
1. Check all database queries for SQL injection (use parameterized queries)
2. Verify authentication checks on every endpoint
3. Look for race conditions in concurrent code paths
4. Confirm error messages don't leak internal details
Be prescriptive when operations are fragile, consistency matters, or a specific sequence must be followed:
## Database migration
Run exactly this sequence:
```bash
python scripts/migrate.py --verify --backup
```
Do not modify the command or add additional flags.
Most skills have a mix. Calibrate each part independently.
When multiple tools or approaches could work, pick a default and mention alternatives briefly rather than presenting them as equal options.
<!-- Too many options -->
You can use pypdf, pdfplumber, PyMuPDF, or pdf2image...
<!-- Clear default with escape hatch -->
Use pdfplumber for text extraction:
```python
import pdfplumber
```
For scanned PDFs requiring OCR, use pdf2image with pytesseract instead.
Favor procedures over declarations
A skill should teach the agent how to approach a class of problems, not what to produce for a specific instance. Compare:
<!-- Specific answer — only useful for this exact task -->
Join the `orders` table to `customers` on `customer_id`, filter where
`region = 'EMEA'`, and sum the `amount` column.
<!-- Reusable method — works for any analytical query -->
1. Read the schema from `references/schema.yaml` to find relevant tables
2. Join tables using the `_id` foreign key convention
3. Apply any filters from the user's request as WHERE clauses
4. Aggregate numeric columns as needed and format as a markdown table
This doesn’t mean skills can’t include specific details — output format templates (see Templates for output format), constraints like “never output PII,” and tool-specific instructions are all valuable. The point is that the approach should generalize even when individual details are specific.
Patterns for effective instructions
These are reusable techniques for structuring skill content. Not every skill needs all of them — use the ones that fit your task.
When you need the agent to produce output in a specific format, provide a template. This is more reliable than describing the format in prose, because agents pattern-match well against concrete structures. Short templates can live inline in SKILL.md; for longer templates, or templates only needed in certain cases, store them in assets/ and reference them from SKILL.md so they only load when needed.
## Report structure
Use this template, adapting sections as needed for the specific analysis:
```markdown
# [Analysis Title]
## Executive summary
[One-paragraph overview of key findings]
## Key findings
- Finding 1 with supporting data
- Finding 2 with supporting data
## Recommendations
1. Specific actionable recommendation
2. Specific actionable recommendation
```
Checklists for multi-step workflows
An explicit checklist helps the agent track progress and avoid skipping steps, especially when steps have dependencies or validation gates.
## Form processing workflow
Progress:
- [ ] Step 1: Analyze the form (run `scripts/analyze_form.py`)
- [ ] Step 2: Create field mapping (edit `fields.json`)
- [ ] Step 3: Validate mapping (run `scripts/validate_fields.py`)
- [ ] Step 4: Fill the form (run `scripts/fill_form.py`)
- [ ] Step 5: Verify output (run `scripts/verify_output.py`)
Validation loops
Instruct the agent to validate its own work before moving on. The pattern is: do the work, run a validator (a script, a reference checklist, or a self-check), fix any issues, and repeat until validation passes.
## Editing workflow
1. Make your edits
2. Run validation: `python scripts/validate.py output/`
3. If validation fails:
- Review the error message
- Fix the issues
- Run validation again
4. Only proceed when validation passes
A reference document can also serve as the “validator” — instruct the agent to check its work against the reference before finalizing.
Plan-validate-execute
For batch or destructive operations, have the agent create an intermediate plan in a structured format, validate it against a source of truth, and only then execute.
## PDF form filling
1. Extract form fields: `python scripts/analyze_form.py input.pdf` → `form_fields.json`
(lists every field name, type, and whether it's required)
2. Create `field_values.json` mapping each field name to its intended value
3. Validate: `python scripts/validate_fields.py form_fields.json field_values.json`
(checks that every field name exists in the form, types are compatible, and
required fields aren't missing)
4. If validation fails, revise `field_values.json` and re-validate
5. Fill the form: `python scripts/fill_form.py input.pdf field_values.json output.pdf`
The key ingredient is step 3: a validation script that checks the plan (field_values.json) against the source of truth (form_fields.json). Errors like “Field ‘signature_date’ not found — available fields: customer_name, order_total, signature_date_signed” give the agent enough information to self-correct.
Bundling reusable scripts
When iterating on a skill, compare the agent’s execution traces across test cases. If you notice the agent independently reinventing the same logic each run — building charts, parsing a specific format, validating output — that’s a signal to write a tested script once and bundle it in scripts/.
For more on designing and bundling scripts, see Using scripts in skills.
Next steps
Once you have a working skill, two guides can help you refine it further: