scripts/ directory. This guide covers one-off commands, self-contained scripts with their own dependencies, and how to design script interfaces for agentic use.
One-off commands
When an existing package already does what you need, you can reference it directly in yourSKILL.md instructions without a scripts/ directory. Many ecosystems provide tools that auto-resolve dependencies at runtime.
- uvx
- pipx
- npx
- bunx
- deno run
- go run
- Pin versions (e.g.,
npx eslint@9.0.0) so the command behaves the same over time. - State prerequisites in your
SKILL.md(e.g., “Requires Node.js 18+”) rather than assuming the agent’s environment has them. For runtime-level requirements, use thecompatibilityfrontmatter field. - Move complex commands into scripts. A one-off command works well when you’re invoking a tool with a few flags. When a command grows complex enough that it’s hard to get right on the first try, a tested script in
scripts/is more reliable.
Referencing scripts from SKILL.md
Use relative paths from the skill directory root to reference bundled files. The agent resolves these paths automatically — no absolute paths needed.
List available scripts in your SKILL.md so the agent knows they exist:
SKILL.md
SKILL.md
The same relative-path convention works in support files like
references/*.md — script execution paths (in code blocks) are relative to the skill directory root, because the agent runs commands from there.Self-contained scripts
When you need reusable logic, bundle a script inscripts/ that declares its own dependencies inline. The agent can run the script with a single command — no separate manifest file or install step required.
Several languages support inline dependency declarations:
- Python
- Deno
- Bun
- Ruby
PEP 723 defines a standard format for inline script metadata. Declare dependencies in a TOML block inside Run with uv (recommended):
# /// markers:scripts/extract.py
uv run creates an isolated environment, installs the declared dependencies, and runs the script. pipx (pipx run scripts/extract.py) also supports PEP 723.- Pin versions with PEP 508 specifiers:
"beautifulsoup4>=4.12,<5". - Use
requires-pythonto constrain the Python version. - Use
uv lock --scriptto create a lockfile for full reproducibility.
Designing scripts for agentic use
When an agent runs your script, it reads stdout and stderr to decide what to do next. A few design choices make scripts dramatically easier for agents to use.Avoid interactive prompts
This is a hard requirement of the agent execution environment. Agents operate in non-interactive shells — they cannot respond to TTY prompts, password dialogs, or confirmation menus. A script that blocks on interactive input will hang indefinitely. Accept all input via command-line flags, environment variables, or stdin:Document usage with --help
--help output is the primary way an agent learns your script’s interface. Include a brief description, available flags, and usage examples:
Write helpful error messages
When an agent gets an error, the message directly shapes its next attempt. An opaque “Error: invalid input” wastes a turn. Instead, say what went wrong, what was expected, and what to try:Use structured output
Prefer structured formats — JSON, CSV, TSV — over free-form text. Structured formats can be consumed by both the agent and standard tools (jq, cut, awk), making your script composable in pipelines.
Further considerations
- Idempotency. Agents may retry commands. “Create if not exists” is safer than “create and fail on duplicate.”
- Input constraints. Reject ambiguous input with a clear error rather than guessing. Use enums and closed sets where possible.
- Dry-run support. For destructive or stateful operations, a
--dry-runflag lets the agent preview what will happen. - Meaningful exit codes. Use distinct exit codes for different failure types (not found, invalid arguments, auth failure) and document them in your
--helpoutput so the agent knows what each code means. - Safe defaults. Consider whether destructive operations should require explicit confirmation flags (
--confirm,--force) or other safeguards appropriate to the risk level. - Predictable output size. Many agent harnesses automatically truncate tool output beyond a threshold (e.g., 10-30K characters), potentially losing critical information. If your script might produce large output, default to a summary or a reasonable limit, and support flags like
--offsetso the agent can request more information when needed. Alternatively, if output is large and not amenable to pagination, require agents to pass an--outputflag that specifies either an output file or-to explicitly opt in to stdout.