Agent Skills Need Package Managers

May 17, 2026 13 min read

ai-agents agent-skills mcp package-management codex software-development

From the guide: Codex CLI Comprehensive Guide

Agent skills now have the same failure mode JavaScript had before lockfiles: everyone copies useful files into local config, then every copy drifts.

The signal arrived from multiple directions in the same week. Microsoft’s Agent Package Manager documentation describes agent context as something teams should declare in a manifest, resolve into a lockfile, and distribute into the directories each AI client already reads.¹ Sx describes the same category from another direction: a package manager for AI coding assistants that can share skills, rules, agents, commands, hooks, MCP servers, and plugin bundles across teams and tools.²

The category matters because Codex, Claude, Cursor, Copilot, Gemini, OpenCode, and similar tools no longer run on prompts alone. They run on process files, skill definitions, command files, MCP server declarations, hook scripts, policy files, and plugin manifests. Those files now shape the agent’s behavior before the first token of task-specific work appears.

TL;DR

Agent skills need package managers because agent context has become software supply chain. A useful skill is not only prose. It can pull in scripts, MCP servers, hooks, commands, agents, and installation scope. Teams need a manifest, a lockfile, content scanning, source policy, review gates, and rollback for those assets.

The right question is no longer “where do I paste this skill?” The right question is “what version did we install, where did it come from, who approved it, which clients received it, what can it execute, and how do we revert it?”

Package managers will not make agent work safe by themselves. They make the dependency graph visible enough to govern.

Key Takeaways

For engineering teams: - Treat agent skills, MCP servers, hooks, commands, prompts, and plugins as dependencies. - Commit lockfiles, review updates, and run install/audit checks before a new package reaches a shared project.

For security reviewers: - Separate build-time package integrity from runtime safety. A clean install does not prove a hook or MCP server behaves safely after the agent reads it. - Require source allow lists, pinned commits, hidden-character scans, and secret-indirection rules before trusting shared agent context.

For agent-tool builders: - Package the minimum coherent capability, not the whole private workflow. - Build for scoped installation, update review, and rollback from the first public release.

What Changed?

OpenAI’s Codex Academy page now gives a plain split: plugins connect Codex to external tools and sources of information, while skills teach Codex a team’s specific process.³ Anthropic’s plugin documentation uses a broader packaging frame: plugins bundle MCP connectors, skills, slash commands, and sub-agents into a reusable capability package.⁴

Those definitions create an operational problem. A team does not install “advice” anymore. A team installs files that can change which tools an agent sees, which workflows users invoke, which background checks run, and which instructions load into context.

Claude Code’s plugin reference shows the shape directly. A plugin can include skills, commands, agents, hooks, MCP server declarations, monitors, binaries, settings, and a manifest.⁵ Its CLI supports install scopes such as user, project, and local; its version resolution can come from plugin metadata, marketplace metadata, or a git commit SHA.⁶

That looks like a dependency system because it is a dependency system.

Why Copy-Paste Breaks

Copy-paste works for one developer trying one skill. It fails for a team.

The first failure is drift. One repo has yesterday’s skill. Another repo has the branch version. A third developer edits a local copy because a sentence annoyed the model. Nobody knows which version produced last week’s good result.

The second failure is scope. A design-review skill belongs in design-heavy repositories. A database-migration skill may belong only in backend services. A secret-scanning hook belongs almost everywhere. Global installation bloats context and increases accidental activation. Per-project copy-paste buries useful work.

The third failure is trust. A skill file can include procedural instructions. A plugin can include hooks. An MCP server can connect to data and tools. A slash command can trigger a multi-step workflow. A package manager cannot decide whether the workflow deserves trust, but it can force the installer to answer where the files came from and which version entered the tree.

The fourth failure is rollback. When a new skill weakens an agent’s judgment, the team needs one revertible dependency change. Manual copies turn rollback into archaeology.

What A Package Manager Adds

Microsoft APM frames the package-manager shape explicitly. apm.yml declares dependencies. apm.lock.yaml pins resolved packages so two developers can install byte-identical context. APM writes into existing client directories such as .github/, .claude/, .cursor/, .codex/, AGENTS.md, .gemini/, .opencode/, and .windsurf/; it does not invent a new runtime.¹

Its quickstart shows the practical artifact set: apm.yml, apm.lock.yaml, a gitignored apm_modules/ cache, client-neutral skills, and target-specific output files. The same page says APM resolves transitive dependencies, scans package content for hidden Unicode, and records exact commits plus content hashes in the lockfile.⁷

The dependency workflow looks familiar:

Old software dependency question	Agent package equivalent
Which library version did we install?	Which skill/plugin/MCP version did we install?
What does the lockfile pin?	Which commit, content hash, and deployed files entered the agent setup?
Which packages can run code?	Which hooks, binaries, commands, and MCP servers can execute?
Which dependency is allowed in production?	Which sources, scopes, primitives, and transports can reach shared projects?
How do we roll back?	Revert the package manifest or lockfile and reinstall compiled context.

The Microsoft docs also spell out the lockfile discipline: commit the generated lockfile, never hand-edit it, and inspect it to answer which version the team actually runs.⁸

That discipline matters more for agents than it did for many earlier config files. Agent context changes behavior probabilistically. A one-line instruction can alter what the model refuses, which tool it prefers, whether it stops for evidence, or whether it treats a release as done.

Sx Shows The Same Pressure

Sx starts from a different product surface but lands in the same category. Its README calls sx a package manager for AI coding assistants and says it manages skills, MCP configs, commands, and related assets.² It supports install scopes across organizations, repositories, paths, teams, users, and bot identities.⁹

The scoping detail matters. Good agent context should not load everywhere. A package manager should answer: who receives the asset, in which repo, under which path, and for which bot or human identity?

Sx also treats audit and usage as first-class surfaces. Its README lists sx stats for adoption data and sx audit for recent team or install mutations.⁹ That points toward the next layer: agent packages need not only distribution, but also usage evidence. A skill that nobody invokes is dead weight. A skill that everyone invokes but repeatedly repairs needs revision. A hook that blocks useful work needs a change request, not quiet deletion.

The strongest Sx idea is not the marketplace. The strongest idea is scoped distribution plus observed adoption.

What Package Managers Cannot Prove

A package manager can make the dependency graph visible. It cannot make every package worthy.

Microsoft’s security documentation states the boundary clearly. APM defends the build-time supply chain for prompts, instructions, skills, hooks, and MCP server declarations. It targets reproducibility, integrity, provenance, and pre-deploy content safety.¹⁰ The same page says APM does not sandbox MCP servers at runtime, does not perform malware analysis on dependency code, does not sign packages, and does not inspect what the agent does after reading context.¹¹

That boundary should shape adoption.

Do not treat an install success as a trust decision. Treat install success as a reason to continue review. The review still needs to inspect visible instructions, executable hooks, MCP transports, environment-variable handling, update policy, and the actual job the package claims to perform.

The rule is simple: package managers make agent context governable, not inherently good.

The Minimum Standard

Teams do not need to wait for one ecosystem winner before improving their process. They can start with six rules.

1. Inventory every agent asset. List skills, commands, hooks, MCP servers, agents, plugin bundles, prompt files, and project instructions. If the team cannot inventory the assets, it cannot govern them.

2. Split personal, project, and organization scope. Personal experiments should not become project defaults. Project standards should not become global context. Organization packages should carry explicit ownership.

3. Pin versions before sharing. Use tags or commit SHAs for shared packages. Floating branches belong in experiments, not release workflows.

4. Commit the lockfile. Reproducibility requires the resolved tree, not only the manifest intent.

5. Review runtime surfaces separately. Hooks, binaries, shell commands, and MCP servers deserve stricter review than plain instructional skills. They can execute or connect, so they carry higher risk.

6. Make rollback boring. A bad package update should revert through one dependency change plus one reinstall command. If rollback requires remembering copied files, the system is not ready.

A Practical Adoption Map

Start small.

Package one harmless skill first: a writing rubric, test checklist, or review format. Install it into one repo. Confirm the right client sees it. Confirm the lockfile pins it. Confirm uninstall works.

Next, package a command that people already invoke manually. Avoid hooks and MCP servers until the team understands the install and rollback path.

Then package an MCP server declaration, but keep credentials out of the package. Use environment-variable references and a separate secret store. The package should describe the runtime dependency, not carry the secret.

Hooks come last. A hook can enforce quality at the right moment, but it can also block work, hide brittle assumptions, or execute scripts under the wrong trust model. Ship hooks only after the team has source policy, review ownership, and rollback.

That sequence respects the risk gradient:

Package type	Default risk	First review question
Plain skill	Low	Does it improve work without bloating context?
Prompt or slash command	Medium	Does it trigger the right workflow and preserve user control?
Agent persona	Medium	Does it narrow scope or create confusion with the main agent?
MCP server	High	What data and actions can it expose?
Hook or executable	High	What can it run, when does it run, and how does it fail?

The Review Packet

Before a shared agent package enters a project, require one review packet. Keep it boring.

Field	Required answer
Source	Repository, owner, version ref, and lockfile entry
Contents	Skills, prompts, commands, hooks, agents, MCP servers, binaries, and settings
Scope	User, project, local, organization, team, path, or bot
Runtime surface	Files only, tool access, shell execution, network access, or external data access
Secrets	Environment-variable references only, with no literal credentials
Policy	Allowed source, allowed primitive type, allowed transport, and review owner
Verification	Install dry run, content scan, route/client discovery, and rollback test
Exit plan	Exact uninstall, prune, or revert command

That packet prevents the worst failure: a team saying “we installed a skill” when it actually installed a plugin, an MCP server, two hooks, and a command nobody reviewed.

The Taste Layer Still Matters

Agent packages will invite quantity. A team can install 40 skills because installation feels cheap. Cheap context still has a cost.

Every added skill competes for attention. Every command adds a choice. Every hook adds a possible block. Every MCP server increases the action surface. The package manager solves distribution, not judgment.

The right standard stays small and sharp: install what improves the work, remove what bloats the agent, pin what survives review, and watch what people actually use.

That is the Steve test for agent packages. Do not publish the maximum bundle. Publish the coherent one.

Quick Summary

Agent skills need package managers because agent context now behaves like dependency code. A skill can carry process. A plugin can carry commands, hooks, MCP servers, and agents. A package can change the behavior of every developer’s agent setup.

The package manager’s job is not to make those assets good. Its job is to declare them, pin them, distribute them, audit them, and make rollback possible. The team’s job is still harder: decide which assets deserve to exist.

FAQ

Are agent skills really dependencies?

Yes. A shared skill changes how an agent performs a task. A plugin can also add commands, hooks, MCP servers, and agent definitions. Those files influence behavior across machines, so teams should track them with the same seriousness they apply to code dependencies.

Does a package manager replace plugin review?

No. A package manager records source, version, hash, scope, and installed files. Review still needs to inspect what the package says, what it can execute, which MCP servers it declares, and whether the capability belongs in the project.

Should teams package private workflows?

Teams should package repeatable jobs-to-be-done, not private operating details. A public package can ship a general review gate, migration checklist, or documentation workflow. It should not ship private prompts, sensitive file paths, credentials, internal source maps, or proprietary scoring internals.

What should a team package first?

Start with a low-risk skill that already works manually. Avoid MCP servers and hooks until the team has a manifest, a lockfile, source policy, install review, and rollback path.

What is the best package-manager feature for agent work?

The lockfile is the load-bearing feature. Discovery helps, and install commands feel good, but reproducible agent context requires exact source refs, content hashes, and a record of deployed files.

References

Microsoft, “What is APM?”, Agent Package Manager documentation, last updated May 11, 2026. Primary source for APM as a dependency manager for AI agent context, the apm.yml / apm.lock.yaml mental model, managed primitives, target outputs including .codex/ and AGENTS.md, and the three promises of manifest portability, security checks, and policy governance. ↩↩
Sleuth, “sleuth-io/sx”, GitHub repository, accessed May 17, 2026. Primary source for Sx describing itself as a package manager for AI coding assistants, the managed asset categories, supported clients, install scopes, audit/stat commands, and latest release metadata. ↩↩
OpenAI Academy, “Plugins and skills”, April 23, 2026. Primary source for the Codex distinction between plugins as tool/data connectors and skills as team process playbooks. ↩
Anthropic, “Plugins overview”, Claude documentation, accessed May 17, 2026. Primary source for Claude plugins as reusable packages bundling MCP connectors, skills, slash commands, and sub-agents. ↩
Anthropic, “Plugins reference”, Claude Code documentation, accessed May 17, 2026. Primary source for Claude Code plugin components including skills, commands, agents, hooks, MCP servers, monitors, binaries, settings, and manifests. ↩
Anthropic, “Plugins reference”, Claude Code documentation, accessed May 17, 2026. Source for plugin install scopes, plugin dependency pruning, component inventory, projected token cost, and version resolution behavior. ↩
Microsoft, “Quickstart”, Agent Package Manager documentation, last updated May 11, 2026. Source for the install flow, generated apm.yml, apm.lock.yaml, apm_modules/, target output files, transitive dependency resolution, hidden Unicode scanning, and policy preflight. ↩
Microsoft, “Manage dependencies”, Agent Package Manager documentation, last updated May 11, 2026. Source for dependency reference forms, pinning, branch versus tag/SHA behavior, lockfile contents, and lockfile rules. ↩
Sleuth, “sx README”, GitHub repository, accessed May 17, 2026. Source for Sx installation scopes, cloud relay, stats, audit, supported clients, and asset types. ↩↩
Microsoft, “Security and Supply Chain”, Agent Package Manager documentation, last updated May 11, 2026. Source for APM’s build-time threat model: reproducibility, integrity, provenance, and pre-deploy content safety. ↩
Microsoft, “Security and Supply Chain”, Agent Package Manager documentation, last updated May 11, 2026. Source for stated non-goals: no runtime sandboxing for MCP servers, no malware analysis, no package signing, no visible prompt-injection defense, and no post-read agent behavior inspection. ↩