The Downsides of Agentic Skills

May 17, 2026

Skills are seductive. Drop a folder in your repo, write a description, and now the agent “knows” how to do a thing. The pitch is real, but if you’re putting them into a production project, there are important tradeoffs people seem to omit very often.

Context isn’t free

Every skill that matches your task gets loaded into context. SKILL.md files are usually (and desirably) small individually, but as your library grows, the eligible set grows with it. You pay in tokens, latency, and (more subtly) attention budget. The model has to read everything you gave it. The more skills compete for relevance, the more likely the wrong one wins.

Triggering is fuzzy

Skills activate based on description matching, not deterministic rules. A subtle reword in the description changes when one fires. Two skills with overlapping triggers can both fire, neither fire, or fire in the wrong order. There’s no compiler to catch this. You find out in production.

I have experienced this myself several times. A Skill would suddenly stop firing for no reason apparently, and I would have to point the agent to use it very explicitly.

Non-determinism leaks into your workflow

A traditional library does the same thing every call. A skill is a hint the agent may or may not act on, in an order it picks. Same input, potentially different path. That’s fine for exploration. It’s painful for anything you need to reason about, test, or audit. Haven’t you ever noticed a difference in behaviour from one run of a Skill to another? It happens. And it makes sense, since a Skill is just a guideline. The agent uses it as context and then decides how to act on it. So, by definition, they are not deterministic.

Testing gets harder

You can unit-test a function. How do you unit-test “the agent should use the docx skill when the user asks for a Word doc”? You write evals. Evals are slow, expensive, and probabilistic. CI cycles can balloon. Regressions can hide for weeks because your eval suite passed at 94% and it always passes at around 94%.

It’s a new kind of tech debt

Adding a skill is often easier than fixing the underlying flow. If you hit an awkward API, it’s probably easier to wrap it in a skill. Same for a confusing internal tool or library. Six months later you have a layer of skills papering over the real problems, and refactoring the real code now means rewriting all the skills that depend on its quirks.

The permission surface grows quietly

Each skill can pull in shell access, file system reach, network calls. Every one is a trust boundary. Most teams audit their dependencies. Few have a story for auditing what a skill is allowed to do, what it ran, and what it touched. Once skills can call other skills, the blast radius compounds. This is an inherent problem of agentic coding too, not only skills. But skills can make it even more opaque.

You own the maintenance

Skills age. Underlying APIs change, model behaviors drift, your own internal tools evolve. Unlike a library version bump, there’s no clean changelog telling you what broke. You’ll discover the breakage when the agent quietly does the wrong thing. This one is particularly dangerous, so you’ll need some workflow in place for continuously auditing your skills or the described scenario will inevitably happen.

Onboarding gets weirder

New teammates used to read the codebase. Now they also need to know which skills exist, when they trigger, how they compose, and which ones are load bearing. That knowledge lives in descriptions and behavior, not in stack traces. This easily becomes tribal knowledge and an important mental overhead.

When skills make sense anyway

None of this means don’t use them, but you should be very aware of their risks and implications. Skills are a good fit when the task is genuinely fuzzy and benefits from model judgment, when the workflow is exploratory, when the cost of a wrong invocation is low, and when you actually run evals and checks over the resulting code.

They’re a bad fit when you need deterministic, auditable, testable behavior, which is most production code paths. Skills are not like re-executable commands, neither function calls. They are just a guideline for the agent, nothing else.

But then again, this is not a problem of skills only, but the whole agentic code concept. In a world where code is generated in a non-deterministic way, evals and automated checks become more important than ever.

The honest framing: skills are a powerful tool with a real maintenance and reasoning cost. Do not treat them like a free abstraction. Decide deliberately what belongs in a skill versus a plain function the agent can call.

Effective Android

Discussion about this post

Ready for more?