Skills Are the New Gold
How a shared folder of markdown files became the contract between our agent and our data team, and why we now think it's the most valuable thing we ship.

We were building two features in parallel: the AI Segment Builder and the AI Campaign Builder. Both let a merchant ask plain-English questions about their customers and get back something they can act on. "Customers in Italy who bought fragrances in the last 90 days but haven't ordered since." That kind of thing.
I was on the agent side. The data team was on the warehouse side. Both of us were moving. Neither of us was finished.
This is the part of building production AI features that nobody warns you about. The textbook version says: the data is there, the agent goes and queries it. The real version is that the data is being built at the same time the agent is being built. New marts land every week. Column names shift. The join key on the new customer-category table changed twice in a month. The data team didn't know what the agent wanted yet. The agent (me) didn't know what the data team was going to ship.
If you've tried to coordinate a parallel build like this, you know the failure modes. Weekly syncs that go stale by Wednesday. A Notion page nobody updates. The agent's prompt full of # TODO: confirm with data team comments. Or the worst version: one team waits for the other to "be ready," and the project that was supposed to ship in six weeks ships in fourteen.
We needed something that could move at the speed of both teams. We needed a contract.
This post is about what we ended up using as that contract, a shared folder of markdown files, and why we now think that folder is the most valuable artifact in our entire AI stack.
The shared folder
The trick, once we figured it out, was almost embarrassingly simple.
We made a directory inside brain0 called skills/. Inside it, the data team got their own subfolder, redshift-schema/, and we agreed on one rule: whatever the data team writes in there is what the agent will know. Tables, column names, join patterns, gotchas, the canonical mart for a given question, the legacy table you should not use anymore. All of it. In markdown. In the repo. Reviewed in PRs.
The pattern has a name. Anthropic calls it Agent Skills. Our agent framework exposes the same primitive, and the OpenAI ecosystem is feeling its way toward something similar.
Most engineers have first met skills through their coding agents. They're what powers Claude Code, Cursor, Codex: a SKILL.md file in the repo that teaches the assistant how a particular codebase wants to be edited, which scripts to prefer, which patterns to avoid. That's where most of the public writing about skills lives today.
We're doing something less common with the same primitive. We're using it to power a production, user-facing agent. Not an assistant helping an engineer write code, but an agent answering a merchant's question about their own customers and running real SQL against a real warehouse to do it. Same shape, different surface, and that move from "skills for coding agents" to "skills for production agents" is the part of this story I think matters most.
Everyone is converging on this primitive because it's the right shape for putting an LLM on top of a knowledge-rich system. Once you've felt the alternative, the convergence makes sense.
The shape is a directory:
skills/redshift-schema/
├── SKILL.md
└── references/
├── tables_audience.md
├── tables_products.md
├── patterns_segments_by_category.md
├── patterns_audience_rfm.md
└── learnings.md
What makes it work isn't the layout. It's the staging. The agent doesn't load the whole skill at once. It loads it in three tiers, what Anthropic calls progressive disclosure:
- Frontmatter, always in the system prompt. About 50 tokens. Just enough to tell the agent the skill exists and when to reach for it.
SKILL.mdbody, loaded when the agent decides this skill is relevant to the current task. Contains the index of references and the cross-cutting rules.- Reference files, pulled lazily, one at a time, only when the current sub-task needs them. The agent can fan out and load several in parallel inside the same turn.
The context window stays small. The knowledge surface can grow without bound. The data team can write as much as they want, and the agent only pays the token cost for what it actually reads. That's the whole unlock.
Wiring it into our codebase was a single argument on the agent constructor:
agent = Agent(
name="Segment Builder",
model=AwsClaude(id=SEGMENT_CHAT_MODEL),
tools=tools,
instructions=_load_prompt(),
skills=Skills(loaders=[LocalSkills(str(SKILLS_DIR))]),
)
That's it. The rest of the work, the work that actually mattered, was about what went inside the directory, and about the workflow that grew up around editing it.
When the data team shipped a new mart, they shipped a markdown file with it in the same PR. When they renamed a column, the markdown got updated in the same commit. When they discovered that agg_store_customer was the right canonical audience table and agg_contacts_enriched was now legacy, the markdown said so, and the agent picked that up on its next run.
The agent stopped guessing about the warehouse. The data team stopped fielding "what column should I use for X?" questions from me. Both teams kept moving in parallel, and the contract between us was a thing in the repo: concrete, versioned, reviewable.
What we didn't appreciate at first is that this isn't just a documentation pattern. It's the agent's runtime knowledge layer. When the data team updates the markdown, the agent literally gets smarter the next time it runs. There's no deploy, no prompt rewrite, no engineering ticket. The data team ships a markdown PR, and the agent inherits the capability the moment it lands.
What the data team wrote in there
The SKILL.md itself is small. It's a table of contents over the references/ files, plus the rules that apply to every Redshift query. One line in particular did more work than any prompt engineering we'd ever done:
- Do NOT run `show_tables()` or `describe_table()` — use the references in this skill directly.
The agent has the schema-discovery tools in its toolbelt (the Redshift toolset ships with them). We didn't take them away. We just told the agent to trust the contract. What the data team writes in the skill is more reliable than what you can introspect at runtime, because the data team knows the difference between the column exists and the column means what you think it means.
The reference files are written like onboarding docs for a smart new hire. tables_audience.md lists agg_store_customer with its columns and explains that it replaced agg_contacts_enriched, and which legacy bridge joins still flow through the old table. patterns_segments_by_category.md shows the exact join chain for "customers who bought in category X," with a working SQL template hardened against the edge cases we hit in production.
Then there's learnings.md, which turned out to be the most valuable single piece of our agent infrastructure. It's a running corpus of every SQL pitfall we've caught:
- Do NOT use PERCENTILE_CONT with other DISTINCT aggregates — Redshift doesn't support it.
- All PERCENTILE_CONT calls in a single SELECT must share the same ORDER BY.
- `organization_id` is VARCHAR, not ObjectId — always use string comparison.
- Filter `source_name != 'Matrixify App'` on fact_store_orders to exclude bulk imports.
- When using CASE in GROUP BY, use the column position number (GROUP BY 1).
Every line is a real bug, distilled into one sentence. The agent reads it before writing SQL. We've turned production post-mortems into runtime constraints without changing a line of Python. Once a bug lives in learnings.md, we don't see it again.
One contract, three products
The clearest sign the pattern was working showed up the first time the data team shipped a new mart on top of it. They built marts.agg_customer_category, a pre-aggregated table of lifetime value per customer per LLM-classified product category. Before skills, exposing that table to our agents would have been a brain0 PR: Python, prompts, tests, deploy, three days of round-trip. With the shared skills directory in place, it was a markdown PR.
The data team wrote patterns_segments_by_category.md, added a row to the index in SKILL.md, and merged. From the moment that file landed, our AI Segment Builder could answer "customers who buy fragrances" against the new mart. So could the AI Campaign Builder. So could the autonomous brand-research pipeline we use internally.
Three products picked up the same capability with no Python changes, because they all read from the same directory:
# conversational_segment_agent.py
SKILLS_DIR = Path(__file__).parent.parent / "deep_agent" / "skills"
# segment_designer.py
SKILLS_DIR = Path(__file__).parent.parent / "deep_agent" / "skills"
# redshift_agent.py
SKILLS_DIR = Path(__file__).parent.parent / "skills"
Same path. Single source of truth. The data team, the people who actually know what every column in the warehouse means, became the primary contributors to our AI features. They write the markdown. The agents inherit the capability. Engineering isn't in the loop for every new mart, every new join, every new analytical recipe.
That's the shape of how we ship AI features now. The parallel-build problem we started with (agent team and data team moving at different speeds, no shared contract) solved itself the moment the contract became a thing both teams could edit in the same repo.
Skills are the new gold
This is the part I keep coming back to.
Models get cheaper and better, fast. The floor of capability rises every quarter, the cost per token drops, and the agent we ship on Claude Opus 4.6 today is one we'll ship later on something cheaper. What carries over isn't the model.
It's the knowledge layer. The skills directory.
We think about ours now the way we think about our codebase. It's institutional memory, the kind you can't recreate without going through the work that produced it. Every line in learnings.md is a bug we will never debug again. Every reference file is a piece of tribal warehouse knowledge made explicit and queryable. Every contributor, whether engineer, data scientist, or analyst, adds to a corpus that makes our agents better without making them slower or harder to reason about.
That corpus is the thing competitors can't copy by swapping models. It isn't a prompt that leaks in a screenshot. It isn't a framework choice that gets posted on Hacker News. It's a year of accumulated, hard-won knowledge about a specific domain, in a format a state-of-the-art model can consume on demand.
So here's what I keep telling anyone who'll listen. The gap between a mediocre production AI agent and a great one isn't really about the model anymore. Every team has the same Claude. Every team has access to the same frameworks. The difference, increasingly, is what's in your skills directory, and how many people in your company are qualified to add to it.
If you're building production agents right now, the most valuable thing you can do this quarter is probably not picking a better model. It's sitting down with the people in your company who know your domain best, opening an empty directory, and writing down what they know. Wire it in as a skill. Forbid the agent from going around it. Let it grow.
The teams that win the next wave of AI products probably won't be the ones with the cleverest prompts. They'll be the ones whose skills directory is already three years deep by the time everyone else starts theirs.
That's the gold. Worth starting now.
Sources & further reading
- Equipping agents for the real world with Agent Skills — Anthropic
- Agent Skills overview — Claude API Docs
- Skill authoring best practices — Claude API Docs
- Claude Agent Skills: A First Principles Deep Dive — Lee Hanchung
- ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Consensus Enforcement, and Column Exploration (arXiv, 2025)