What it actually means to be AI-native
We rewired the whole company around AI agents over twelve months — engineering, product, sales, every employee. Here's the boring infrastructure work that made it work, and the surprising place the bottleneck ended up.

I recently gave a talk at the AWS CTO Circle called "What does it actually mean to be AI-native?" I went in half expecting to talk about models and benchmarks. I came out of it having mostly talked about a monorepo migration, a markdown file called CLAUDE.md, and the slightly disorienting realization that engineering is no longer our bottleneck.
This post is the longer version of that talk. It's a story about how, over twelve months, we rewired TextYess around AI agents. Not by adopting a tool. By changing the shape of how the whole company operates. Engineering changed first, then product, then sales, then everyone.
I should say up front: we're still figuring this out. None of what follows is a stable end state. But we're already operating in a way that wasn't possible a year ago, and I think the shape of how that happened is the part most teams I talk to are missing.
The shape of TextYess
A bit of context first. We build the AI-first CRM for eCommerce brands. The unit of work is a conversation, not a click. Our agents handle pre-sales and support across WhatsApp, voice, onsite chat, and email. Beneath them sits a layer of deep agents that hold 360° context on the store and can propose campaigns and automations autonomously. Around them sit the automations and the shared inbox where humans can take over, plus the tooling that analyzes every conversation qualitatively and quantitatively.
Conversations are our data, not page events, not clicks. The product is an AI system. We don't bolt agents onto the side of an existing CRM. The agents are the CRM.
That shapes everything downstream. When I say "AI-native" I don't mean "we use AI." I mean: the AI is the load-bearing part of what we ship, and that fact has to be reflected in how we build it.
Chapter 1: The foundation we built first
A year ago, our engineers were shipping about eight PRs each per week. That sounds healthy on paper. In practice, eight was the ceiling we were hitting with AI coding assistants turned on. Cursor, Claude Code, the lot. We had the tools, we had the models, and we were still capped at eight.
The problem wasn't the agent. The problem was what the agent could see.
We had four separate repos: the API, the frontend, the worker fleet, and a shared package. Reasonable enough. We'd grown into that structure because separate deploys felt cleaner, and because nobody had ever really questioned it. But our coding agents could only see one repo at a time. If a feature touched the frontend and the API (and most of them do), the agent was working with half a map.
Same prompt, different repo, different result. We were spending more time correcting agents than writing code ourselves. Engineers had to context-switch between agent sessions, copy-paste between windows, manually relay what the worker contract looked like when the frontend needed to call it. The agent's blind spot was our coordination tax.
So we merged everything into one monorepo.
What surprised us is how cheap that was. We'd been bracing for a multi-month project; we'd talked about it in the abstract for over a year and kept not doing it because "we'd never finish in time." Once we sat down to do it, the whole thing took one to two weeks. The barrier we'd been afraid of was almost entirely imaginary. Git history, CI pipelines, deploy targets: tedious, but nothing we couldn't get through.
The merge itself wasn't the thing. The thing was the nested context files we put at the root.
/CLAUDE.md
/api/CLAUDE.md
/frontend/CLAUDE.md
/workers/CLAUDE.md
/shared/CLAUDE.md
These explain to any agent (Claude Code, Cursor, Devin) what each part of the codebase is for, what patterns it follows, what not to do. The root file sets the cross-cutting rules. The nested files override and specialize.
The deeper we invested in this hierarchy, the better agents performed. This is the unglamorous work that nobody puts on a slide, and it did more for us than anything else we shipped that quarter. The monorepo migration was the headline. The CLAUDE.md tree was the unlock.
After both were in place, our agents could finally see the whole picture. The same prompt produced consistent results. The coordination tax went to zero. And we could start thinking about what to do with the headroom.
Chapter 2: The three-tier engineering stack
Once agents could see everything, the question became which agent should do what.
I'm skeptical of teams that pick one coding agent and use it for everything. Coding agents are tools, and like any tool they have a shape. The ones that exist today are differentiated enough that you want the right one for each kind of work. We landed on three tiers.
Tier 1: small issues and bugs, fully autonomous
This is the one that surprised me most.
When customer support catches a bug, they don't open a ticket for engineering. They open a triage issue in Linear with a description of what's broken. Cursor picks it up automatically, reads the issue, looks at the repo, takes a swing at the fix, opens a PR. Devin reviews it. A human approves, merges, deploys.
No engineer is in the loop until the approval. The whole pipeline from "CS noticed something is broken" to "code change waiting for a human to merge" runs without anyone touching it.
The trick is the boundary. Tier 1 only covers things that are bounded: a copy fix, a missing null check, a typo in an enum, a date format that's wrong on one screen. The kind of bug where the description and the diff are both small. We're not asking agents to refactor the auth layer on their own. We're asking them to do the thing a junior engineer would do on their first day, at a pace and parallelism no junior engineer can match.
A bounded category plus an autonomous pipeline is what makes it work. Take either piece out and the whole thing collapses.
Tier 2: large features, human in the loop
For real features, we use Claude Code with custom skills and parallel sub-agents.
The pattern is roughly this: an engineer scopes the work, kicks off Claude Code with a prompt that points at the relevant skills, and Claude Code fans out into sub-agents that handle pieces of the work in parallel. The agent comes back with, give or take, 85% of the feature done. The engineer takes that, refines it, fills the gaps the agent didn't quite get, and ships.
That 85% number is rough. Some sessions land closer to 95%, some closer to 60. The point is that the engineer is not writing from scratch and is not micromanaging line by line. They're a level up: directing the work, making the calls the agent can't make, and reviewing the diff with their attention tuned to "is this right?" rather than "did I remember the syntax?"
The skills are what make this scale. We've written about them before. They're a shared folder of markdown files that turns institutional knowledge into something an agent can read at runtime. When a sub-agent needs to write SQL against our warehouse, it reads the warehouse skill. When it needs to add a screen to the merchant dashboard, it reads the dashboard skill. The same skills are shared across products and across the company, and they accumulate over time.
Tier 3: code review, quality gate
Every PR — whoever wrote it, agent or human or both — goes through Devin Review first.
Devin Review does the first pass. It flags bugs, warnings, missing tests, unused imports, weird assumptions. We let it iterate once or twice with the author before a human even opens the PR. By the time a human reviewer shows up, the noise is gone. They're looking at substantive choices, not stylistic mistakes.
I almost left this tier out of the talk because it sounded boring. It's probably the one that matters most. Reviewing a clean PR takes a quarter of the time. The human's attention also lands in the right place: on the design choices, not on "you forgot to handle the empty array case."
What it looks like in practice
The numbers are the part of the talk people remember, so here they are.
| Metric | Before | After | Change |
|---|---|---|---|
| PRs per engineer per week | 8 | 18 | +125% throughput |
| Bug resolution time | 2h | 30m | 4x faster |
| Human review time per PR | 60m | 15m | AI pre-filtered |
I want to be careful with how I read these. PRs per engineer is not the same as features shipped, and a thirty-minute bug resolution doesn't mean we have zero bugs. A 4x speedup on a single metric is the easy thing to brag about and the easy thing to misread.
The qualitative version probably matters more. When agents do the bugs and a chunk of the routine feature work, engineers stop spending their day in the agent's seat and start spending it in the product designer's seat. The question that eats the most hours isn't "how do I implement this?" anymore. It's "what should we actually be building, and why?"
That shift is the part I want to come back to.
Beyond engineering: AI runs across the whole company
The unusual thing we did, and the thing I wish someone else had told me twelve months earlier, is that we didn't stop at the engineering org.
Engineering felt the productivity jump first because that's where the tools were most mature. But the underlying pieces are general: agents with context, skills, MCP servers, scheduled runs. They work everywhere. They just need somebody to wire them up.
So we wired them up.
On the product side, an internal agent we call OpenClaw runs every week over our usage data. It flags anomalies, surfaces how people are actually using the product versus how we thought they would, and reports back. The product team doesn't run a weekly metrics review meeting in the old shape anymore. The agent runs it for them, and the meeting turns into a conversation about what the agent found.
Sales runs a similar loop on a daily beat. Every evening, an agent processes the call transcripts from every BDR and AE conversation that happened that day. It finds patterns. It flags mistakes ("you offered a discount you weren't supposed to offer"). It delivers coaching notes to each rep individually, with quotes from their own calls. Our head of sales used to do this manually for two or three reps when she had a calm week. Now every rep gets it every day.
And every TextYess employee, including the people in operations, finance, and support, has Claude plus a set of MCP servers: Granola, Notion, Linear, Attio. Each person has a personal daily sync agent and a private vault. The agent does the work that used to live in the gap between meetings: catching up on what happened yesterday, pulling threads forward, flagging the thing you said you'd follow up on and didn't.
None of these is dramatic on its own. Each one is a small productivity gain, a small visibility unlock, a small "oh, I didn't have to do that anymore." The compounding is the whole story. What an AI-native org feels like from the inside isn't one big magic moment. It's a lot of small things, running in parallel, every day.
The shift: the bottleneck moved
This is the part I didn't see coming, and the part I think matters most.
A year ago, engineering was the constraint. Product had a backlog full of ideas. Engineering had a queue of bugs and tech debt that ate most of the week. Features waited weeks not because they weren't important, but because the team that would build them was busy with everything else. Prioritization, in that world, was almost trivial: pick the most urgent thing, do it, move on. There weren't any real trade-offs to make when the bottleneck was that tight.
Now: engineering capacity outpaces product. Bugs get fixed automatically. Routine features get a Claude Code session and an afternoon. The backlog clears faster than product can fill it.
Which means the hard question changed. "Can we build it?" used to be the question. The answer was almost always "yes, but not for six weeks." Now the answer is "yes, in about a day," and the question that takes its place is should we build it? Customer insight, strategic prioritization, knowing what to build and why: that's the bottleneck now.
I find this exciting and a little disorienting at the same time. "Can ship complex things fast" used to be the most rewarded skill in engineering, and it's being commoditized in front of us. "Can decide what's worth shipping" is the skill on the way up, and it was never really a separate craft from the doing. The people who are good at it tend to be the same people who got good at doing. The open question for me is whether we know how to hire, promote, and structure teams around that fact yet, and I don't think we do.
We're still in it.
Twelve months ago
I'll close with roughly what I said at the end of the talk.
We're still figuring a lot of this out. The three-tier stack will look different in six months. The skills directory will keep growing. OpenClaw will get replaced by something better. Some of what I described above will probably look naive by the time anyone reads it back.
But we're already operating in a way that wasn't possible twelve months ago, and if you asked me what "AI-native" means in practice, that's the answer. It doesn't mean using AI. It means reshaping how the company runs so the AI can do work that would otherwise need a human.
The interesting thing is that the work that gets you there is mostly the boring kind. A monorepo so the agent can see the whole codebase. A CLAUDE.md hierarchy so it knows where it is. A skills directory so the things only your senior engineers know become legible to a model. A Linear-to-Cursor pipeline so the autonomous tier exists at all. None of this is an AI breakthrough, and most of it could have been built two years ago by anyone who took agents seriously enough to invest in their context. We didn't, and most people I talk to still don't.
We're a year in and nowhere near done. We are, at least, a year ahead of the version of us that didn't start.
Happy to go deep on any of this.