Embracing the Software 3.0 Era
This is the English version of a previously published article.
What Is the Software 3.0 Era?
In June 2025, Andrej Karpathy gave a talk at Y Combinator AI Startup School. He broke software's evolution into three stages.
Software 1.0: What we've done for decades. Writing explicit logic in Python, Java, or C++. Branching with if-else, looping with for, abstracting with functions. Telling the computer exactly how to do things—in code.
Software 2.0: Kicked off with the deep learning boom in the 2010s. You stop writing rules by hand. Collect data, train a model, and the neural network weights become the program. Tesla Autopilot, for instance, replaced huge chunks of C++ with neural networks.
Software 3.0: Where we are now. You tell an LLM what you want, in plain language. The prompt is the program.

As Karpathy puts it: "Software 3.0 is eating 1.0/2.0." The new paradigm is swallowing the old ones.
📺 Andrej Karpathy: Software Is Changing (Again) — Y Combinator AI Startup School
Harness: Making LLMs Actually Useful
But the reality is messier.
You can't just tell ChatGPT, "Fix the bug in our service," and expect a patch to ship. LLMs are powerful—but on their own, they can't read your codebase, run commands, or touch a database.
That's where the idea of a harness comes in.
A harness is the gear you put on a horse. It's what lets humans actually use that power. No matter how fast or strong the horse, without a harness, that power goes nowhere.

Same goes for LLMs. Raw capability isn't enough. You need tools and environments that fill the gaps and connect them to real work.
Claude Code Is Also a Harness
Claude Code is Anthropic's CLI-based coding agent. At its core, it's essentially a harness for Claude.
Here's what it provides:
Together, these turn Claude from a model into an agent that can actually ship things.
But look at this structure for a second. Doesn't it feel familiar?
Seeing It Through Software 1.0 Eyes
MCP, skills, sub-agents, slash commands...
New terms pile up fast, and with them, cognitive load. But look closely at this structure, and you might notice something: it maps surprisingly well onto layered architecture—something most of us have been working with for years. At least as a starting point.

Breaking Down Each Layer
Slash command = Controller
Like Spring's @RestController or Express's router.get(), a slash command is the entry point for user requests. Type /review and the review workflow kicks off. Type /refactor and refactoring begins.
# User input
/review PR-1234
# Internally
→ Triggers the review workflow
→ Executes the appropriate sub-agent and skill combinationSub-agent = Service Layer
Just as a Service layer coordinates multiple repositories and domain objects, a sub-agent orchestrates multiple skills to complete a workflow. Each sub-agent maintains its own independent context—think of it as a self-contained unit of work, separate from others.
Skills = Domain-level Component (SRP)
A skill is a single-purpose unit that follows the Single Responsibility Principle. "Review code," "Generate tests," "Write docs"—one clear job per skill. Just as classes shouldn't bloat into monoliths, a skill should do exactly one thing, and do it well.
MCP = Infrastructure / Adapter
Think of MCP as the layer that manages connections to external systems—databases, APIs, and similar outside interfaces. Much like the Repository or Adapter pattern, it's an abstraction boundary: internal logic doesn't need to know how the outside world is implemented.
CLAUDE.md = the project's constitution
Think of CLAUDE.md as the project's stable foundation—the norms and principles that don't change often: tech stack, coding conventions, build commands. Less a dependency manifest, more a shared understanding of how this project works.
# Example CLAUDE.md
## Tech Stack
- TypeScript + React 18
- Node.js 20+
- pnpm
## Coding Conventions
- Use functional components only
- Write tests using vitest
## Build Commands
- `pnpm build` — Production build
- `pnpm test` — Run all testsOne thing worth noting: if you find yourself editing CLAUDE.md often, that content probably doesn't belong there. Dynamic details—current task, today's priorities—should come in through conversation or be injected into the sub-agent's context directly.
Anti-patterns Apply Here Too
The anti-patterns from layered architecture carry over to agent design with surprising fidelity. The names even ring a bell.
curl calls; breaks when the API changesCode smells apply too:
- Feature Envy: A skill excessively references another skill's data
- Duplication: Identical prompts copy-pasted across multiple skills
- Long Method: One sub-agent sequentially calling 10 skills
The Crucial Difference: What the Metaphor Misses
The layered architecture analogy holds up well. But there's one thing it doesn't quite capture.
Think about a traditional service layer. What happens when inventory runs out mid-order? You throw an OutOfStockException, or fall back to a back-order policy. Payment fails? Retry, or return an error.
Every branch has to be predefined.
// Traditional service layer
public Order processOrder(OrderRequest request) {
if (inventory.check(request.getItemId()) < request.getQuantity()) {
throw new OutOfStockException(); // Predefined exception
// or
return backOrderPolicy.apply(request); // Predefined policy
}
// ...
}But in real development, you hit moments like:
"This edge case... I need to check with the PM." "This scenario isn't in the spec. What do I do?"
In traditional architecture, there's no way for the code to pause and ask. It throws an error, makes an arbitrary call, or logs it and moves on.
Agents Can Ask Questions
Agents are different. With Human-in-the-Loop (HITL), there's another option.
Request → Agent → Processing...
↓
🤔 Uncertain situation
↓
"Would you prefer A or B?"
↓
User: "Let's go with A"
↓
Continues → DoneWith tools like UserAskQuestion, an agent can delegate judgment mid-execution.
Exceptions become questions.
When to Ask, When to Just Do It
HITL is great—but an agent that asks every two seconds is just annoying.
Ask when:
- The action is hard to reverse (deletions, deployments, external API calls)
- There are multiple valid paths and no clear winner
- The stakes are high
Just do it when:
- The task is safely repeatable
- A convention already covers it
- It's easy to undo
A great agent is one that knows when to ask.
The Path from 1.0 to 3.0
The Software 3.0 era is here. But that doesn't make everything we've learned obsolete.
What to Leave Behind
- The compulsion to write every piece of logic explicitly
- The urge to predefine every conceivable edge case
- Seeing LLMs as little more than "smart autocomplete"
What to Carry Forward
- Layer separation, SRP, abstraction
- Dependency management, interface design
- Testability and debugging strategies
- Code reviews and iterative improvement
The tools have changed. The principles of good design—cohesion, coupling, abstraction—haven't.
When designing an MCP, think Adapter Pattern. When writing a skill, think SRP. When building a sub-agent, think Service Layer.
The architectural thinking you've built up is a solid foundation for building agents well.
Limitations: What the Metaphor Hides
The layered architecture analogy is a useful lens—but like any analogy, it papers over a few real-world gotchas worth keeping in mind.
Tokens Are the New RAM
On traditional servers, you watch RAM. With agents, you watch tokens.
Context Window = Working MemoryToken Usage = Memory Footprint
CLAUDE.md, skills, conversation history, MCP responses—it all competes for space in the context window. 200K tokens sounds like a lot, until you're working with a large codebase.
Just as you guard against OOM crashes, you should anticipate token explosions. Before writing "analyze all test files" in CLAUDE.md, picture what that means across 50 test files. You don't need exact counts—a rough sense of files and line count is enough.
A useful trick: ask Claude, "If you ran this workflow, which files would you expect to read?" If the list is longer than expected, it's a signal to narrow your instructions or break the task into steps.
Another way to save tokens: extract deterministic logic into scripts.
# Anti-pattern: LLM re-interprets the convention every time
"Create a branch name in the format feature/JIRA-{ticket}-{description}.
Description should be kebab-case. If it's in Korean, translate to English..."
# Better: a script encapsulates the convention
./scripts/create-branch.sh JIRA-1234 "login feature"
→ feature/JIRA-1234-login-featureFrom the LLM's perspective, it just runs the script and uses the output. No need to interpret the convention, no tokens wasted re-deriving it. If a task doesn't require reasoning, offload it to a script.
The Skill Separation Dilemma: Class Explosion and the Law of Demeter
In traditional architecture, blindly applying SRP leads to Class Explosion—hundreds of tiny files that are individually correct but collectively hard to navigate.
Skills have a similar problem. In practice, you often pay an upfront context cost for discoverability—names and descriptions loaded so Claude knows what's available—and you pay more when skills are actually invoked. With 20 skills, that overhead adds up.
# Anti-pattern: Skill Explosion
.claude/skills/
├── review-naming/
│ └── SKILL.md
├── review-types/
│ └── SKILL.md
├── review-complexity/
│ └── SKILL.md
├── review-security/
│ └── SKILL.md
└── ... (15 more)It's the agent equivalent of this:
// Class Explosion anti-pattern
class NamingValidator { ... }
class TypeValidator { ... }
class ComplexityValidator { ... }
class SecurityValidator { ... }
// ... 15 more
// Calling side
new NamingValidator().validate(code);
new TypeValidator().validate(code);
// You have to remember which Validator to reach for every timeThink about the Law of Demeter: "Don't talk to strangers." Objects should only interact with their immediate neighbors.
Applied to skills: SKILL.md should be the entry point. Delegate the heavy content to references/.
# Recommended: Progressive Disclosure structure
.claude/skills/
└── code-review/
├── SKILL.md # "Review my code" → load only this
├── references/ # Load only when needed
│ ├── naming-rules.md # "What are the naming conventions?" → load then
│ ├── security-checklist.md
│ └── performance-guide.md
└── scripts/
└── lint-check.shThis mirrors the Facade pattern:
// Facade: single entry point, internal delegation
class CodeReviewer {
private NamingRules namingRules; // Loaded when needed
private SecurityChecklist security; // Loaded when needed
public Review review(Code code) {
// Use only what the situation calls for
if (needsNamingCheck) namingRules.check(code);
if (needsSecurityCheck) security.check(code);
}
}Claude works the same way. SKILL.md acts as the Facade. The files in references/ only enter the context when Claude actually needs them.
Finding the balance:
references/scripts/ or MCPPractical Tips: The Setup & Config Pattern
Enough theory. What does this look like in practice?
Slash commands let you blend HITL with automation naturally. Compare it to a familiar CLI pattern:
# Traditional CLI
npm init # Generate initial structure
npm config set # Adjust settings later
# Agent commands
/setup # Analyze repo → generate structure
/config # Adjust existing settingsHITL shines brightest during setup:
/setup
→ Detected: TypeScript + React, pnpm
→ Found both vitest and jest as testing frameworks.
Which should be the default? [vitest / jest]
> vitest
→ CLAUDE.md createdThe agent handles what's obvious automatically—and asks only when something's genuinely ambiguous. You don't predefine every option upfront. You just let it pause at the forks.
The open-source claude-hud plugin demonstrates this pattern cleanly:
# 1. Install plugin
/plugin install claude-hud
# 2. Configure for the repo — this is the setup
/claude-hud:setupWhat /claude-hud:setup does:
- Detects the current environment (terminal type, Claude Code version)
- Auto-configures the statusline
- Registers the required hooks
The core principle: minimize manual configuration, and only interrupt the user when their input is genuinely needed.
Closing Thoughts
Development in the Software 3.0 era is shifting—from writing code to assembling and directing it.
But the principles behind that assembly aren't foreign. They're the same ones we've been working with.
If MCP, skills, sub-agents, and slash commands feel unfamiliar, try mapping them onto the layered architecture you already know. New technology, viewed through the lens of good engineering principles, tends to make a lot more sense.
One more thing worth holding onto: applications can now ask questions. Rather than trying to spec out every edge case upfront, it's worth considering a different approach—build systems that handle ambiguity by simply asking.
Start building by refactoring your mindset.
All images in this article were created using generative AI.
References
- Andrej Karpathy: Software Is Changing (Again) — Y Combinator
- claude-hud — Claude Code plugin example
- Claude Code Official Documentation
