Stepping into the Software 3.0 Era
This is the English version of a previously published article.
What is Software 3.0?
In June 2025, Andrej Karpathy gave a compelling talk at Y Combinator AI Startup Sc hool, laying out three stages of software evolution.
Software 1.0 is the approach we’ve relied on for decades. You code explicit logic in languages like Python, Java or C++, branch with if-else, iterate with for loops and abstract through functions. It was about spelling out every detail of the "how" in code.
Software 2.0 emerged with the rise of deep learning in the 2010s. You no longer write rules by hand. Instead, you collect data, train a model, and the neural network weights become the program. Think of Tesla Autopilot, where neural networks replaced huge swaths of C++ code.
Software 3.0 is where we are now. Instead of writing code, you tell an LLM “what” you want in natural language and the prompt becomes the program.
As Karpathy puts it, "Software 3.0 is eating 1.0/2.0." The new paradigm is taking over.

As Karpathy puts it: "Software 3.0 is eating 1.0/2.0." The new paradigm is swallowing the old ones.
📺 Andrej Karpathy: Software Is Changing (Again) — Y Combinator AI Startup School
Harness: What Makes LLM Useful
But reality tells a different story.
Simply telling ChatGPT to "fix the bug in our service" doesn’t magically solve the problem. While LLMs are powerful, they can’t read files, call APIs, or access databases on their own.
This is where the concept of a harness comes in.
Originally, a harness refers to the gear strapped to a horse that lets humans make use of a horse's strength and speed. A horse without a harness is just a horse.

The same goes for LLMs. On their own, they're hard to control or put to use. You need the right tools and infrastructure to work around their limitations and connect them to real-world tasks.
Claude Code: A Harness for Claude
Claude Code is Anthropic's CLI-based coding agent. In essence, it’s a harness for Claude.
Here’s what Claude Code provides:
Together, these act as a harness to transform Claude from an LLM engine into an agent that can actually do real work. Sound familiar?
Understanding Claude Code through Software 1.0
MCP, Skills, Sub-agents, Slash command..
New terminology piles up fast, quickly creating cognitive overload. But look past the labels, and the structure is strikingly similar to the layered architecture we've been using for years.

A Closer Look at Each Layer
Slash command = Controller
Slash Commands are the entry point for user requests, just like @RestController in Spring or router.get() in Express. /review triggers review workflow and /refactor triggers refactoring workflow.
# User input
/review PR-1234
# Internally
→ Triggers the review workflow
→ Executes the appropriate sub-agent and skill combinationSub-agent = Service Layer
Think of sub-agents like a service layer. Instead of coordinating repositories and domain objects, they coordinate skills and combine them to complete a workflow. Each sub-agent maintains its own context, allowing them to operate independently like separate threads.
Skills = Domain-level Component (SRP)
Skills follow the Single Responsibility Principle (SRP), where each one does exactly one thing. "Review code." "Generate tests." "Write docs." Just as bloated classes are a red flag, a skill that tries to do too much is a problem.
MCP = Infrastructure / Adapter
MCP(Model Context Protocol) is the bridge to the outside world, such as databases, APIs, and the filesystem. Like the repository or adapter pattern, it provides a layer of abstraction so internal logic stays independent of external implementations.
CLAUDE.md = package.json
CLAUDE.md plays the same role as package.json or pom.xml. It’s about defining the things that rarely change, such as tech stack, coding conventions, and build commands.
# Example CLAUDE.md
## Tech Stack
- TypeScript + React 18
- Node.js 20+
- pnpm
## Coding Conventions
- Use functional components only
- Write tests using vitest
## Build Commands
- `pnpm build` — Production build
- `pnpm test` — Run all testsNote: If you find yourself constantly editing CLAUDE.md, that content probably doesn’t belong there. Try passing dynamic information (current issues, today’s priorities, etc.) through conversation or a sub-agent context instead.
The Same Anti-patterns Apply
The same anti-patterns from layered architecture show up in agent design. These names will look familiar.
curl calls, any API changes break everythingThe sames goes for code smells.
- Feature Envy: One skill over-referencing another’s data
- Duplication: Similar prompts scattered across multiple skills
- Long Method: One sub-agent chaining ten skill calls in a row
The Key Difference: What Layered Architecture Doesn’t Explain
You can use the layered architecture to mostly understand agent design, but there is
still one thing that it doesn't explain.
Picture a traditional service layer. When stock runs out mid-order, you throw an OutOfStockException or apply a backorder policy. When payment fails, you retry or return an error.
Everything has to be anticipated ahead of time.
// Traditional service layer
public Order processOrder(OrderRequest request) {
if (inventory.check(request.getItemId()) < request.getQuantity()) {
throw new OutOfStockException(); // Predefined exception
// or
return backOrderPolicy.apply(request); // Predefined policy
}
// ...
}But development in real life is different, and we end up asking questions like:
"Wait, I need to loop in the PM on this one." "This wasn’t in the spec. Now what?"
In traditional architecture, there's no way to stop the code. You have to either throw an exception, make an arbitrary call, or log it and move on.
Agents Asks Questions
But agents are different. They support Human-in-the-Loop(HITL).
Request → Agent → Processing...
↓
🤔 Uncertain situation
↓
"Would you prefer A or B?"
↓
User: "Let's go with A"
↓
Continues → DoneWith tools like UserAskQuestion, agents can simply pause to ask questions in the middle of the task.
Just like that, exceptions become questions.
When to Ask, When to Act
HITL being available doesn't mean the agent should ask every time. That would make it an annoying tool.
When to ask:
- The action is hard to reverse (deletions, deployments, external API calls)
- There are multiple options but no clear right answer
- The decision carries significant cost or risk
When to act:
- The task is safe and repeatable
- There is an already established convention
- The action is easy to undo
A good agent knows when to ask. A great one knows “when not to.”
Development from 1.0 to 3.0
The dawn of the nascent Software 3.0 doesn't render everything we've learned before obsolete.
Leave Behind
- Feeling compelled to “explicate every logic”
- Trying to predefine all edge cases
- Using LLM only as a “smart auto-complete” tool
Take with you
- Layer separation, single responsibility principle, abstraction
- Dependency management, interface design
- Testability, debugging strategy
- Code review, incremental improvements
The tool has changed, but the underlying principle (cohesion, coupling, abstraction) to a good design has not.
Think back to adapter patterns when designing an MCP. Think back to SRP when creating a skill. Think back to server layers when designing a sub-agent.
Your architecture knowledge is the foundation for building great agents.
But Not Everything Transfers
That said, there are some things that can’t be fully explained just with layered architecture. Here are some points to keep in mind.
Token is the New Memory
In traditional server development, RAM was the constraint. With agents, it's tokens.
Context Window = working memory
Token usage = memory usage
CLAUDE.md, Skills, conversation history, and MCP responses—all of it piles into the context window. 200K tokens sounds generous until you're working across a large codebase, and suddenly it's gone.
Just as you’d try to prevent OOM, token bloat is also something you can anticipate. Before writing "analyze all test files" in CLAUDE.md, picture what that instruction looks like across 50 test files. You don't need an exact token count, a rough sense of file count and line volume is enough.
After writing your instructions, ask Claude directly: "What files would you end up reading if I ran this workflow?" If the scope is larger than expected, that's your signal to either tighten the instructions or break the work into phases.
Another way to save on tokens is to offload deterministic logic into scripts.
# Anti-pattern: LLM re-interprets the convention every time
"Create a branch name in the format feature/JIRA-{ticket}-{description}.
Description should be kebab-case. If it's in Korean, translate to English..."
# Better: a script encapsulates the convention
./scripts/create-branch.sh JIRA-1234 "login feature"
→ feature/JIRA-1234-login-featureAn LLM just runs the script and works with the output. No need to parse conventions or waste token re-deriving what a script can compute in milliseconds. If a task doesn't require judgment, make a tool to take care of it.
The Skill Separation Dilemma: Class Explosion and the Law of Demeter
Following SRP blindly in traditional architecture leads to class explosion. Hundreds of tiny classes sprawl across the codebase, and just mapping the relationships between them becomes a cognitive burden in itself.
The same applies to Skills. At startup, Claude loads every Skill's metadata (name and description) into the system prompt. Twenty Skills means twenty descriptions permanently occupying context.
# Anti-pattern: Skill Explosion
.claude/skills/
├── review-naming/
│ └── SKILL.md
├── review-types/
│ └── SKILL.md
├── review-complexity/
│ └── SKILL.md
├── review-security/
│ └── SKILL.md
└── ... (15 more)The above is equivalent of writing something like this:
// Class Explosion anti-pattern
class NamingValidator { ... }
class TypeValidator { ... }
class ComplexityValidator { ... }
class SecurityValidator { ... }
// ... 15 more
// Calling side
new NamingValidator().validate(code);
new TypeValidator().validate(code);
// You have to remember which Validator to reach for every timeThink of the Law of Demeter: "Don't talk to strangers." An object should only know about its immediate neighbors. Applied to skill design, it would mean that SKILL.md provides the entry point, and detailed knowledge is delegated to references/.
# Recommended: Progressive Disclosure structure
.claude/skills/
└── code-review/
├── SKILL.md # "Review my code" → load only this
├── references/ # Load only when needed
│ ├── naming-rules.md # "What are the naming conventions?" → load then
│ ├── security-checklist.md
│ └── performance-guide.md
└── scripts/
└── lint-check.shThis is similar to a facade pattern:
// Facade: single entry point, internal delegation
class CodeReviewer {
private NamingRules namingRules; // Loaded when needed
private SecurityChecklist security; // Loaded when needed
public Review review(Code code) {
// Use only what the situation calls for
if (needsNamingCheck) namingRules.check(code);
if (needsSecurityCheck) security.check(code);
}
}Claude works much in the same way. SKILL.md acts as the facade, and files inside references/ are only loaded to context when Claude deems it necessary.
Finding a balance:
references/ filescripts/ or MCPPractical Tips: Setup & Config Patterns
But how does this actually work in practice?
Slash commands let you easily blend HITL with automation. Compare it to a CLI pattern you already know:
# Traditional CLI
npm init # Generate initial structure
npm config set # Adjust settings later
# Agent commands
/setup # Analyze repo → generate structure
/config # Adjust existing settingsHITL really earns its place is in the setup process:
/setup
→ Detected: TypeScript + React, pnpm
→ Found both vitest and jest as testing frameworks.
Which should be the default? [vitest / jest]
> vitest
→ CLAUDE.md createdThe agent detects the environment automatically, but flags ambiguous parts with a question. You don't need to predefine everything. Let the agent handle the clear-cut cases and step in only when things are uncertain.
The claude-hud plugin, an open source project, demonstrates this pattern well:
# 1. Install plugin
/plugin install claude-hud
# 2. Configure for the repo — this is the setup
/claude-hud:setupWhat /claude-hud:setup does:
- Detect the current environment (terminal type, Claude Code version, etc.)
- Automatically configures statusline settings
- Registers the necessary hooks
The agent needs to ask questions only when it should to keep manual input to a minimum.
Closing Thoughts
Development in the Software 3.0 era is shifting from writing code to assembling and instructing it.
However, at its core, the principles of code assembly remain largely the same.
MCP, skills, sub-agents, slash commands—if these still feel foreign, map them onto the layered architecture you already know. The same engineering principles hold, and the design patterns are already there, waiting to be recognized.
One more shift worth noting is that applications can now ask questions. Instead of trying to predefine everything from the start, let the agent question ambiguity.
Start building by refactoring your mindset.
*All images used in this article have been created with generative AI.
References
- Andrej Karpathy: Software Is Changing (Again) — Y Combinator
- claude-hud — Claude Code plugin example
- Claude Code Official Documentation
