Embracing the Software 3.0 Era

김용성 · 토스페이먼츠 Node.js Developer
2026년 3월 17일

This is the English version of a previously published article.

What Is the Software 3.0 Era?

In June 2025, Andrej Karpathy gave a talk at Y Combinator AI Startup School. He broke software's evolution into three stages.

Software 1.0: What we've done for decades. Writing explicit logic in Python, Java, or C++. Branching with if-else, looping with for, abstracting with functions. Telling the computer exactly how to do things—in code.

Software 2.0: Kicked off with the deep learning boom in the 2010s. You stop writing rules by hand. Collect data, train a model, and the neural network weights become the program. Tesla Autopilot, for instance, replaced huge chunks of C++ with neural networks.

Software 3.0: Where we are now. You tell an LLM what you want, in plain language. The prompt is the program.

As Karpathy puts it: "Software 3.0 is eating 1.0/2.0." The new paradigm is swallowing the old ones.

📺 Andrej Karpathy: Software Is Changing (Again) — Y Combinator AI Startup School


Harness: Making LLMs Actually Useful

But the reality is messier.

You can't just tell ChatGPT, "Fix the bug in our service," and expect a patch to ship. LLMs are powerful—but on their own, they can't read your codebase, run commands, or touch a database.

That's where the idea of a harness comes in.

A harness is the gear you put on a horse. It's what lets humans actually use that power. No matter how fast or strong the horse, without a harness, that power goes nowhere.

Same goes for LLMs. Raw capability isn't enough. You need tools and environments that fill the gaps and connect them to real work.

LLM Limitation
Harness Role
Context window limits
Memory management
Hallucination
Fact grounding, RAG
Lack of domain knowledge
Knowledge base
No state management
Session management, orchestration
No access to external systems
Tooling, MCP

Claude Code Is Also a Harness

Claude Code is Anthropic's CLI-based coding agent. At its core, it's essentially a harness for Claude.

Here's what it provides:

Feature
Role
File system access
Lets Claude read and write code
Terminal execution
Lets Claude run commands
MCP (Model Context Protocol)
Connects to external systems
Sub-agent
Splits and handles complex tasks
Slash command
Routes user intent
Skills
Reusable functional units
Hooks
Event-driven automation

Together, these turn Claude from a model into an agent that can actually ship things.

But look at this structure for a second. Doesn't it feel familiar?


Seeing It Through Software 1.0 Eyes

MCP, skills, sub-agents, slash commands...

New terms pile up fast, and with them, cognitive load. But look closely at this structure, and you might notice something: it maps surprisingly well onto layered architecture—something most of us have been working with for years. At least as a starting point.

Breaking Down Each Layer

Slash command = Controller

Like Spring's @RestController or Express's router.get(), a slash command is the entry point for user requests. Type /review and the review workflow kicks off. Type /refactor and refactoring begins.

# User input
/review PR-1234

# Internally
Triggers the review workflow
Executes the appropriate sub-agent and skill combination

Sub-agent = Service Layer

Just as a Service layer coordinates multiple repositories and domain objects, a sub-agent orchestrates multiple skills to complete a workflow. Each sub-agent maintains its own independent context—think of it as a self-contained unit of work, separate from others.

Skills = Domain-level Component (SRP)

A skill is a single-purpose unit that follows the Single Responsibility Principle. "Review code," "Generate tests," "Write docs"—one clear job per skill. Just as classes shouldn't bloat into monoliths, a skill should do exactly one thing, and do it well.

MCP = Infrastructure / Adapter

Think of MCP as the layer that manages connections to external systems—databases, APIs, and similar outside interfaces. Much like the Repository or Adapter pattern, it's an abstraction boundary: internal logic doesn't need to know how the outside world is implemented.

CLAUDE.md = the project's constitution

Think of CLAUDE.md as the project's stable foundation—the norms and principles that don't change often: tech stack, coding conventions, build commands. Less a dependency manifest, more a shared understanding of how this project works.

# Example CLAUDE.md

## Tech Stack
- TypeScript + React 18
- Node.js 20+
- pnpm

## Coding Conventions
- Use functional components only
- Write tests using vitest

## Build Commands
- `pnpm build` Production build
- `pnpm test` Run all tests
One thing worth noting: if you find yourself editing CLAUDE.md often, that content probably doesn't belong there. Dynamic details—current task, today's priorities—should come in through conversation or be injected into the sub-agent's context directly.

Anti-patterns Apply Here Too

The anti-patterns from layered architecture carry over to agent design with surprising fidelity. The names even ring a bell.

Traditional Anti-pattern
Agent Version
Symptom
God Class
God skill
One 3,000-line skill handling everything
Spaghetti Code
Spaghetti CLAUDE.md
All instructions dumped together with no structure
Tight Coupling
Hardcoding without MCP
Direct curl calls; breaks when the API changes
Leaky Abstraction
Sub-agent knows MCP internals
Abstraction boundaries collapse; reusability is lost
Circular Dependency
Circular skill calls
A→B→C→A, risking infinite loops

Code smells apply too:

  • Feature Envy: A skill excessively references another skill's data
  • Duplication: Identical prompts copy-pasted across multiple skills
  • Long Method: One sub-agent sequentially calling 10 skills

The Crucial Difference: What the Metaphor Misses

The layered architecture analogy holds up well. But there's one thing it doesn't quite capture.

Think about a traditional service layer. What happens when inventory runs out mid-order? You throw an OutOfStockException, or fall back to a back-order policy. Payment fails? Retry, or return an error.

Every branch has to be predefined.

// Traditional service layer
public Order processOrder(OrderRequest request) {
    if (inventory.check(request.getItemId()) < request.getQuantity()) {
        throw new OutOfStockException();  // Predefined exception
        // or
        return backOrderPolicy.apply(request);  // Predefined policy
    }
    // ...
}

But in real development, you hit moments like:

"This edge case... I need to check with the PM." "This scenario isn't in the spec. What do I do?"

In traditional architecture, there's no way for the code to pause and ask. It throws an error, makes an arbitrary call, or logs it and moves on.

Agents Can Ask Questions

Agents are different. With Human-in-the-Loop (HITL), there's another option.

Request Agent Processing...
                      
                 🤔 Uncertain situation
                      
                 "Would you prefer A or B?"
                      
User: "Let's go with A"
                      
                 Continues Done

With tools like UserAskQuestion, an agent can delegate judgment mid-execution.

Exceptions become questions.

Traditional
HITL
All edge cases must be predefined
When unsure, just ask
Exception → error or default
Exception → request user judgment
All-or-nothing automation
Partial automation works fine
Mistakes require rollback
Catch mistakes before they happen

When to Ask, When to Just Do It

HITL is great—but an agent that asks every two seconds is just annoying.

Ask when:

  • The action is hard to reverse (deletions, deployments, external API calls)
  • There are multiple valid paths and no clear winner
  • The stakes are high

Just do it when:

  • The task is safely repeatable
  • A convention already covers it
  • It's easy to undo

A great agent is one that knows when to ask.


The Path from 1.0 to 3.0

The Software 3.0 era is here. But that doesn't make everything we've learned obsolete.

What to Leave Behind

  • The compulsion to write every piece of logic explicitly
  • The urge to predefine every conceivable edge case
  • Seeing LLMs as little more than "smart autocomplete"

What to Carry Forward

  • Layer separation, SRP, abstraction
  • Dependency management, interface design
  • Testability and debugging strategies
  • Code reviews and iterative improvement

The tools have changed. The principles of good design—cohesion, coupling, abstraction—haven't.

When designing an MCP, think Adapter Pattern. When writing a skill, think SRP. When building a sub-agent, think Service Layer.

The architectural thinking you've built up is a solid foundation for building agents well.


Limitations: What the Metaphor Hides

The layered architecture analogy is a useful lens—but like any analogy, it papers over a few real-world gotchas worth keeping in mind.

Tokens Are the New RAM

On traditional servers, you watch RAM. With agents, you watch tokens.

Context Window = Working MemoryToken Usage = Memory Footprint

CLAUDE.md, skills, conversation history, MCP responses—it all competes for space in the context window. 200K tokens sounds like a lot, until you're working with a large codebase.

Element
Rough Tokens
Note
CLAUDE.md (well-structured)
500–2,000
Per project
Single skill
300–1,500
Token cost when included in context
Conversation history
Cumulative
Grows throughout the session
MCP response (e.g. DB query)
Variable
Watch for large payloads

Just as you guard against OOM crashes, you should anticipate token explosions. Before writing "analyze all test files" in CLAUDE.md, picture what that means across 50 test files. You don't need exact counts—a rough sense of files and line count is enough.

A useful trick: ask Claude, "If you ran this workflow, which files would you expect to read?" If the list is longer than expected, it's a signal to narrow your instructions or break the task into steps.

Another way to save tokens: extract deterministic logic into scripts.

# Anti-pattern: LLM re-interprets the convention every time
"Create a branch name in the format feature/JIRA-{ticket}-{description}.
Description should be kebab-case. If it's in Korean, translate to English..."

# Better: a script encapsulates the convention
./scripts/create-branch.sh JIRA-1234 "login feature"
feature/JIRA-1234-login-feature

From the LLM's perspective, it just runs the script and uses the output. No need to interpret the convention, no tokens wasted re-deriving it. If a task doesn't require reasoning, offload it to a script.

The Skill Separation Dilemma: Class Explosion and the Law of Demeter

In traditional architecture, blindly applying SRP leads to Class Explosion—hundreds of tiny files that are individually correct but collectively hard to navigate.

Skills have a similar problem. In practice, you often pay an upfront context cost for discoverability—names and descriptions loaded so Claude knows what's available—and you pay more when skills are actually invoked. With 20 skills, that overhead adds up.

# Anti-pattern: Skill Explosion
.claude/skills/
├── review-naming/
└── SKILL.md
├── review-types/
└── SKILL.md
├── review-complexity/
└── SKILL.md
├── review-security/
└── SKILL.md
└── ... (15 more)

It's the agent equivalent of this:

// Class Explosion anti-pattern
class NamingValidator { ... }
class TypeValidator { ... }
class ComplexityValidator { ... }
class SecurityValidator { ... }
// ... 15 more

// Calling side
new NamingValidator().validate(code);
new TypeValidator().validate(code);
// You have to remember which Validator to reach for every time

Think about the Law of Demeter: "Don't talk to strangers." Objects should only interact with their immediate neighbors.

Applied to skills: SKILL.md should be the entry point. Delegate the heavy content to references/.

# Recommended: Progressive Disclosure structure
.claude/skills/
└── code-review/
    ├── SKILL.md                  # "Review my code" load only this
    ├── references/               # Load only when needed
    ├── naming-rules.md       # "What are the naming conventions?" load then
    ├── security-checklist.md
    └── performance-guide.md
    └── scripts/
        └── lint-check.sh

This mirrors the Facade pattern:

// Facade: single entry point, internal delegation
class CodeReviewer {
    private NamingRules namingRules;      // Loaded when needed
    private SecurityChecklist security;   // Loaded when needed

    public Review review(Code code) {
        // Use only what the situation calls for
        if (needsNamingCheck) namingRules.check(code);
        if (needsSecurityCheck) security.check(code);
    }
}

Claude works the same way. SKILL.md acts as the Facade. The files in references/ only enter the context when Claude actually needs them.

Finding the balance:

Situation
Traditional Architecture
Skill Design
Independent workflow
Separate Service class
Separate skill
Detailed rules within same domain
Private method / inner class
Files in references/
Reusable utility
Common module
scripts/ or MCP

Practical Tips: The Setup & Config Pattern

Enough theory. What does this look like in practice?

Slash commands let you blend HITL with automation naturally. Compare it to a familiar CLI pattern:

# Traditional CLI
npm init          # Generate initial structure
npm config set    # Adjust settings later

# Agent commands
/setup            # Analyze repo generate structure
/config           # Adjust existing settings

HITL shines brightest during setup:

/setup

Detected: TypeScript + React, pnpm
Found both vitest and jest as testing frameworks.
  Which should be the default? [vitest / jest]

> vitest

CLAUDE.md created

The agent handles what's obvious automatically—and asks only when something's genuinely ambiguous. You don't predefine every option upfront. You just let it pause at the forks.

The open-source claude-hud plugin demonstrates this pattern cleanly:

# 1. Install plugin
/plugin install claude-hud

# 2. Configure for the repo this is the setup
/claude-hud:setup

What /claude-hud:setup does:

  • Detects the current environment (terminal type, Claude Code version)
  • Auto-configures the statusline
  • Registers the required hooks

The core principle: minimize manual configuration, and only interrupt the user when their input is genuinely needed.


Closing Thoughts

Development in the Software 3.0 era is shifting—from writing code to assembling and directing it.

But the principles behind that assembly aren't foreign. They're the same ones we've been working with.

If MCP, skills, sub-agents, and slash commands feel unfamiliar, try mapping them onto the layered architecture you already know. New technology, viewed through the lens of good engineering principles, tends to make a lot more sense.

One more thing worth holding onto: applications can now ask questions. Rather than trying to spec out every edge case upfront, it's worth considering a different approach—build systems that handle ambiguity by simply asking.

Start building by refactoring your mindset.

All images in this article were created using generative AI.


References

뉴스레터가 발행되면
이메일로 알려드릴게요
구독하기