Stepping into the Software 3.0 Era

김용성 · 토스페이먼츠 Node.js Developer
2026년 3월 17일

This is the English version of a previously published article.

What is Software 3.0?

In June 2025, Andrej Karpathy gave a compelling talk at Y Combinator AI Startup Sc hool, laying out three stages of software evolution.

Software 1.0 is the approach we’ve relied on for decades. You code explicit logic in languages like Python, Java or C++, branch with if-else, iterate with for loops and abstract through functions. It was about spelling out every detail of the "how" in code.

Software 2.0 emerged with the rise of deep learning in the 2010s. You no longer write rules by hand. Instead, you collect data, train a model, and the neural network weights become the program. Think of Tesla Autopilot, where neural networks replaced huge swaths of C++ code.

Software 3.0 is where we are now. Instead of writing code, you tell an LLM “what” you want in natural language and the prompt becomes the program.

As Karpathy puts it, "Software 3.0 is eating 1.0/2.0." The new paradigm is taking over.

As Karpathy puts it: "Software 3.0 is eating 1.0/2.0." The new paradigm is swallowing the old ones.

📺 Andrej Karpathy: Software Is Changing (Again) — Y Combinator AI Startup School


Harness: What Makes LLM Useful

But reality tells a different story.

Simply telling ChatGPT to "fix the bug in our service" doesn’t magically solve the problem. While LLMs are powerful, they can’t read files, call APIs, or access databases on their own.

This is where the concept of a harness comes in.

Originally, a harness refers to the gear strapped to a horse that lets humans make use of a horse's strength and speed. A horse without a harness is just a horse.

The same goes for LLMs. On their own, they're hard to control or put to use. You need the right tools and infrastructure to work around their limitations and connect them to real-world tasks.

LLM Limitation
Role of Harness
Context window limit
Memory management
Hallucination
Fact grounding, RAG
Lack of domain knowledge
Knowledge base
Unable to manage status
Session, Orchestration
Unable to access external system
Tool, MCP

Claude Code: A Harness for Claude

Claude Code is Anthropic's CLI-based coding agent. In essence, it’s a harness for Claude.

Here’s what Claude Code provides:

Feature
Role
File system access
Lets LLM read and write code
Terminal execution
Lets LLM run commands
MCP (Model Context Protocol)
Connects to external systems
Sub-agent
Breaks down complex tasks
Slash command
Routes user intent
Skills
Reusable functional units
Hooks
Event-driven automation

Together, these act as a harness to transform Claude from an LLM engine into an agent that can actually do real work. Sound familiar?


Understanding Claude Code through Software 1.0

MCP, Skills, Sub-agents, Slash command..

New terminology piles up fast, quickly creating cognitive overload. But look past the labels, and the structure is strikingly similar to the layered architecture we've been using for years.

A Closer Look at Each Layer

Slash command = Controller

Slash Commands are the entry point for user requests, just like @RestController in Spring or router.get() in Express. /review triggers review workflow and /refactor triggers refactoring workflow.

# User input
/review PR-1234

# Internally
Triggers the review workflow
Executes the appropriate sub-agent and skill combination

Sub-agent = Service Layer

Think of sub-agents like a service layer. Instead of coordinating repositories and domain objects, they coordinate skills and combine them to complete a workflow. Each sub-agent maintains its own context, allowing them to operate independently like separate threads.

Skills = Domain-level Component (SRP)

Skills follow the Single Responsibility Principle (SRP), where each one does exactly one thing. "Review code." "Generate tests." "Write docs." Just as bloated classes are a red flag, a skill that tries to do too much is a problem.

MCP = Infrastructure / Adapter

MCP(Model Context Protocol) is the bridge to the outside world, such as databases, APIs, and the filesystem. Like the repository or adapter pattern, it provides a layer of abstraction so internal logic stays independent of external implementations.

CLAUDE.md = package.json

CLAUDE.md plays the same role as package.json or pom.xml. It’s about defining the things that rarely change, such as tech stack, coding conventions, and build commands.

# Example CLAUDE.md

## Tech Stack
- TypeScript + React 18
- Node.js 20+
- pnpm

## Coding Conventions
- Use functional components only
- Write tests using vitest

## Build Commands
- `pnpm build` Production build
- `pnpm test` Run all tests

Note: If you find yourself constantly editing CLAUDE.md, that content probably doesn’t belong there. Try passing dynamic information (current issues, today’s priorities, etc.) through conversation or a sub-agent context instead.


The Same Anti-patterns Apply

The same anti-patterns from layered architecture show up in agent design. These names will look familiar.

Traditional
Agent
Symptom
God Class
God skill
One skill handles all of 300 lines
Spaghetti Code
Spaghetti CLAUDE.md
All instructions mixed together without structure
Tight Coupling
Hardcoding without MCP
Direct curl calls, any API changes break everything
Leaky Abstraction
Sub-agent knows MCP internals
Abstraction boundary collapses, nothing is reusable
Circular Dependency
Circular skill calls
A→B→C→A, a risk of infinite loop

The sames goes for code smells.

  • Feature Envy: One skill over-referencing another’s data
  • Duplication: Similar prompts scattered across multiple skills
  • Long Method: One sub-agent chaining ten skill calls in a row

The Key Difference: What Layered Architecture Doesn’t Explain

You can use the layered architecture to mostly understand agent design, but there is still one thing that it doesn't explain. Picture a traditional service layer. When stock runs out mid-order, you throw an OutOfStockException or apply a backorder policy. When payment fails, you retry or return an error.

Everything has to be anticipated ahead of time.

// Traditional service layer
public Order processOrder(OrderRequest request) {
    if (inventory.check(request.getItemId()) < request.getQuantity()) {
        throw new OutOfStockException();  // Predefined exception
        // or
        return backOrderPolicy.apply(request);  // Predefined policy
    }
    // ...
}

But development in real life is different, and we end up asking questions like:

"Wait, I need to loop in the PM on this one." "This wasn’t in the spec. Now what?"

In traditional architecture, there's no way to stop the code. You have to either throw an exception, make an arbitrary call, or log it and move on.

Agents Asks Questions

But agents are different. They support Human-in-the-Loop(HITL).

Request Agent Processing...
                      
                 🤔 Uncertain situation
                      
                 "Would you prefer A or B?"
                      
User: "Let's go with A"
                      
                 Continues Done

With tools like UserAskQuestion, agents can simply pause to ask questions in the middle of the task.

Just like that, exceptions become questions.

Traditional
HITL
All cases must be predefined
When unsure, ask
Exceptions → Error or default value
Exceptions → Asks for user input
100% or 0% automation
Partial automation
Mistakes require rollback
Confirm before making a mistake

When to Ask, When to Act

HITL being available doesn't mean the agent should ask every time. That would make it an annoying tool.

When to ask:

  • The action is hard to reverse (deletions, deployments, external API calls)
  • There are multiple options but no clear right answer
  • The decision carries significant cost or risk

When to act:

  • The task is safe and repeatable
  • There is an already established convention
  • The action is easy to undo

A good agent knows when to ask. A great one knows “when not to.”


Development from 1.0 to 3.0

The dawn of the nascent Software 3.0 doesn't render everything we've learned before obsolete.

Leave Behind

  • Feeling compelled to “explicate every logic”
  • Trying to predefine all edge cases
  • Using LLM only as a “smart auto-complete” tool

Take with you

  • Layer separation, single responsibility principle, abstraction
  • Dependency management, interface design
  • Testability, debugging strategy
  • Code review, incremental improvements

The tool has changed, but the underlying principle (cohesion, coupling, abstraction) to a good design has not.

Think back to adapter patterns when designing an MCP. Think back to SRP when creating a skill. Think back to server layers when designing a sub-agent.

Your architecture knowledge is the foundation for building great agents.


But Not Everything Transfers

That said, there are some things that can’t be fully explained just with layered architecture. Here are some points to keep in mind.

Token is the New Memory

In traditional server development, RAM was the constraint. With agents, it's tokens.

Context Window = working memory

Token usage = memory usage

CLAUDE.md, Skills, conversation history, and MCP responses—all of it piles into the context window. 200K tokens sounds generous until you're working across a large codebase, and suddenly it's gone.

Element
Token Cost
Note
CLAUDE.md (well-structured)
500–2,000
Per project
One skill
300–1,500
Every time it loads
Conversation history
Cumulative
Through session
MCP response (DB queries, etc.)
Variable
Caution for large responses

Just as you’d try to prevent OOM, token bloat is also something you can anticipate. Before writing "analyze all test files" in CLAUDE.md, picture what that instruction looks like across 50 test files. You don't need an exact token count, a rough sense of file count and line volume is enough.

After writing your instructions, ask Claude directly: "What files would you end up reading if I ran this workflow?" If the scope is larger than expected, that's your signal to either tighten the instructions or break the work into phases.

Another way to save on tokens is to offload deterministic logic into scripts.

# Anti-pattern: LLM re-interprets the convention every time
"Create a branch name in the format feature/JIRA-{ticket}-{description}.
Description should be kebab-case. If it's in Korean, translate to English..."

# Better: a script encapsulates the convention
./scripts/create-branch.sh JIRA-1234 "login feature"
feature/JIRA-1234-login-feature

An LLM just runs the script and works with the output. No need to parse conventions or waste token re-deriving what a script can compute in milliseconds. If a task doesn't require judgment, make a tool to take care of it.

The Skill Separation Dilemma: Class Explosion and the Law of Demeter

Following SRP blindly in traditional architecture leads to class explosion. Hundreds of tiny classes sprawl across the codebase, and just mapping the relationships between them becomes a cognitive burden in itself.

The same applies to Skills. At startup, Claude loads every Skill's metadata (name and description) into the system prompt. Twenty Skills means twenty descriptions permanently occupying context.

# Anti-pattern: Skill Explosion
.claude/skills/
├── review-naming/
└── SKILL.md
├── review-types/
└── SKILL.md
├── review-complexity/
└── SKILL.md
├── review-security/
└── SKILL.md
└── ... (15 more)

The above is equivalent of writing something like this:

// Class Explosion anti-pattern
class NamingValidator { ... }
class TypeValidator { ... }
class ComplexityValidator { ... }
class SecurityValidator { ... }
// ... 15 more

// Calling side
new NamingValidator().validate(code);
new TypeValidator().validate(code);
// You have to remember which Validator to reach for every time

Think of the Law of Demeter: "Don't talk to strangers." An object should only know about its immediate neighbors. Applied to skill design, it would mean that SKILL.md provides the entry point, and detailed knowledge is delegated to references/.

# Recommended: Progressive Disclosure structure
.claude/skills/
└── code-review/
    ├── SKILL.md                  # "Review my code" load only this
    ├── references/               # Load only when needed
    ├── naming-rules.md       # "What are the naming conventions?" load then
    ├── security-checklist.md
    └── performance-guide.md
    └── scripts/
        └── lint-check.sh

This is similar to a facade pattern:

// Facade: single entry point, internal delegation
class CodeReviewer {
    private NamingRules namingRules;      // Loaded when needed
    private SecurityChecklist security;   // Loaded when needed

    public Review review(Code code) {
        // Use only what the situation calls for
        if (needsNamingCheck) namingRules.check(code);
        if (needsSecurityCheck) security.check(code);
    }
}

Claude works much in the same way. SKILL.md acts as the facade, and files inside references/ are only loaded to context when Claude deems it necessary.

Finding a balance:

Situation
Traditional Architecture
Skill Design
Independent workflow
Separate service class
Separate skill
Domain-specific rules
Private method / inner class
references/ file
Reusable utility
Common module
scripts/ or MCP

Practical Tips: Setup & Config Patterns

But how does this actually work in practice?

Slash commands let you easily blend HITL with automation. Compare it to a CLI pattern you already know:

# Traditional CLI
npm init          # Generate initial structure
npm config set    # Adjust settings later

# Agent commands
/setup            # Analyze repo generate structure
/config           # Adjust existing settings

HITL really earns its place is in the setup process:

/setup

Detected: TypeScript + React, pnpm
Found both vitest and jest as testing frameworks.
  Which should be the default? [vitest / jest]

> vitest

CLAUDE.md created

The agent detects the environment automatically, but flags ambiguous parts with a question. You don't need to predefine everything. Let the agent handle the clear-cut cases and step in only when things are uncertain.

The claude-hud plugin, an open source project, demonstrates this pattern well:

# 1. Install plugin
/plugin install claude-hud

# 2. Configure for the repo this is the setup
/claude-hud:setup

What /claude-hud:setup does:

  • Detect the current environment (terminal type, Claude Code version, etc.)
  • Automatically configures statusline settings
  • Registers the necessary hooks

The agent needs to ask questions only when it should to keep manual input to a minimum.


Closing Thoughts

Development in the Software 3.0 era is shifting from writing code to assembling and instructing it.

However, at its core, the principles of code assembly remain largely the same.

MCP, skills, sub-agents, slash commands—if these still feel foreign, map them onto the layered architecture you already know. The same engineering principles hold, and the design patterns are already there, waiting to be recognized.

One more shift worth noting is that applications can now ask questions. Instead of trying to predefine everything from the start, let the agent question ambiguity.

Start building by refactoring your mindset.

*All images used in this article have been created with generative AI.


References

뉴스레터가 발행되면
이메일로 알려드릴게요
구독하기