Stepping into the Software 3.0 Era

김용성 · 토스페이먼츠 Node.js Developer

2026년 3월 17일

This is the English version of a previously published article.

What is Software 3.0?

In June 2025, Andrej Karpathy gave a compelling talk at Y Combinator AI Startup Sc hool, laying out three stages of software evolution.

Software 1.0 is the approach we’ve relied on for decades. You code explicit logic in languages like Python, Java or C++, branch with if-else, iterate with for loops and abstract through functions. It was about spelling out every detail of the "how" in code.

Software 2.0 emerged with the rise of deep learning in the 2010s. You no longer write rules by hand. Instead, you collect data, train a model, and the neural network weights become the program. Think of Tesla Autopilot, where neural networks replaced huge swaths of C++ code.

Software 3.0 is where we are now. Instead of writing code, you tell an LLM “what” you want in natural language and the prompt becomes the program.

As Karpathy puts it, "Software 3.0 is eating 1.0/2.0." The new paradigm is taking over.

As Karpathy puts it: "Software 3.0 is eating 1.0/2.0." The new paradigm is swallowing the old ones.

📺 Andrej Karpathy: Software Is Changing (Again) — Y Combinator AI Startup School

Harness: What Makes LLM Useful

But reality tells a different story.

Simply telling ChatGPT to "fix the bug in our service" doesn’t magically solve the problem. While LLMs are powerful, they can’t read files, call APIs, or access databases on their own.

This is where the concept of a harness comes in.

Originally, a harness refers to the gear strapped to a horse that lets humans make use of a horse's strength and speed. A horse without a harness is just a horse.

The same goes for LLMs. On their own, they're hard to control or put to use. You need the right tools and infrastructure to work around their limitations and connect them to real-world tasks.

LLM Limitation

Role of Harness

Context window limit

Memory management

Hallucination

Fact grounding, RAG

Lack of domain knowledge

Knowledge base

Unable to manage status

Session, Orchestration

Unable to access external system

Tool, MCP

Claude Code: A Harness for Claude

Claude Code is Anthropic's CLI-based coding agent. In essence, it’s a harness for Claude.

Here’s what Claude Code provides:

Feature

Role

File system access

Lets LLM read and write code

Terminal execution

Lets LLM run commands

MCP (Model Context Protocol)

Connects to external systems

Sub-agent

Breaks down complex tasks

Slash command

Routes user intent

Skills

Reusable functional units

Hooks

Event-driven automation

Together, these act as a harness to transform Claude from an LLM engine into an agent that can actually do real work. Sound familiar?

Understanding Claude Code through Software 1.0

MCP, Skills, Sub-agents, Slash command..

New terminology piles up fast, quickly creating cognitive overload. But look past the labels, and the structure is strikingly similar to the layered architecture we've been using for years.

A Closer Look at Each Layer

Slash command = Controller

Slash Commands are the entry point for user requests, just like @RestController in Spring or router.get() in Express. /review triggers review workflow and /refactor triggers refactoring workflow.

# User input
/review PR-1234

# Internally
→ Triggers the review workflow
→ Executes the appropriate sub-agent and skill combination

Sub-agent = Service Layer

Think of sub-agents like a service layer. Instead of coordinating repositories and domain objects, they coordinate skills and combine them to complete a workflow. Each sub-agent maintains its own context, allowing them to operate independently like separate threads.

Skills = Domain-level Component (SRP)

Skills follow the Single Responsibility Principle (SRP), where each one does exactly one thing. "Review code." "Generate tests." "Write docs." Just as bloated classes are a red flag, a skill that tries to do too much is a problem.

MCP = Infrastructure / Adapter

MCP(Model Context Protocol) is the bridge to the outside world, such as databases, APIs, and the filesystem. Like the repository or adapter pattern, it provides a layer of abstraction so internal logic stays independent of external implementations.

CLAUDE.md = package.json

CLAUDE.md plays the same role as package.json or pom.xml. It’s about defining the things that rarely change, such as tech stack, coding conventions, and build commands.

# Example CLAUDE.md

## Tech Stack
- TypeScript + React 18
- Node.js 20+
- pnpm

## Coding Conventions
- Use functional components only
- Write tests using vitest

## Build Commands
- `pnpm build` — Production build
- `pnpm test` — Run all tests

Note: If you find yourself constantly editing CLAUDE.md, that content probably doesn’t belong there. Try passing dynamic information (current issues, today’s priorities, etc.) through conversation or a sub-agent context instead.

The Same Anti-patterns Apply

The same anti-patterns from layered architecture show up in agent design. These names will look familiar.

Traditional

Agent

Symptom

God Class

God skill

One skill handles all of 300 lines

Spaghetti Code

Spaghetti CLAUDE.md

All instructions mixed together without structure

Tight Coupling

Hardcoding without MCP

Direct curl calls, any API changes break everything

Leaky Abstraction

Sub-agent knows MCP internals

Abstraction boundary collapses, nothing is reusable

Circular Dependency

Circular skill calls

A→B→C→A, a risk of infinite loop

The sames goes for code smells.

Feature Envy: One skill over-referencing another’s data
Duplication: Similar prompts scattered across multiple skills
Long Method: One sub-agent chaining ten skill calls in a row

The Key Difference: What Layered Architecture Doesn’t Explain

You can use the layered architecture to mostly understand agent design, but there is still one thing that it doesn't explain. Picture a traditional service layer. When stock runs out mid-order, you throw an OutOfStockException or apply a backorder policy. When payment fails, you retry or return an error.

Everything has to be anticipated ahead of time.

// Traditional service layer
public Order processOrder(OrderRequest request) {
    if (inventory.check(request.getItemId()) < request.getQuantity()) {
        throw new OutOfStockException();  // Predefined exception
        // or
        return backOrderPolicy.apply(request);  // Predefined policy
    }
    // ...
}

But development in real life is different, and we end up asking questions like:

"Wait, I need to loop in the PM on this one." "This wasn’t in the spec. Now what?"

In traditional architecture, there's no way to stop the code. You have to either throw an exception, make an arbitrary call, or log it and move on.

Agents Asks Questions

But agents are different. They support Human-in-the-Loop(HITL).

Request → Agent → Processing...
                      ↓
                 🤔 Uncertain situation
                      ↓
                 "Would you prefer A or B?"
                      ↓
User: "Let's go with A"
                      ↓
                 Continues → Done

With tools like UserAskQuestion, agents can simply pause to ask questions in the middle of the task.

Just like that, exceptions become questions.

Traditional

HITL

All cases must be predefined

When unsure, ask

Exceptions → Error or default value

Exceptions → Asks for user input

100% or 0% automation

Partial automation

Mistakes require rollback

Confirm before making a mistake

When to Ask, When to Act

HITL being available doesn't mean the agent should ask every time. That would make it an annoying tool.

When to ask:

The action is hard to reverse (deletions, deployments, external API calls)
There are multiple options but no clear right answer
The decision carries significant cost or risk

When to act:

The task is safe and repeatable
There is an already established convention
The action is easy to undo

A good agent knows when to ask. A great one knows “when not to.”

Development from 1.0 to 3.0

The dawn of the nascent Software 3.0 doesn't render everything we've learned before obsolete.

Leave Behind

Feeling compelled to “explicate every logic”
Trying to predefine all edge cases
Using LLM only as a “smart auto-complete” tool

Take with you

Layer separation, single responsibility principle, abstraction
Dependency management, interface design
Testability, debugging strategy
Code review, incremental improvements

The tool has changed, but the underlying principle (cohesion, coupling, abstraction) to a good design has not.

Think back to adapter patterns when designing an MCP. Think back to SRP when creating a skill. Think back to server layers when designing a sub-agent.

Your architecture knowledge is the foundation for building great agents.

But Not Everything Transfers

That said, there are some things that can’t be fully explained just with layered architecture. Here are some points to keep in mind.

Token is the New Memory

In traditional server development, RAM was the constraint. With agents, it's tokens.

Context Window = working memory

Token usage = memory usage

CLAUDE.md, Skills, conversation history, and MCP responses—all of it piles into the context window. 200K tokens sounds generous until you're working across a large codebase, and suddenly it's gone.

Element

Token Cost

Note

CLAUDE.md (well-structured)

500–2,000

Per project

One skill

300–1,500

Every time it loads

Conversation history

Cumulative

Through session

MCP response (DB queries, etc.)

Variable

Caution for large responses

Just as you’d try to prevent OOM, token bloat is also something you can anticipate. Before writing "analyze all test files" in CLAUDE.md, picture what that instruction looks like across 50 test files. You don't need an exact token count, a rough sense of file count and line volume is enough.

After writing your instructions, ask Claude directly: "What files would you end up reading if I ran this workflow?" If the scope is larger than expected, that's your signal to either tighten the instructions or break the work into phases.

Another way to save on tokens is to offload deterministic logic into scripts.

# Anti-pattern: LLM re-interprets the convention every time
"Create a branch name in the format feature/JIRA-{ticket}-{description}.
Description should be kebab-case. If it's in Korean, translate to English..."

# Better: a script encapsulates the convention
./scripts/create-branch.sh JIRA-1234 "login feature"
→ feature/JIRA-1234-login-feature

An LLM just runs the script and works with the output. No need to parse conventions or waste token re-deriving what a script can compute in milliseconds. If a task doesn't require judgment, make a tool to take care of it.

The Skill Separation Dilemma: Class Explosion and the Law of Demeter

Following SRP blindly in traditional architecture leads to class explosion. Hundreds of tiny classes sprawl across the codebase, and just mapping the relationships between them becomes a cognitive burden in itself.

The same applies to Skills. At startup, Claude loads every Skill's metadata (name and description) into the system prompt. Twenty Skills means twenty descriptions permanently occupying context.

# Anti-pattern: Skill Explosion
.claude/skills/
├── review-naming/
│   └── SKILL.md
├── review-types/
│   └── SKILL.md
├── review-complexity/
│   └── SKILL.md
├── review-security/
│   └── SKILL.md
└── ... (15 more)

The above is equivalent of writing something like this:

// Class Explosion anti-pattern
class NamingValidator { ... }
class TypeValidator { ... }
class ComplexityValidator { ... }
class SecurityValidator { ... }
// ... 15 more

// Calling side
new NamingValidator().validate(code);
new TypeValidator().validate(code);
// You have to remember which Validator to reach for every time

Think of the Law of Demeter: "Don't talk to strangers." An object should only know about its immediate neighbors. Applied to skill design, it would mean that SKILL.md provides the entry point, and detailed knowledge is delegated to references/.

# Recommended: Progressive Disclosure structure
.claude/skills/
└── code-review/
    ├── SKILL.md                  # "Review my code" → load only this
    ├── references/               # Load only when needed
    │   ├── naming-rules.md       # "What are the naming conventions?" → load then
    │   ├── security-checklist.md
    │   └── performance-guide.md
    └── scripts/
        └── lint-check.sh

This is similar to a facade pattern:

// Facade: single entry point, internal delegation
class CodeReviewer {
    private NamingRules namingRules;      // Loaded when needed
    private SecurityChecklist security;   // Loaded when needed

    public Review review(Code code) {
        // Use only what the situation calls for
        if (needsNamingCheck) namingRules.check(code);
        if (needsSecurityCheck) security.check(code);
    }
}

Claude works much in the same way. SKILL.md acts as the facade, and files inside references/ are only loaded to context when Claude deems it necessary.

Finding a balance:

Situation

Traditional Architecture

Skill Design

Independent workflow

Separate service class

Separate skill

Domain-specific rules

Private method / inner class

references/ file

Reusable utility

Common module

scripts/ or MCP

Practical Tips: Setup & Config Patterns

But how does this actually work in practice?

Slash commands let you easily blend HITL with automation. Compare it to a CLI pattern you already know:

# Traditional CLI
npm init          # Generate initial structure
npm config set    # Adjust settings later

# Agent commands
/setup            # Analyze repo → generate structure
/config           # Adjust existing settings

HITL really earns its place is in the setup process:

/setup

→ Detected: TypeScript + React, pnpm
→ Found both vitest and jest as testing frameworks.
  Which should be the default? [vitest / jest]

> vitest

→ CLAUDE.md created

The agent detects the environment automatically, but flags ambiguous parts with a question. You don't need to predefine everything. Let the agent handle the clear-cut cases and step in only when things are uncertain.

The claude-hud plugin, an open source project, demonstrates this pattern well:

# 1. Install plugin
/plugin install claude-hud

# 2. Configure for the repo — this is the setup
/claude-hud:setup

What /claude-hud:setup does:

Detect the current environment (terminal type, Claude Code version, etc.)
Automatically configures statusline settings
Registers the necessary hooks

The agent needs to ask questions only when it should to keep manual input to a minimum.

Closing Thoughts

Development in the Software 3.0 era is shifting from writing code to assembling and instructing it.

However, at its core, the principles of code assembly remain largely the same.

MCP, skills, sub-agents, slash commands—if these still feel foreign, map them onto the layered architecture you already know. The same engineering principles hold, and the design patterns are already there, waiting to be recognized.

One more shift worth noting is that applications can now ask questions. Instead of trying to predefine everything from the start, let the agent question ambiguity.

Start building by refactoring your mindset.

*All images used in this article have been created with generative AI.

References

Andrej Karpathy: Software Is Changing (Again) — Y Combinator
claude-hud — Claude Code plugin example
Claude Code Official Documentation

댓글 관련 문의: toss-tech@toss.im

김용성님의 다른 글

Stepping into the Software 3.0 Era

What is Software 3.0?

Harness: What Makes LLM Useful

Claude Code: A Harness for Claude

Understanding Claude Code through Software 1.0

A Closer Look at Each Layer

The Same Anti-patterns Apply

The Key Difference: What Layered Architecture Doesn’t Explain

Agents Asks Questions

When to Ask, When to Act

Development from 1.0 to 3.0

Leave Behind

Take with you

But Not Everything Transfers

Token is the New Memory

The Skill Separation Dilemma: Class Explosion and the Law of Demeter

Practical Tips: Setup & Config Patterns

Closing Thoughts

References

Software 3.0 시대, Harness를 통한 조직 생산성 저점 높이기

소프트웨어 3.0 시대를 맞이하며

연관 콘텐츠

Software 3.0 시대, Harness를 통한 조직 생산성 저점 높이기

.css-p4abj2{display:contents;line-height:1.55;}.css-1kxrhf3{white-space:pre-wrap;}What is Software 3.0?

Harness: What Makes LLM Useful

Claude Code: A Harness for Claude

Understanding Claude Code through Software 1.0

A Closer Look at Each Layer

The Same Anti-patterns Apply

The Key Difference: What Layered Architecture Doesn’t Explain

Agents Asks Questions

When to Ask, When to Act

Development from 1.0 to 3.0

Leave Behind

Take with you

But Not Everything Transfers

Token is the New Memory

The Skill Separation Dilemma: Class Explosion and the Law of Demeter

Practical Tips: Setup & Config Patterns

Closing Thoughts

References

Software 3.0 시대, Harness를 통한 조직 생산성 저점 높이기

소프트웨어 3.0 시대를 맞이하며

연관 콘텐츠

Software 3.0 시대, Harness를 통한 조직 생산성 저점 높이기

What is Software 3.0?