ai agentsautonomyexperimentsbusiness operations

The Karpathy Loop: What Happens When AI Starts Running Its Own Experiments

What Andrej Karpathy's autonomous experiment loop actually means, why Shopify's overnight replication matters, and how the same pattern could reshape business operations.

March 23, 20266 min readBy AndresUpdated March 23, 2026

Everyone talks about AI like it's a tool you pick up and use — like a calculator, but smarter. Nobody tells you what happens when the tool picks itself up and starts using itself.

TL;DR: Andrej Karpathy described a pattern where AI agents run their own experiments autonomously — hypothesize, test, evaluate, iterate. Shopify replicated it overnight. The same loop applies to business operations: market research, content testing, pricing experiments. The constraint is not capability but oversight architecture.

Last week, Andrej Karpathy — one of the people who built the AI systems you're using right now — ran an experiment that quietly changed the conversation about what AI agents can actually do. And if you're not an AI researcher, you probably missed it. So let me explain why it matters.

What Actually Happened

Karpathy pointed an AI coding agent at a specific problem: optimize the training process for a small language model. Then he walked away.

Two days later, the agent had run 700 experiments on its own. No human reviewed each one. No human approved each step. The agent designed the experiments, ran them, analyzed the results, and used what it learned to design the next round. Over and over and over — 700 times.

The result: 20 optimizations that delivered an 11% speedup in how fast the model could be trained. That's not a rounding error. In the world of AI training, where companies spend millions on compute, an 11% improvement is real money.

Karpathy's comment: "the final boss battle for AI labs." Meaning — this is the capability that changes how AI research itself gets done.

Then Shopify's CEO Did It Overnight

Here's where it gets interesting. Tobias Lutke — the CEO of Shopify — saw what Karpathy published, pointed the same kind of agent at his own company's internal data, and let it run overnight.

By morning: 37 experiments completed. 19% performance improvement on whatever he was optimizing.

One night. No team. No sprint planning. No standup meeting. An AI agent ran 37 experiments while the CEO slept, and delivered a result that would have taken a human team days or weeks to produce.

Why This Is Different From Everything Before

So here's why this matters if you're not a researcher or a developer.

Up until now, the AI tools most people use work like a conversation. You ask a question, you get an answer. You give an instruction, you get an output. It's a back-and-forth — you're always in the loop. The AI does one thing, then waits for you.

The Karpathy Loop is different. The AI doesn't wait. It runs an experiment, looks at the result, decides what to try next based on what it learned, and runs the next experiment. Then the next one. Then the next one. Seven hundred times in two days, with no human checking in between.

This is the difference between a tool and a worker. A tool does what you tell it, once. A worker takes a goal, figures out the steps, and keeps going until the job is done. What Karpathy demonstrated is an AI agent that works like a worker — not a fancy chatbot that works like a tool.

"But I'm Not Training AI Models"

Right. And that's the part most of the coverage misses.

The specific task — optimizing AI model training — is narrow and technical. Most people will never do it. But the pattern is universal. Think of it this way: Karpathy showed that the engine works. The engine doesn't care whether it's optimizing model training, testing marketing headlines, running pricing experiments, or iterating on product designs. The loop is the same — try something, measure the result, decide what to try next, repeat.

Here's what I want you to think about. Right now, when you want to test something in your business — a new email subject line, a different pricing tier, a change to your onboarding flow — you design the test, run it, wait for results, analyze them, and decide what to do next. That cycle takes days or weeks. And you probably run one or two tests at a time because you're one person.

Now imagine pointing an agent at that same problem and telling it: "Run as many tests as you need. Here's what success looks like. Go."

That's the Karpathy Loop applied to your world. Not 700 experiments on AI training. 700 experiments on whatever your business needs optimized. While you sleep.

We're not there yet for most business applications. The infrastructure, the integrations, the data pipelines — there's real work between "Karpathy did it" and "you can do it for your email campaigns." But the gap is closing fast. Karpathy did this with tools that exist today, not a research prototype. Shopify's CEO replicated it overnight with production data.

What This Means Going Forward

Karpathy said "all LLM frontier labs will do this" — meaning every major AI company will use autonomous agent loops to improve their own models. AI improving AI. That's the immediate impact on the industry.

But the broader signal is the one worth paying attention to. The question is no longer "can AI do useful work without human supervision?" Karpathy just answered that. The question now is: "what work should you hand to an autonomous loop, and what should you keep?"

Here's how I'd think about it:

Tasks with clear success metrics are first. If you can define "better" as a number — faster, cheaper, higher conversion rate — an autonomous loop can optimize it. If "better" requires taste, judgment, or context a machine doesn't have, it stays with you.
Volume is the advantage, not intelligence. The agent that ran 700 experiments wasn't smarter than Karpathy. It was faster. It could test more variations in two days than a human could test in months. If your bottleneck is "I can only test one thing at a time," this is the pattern that breaks that bottleneck.
The overnight test is the entry point. You don't need to rebuild your business around autonomous agents tomorrow. But the next time you have a question that could be answered by running 50 variations of something — ask yourself whether an agent could run them overnight while you sleep. That's the Karpathy Loop in its simplest form.

The AI research world just demonstrated that autonomous agent loops produce real results on real problems. The conversation has moved from "will this work?" to "what else can it work on?"

The Karpathy Loop: What Happens When AI Starts Running Its Own Experiments

What Actually Happened

Then Shopify's CEO Did It Overnight

Why This Is Different From Everything Before

"But I'm Not Training AI Models"

What This Means Going Forward

Related posts

Vibe Coding: What It Actually Means That Anyone Can Build Software Now

Setting Up OpenClaw Without Getting Breached: What "Non-Technical Setup" Actually Means

AI Agents That Improve Their Own Instructions: The Karpathy Loop for Skill Files