Persistent Sandboxes Change the Shape of AI Coding Agents

Vercel made Sandbox persistence generally available on May 26, and the interesting part is not the API surface. It is the architectural pressure behind it.

For the last year, most AI coding demos have quietly depended on disposable environments. Spin up a container, clone a repo, run a command, throw the machine away. That is fine for a narrow benchmark or a one-shot code generation task. It starts to break down when the agent is doing real engineering work: installing dependencies, discovering the shape of a monorepo, running tests repeatedly, generating artifacts, debugging a dev server, and coming back after a pause.

Persistent sandboxes are a recognition that agent work has state. Not just chat history. Filesystem state, package-manager state, build caches, generated files, logs, tool results, and enough identity to resume the same execution environment later.

That sounds small until you build around it.

The old disposable model was useful, but leaky

A disposable sandbox gives you a clean boundary. That is its strength. You can run untrusted code, limit blast radius, and avoid carrying accidental state between jobs. For many workloads, that is exactly what you want.

But coding agents are rarely pure functions. A useful agent often spends a meaningful part of its time making the environment usable before it can solve the actual problem. It installs packages, warms caches, starts services, runs migrations against a test database, and learns which test command is real instead of aspirational.

If the environment disappears after every session, the agent pays that tax again and again. Worse, the product design tends to compensate by stuffing more context into prompts. The model gets a larger transcript, but the machine loses the concrete state that made the work reproducible.

That is the wrong trade. A transcript can explain what happened. It cannot replace a filesystem that still contains the failed test output, the generated fixture, or the package tree the agent actually used.

What changed

With the GA release, Vercel Sandboxes now save and restore filesystem state between sessions by default. A sandbox can have a durable name, so product code can create it, stop it, retrieve it later, and continue working from the latest snapshot. Vercel also added practical lifecycle pieces around that model: `getOrCreate`, `fork`, `delete`, resume hooks, tags, and richer `stop()` metadata for snapshot, CPU, and transfer information.

That matters because it moves the abstraction from "run this command in an isolated box" toward "manage this named execution workspace."

Those are different products to build.

The important part is identity. Once the environment has a stable name, the rest of your system can reason about it. You can show it in an admin UI, clean it up with a retention policy, tag it by tenant, and resume it when a user comes back. The sandbox stops being a hidden command runner and becomes a resource your application owns.

Persistence is not memory

This is where teams will get sloppy if they are not careful.

A persistent sandbox is not agent memory in the product sense. It should not become an unbounded attic for every thought, secret, dependency, and failed experiment an agent has ever touched. Filesystem persistence is concrete operational state. Memory is selected knowledge that should influence future behavior.

Those need different rules.

For example, keeping `node_modules`, a generated SQLite database, a failed test artifact, and a build cache may be reasonable inside a sandbox snapshot. Keeping a vague note like "the auth system is weird" as durable product memory is worse than useless unless it is scoped and tied to evidence.

The reverse is also true. A concise repository memory such as "API routes live under `src/server/routes`, not `app/api`" does not need to live as a random text file inside a sandbox.

Treat sandbox persistence as resume state. Treat memory as reusable judgment. Mixing them creates systems that are hard to inspect and harder to trust.

The cost model becomes part of the design

Vercel made persistence the default, but automatic snapshots consume snapshot storage billed separately from compute. That detail should shape how you build.

Persistent by default is good for long-running or repeatedly resumed work. It is not automatically good for every agent task. If a job only needs a clean environment to lint one patch and return a result, persistence may be waste. If a job handles sensitive customer data or creates huge artifacts, persistence can quietly become a cleanup problem.

I would separate workloads into at least three classes:

ephemeral checks, where the sandbox should be non-persistent and discarded
resumable investigations, where persistence is useful for hours or days
long-lived workspaces, where persistence is intentional and governed by retention, ownership, and audit rules

That classification should be explicit in code. Do not let every agent run inherit the same default because the SDK made the easy path convenient.

Forking is the sleeper feature

The addition of `Sandbox.fork()` may end up being as important as persistence itself.

Agents often need to explore without committing to a path. One branch tries the minimal fix. Another branch upgrades a dependency. Another branch rewrites a component. In a persistent model with forking, a warmed environment can become the base for controlled experiments.

The risk is that product teams expose forking as magic instead of policy. Forks need names, lineage, cleanup, and a rule for which artifacts are promoted back to the main run. Otherwise you get a pile of expensive snapshots and no confidence about which one produced the final diff.

What a production workflow should look like

If I were building this into a real coding-agent product, I would not start with the model prompt. I would start with the run lifecycle.

Something like this:

1type AgentRun = {
2  id: string
3  tenantId: string
4  repoId: string
5  sandboxName: string
6  persistence: "ephemeral" | "resumable" | "workspace"
7  status:
8    | "queued"
9    | "running"
10    | "waiting_for_approval"
11    | "ready_for_review"
12    | "failed"
13    | "expired"
14  retentionUntil: string
15  lastSnapshotId?: string
16}

That gives the sandbox a place in the system instead of treating it as a hidden implementation detail. Then I would add a few boring rules:

every persistent sandbox has an owner, purpose, and expiration
every run records the commands that changed filesystem state
secrets are injected at execution time, not written into durable files
large artifacts are either promoted intentionally or deleted before snapshot
non-persistent mode is used for simple checks by default
resumed sessions validate repository state before continuing

None of that is glamorous. It is the difference between a demo that works twice and a system a team can operate.

The practical takeaway

Persistent sandboxes are not exciting because they save an `npm install`. They are exciting because they give agent work a durable place to happen.

That changes the product boundary. The agent is no longer just a model call with tools. It is a worker operating inside a named environment with state, cost, risk, and lifecycle. Once you see it that way, many design decisions become clearer: what should persist, what should be thrown away, what needs approval, what needs audit, and what must be cleaned up.

The sensible response is not to make every sandbox persistent forever. It is to treat persistence as an architectural primitive. Use it for investigations, resumable development tasks, and multi-step workflows where the environment itself contains useful progress. Turn it off for throwaway checks. Fork it when exploration is cheaper than arguing with a plan. Delete it when its job is done.

That is the shape AI coding tools are moving toward: not smarter autocomplete, but recoverable engineering workspaces. Persistence is one of the pieces that makes that shift real.