Sandbox

Every Chat and App stage runs inside a sandbox: an isolated computer the agent borrows for one run, then discards.

Why this matters in one sentence. The sandbox is what makes it safe to say "yes, the agent can pip install whatever it needs, run shell commands, write files, and try things that don't work", because the blast radius is one container that gets destroyed when the run ends.

The three providers

Aitroop abstracts the runtime behind a single ISandbox interface, then ships three implementations. Which one a given run uses depends on workspace settings, plan, and the work being done. The agent's code never knows the difference, but the operational characteristics differ a lot.

Provider	Where it runs	Use it when	Limits
`host` local	On the Aitroop server machine itself, under a per-user temp directory.	Self-hosted dev installs, smoke tests, and very lightweight stages where spinning up a remote VM would be overkill.	No real isolation between users; host mode is intended for single-tenant / dev only.
`e2b` default cloud	A fresh Firecracker microVM provisioned by E2B on demand.	Most chats and most App runs. Boots in <1 second, lasts up to 7 days, supports the full `claude-code` template.	Per-sandbox RAM/CPU set by your plan. Inactive sandboxes are paused after 5 minutes of idle.
`daytona` heavy	A managed dev container on Daytona.	Long-running work, larger codebases, runs that need persistent caches between stages.	Slower cold start than E2B. Better for stages that already amortize the boot cost.

What the agent can do inside

The sandbox interface exposes two surfaces, commands and files, and a few lifecycle hooks:

Commands

sandbox.commands.run("python script.py")
sandbox.commands.run({ cmd: "node", args: ["build.js"] })
// returns: { exitCode, stdout, stderr }

Shell commands run as the sandbox user (not root) in projectDir.
You can install packages: pip install, npm install, apt-get on E2B and Daytona, your own permissions on host.
Standard out and standard error are streamed back to the run log; the agent sees them as the command finishes.
Timeout per command is bounded by the stage's timeout_ms.

Files

sandbox.files.write(path, contents)
sandbox.files.read(path) // utf-8 by default
sandbox.files.list(dir)
sandbox.files.stat(path)
sandbox.files.remove(path)
sandbox.files.rename(from, to)
sandbox.files.mkdir(path)
sandbox.files.copy(from, to)

Files live under projectDir (typically /home/user/project on E2B). The directory is preserved across commands within the same run but wiped when the sandbox is destroyed.

Lifecycle hooks

sandbox.pause(): voluntarily release the sandbox. The next call boots a fresh one.
sandbox.keepAlive(timeoutMs): extend the idle timeout for a long-running stage.

Timeouts you should know

Three timers govern every run:

Constant	Default	What it controls
`SANDBOX_IDLE_TIMEOUT_MS`	5 min	How long an unused sandbox sticks around before being paused.
`AGENT_RUN_TIMEOUT_MS`	7 days	Hard ceiling on a single run, regardless of stage timeouts.
`stage.timeout_ms`	3 min	Per-stage wall-clock budget. Override in the App definition.

Most failures you'll see are stage timeouts, not sandbox timeouts. If a research stage exceeds 3 minutes, the stage fails, even though the sandbox would have happily run for hours. For heavy research, set timeout_ms: 600000 (10 min) in the App definition.

What persists, what doesn't

Within a single run

All stages of a single App execution share the same sandbox. So:

Files written by Stage 1 are readable by Stage 2.
Packages pip installed by Stage 1 are available to Stage 2.
Environment variables set in Stage 1 persist through to Stage 2.

This is what makes "write a report in Stage 1, then convert it to PDF in Stage 2" just work: Stage 2 finds the report exactly where Stage 1 left it.

Across runs

Nothing persists between runs. Each execution gets a brand-new sandbox, even for the same App. If your App needs to remember something between runs (a list of already-seen items, a counter, a cache), store it externally:

As an Artifact, then read it back from a Connect (Drive, S3, Notion) on the next run.
Via a webhook to your own state store.
By chaining App outputs into App inputs through a Schedule.

Across chats

Same rule: every chat gets its own sandbox. Two chats opened side by side are completely separate filesystems. The agent in one cannot see what the agent in the other is doing.

Networking and Connects

Sandboxes have full outbound internet access. The agent can fetch URLs, hit public APIs, install packages from npm/pypi, and so on.

For Connects (OAuth integrations), Aitroop injects credentials at run start, not by writing tokens to the sandbox filesystem, but through internal proxy calls that never expose the raw token. If the agent runs gh repo list or calls the Gmail API, the request goes through a proxy that signs it with your OAuth token. The token is never visible inside the sandbox.

What this means for security. Even if a stage's script_code is malicious, it cannot exfiltrate your Connect tokens; they're not in the filesystem or env vars at all. The worst case is that the script uses the Connect to do something visible (send an email, create a record), which shows up in your provider's audit log just like any other action.

Choosing a provider

If you're on the hosted plan

You don't choose. The platform picks. For 99% of work it picks e2b. For specific stage types, for example, a stage that needs to run a full IDE workload, it may pick daytona.

If you're self-hosting

Set the provider in config.yaml or as an env var:

# Use the local host machine (single-tenant only)
SANDBOX_PROVIDER=host

# Use E2B (recommended; needs E2B_API_KEY)
SANDBOX_PROVIDER=e2b
E2B_API_KEY=e2b_...
E2B_TEMPLATE=claude-code

# Use Daytona (needs DAYTONA_API_KEY)
SANDBOX_PROVIDER=daytona
DAYTONA_API_KEY=...
DAYTONA_TARGET=...

Mixed provider routing ("host for cheap stuff, E2B for heavy stuff") is on the roadmap but not yet a first-class config knob. For now, pick one provider per deployment.

What you see in the UI

The sandbox is mostly invisible (that's the point), but two surfaces let you peek in:

The run log

During an App run, the right pane shows every tool call the agent makes. Each shell command and file operation is logged with its arguments, exit code, and (truncated) output:

$ bash -c "pip install pandas"
→ exit 0 · 8.2s
  Successfully installed pandas-2.3.1 numpy-2.1.0

$ python analyze.py
→ exit 0 · 4.1s
  Wrote 1,247 rows to /home/user/project/output.csv

The file tree (for code-heavy runs)

When a stage produces a code Artifact, the preview pane shows the full project tree as it existed in the sandbox at end-of-run. You can click any file to view it, then download or share.

Lifecycle: how a sandbox is acquired and held

Aitroop doesn't spawn a sandbox per request. Each user has at most one warm sandbox, recorded in the sb_container row keyed by their internal username. The same sandbox is reused across consecutive chats and App runs; that's what keeps cold-start cost off the common path. Lifecycle state lives on that one row:

`sb_container` column	Meaning
`username`	Primary key. The user's stable internal identifier.
`sandbox_id`	Provider-side identifier (E2B sandbox ID, Daytona workspace ID, host PID-prefix). Empty during creation.
`status`	`ready` when usable. `creating` while a worker is provisioning a fresh one.
`paused`	`true` after the provider auto-paused the sandbox for idleness. `false` once it's resumed and warm.
`type`	Which provider owns it (`e2b` / `daytona` / `local`).
`lock_version`	Optimistic-lock counter. Bumped on every mutation; concurrent writers with a stale version are rejected.
`total_usage_seconds`	Cumulative wall-clock time the sandbox has been active; the billable metric.
`last_activity_at`	Last successful SDK touch. Drives the 5-minute idle-pause timer.

Concurrent acquire with optimistic locking

Two reasons a sandbox might be touched simultaneously: a chat reply and a scheduled App starting at the same instant, or two server instances handling traffic for the same user. The platform serialises this with CAS on lock_version:

The caller reads the current row, including lock_version.
It tries to claim the lock with an UPDATE conditioned on the version it just saw, incrementing it atomically.
Success: that caller owns the sandbox; everyone else retries the read after ~100 ms.
Failure: another caller got there first. Re-read and try again.

A worker that crashed mid-acquire would leave status = 'creating' forever. To prevent indefinite stuck state, any row sitting in creating for more than ~25 s is considered stale: the next acquirer resets it, bumps the version, and proceeds. This is the stale-lock recovery path: silent in normal operation, visible in logs as a one-line warning when it fires.

Pause / resume

Every SDK call to the underlying provider resets the 5-minute idle timer. If 5 minutes elapse with no touch, the provider freezes the sandbox: the row stays, paused flips to true, the container is no longer billing. The next acquisition wakes it back up; cold-resume on E2B is ~1 s. To keep the sandbox warm during long agent thinking (no SDK calls between LLM turns), the runner emits a keepAlive ping every 30 s. That's why the cluster constants line up the way they do:

SANDBOX_KEEPALIVE_INTERVAL_MS = 30 s
SANDBOX_IDLE_TIMEOUT_MS      = 5 min (12x the keepalive)
AGENT_RUN_TIMEOUT_MS         = 7 days (hard ceiling on a single turn)

Dead sandboxes and `withSandbox`

Sandboxes can die outside the platform's view: a provider deletes the underlying VM, the host process gets OOM-killed. The runner notices this by classifying the resulting error as sandbox_not_found (see runner error classification). Most server code uses a wrapper helper:

provider.withSandbox(userId, userEmail, async (sb) => {
  await sb.commands.run('pip install pandas');
  await sb.files.write('script.py', code);
});

If the inner function throws and the error matches "sandbox is gone", the wrapper evicts the in-memory cache, calls createOrResume for a fresh sandbox, and runs the function one more time. One retry; that's it. A second failure propagates out. Other transient errors (network, busy) propagate immediately; they're not the wrapper's job to handle.

Host-mode secret stripping

Host mode is the developer-only provider where the "sandbox" is a process on the Aitroop server itself. Without protection, any code the agent runs would inherit the server's environment, including database URLs, JWT signing keys, and S3 credentials. The host provider therefore strips a fixed set of variables from the child environment before spawn: DATABASE_URL, JWT_SECRET, E2B_API_KEY, DAYTONA_API_KEY, OAuth client secrets, S3 access/secret keys, and anything Composio-side. The agent gets a clean environment; the server keeps its secrets.

Sandbox usage and cost

Each sandbox second is metered. The billing table (buy_balance) tracks total deposited vs. total spent in USD; per-run breakdowns appear in Settings → Usage.

Sandbox time: only counts while the sandbox is actively running. Paused (idle > 5min) sandboxes don't cost anything.
LLM tokens: billed separately per stage, model-dependent. Sonnet-4.6 is the default.
Storage: Artifacts in S3/R2 are billed by retained GB-month. Free up to your plan's limit.

The sb_container table tracks each sandbox instance with total_usage_seconds, and sb_usage_log records every duration with a reason: agent run, file ops, etc.

FAQ & troubleshooting

My stage failed with "command timed out". What gives?

Possible causes:

The stage's timeout_ms is too small for the work (default 3 min). Bump it.
The command itself hung: a process waiting for stdin that never comes, a curl against a non-responding host, an npm install behind a corporate proxy.
The sandbox idle timer fired between commands. If a stage takes >5 min between agent decisions (rare), the sandbox pauses. The next command auto-resumes but adds latency.

Fix: open the failed run, click Debug in chat, and inspect the run log to find which command never returned. Most of the time the cure is bumping timeout_ms on that one stage, e.g. "timeout_ms": 600000 for a 10-minute budget.

Why doesn't my Stage 2 see the file Stage 1 wrote?

Likely cause: Stage 1 wrote to an absolute path outside projectDir (e.g. /tmp/foo.csv) and Stage 2 looked under projectDir. Or the App is split across two executions (each gets its own sandbox).

Fix: always write under projectDir (the agent's working directory). The path is exposed to the agent as $AITROOP_PROJECT_DIR at run start.

Can I SSH into the sandbox to poke around?

Not directly. The sandbox is closed by design. If you need to inspect what happened, use Debug in chat on the failed run: the chat opens with the same sandbox snapshot still mounted, and you can run any command interactively.

How do I install a system package (apt-get)?

On e2b and daytona, the agent has sudo for package installs. Tell it in the stage goal: "first run apt-get install -y ffmpeg, then…". On host, the agent has whatever permissions the host process has, usually no sudo, so apt-get won't work.

Why is my E2B sandbox slower today than yesterday?

Cold starts vary by region and time of day. Most of the time it's <1 second. If you see a 10-second cold start, the agent is waiting for a fresh VM provision. This is normal and resolves on its own. If it happens systematically, check the status banner in Settings → Workspace.

Can two stages run in parallel?

No. Stages run strictly in order: Stage n doesn't start until Stage n-1 has finished and its artifact is saved. If you need parallelism inside a stage, the agent can spawn subprocesses with commands.run, but that's its choice, not yours.

What happens to my files when the sandbox is destroyed?

They're gone. Anything you want to keep must be saved as an Artifact (which gets stored in S3/R2 under app_artifact.s3_key) before the run finishes. Artifacts survive sandbox destruction.

Is the sandbox a real VM or a container?

Depends on the provider. E2B uses Firecracker microVMs: full kernel isolation. Daytona uses dev containers. Host mode is just a process under your user account on the server. Functionally the agent code is the same in all three; security-wise, E2B is the strongest.

Next Executions: lifecycle and debugging

Sandbox

The three providers

What the agent can do inside

Commands

Files

Lifecycle hooks

Timeouts you should know

What persists, what doesn't

Within a single run

Across runs

Across chats

Networking and Connects

Choosing a provider

If you're on the hosted plan

If you're self-hosting

What you see in the UI

The run log

The file tree (for code-heavy runs)

Lifecycle: how a sandbox is acquired and held

Concurrent acquire with optimistic locking

Pause / resume

Dead sandboxes and withSandbox

Host-mode secret stripping

Sandbox usage and cost

FAQ & troubleshooting

Dead sandboxes and `withSandbox`