Docs Getting started Core concepts

Core concepts

The technical anatomy of an Aitroop App. This page is the reference companion to the Quickstart: every field, every type, every option. About 15 minutes to read end-to-end. You don't have to memorize it — bookmark it and come back when a concept shows up.

The 10-second model. An App is a recipe (JSON). An Execution is a single time the recipe is cooked. A Stage is one step in the recipe. An Artifact is what comes out of a stage. A Skill is a tool the agent reaches for; a Connect is the OAuth bridge to your other accounts. Everything else is detail.

The App definition

An App is a JSON document persisted in app_def. You don't write the JSON yourself — the agent does — but knowing what it contains makes everything else clear. The top-level shape:

{
  "id": string, // auto-generated, "app_..."
  "name": string, // "Company Brief"
  "description": string, // one-sentence summary
  "icon": string, // emoji, e.g. "🏢"
  "tags": string[], // ["research", "b2b"]
  "status": "draft" | "published" | "archived",
  "version": number, // monotonic; bumped on each successful save
  "input_schema": AppInput[], // the form
  "stages": AppStage[], // what to do, in order
  "artifact_schema": AppArtifactDef[], // what to produce
  "resources": { skills: string[], connects: string[] },
  "examples": AppExample[] // optional pre-filled samples
}

The formula in shorthand:

App = AppInput[] + Stage[] + Artifact[] + resources

AppInput — the form fields

Each entry in input_schema is one form field your users fill in. The full type:

{
  "id": string, // snake_case, e.g. "company_name"
  "label": string, // shown above the field
  "type": AppInputType, // see table below
  "required": boolean, // default false; blocks Run when blank
  "placeholder": string, // optional hint shown inside the field
  "default": unknown, // optional pre-filled value
  "dynamic_default":string, // "{{today}}" etc — resolved at run time
  "description": string, // help text shown below
  "options": { label, value }[], // required for select/multiselect/radio
  "show_if": AppInputConditionalRule, // conditional visibility
  "min" | "max" | "step": number // for number/slider
}

The 11 input types

The server validates type against a fixed set; submitting an unknown type returns 400. The full list:

TypeUIUse it whenExtra fields
textSingle-line text boxShort identifiers — names, URLs, IDs.
textareaMulti-line text boxLong instructions, descriptions, content to process.
numberNumeric inputCounts, thresholds, prices.min, max, step
sliderNumeric sliderBounded ranges where a feel matters more than precision.min, max, step
selectSingle-choice dropdown3–10 fixed options, only one allowed.options (required)
multiselectMulti-choice dropdown3–10 fixed options, multiple allowed.options (required)
radioRadio button group2–4 options where showing all of them at once helps.options (required)
booleanToggle / checkboxYes/no flags, optional features.
dateDate picker"As of date", deadlines, anchor points.
daterangeTwo date pickersWindowed reports — "from X to Y".
fileFile upload buttonUser uploads a fresh document each run (CSV, PDF, image).

Dynamic defaults

dynamic_default is a fixed enum, not a template language. Picking one of these values fills the field at run time without a static default. Useful for scheduled runs that need a moving anchor.

ValueResolves to
todayToday's date in the workspace time zone.
yesterdayThe day before today.
last7daysThe 7-day window ending today (used with daterange).
last30daysThe 30-day window ending today.
thisMonth1st of the current month through today.
lastMonth1st through last day of the previous month.

Example: a 3-input form

[
  { "id": "company_name", "label": "Company name",
    "type": "text", "required": true,
    "placeholder": "Acme Corp" },

  { "id": "tone", "label": "Tone",
    "type": "select", "default": "executive",
    "options": [
      { "label": "Executive", "value": "executive" },
      { "label": "Engineering", "value": "eng" },
      { "label": "Sales", "value": "sales" }
    ] },

  { "id": "include_competitors", "label": "Include competitors?",
    "type": "boolean", "default": true }
]
Designing good forms. Default everything you can. Mark the absolute minimum as required. Use dynamic_default for time-relative inputs ({{today}}, {{last_week_start}}) so scheduled runs don't need any input at all.

AppStage — what gets done

Each entry in stages is one step in the workflow. Stages run in order; each can read the artifacts produced by earlier stages.

{
  "id": string, // snake_case, e.g. "research"
  "name": string, // display name
  "stage_type": "agent" | "script" | "human", // default "agent"
  "goal": string, // the instruction with {{refs}}
  "system_prompt": string, // optional extra context
  "model": string, // default "claude-sonnet-4-6"
  "timeout_ms": number, // default 180000 (3 min)
  "skills": string[], // stage-local skill list (overrides App-level)
  "connects": string[], // stage-local connect list
  "artifact_defs": AppArtifactDef[],// what this stage produces

  // for stage_type == "script":
  "script_lang": "python" | "node",
  "script_code": string,

  // for stage_type == "human":
  "human_message": string // shown to the approver
}

Comparing the three stage types

agent defaultscripthuman
Runs whatClaude with full reasoning + tool usePython or Node code, no LLMWaits for a person
Use forResearch, drafting, summarization, anything ambiguousExact transforms — filter CSV, dedupe, format datesApproval gates, "pick the best N"
Latency20 s – 10 min< 1 s typicalHowever long the human takes
Cost driverLLM tokens + sandbox secondsSandbox seconds onlyZero compute cost
Deterministic?No (LLM stochasticity)Yes — same input, same outputNo (human variability)

Writing the goal field

The goal is the agent's instruction. Before the stage runs, the executor walks the goal with the regex /{{([^}]+)}}/g and replaces each {{key}} with execution.input[key]. Unmatched references are left literal — they aren't an error, but they also won't be substituted, so spell the input ids carefully.

ReferenceResolves to
{{company_name}}The value the user typed into the company_name input.
{{include_competitors}}The stringified boolean ("true" / "false").
{{date_range}}A daterange input renders as its JSON form when stringified.
What the goal template does not support. There is no {{today}}, no {{user.email}}, no {{#if x}}...{{/if}}, and — critically — no {{ stages.X.Y }}. The expander only looks at form values. For time-relative anchors, use dynamic_default on an input field instead. For prior-stage outputs, see Multi-stage Apps below — the executor injects them into the system prompt automatically.

A real example:

Research {{company_name}} ({{company_url}}) and write a comprehensive
competitive analysis covering:

1. Business model and revenue streams
2. Key products/services
3. Target customers and ICP
4. Recent news and strategic moves
5. Strengths, weaknesses, opportunities, threats

If {{include_competitors}} is true, also include
the top 3–5 direct competitors with a one-paragraph profile each.

Produce a Markdown report titled "{{company_name}} Competitive
Analysis" under a ## Competitive Analysis heading.

A good goal is:

  • Specific — names the inputs to use and what to do with them.
  • Output-aware — names the artifact ID and format.
  • Actionable — imperative verbs ("Research", "Write", "Analyze").
  • Structured — a 10-line goal with numbered sections almost always beats a 2-line one.

AppArtifactDef — what each stage produces

{
  "id": string, // snake_case, e.g. "final_report"
  "title": string, // display name
  "format": ArtifactFormat, // see table below
  "language": string, // for format="code", e.g. "python"
  "description": string
}

Seven supported formats:

formatUsed forPreview
markdownReports, briefings, draftsRendered with GFM, tables, code fences, Mermaid.
codeSource code, single file or a treeSyntax-highlighted, with a file tree for multi-file output.
htmlRendered pages, dashboards, slidesSandboxed iframe; scripts allowed but isolated.
jsonStructured data for downstream systemsCollapsible tree with type info.
csvTabular data — leads, transactionsSortable, filterable spreadsheet view.
imageGenerated images, charts, diagramsZoom & pan. PNG / JPEG / SVG / WebP.
fileAnything else — PDF, ZIP, audio, videoType-aware: PDF inline, audio player, ZIP listing.

All artifacts are stored in S3 (or R2 if you self-host) under app_artifact.s3_key, with MIME type and byte size recorded. Stored for the lifetime of the App; deleted App archives are recoverable for 30 days.

Resources — Skills and Connects the App needs

"resources": {
  "skills": ["web-search", "pdf-extract"],
  "connects": [
    { "type": "specific", "id": "hubspot", "is_required": true },
    { "type": "category", "id": "email", "is_required": true,
      "label": "Any inbox we can send from" }
  ]
}

Skills are capability modules the agent can call (web search, PDF extraction, image generation) — see Skills. Connects are the OAuth bridges the App needs. Aitroop tracks Connects in a two-level taxonomy:

  • type: "specific" — pin a single provider by its connect_definition.id (e.g. github, hubspot, gmail). Use this when only one integration will do.
  • type: "category" — accept any provider in a connect_category (e.g. email, storage, crm). The App is satisfied if the user has authorized at least one provider in that category. Use this when the App is provider-agnostic ("any email", "any CRM").

The legacy form "connects": ["hubspot", "github"] still works — each string is treated as { type: "specific", id: «string», is_required: true } when the App runs. The App Builder writes the rich form for new apps.

Resources can be declared App-wide (as above) or per-stage (under stage.skills / stage.connects). Per-stage scoping is the way to make a multi-stage App safer — give stage 1 only web-search, stage 2 only the CRM Connect, and so on. See Connects for the full provider catalog and the satisfaction model.

Executions — when an App actually runs

An Execution is one trip through an App's stages. It's the thing with a status, a stage log, a duration, and a final set of artifacts.

{
  "id": "exec_...",
  "app_id": "app_...",
  "status": "pending" | "running" | "completed" | "failed" | "cancelled",
  "input": { ...form values },
  "is_test": boolean,
  "duration_ms": number,
  "start_from_stage_index": number, // for resumes
  "prior_execution_id": string | null // when resuming
}

Every stage in the execution gets its own app_stage_log row with status, duration, the session ID (so you can open it as a chat), the expanded goal, and any error.

The status enums are not the same at the two levels:

FieldPossible values
app_execution.statuspending · running · completed · failed · cancelled
app_stage_log.statuspending · running · waiting · completed · failed · skipped
app_human_gate.statuspending · approved · rejected

Stage logs go to waiting while a human stage holds the run; the parent execution stays in running. They also go to skipped when an execution is resumed with start_from_stage_index > 0 — the earlier stages aren't re-run. See Executions for the full lifecycle, debug flows, and SSE streaming.

Sandbox — where the agent does its work

Every chat and every App stage runs inside a sandbox: an isolated runtime with its own filesystem, its own processes, and outbound network. The agent can pip install, run shell commands, write files, fetch URLs — and when the run ends, the sandbox is destroyed.

Three backend providers under one ISandbox interface, plus a router that picks between them:

  • host — runs directly on the Aitroop server process. Self-hosted dev installs only; secrets like DATABASE_URL, JWT_SECRET, E2B_API_KEY, and S3 credentials are stripped from the child environment.
  • e2b — Firecracker microVMs from E2B. Default for hosted Aitroop.
  • daytona — managed dev containers from Daytona. For heavy / longer-running stages.
  • router — selector that dispatches to one of the three above per user (used to route privileged users to host).

Every sandbox exposes the same path layout, regardless of provider, rooted at the sandbox user's home directory:

~/.aitroop/ ← Aitroop working dir
~/.aitroop/run-agent-config.json ← per-run config
~/.aitroop/run-agent ← binary (uploaded in remote mode)
~/.aitroop/skills/{name}/{version}/ ← immutable skill cache (S3-mirrored)
~/.claude/skills/{name} ← symlink → cache (always-on core skills)
~/.aitroop/app_skills/{appId}/.claude/skills/ ← --add-dir target (per-app skills)
{projectDir} ← where the agent reads/writes files

All stages in a single execution share the same sandbox, so stage 2 can read what stage 1 wrote. Different executions get different sandboxes — nothing leaks across runs. The constants you'll hit in practice: SANDBOX_IDLE_TIMEOUT_MS = 5 min, SANDBOX_KEEPALIVE_INTERVAL_MS = 30 s, and an upper-bound AGENT_RUN_TIMEOUT_MS = 7 days. See Sandbox for the full operational model.

Multi-stage Apps

When a workflow naturally has steps that feed each other, define multiple stages. Stages run sequentially. Each stage sees the prior stages' artifacts in two places — neither of which uses goal templating:

  1. The shared sandbox filesystem. All stages in one execution run inside the same sandbox (Sandbox). Files Stage 1 writes are still there for Stage 2 to read.
  2. An auto-injected ## Previous Stage Outputs section in the system prompt. Before stage N runs, the executor loads every artifact produced by stages 0..N-1, renders each as ### {{title}}\n{{content}}, and prepends the block to the system prompt. The agent reads them as plain context — no manual reference needed in the goal.
App: Cold Outreach Pipeline

stages: [
  { id: "find", stage_type: "agent",
    goal: "Find companies matching {{icp}}...",
    artifact_defs: [{ id: "leads", title: "Leads", format: "csv" }] },

  { id: "enrich", stage_type: "agent",
    goal: "Read the Leads CSV from the previous stage and add
       a verified email and recent news per row...",
    artifact_defs: [{ id: "enriched", title: "Enriched Leads", format: "csv" }] },

  { id: "review", stage_type: "human",
    human_message: "Review the enriched leads. Remove rows
       you don't want to contact." },

  { id: "draft", stage_type: "agent",
    goal: "Draft personalized outreach for the approved leads...",
    artifact_defs: [{ id: "drafts", title: "Drafts", format: "markdown" }] }
]

Notice the human stage between two agent stages — that's the human-in-the-loop pattern. Execution pauses there, the platform creates a row in app_human_gate with status pending, and the stage log transitions to waiting. The next stage only runs after the gate is set to approved (or stays paused on rejected).

Sessions and the message queue

Every conversation with the agent — whether it's a free-form chat, an App stage, or a tick of a scheduled task — happens inside a session. A session is a stable container with its own Claude conversation ID, its own ordered message log, and its own status-tracked queue. Three flavours are persisted in app_session.session_type:

session_typeCreated byLifetimeWhere it shows up
chatThe user — "+ New Chat".Permanent until archived/deleted.Sidebar, search, history.
app_stageThe App executor when an agent stage begins.Tied to the parent execution. Stored forever as audit, browsable via Debug in chat.The execution detail page — each stage log carries a session_id.
scheduledThe first tick of a send_message scheduled task.Reused for every subsequent tick of the same task — continuity by design.A single "[Scheduled] {name}" session linked from the task row.

Inside each session, every user turn is a row in app_message. Rows aren't "fire-and-forget" — they go through a small state machine because the agent worker pool can be multi-instance and a single session must serialize its dispatches:

pending  →  dispatching  →  sent

                    dispatching  →  failed (after 5+ attempts)

The columns that matter on app_message:

  • position — integer per session, monotonically increasing. Determines order within equal-priority rows. The DAO assigns this atomically when the row is inserted.
  • priority — a 64-bit sort key. Higher priority dispatches first; default is wall-clock time so two messages without explicit priority run in arrival order.
  • attempts — increments on each dispatch try. After 5 failures the row goes to failed and the user-visible message shows the error.
  • dispatched_at — set when a worker claims the row; combined with a 15-minute staleness threshold drives the recovery poller.
  • mode — free-form tag for what kind of message this is (used to route formatting in the UI).
  • last_error — last failure reason, kept across retries.

Workers claim work with SELECT … FOR UPDATE SKIP LOCKED ordered by priority DESC, position ASC, so two instances will never pick the same row. A partial unique index — uq_app_message_inflight over session_id WHERE status = 'dispatching' AND role = 'user' — enforces "at most one in-flight user message per session" at the database level. If a row stays in dispatching for more than 900 s (worker crash, node lost), a recovery poller resets it to pending so another worker can pick it up.

How dispatch actually moves

The mechanism is two layers: an in-process QueueController and a Postgres NOTIFY channel called queue_dispatch. When a new message is queued, the writing instance broadcasts a queue_updated event locally and emits pg_notify('queue_dispatch', '{sessionId}|{originInstanceId}'). Other instances listening on the same channel ignore their own echoes and pick up the work — so dispatch is event-driven across the cluster, with a 60 s poll as a safety net for missed notifications.

The shared-session flag

Normally each App stage gets its own session. When RunAppRequest.shared_session is true, the executor creates one session for the whole execution and reuses its session_id across every agent stage. The visible difference: Debug in chat on the execution opens a single chat that contains every stage's messages in order, instead of one chat per stage. Useful when you want to read the run as a narrative; rarely needed otherwise.

The ask_user pattern

Agents occasionally need to ask the user something mid-run (an ambiguity, a missing piece of input). Emitting an ask_user tool call pauses the agent's turn, surfaces the question in the chat UI, and waits for a reply. The platform routes the reply back as the tool result on the next tick — so the agent's "I asked, here's the answer" is a normal conversational step from the model's point of view. Distinct from human stages (which gate the whole pipeline at the execution level); ask_user is purely intra-session.

The agent runner protocol

Every agent stage and every chat turn ultimately runs through a Go binary called run-agent. The server hands the binary a JSON config and reads its stdout line-by-line. Each line is one event — a stable JSONL protocol you'll see referenced in logs, in SSE payloads, and in the chat UI's reasoning trace.

Forwarded events

These are emitted to the UI as they arrive. Field names are camelCase on the wire to JS; the runner's snake_case is auto-converted (tool_idtoolId, is_errorisError, etc.).

EventMeaning
text_deltaThe agent is writing visible text. Streamed token by token.
thinking_deltaHidden reasoning content. Shown in the trace pane only.
tool_startAgent is calling a tool. Carries toolId, toolName, arguments.
tool_resultTool returned. Carries toolId, output, and isError.
subtask_start / subtask_progress / subtask_doneThe agent spawned a nested task (Task tool / sub-agent).
ask_userAgent needs a question answered before continuing — see above.
context_usagePeriodic token usage telemetry (in/out/cache hits).
session_infoCarries the underlying Claude conversation ID for this turn; used for resume.
systemA platform-level message (e.g. permission prompt, retry).
errorAgent-side error; turn fails.
doneTurn completed cleanly.

Internal events (never forwarded)

Three events with __runner_*__ names are consumed by the server only — they're bookkeeping, not user-visible state:

EventPurpose
__runner_heartbeat__Keep-alive ping. Tells the supervisor the subprocess is alive even when no tokens are flowing.
__runner_done__The subprocess exited cleanly. Lets the supervisor distinguish clean termination from crash.
__runner_error__Subprocess crashed; carries the captured stderr. Triggers session-expired / sandbox-not-found classification (see below).

Error classification

Not every runner failure is the same. The server classifies the captured error message into one of five buckets — the bucket determines the recovery strategy:

ClassTriggerRecovery
session_expiredThe underlying Claude conversation was retired or rotated.Start a fresh Claude conversation; keep the Aitroop session row.
sandbox_not_foundThe sandbox container died or was reaped.Evict the cached sandbox and create a fresh one; retry the turn once.
networkTransport errors — ECONNRESET, ETIMEDOUT.Retry after backoff.
connectiongRPC / MCP protocol-level error.Surface the error; user retries explicitly.
unknownAnything else.Mark the turn failed with the verbatim error.

What's in the run config

The server builds a RunAgentConfig per turn and writes it to ~/.aitroop/run-agent-config.json inside the sandbox before invoking the binary. The salient fields:

  • prompt — the actual user message (or the stage's expanded goal).
  • options.model — Claude model ID; defaults to claude-sonnet-4-6 if the stage didn't pin one.
  • options.resume — when present, resume an existing Claude conversation. The server pulls this from the session's agent_session_id.
  • options.allowedTools — the curated tool set the runner exposes to the model (typically 8 names: Read, Edit, Bash, Glob, Grep, Write, Task, plus the Aitroop MCP set).
  • options.permissionMode — fixed to 'bypassPermissions'; the platform enforces permissions at the sandbox boundary, not interactively.
  • options.mcpServers — the Aitroop in-house MCP server plus any extras the App / stage requested.
  • options.env — anything in process.env prefixed with SANDBOX_ENV_ is forwarded with the prefix stripped, plus AT_USER_TOKEN (the user's JWT) is injected so authenticated server calls work from inside the sandbox.
  • options.oauthToken / proxy mode — either an Anthropic OAuth token or, when a proxy is configured, an ANTHROPIC_BASE_URL pointing at the proxy with the user's JWT used as bearer. The platform snapshots this once per call so all retries within a turn use the same credentials.

How Apps actually get created

You don't write the App JSON by hand. The agent does it. The flow:

  1. You tell the agent in chat: "Create an app that…"
  2. The agent activates the aitroop-app-create Skill (the App Builder).
  3. The App Builder asks 1–5 clarifying questions if needed.
  4. It designs the full AppDef (inputs, stages, artifacts, resources).
  5. It validates the design — every ID unique, every {{ref}} resolves, goals are specific.
  6. It calls POST /api/apps to save.
  7. It reports the App ID and offers to refine.

To edit an existing App, you say something like "Update my Company Brief app to also include hiring data" and the App Builder fetches the current version, modifies it, and calls PUT /api/apps/{id}.

Common mistakes to avoid

  • Vague goals — "Do research" is too vague. Specify sources, depth, output format.
  • Missing artifacts — an App with no artifact_defs produces no visible output.
  • Too many required inputs — users abandon Apps with long forms. Default everything you can.
  • Undersized timeout — heavy research stages need 5–10 min. Bump timeout_ms to 600000.
  • Wrong input type — use textarea for long text, text for short identifiers, select when there are 3–6 fixed options.
  • Over-decomposing — splitting a one-thought task into 4 stages. Use multi-stage only when the work naturally has separable steps.

Quick glossary

TermMeaning
AppA saved workflow JSON document with form, stages, and artifacts.
AppInputOne form field. 11 supported types — see the input-types table.
StageOne step in the workflow. Types: agent, script, human.
GoalThe agent's instruction string inside a stage. References inputs with {{id}}.
ArtifactThe deliverable a stage produces. Typed (markdown, csv, etc).
SkillA capability the agent can call (web-search, pdf-extract, etc).
ConnectOAuth integration to an external service (google, github, hubspot).
ResourcesThe Skills and Connects an App declares it needs.
ExecutionOne run of an App. Has a status, stage logs, and artifacts.
Stage logThe per-stage record of an execution — status, duration, error, session.
Human gateA paused execution waiting for human approve / reject / edit.
SandboxIsolated runtime each chat or App run executes in. Three providers: host / e2b / daytona.
SessionThe conversation container holding a sequence of messages. Types: chat / app_stage / scheduled.
MessageOne row in app_message. Lives in the dispatch queue with a state machine and a priority.
RunnerThe run-agent binary that actually drives a Claude turn. Emits JSONL events back to the server.
WorkspaceAccount container — holds Apps, Connects, run history.
OrgTeam layer on top of users. Shared Apps, shared billing, member roles.