Skip to content

Code Mode — 1 tool, 57 capabilities

One tool. Runs sandboxed JavaScript. Calls every Ganglia tool through typed wrappers. Chains N tool calls server-side so intermediate results never re-enter your agent’s context window.

A typical agent workflow — “find dead functions that have no tests” — is a 50-step sequence:

  1. Call code_dead → 50 functions come back
  2. Call code_test_for for each of the 50 → 50 more results in context
  3. Filter manually → final answer

Each step forces the LLM to re-read the entire conversation. Schema tokens + accumulated results grow ~quadratically with the number of tool calls. On real workflows this burns 100,000+ tokens that never needed to be there.

gl_run({ code: `
const dead = gl.code.dead({});
const coverage = dead.map(fn => gl.code.test_for({ name: fn.name }));
return dead.filter((_, i) => coverage[i].tests.length === 0);
` })
  • One turn. Agent writes code once, gets final answer back.
  • Server runs the chain internally. 50 intermediate results never touch the context.
  • Only the filtered output returns. The 47 functions that had tests are silently discarded server-side.

Running scripts/benchmark_code_mode.py on a 462-file Rust project with 5 realistic workflows:

MetricIndividual toolsCode ModeReduction
Schema load (one-time)6,376 tokens575 tokens91%
Wire tokens per workflow4,0142,3161.7×
Effective LLM context170,4249,19118.5×

The third row is what your agent actually pays for — tokens that enter the model’s context window across all turns. On longer workflows (10+ tool calls) the ratio exceeds 30×.

Code runs in QuickJS with hard limits:

  • 256 MB memory — can’t allocate forever
  • 30 s timeout — can’t loop forever
  • No filesystem, no network, no process — the only escape hatch is gl.* wrappers back into the host
  • No recursiongl_run can’t call itself

Dimensions stay lean: the QuickJS runtime is embedded in the gl binary (~500 KB), no external dependencies, starts in milliseconds.

All wrappers return the tool’s output synchronously. No await.

const gl = {
code: { grep, get, dead, hotspots, impact, callers, callees,
semantic_search, similar, /* … 45 more */ },
doc: { index, list, toc, get, query },
smart: { read, grep, diff, context },
deliberation: { start, opinion, status, result },
call: (name, args) => /* escape hatch for any registered MCP tool */,
};

The full typed API is available as an MCP resource — ganglia://types/api.d.ts. Claude Code fetches and caches it automatically if your client supports resources.

Use gl_run when:

  • Your workflow would take 3+ sequential tool calls
  • You want to filter / transform / aggregate results before they hit the LLM
  • You’re writing a multi-step analysis (“find X and show me Y for each”)

Use individual tools when:

  • It’s a one-shot lookup (code_get, code_grep)
  • You need to see the intermediate result to decide what to do next
  • You’re exploring interactively

Guessed the wrong method name? The Proxy wrapper helps:

> gl.code.rgep({pattern: "foo"})
Error: Unknown method gl.code.rgep — available: annotate, build, callees, callers,
..., grep, hotspots, impact, ..., search, ...

Or when calling via gl.call(name):

> gl.call("code_rgep", {...})
Error: Unknown tool: code_rgep (did you mean `code_grep`?)

Yes. The whole point.

gl_run({ code: `
const hits = gl.code.semantic_search({
query: "payment retry with exponential backoff",
top_k: 5
});
return hits.map(h => ({
name: h.name,
callers: gl.code.callers({ name: h.name }),
impact: gl.code.impact({ name: h.name, depth: 2 })
}));
` })

One call. Three layers of retrieval (semantic → graph → blast radius). Returns a structured digest. Your agent’s context doesn’t blow up.