Code Mode — 1 tool, 57 capabilities

Code Mode

One tool. Runs sandboxed JavaScript. Calls every Ganglia tool through typed wrappers. Chains N tool calls server-side so intermediate results never re-enter your agent’s context window.

Why

A typical agent workflow — “find dead functions that have no tests” — is a 50-step sequence:

Call code_dead → 50 functions come back
Call code_test_for for each of the 50 → 50 more results in context
Filter manually → final answer

Each step forces the LLM to re-read the entire conversation. Schema tokens + accumulated results grow ~quadratically with the number of tool calls. On real workflows this burns 100,000+ tokens that never needed to be there.

How Code Mode fixes it

gl_run({ code: `
  const dead = gl.code.dead({});
  const coverage = dead.map(fn => gl.code.test_for({ name: fn.name }));
  return dead.filter((_, i) => coverage[i].tests.length === 0);
` })

One turn. Agent writes code once, gets final answer back.
Server runs the chain internally. 50 intermediate results never touch the context.
Only the filtered output returns. The 47 functions that had tests are silently discarded server-side.

Real benchmarks

Running scripts/benchmark_code_mode.py on a 462-file Rust project with 5 realistic workflows:

Metric	Individual tools	Code Mode	Reduction
Schema load (one-time)	6,376 tokens	575 tokens	91%
Wire tokens per workflow	4,014	2,316	1.7×
Effective LLM context	170,424	9,191	18.5×

The third row is what your agent actually pays for — tokens that enter the model’s context window across all turns. On longer workflows (10+ tool calls) the ratio exceeds 30×.

The sandbox

Code runs in QuickJS with hard limits:

256 MB memory — can’t allocate forever
30 s timeout — can’t loop forever
No filesystem, no network, no process — the only escape hatch is gl.* wrappers back into the host
No recursion — gl_run can’t call itself

Dimensions stay lean: the QuickJS runtime is embedded in the gl binary (~500 KB), no external dependencies, starts in milliseconds.

The API

All wrappers return the tool’s output synchronously. No await.

const gl = {
  code: { grep, get, dead, hotspots, impact, callers, callees,
          semantic_search, similar, /* … 45 more */ },
  doc: { index, list, toc, get, query },
  smart: { read, grep, diff, context },
  deliberation: { start, opinion, status, result },
  call: (name, args) => /* escape hatch for any registered MCP tool */,
};

The full typed API is available as an MCP resource — ganglia://types/api.d.ts. Claude Code fetches and caches it automatically if your client supports resources.

When to use `gl_run` vs. individual tools

Use gl_run when:

Your workflow would take 3+ sequential tool calls
You want to filter / transform / aggregate results before they hit the LLM
You’re writing a multi-step analysis (“find X and show me Y for each”)

Use individual tools when:

It’s a one-shot lookup (code_get, code_grep)
You need to see the intermediate result to decide what to do next
You’re exploring interactively

Typo recovery

Guessed the wrong method name? The Proxy wrapper helps:

> gl.code.rgep({pattern: "foo"})
Error: Unknown method gl.code.rgep — available: annotate, build, callees, callers,
  ..., grep, hotspots, impact, ..., search, ...

Or when calling via gl.call(name):

> gl.call("code_rgep", {...})
Error: Unknown tool: code_rgep (did you mean `code_grep`?)

Can I stack it with semantic search?

Yes. The whole point.

gl_run({ code: `
  const hits = gl.code.semantic_search({
    query: "payment retry with exponential backoff",
    top_k: 5
  });
  return hits.map(h => ({
    name: h.name,
    callers: gl.code.callers({ name: h.name }),
    impact: gl.code.impact({ name: h.name, depth: 2 })
  }));
` })

One call. Three layers of retrieval (semantic → graph → blast radius). Returns a structured digest. Your agent’s context doesn’t blow up.