Skip to content

Semantic search (RAG)

Regex search finds text. Structural search follows the graph. Neither finds code by meaning — a query like “payment retry with exponential backoff” misses functions named attempts_remaining or stripe_backoff even though they do exactly that.

Semantic search fixes this by embedding every symbol with a small language model and indexing the vectors. At query time it embeds the natural-language query and returns the N nearest symbols by cosine similarity.

code_embed_index # build or refresh vectors (idempotent)
code_semantic_search(query) # natural language → top-K symbols
code_similar(name) # find symbols like this one

All three are wired into Code Mode — gl.code.semantic_search({query, top_k}) works inside gl_run.

Ollama is the default provider. If it’s running, nothing else is needed:

Terminal window
# One-time: install Ollama + pull the embedding model
ollama pull nomic-embed-text
# Watch your project — embeddings auto-build in the background
gl watch .

You’ll see in the log:

Building semantic search index in background...
Semantic index ready: 47 embedded, 0 unchanged (model: nomic-embed-text)
ProviderModelDimEnv vars
ollamanomic-embed-text (default)768OLLAMA_URL=http://localhost:11434
GL_OLLAMA_EMBED_MODEL=nomic-embed-text
openaitext-embedding-3-small1536GL_EMBED_PROVIDER=openai
OPENAI_API_KEY=sk-...
fastembedBGE-small-en-v1.5 (ONNX)384Feature-gated: cargo build --features local-embeddings

Switch at runtime by setting GL_EMBED_PROVIDER. Vectors are tagged with their model — switching triggers a background re-embed.

We don’t embed raw code bodies. Bodies are noisy — full of boilerplate, error handling, imports. Instead we compose a short, dense “fingerprint” per symbol that captures intent:

function authenticate_user(email, password) -> Result<User, AuthError>
docstring: Validates credentials and returns the user if valid.
calls: hash_password, lookup_user, issue_session
callers: login_handler, reset_password
file: src/auth/service.rs

The fingerprint is blake3-hashed so code_embed_index can skip symbols whose fingerprint hasn’t changed. Running on every gl watch cycle is effectively free after the first pass.

// Natural language
gl.code.semantic_search({ query: "validate user credentials", top_k: 5 })
// → authenticate_user (0.654 cosine), hash_password (0.514), ...
// File filter
gl.code.semantic_search({
query: "route handler",
file_filter: "src/api/"
})
// Similarity from an anchor
gl.code.similar({ name: "retry_with_backoff" })
// → exponential_delay, circuit_breaker_reset, ...

Semantic search alone returns symbols by meaning. Stack it with graph tools for “meaning + structure”:

gl_run({ code: `
// Find payment-related code
const hits = gl.code.semantic_search({
query: "payment processing and retries",
top_k: 10
});
// Get blast radius for each one
return hits.map(h => ({
name: h.name,
file: h.file,
impact: gl.code.impact({ name: h.name, depth: 2 })
}));
` })

Vectors are stored in the project’s graph backend:

  • Cozo (default): embedding relation with raw little-endian f32 bytes. Brute-force cosine in Rust (~50 ms for 10k symbols).
  • FalkorDB: :Embedding nodes with JSON-serialized vectors. Same brute-force approach.

HNSW indexing is on the roadmap for v2 — it’ll matter once we have 100k+ symbols per project.

If you don’t want gl watch to build embeddings automatically:

Terminal window
export GL_EMBED_AUTO=false

You can then build them manually when needed:

code_embed_index({})
  • “Embedding provider unavailable” — Ollama isn’t running. ollama serve in another terminal, then ollama pull nomic-embed-text.
  • “No embeddings yet for model X” — run code_embed_index({}) once. Or enable auto-index.
  • Queries return nothing relevant — try code_embed_index({force: true}) to rebuild from scratch, then re-query. The fingerprint compose logic improves over releases.