Skip to content

code_embed_index

Populates the vector index that code_semantic_search and code_similar read from.

code_embed_index({}) # incremental
code_embed_index({ force: true }) # wipe + re-embed everything
code_embed_index({ batch_size: 64 }) # provider-dependent batching
NameTypeDefaultDescription
batch_sizeinteger32Texts per provider call
forcebooleanfalseRe-embed every symbol even if fingerprint hash is unchanged
Embedded 47 symbols with `nomic-embed-text`; 12 unchanged, 0 failed

Or if nothing changed since last run:

Already up to date — 59 symbols indexed (59 unchanged)

Each symbol’s “fingerprint” (kind + signature + docstring + calls + callers + file) is blake3-hashed. If the hash matches what’s stored, we skip re-embedding. Running repeatedly is effectively free after the first pass.

By default, gl watch runs this automatically in the background after the initial graph build completes. Look for this in your log:

Building semantic search index in background...
Semantic index ready: 47 embedded, 0 unchanged (model: nomic-embed-text)

Disable with GL_EMBED_AUTO=false.

Set GL_EMBED_PROVIDER to pick:

  • ollama (default) — OLLAMA_URL, GL_OLLAMA_EMBED_MODEL
  • openaiOPENAI_API_KEY, GL_OPENAI_EMBED_MODEL
  • fastembed — feature-gated, compile with cargo build --features local-embeddings

Switching providers automatically wipes old vectors (different dim → can’t mix).