code_embed_index

Populates the vector index that code_semantic_search and code_similar read from.

Usage

code_embed_index({})                    # incremental
code_embed_index({ force: true })       # wipe + re-embed everything
code_embed_index({ batch_size: 64 })    # provider-dependent batching

Arguments

Name	Type	Default	Description
`batch_size`	integer	32	Texts per provider call
`force`	boolean	false	Re-embed every symbol even if fingerprint hash is unchanged

Output

Embedded 47 symbols with `nomic-embed-text`; 12 unchanged, 0 failed

Or if nothing changed since last run:

Already up to date — 59 symbols indexed (59 unchanged)

How the cache works

Each symbol’s “fingerprint” (kind + signature + docstring + calls + callers + file) is blake3-hashed. If the hash matches what’s stored, we skip re-embedding. Running repeatedly is effectively free after the first pass.

Auto mode

By default, gl watch runs this automatically in the background after the initial graph build completes. Look for this in your log:

Building semantic search index in background...
Semantic index ready: 47 embedded, 0 unchanged (model: nomic-embed-text)

Disable with GL_EMBED_AUTO=false.

Providers

Set GL_EMBED_PROVIDER to pick:

ollama (default) — OLLAMA_URL, GL_OLLAMA_EMBED_MODEL
openai — OPENAI_API_KEY, GL_OPENAI_EMBED_MODEL
fastembed — feature-gated, compile with cargo build --features local-embeddings

Switching providers automatically wipes old vectors (different dim → can’t mix).