obsidian:~/vault$ search --hybrid obsidian-ai-infrastructure

Obsidian as AI Infrastructure: The Definitive Technical Reference

# The complete system for turning an Obsidian vault into a queryable AI knowledge base. Vault design, hybrid retrieval, MCP integration, hooks, workflows, and operational patterns.

words: 13702 read_time: 69m updated: 2026-03-01 00:00
$ retriever search --hybrid obsidian-ai-infrastructure

Obsidian is not a note-taking app. It is a local-first, plaintext, graph-structured markdown corpus that becomes an AI context reservoir when you add retrieval infrastructure. 16,894 files. 49,746 chunks. 23ms queries. Zero API calls. One 83 MB SQLite file. This guide covers the complete system: from vault architecture to hybrid retrieval to MCP integration to operational workflows.


Key Takeaways

Context engineering, not note-taking. The value of an Obsidian vault for AI is not the notes themselves but the retrieval layer that makes them queryable. A 16,000-file vault without retrieval is a write-only database. A 200-file vault with hybrid search and MCP integration is an AI knowledge base. The retrieval infrastructure is the product. The notes are the raw material.

Hybrid retrieval beats pure keyword or pure semantic search. BM25 catches exact identifiers and function names. Vector search catches synonyms and conceptual matches across different terminology. Reciprocal Rank Fusion (RRF) merges both without requiring score calibration. Neither method alone covers both failure modes. Research on MS MARCO passage ranking confirms the pattern: hybrid retrieval consistently outperforms either method in isolation.1 The hybrid retriever deep dive covers the RRF math, worked examples with real numbers, failure mode analysis, and an interactive fusion calculator.

MCP gives AI tools direct vault access. Model Context Protocol (MCP) servers expose the retriever as a tool that Claude Code, Codex CLI, Cursor, and other AI tools can call directly. The agent queries the vault, receives ranked results with source attribution, and uses the context without loading entire files. The MCP server is a thin wrapper around the retrieval engine.

Local-first means zero API costs and full privacy. The entire stack runs on a single machine: SQLite for storage, Model2Vec for embeddings, FTS5 for keyword search, sqlite-vec for vector KNN. No cloud services, no API calls, no network dependency. Personal notes never leave the machine. The full re-embed of 49,746 chunks would cost roughly $0.30 at OpenAI API prices, but the real costs are latency, privacy exposure, and the network dependency for a system that should work offline.2

Incremental indexing keeps the system current in under 10 seconds. File modification time comparison detects changes. Only modified files are re-chunked and re-embedded. A full reindex takes about four minutes on Apple M-series hardware. Incremental updates on a typical day’s edits run in under ten seconds. The system stays current without manual intervention.

The architecture scales from 200 to 20,000+ notes. The same three-layer design (intake, retrieval, integration) works at any vault size. Start with BM25-only search over a small vault. Add vector search when keyword collisions become a problem. Add RRF fusion when you need both exact and semantic matches. Each layer is independently useful and independently removable.


How to Use This Guide

This guide covers the complete system. Your starting point depends on where you are:

You are… Start here Then explore
New to Obsidian + AI Why Obsidian for AI Infrastructure, Quick Start Vault Architecture, MCP Server Architecture
Existing vault, want AI access MCP Server Architecture, Claude Code Integration Embedding Models, Full-Text Search
Building a retrieval system The Complete Retrieval Pipeline, Reciprocal Rank Fusion Performance Tuning, Troubleshooting
Team or enterprise context Decision Framework, Knowledge Graph Patterns Developer Workflow Recipes, Migration Guide

Sections marked Contract include implementation details, configuration blocks, and failure modes. Sections marked Narrative focus on concepts, architecture decisions, and the reasoning behind design choices. Sections marked Recipe provide step-by-step workflows.


Why Obsidian for AI Infrastructure

The thesis of this guide: Obsidian vaults are the best substrate for personal AI knowledge bases because they are local-first, plaintext, graph-structured, and the user controls every layer of the stack.

What Obsidian gives AI that alternatives do not

Plaintext markdown files. Every note is a .md file on your filesystem. No proprietary format, no database export, no API required to read the content. Any tool that reads files can read your vault. grep, ripgrep, Python’s pathlib, SQLite FTS5 — they all work directly on the source files. When you build a retrieval system, you are indexing files, not API responses. The index is always consistent with the source because the source is the file system.

Local-first architecture. The vault lives on your machine. No server, no cloud sync dependency, no API rate limits, no terms of service governing how you process your own content. You can embed, index, chunk, and search your notes without any external service. This matters for AI infrastructure because the retrieval pipeline runs as fast as your disk allows, not as fast as an API endpoint responds. It also matters for privacy: personal notes containing credentials, health data, financial information, and private reflections never leave your machine.

Graph structure through wiki-links. Obsidian’s [[wiki-link]] syntax creates a directed graph across notes. A note about OAuth implementation links to notes about token rotation, session management, and API security. The graph structure encodes human-curated relationships between concepts. Vector embeddings capture semantic similarity, but wiki-links capture intentional connections that the author made while thinking about the topic. The graph is a signal that embeddings cannot replicate.

Plugin ecosystem. Obsidian has 1,800+ community plugins. Dataview queries your vault like a database. Templater generates notes from templates with JavaScript logic. Git integration syncs your vault to a repository. Linter enforces formatting consistency. These plugins add structure to the vault without changing the underlying plaintext format. The retrieval system indexes the output of these plugins, not the plugins themselves.

5 million+ users. Obsidian has a large active community producing templates, workflows, plugins, and documentation. When you encounter a problem with vault organization or plugin configuration, someone has likely documented a solution. The community also produces Obsidian-adjacent tools: MCP servers, indexing scripts, publishing pipelines, and API wrappers.

What a filesystem alone does not give you

A directory of markdown files has the plaintext advantage but lacks three things that Obsidian adds:

  1. Bidirectional links. Obsidian tracks backlinks automatically. When you link from Note A to Note B, Note B shows that Note A references it. The graph panel visualizes connection clusters. This bidirectional awareness is metadata that a raw filesystem does not provide.

  2. Live preview with plugin rendering. Dataview queries, Mermaid diagrams, and callout blocks render in real-time. The writing experience is richer than a text editor while the storage format remains plaintext. You write and organize in a rich environment; the retrieval system indexes the raw markdown.

  3. Community infrastructure. Plugin discovery, theme marketplace, sync service (optional), publish service (optional), and a documentation ecosystem. You can replicate any individual feature with standalone tools, but Obsidian packages them into a coherent workflow.

What Obsidian does NOT do (and what you build)

Obsidian does not include retrieval infrastructure. It has basic search (full-text, filename, tag) but no embedding pipeline, no vector search, no fusion ranking, no MCP server, no credential filtering, no chunking strategy, and no integration hooks for external AI tools. This guide covers the infrastructure you build on top of Obsidian. The vault is the substrate. The retrieval pipeline, the MCP server, and the integration hooks are the infrastructure.

The architecture described here is markdown-first, not Obsidian-exclusive. If you use Logseq, Foam, Dendron, or a plain directory of markdown files, the retrieval pipeline works identically. The chunker reads .md files. The embedder processes text strings. The indexer writes to SQLite. None of these components depend on Obsidian-specific features. Obsidian’s contribution is the writing and organizational environment that produces the markdown files the retriever indexes.


Quick Start: First AI-Connected Vault

This section gets a vault connected to an AI tool in five minutes. You will install Obsidian, create a vault, install an MCP server, and run your first query. The quick start uses a community MCP server for immediate results. Later sections cover building a custom retrieval pipeline for production use.

Prerequisites

  • macOS, Linux, or Windows
  • Node.js 18+ (for MCP server)
  • Claude Code, Codex CLI, or Cursor installed

Step 1: Create a vault

Download Obsidian from obsidian.md and create a new vault. Choose a location you will remember — the MCP server needs the absolute path.

# Example vault location
~/Documents/knowledge-base/

Add a few notes to give the retriever something to work with. Even 10-20 notes are enough to see results. Each note should be a .md file with a meaningful title and at least one paragraph of content.

Step 2: Install an MCP server

The obsidian-mcp community server provides immediate vault access. Install it:

npm install -g obsidian-mcp-server

Step 3: Configure your AI tool

Claude Code — add to ~/.claude/settings.json:

{
  "mcpServers": {
    "obsidian": {
      "command": "obsidian-mcp-server",
      "args": ["--vault", "/absolute/path/to/your/vault"]
    }
  }
}

Codex CLI — add to .codex/config.toml:

[mcp_servers.obsidian]
command = "obsidian-mcp-server"
args = ["--vault", "/absolute/path/to/your/vault"]

Cursor — add to .cursor/mcp.json:

{
  "mcpServers": {
    "obsidian": {
      "command": "obsidian-mcp-server",
      "args": ["--vault", "/absolute/path/to/your/vault"]
    }
  }
}

Step 4: Run your first query

Open your AI tool and ask a question that your vault notes can answer:

Search my Obsidian vault for notes about [topic you wrote about]

The AI tool calls the MCP server, which searches your vault and returns matching content. You should see results with file paths and relevant excerpts.

What you just built

You connected a local knowledge base to an AI tool through a standard protocol. The MCP server reads your vault files, performs basic search, and returns results. This is the minimal viable version.

What this quick start does NOT give you: - Hybrid retrieval (BM25 + vector search + RRF fusion) - Embedding-based semantic search - Credential filtering - Incremental indexing - Hook-based automatic context injection

The rest of this guide covers building each of these capabilities. The quick start proves the concept. The full pipeline delivers production-quality retrieval.


Decision Framework: Obsidian vs Alternatives

Not every use case needs Obsidian. This section maps when Obsidian is the right substrate, when it is overkill, and when something else fits better.

Decision Tree

START: What is your primary content type?

├─ Structured data (tables, records, schemas)
   Use a database. SQLite, PostgreSQL, or a spreadsheet.
   Obsidian is for prose, not tabular data.

├─ Ephemeral context (current project, temporary notes)
   Use CLAUDE.md / AGENTS.md in the project repo.
   These travel with the code and reset per project.

├─ Team wiki (shared documentation, onboarding)
   Evaluate Notion, Confluence, or a shared git repo.
   Obsidian vaults are personal-first. Team sync is possible
    but not native.

└─ Growing personal knowledge corpus
   
   ├─ < 50 notes
      A folder of markdown files + grep is sufficient.
      Obsidian adds value mainly through the link graph,
       which needs density to be useful.
   
   ├─ 50 - 500 notes
      Obsidian adds value. Wiki-links create a navigable graph.
      BM25-only search (FTS5) is sufficient at this scale.
      Skip vector search and RRF until keyword collisions appear.
   
   ├─ 500 - 5,000 notes
      Full hybrid retrieval becomes valuable. Keyword collisions
       increase. Semantic search catches queries that BM25 misses.
      Add vector search + RRF fusion at this scale.
   
   └─ 5,000+ notes
       Full pipeline is essential. BM25-only returns too much noise.
       Credential filtering becomes critical (more notes = more
        accidentally pasted secrets).
       Incremental indexing matters (full reindex takes minutes).
       MCP integration pays dividends on every AI interaction.

Comparison Matrix

Criterion Obsidian Notion Apple Notes Plain Filesystem CLAUDE.md
Local-first Yes No (cloud) Partial (iCloud) Yes Yes
Plaintext Yes (markdown) No (blocks) No (proprietary) Yes Yes
Graph structure Yes (wiki-links) Partial (mentions) No No No
AI indexable Direct file access API required Export required Direct file access Already in context
Plugin ecosystem 1,800+ plugins Integrations None N/A N/A
Offline capable Full Read-only cached Partial Full Full
Scales to 10K+ notes Yes Yes (with API) Degrades Yes No (single file)
Cost Free (core) $10/mo+ Free Free Free

When Obsidian is overkill

  • Single-project context. If the AI only needs context about the current codebase, put it in CLAUDE.md, AGENTS.md, or project-level documentation. These files travel with the repo and are automatically loaded.
  • Structured data. If the content is tables, records, or schemas, use a database. Obsidian notes are prose-first. Dataview can query frontmatter fields, but a real database handles structured queries better.
  • Temporary research. If the notes will be discarded after the project ends, a scratch directory with markdown files is simpler. Do not build retrieval infrastructure for ephemeral content.

When Obsidian is the right choice

  • Accumulating knowledge over months or years. The value compounds as the corpus grows. A 200-note vault queried daily for six months provides more value than a 5,000-note vault queried once.
  • Multiple domains in one corpus. A vault containing notes on programming, architecture, security, design, and personal projects benefits from cross-domain retrieval that a project-specific CLAUDE.md cannot provide.
  • Privacy-sensitive content. Local-first means the retrieval pipeline never sends content to external services. The vault contains whatever you put in it, including content you would not upload to a cloud service.

Mental Model: Three Layers

The system has three layers that operate independently but compound when combined. Each layer has a different concern and a different failure mode.

┌─────────────────────────────────────────────────────┐
                 INTEGRATION LAYER                     
  MCP servers, hooks, skills, context injection        
  Concern: delivering context to AI tools              
  Failure: wrong context, too much context, stale      
└──────────────────────┬──────────────────────────────┘
                        query + ranked results
┌──────────────────────┴──────────────────────────────┐
                  RETRIEVAL LAYER                      
  BM25, vector KNN, RRF fusion, token budget           
  Concern: finding the right content for any query     
  Failure: wrong ranking, missed results, slow queries 
└──────────────────────┬──────────────────────────────┘
                        chunked, embedded, indexed
┌──────────────────────┴──────────────────────────────┐
                   INTAKE LAYER                        
  Note creation, signal triage, vault organization     
  Concern: what enters the vault and how it's stored   │
  Failure: noise, duplicates, missing structure        
└─────────────────────────────────────────────────────┘

Intake determines what enters the vault. Without curation, the vault accumulates noise: screenshots of tweets, copy-pasted articles with no annotation, half-finished thoughts with no context. The intake layer is responsible for quality control at the point of entry. A scoring pipeline, tagging convention, or manual review process — any mechanism that ensures the vault contains content worth retrieving.

Retrieval makes the vault queryable. This is the engine: chunking notes into search units, embedding chunks into vector space, indexing for keyword and semantic search, fusing results with RRF. The retrieval layer transforms a directory of files into a queryable knowledge base. Without this layer, the vault is navigable through manual browsing and basic search but not programmatically accessible to AI tools.

Integration connects the retrieval layer to AI tools. An MCP server exposes retrieval as a callable tool. Hooks inject context automatically. Skills capture new knowledge back into the vault. The integration layer is the interface between the knowledge base and the AI agents that consume it.

The layers are decoupled by design. The intake scoring pipeline knows nothing about embeddings. The retriever knows nothing about signal routing rules. The MCP server knows nothing about how notes were created. This decoupling means you can improve any layer independently. Replace the embedding model without changing the intake pipeline. Add a new MCP capability without modifying the retriever. Change the signal scoring heuristics without touching the index.


Vault Architecture for AI Consumption

A vault optimized for AI retrieval follows different conventions than a vault optimized for personal browsing. This section covers folder structure, note schema, frontmatter conventions, and the specific patterns that improve retrieval quality.

Folder Structure

Use numbered prefixes for top-level folders to create a predictable organizational hierarchy. The numbers do not imply priority — they group related domains and make the structure scannable.

vault/
├── 00-inbox/              # Unsorted captures, pending triage
├── 01-projects/           # Active project notes
├── 02-areas/              # Ongoing areas of responsibility
├── 03-resources/          # Reference material by topic
   ├── programming/
   ├── security/
   ├── ai-engineering/
   ├── design/
   └── devops/
├── 04-archive/            # Completed projects, old references
├── 05-signals/            # Scored signal intake
   ├── ai-tooling/
   ├── security/
   ├── systems/
   └── ...12 domain folders
├── 06-daily/              # Daily notes (if used)
├── 07-templates/          # Note templates (excluded from index)
├── 08-attachments/        # Images, PDFs (excluded from index)
├── .obsidian/             # Obsidian config (excluded from index)
└── .indexignore            # Paths to exclude from retrieval index

Folders that should be indexed: Everything containing markdown prose — projects, areas, resources, signals, daily notes.

Folders that should be excluded from indexing: Templates (they contain placeholder variables, not content), attachments (binary files), Obsidian configuration, and any folder containing sensitive content you do not want in the retrieval index.

The .indexignore file

Create a .indexignore file at the vault root to explicitly exclude paths from the retrieval index. The syntax matches .gitignore:

# Obsidian internal
.obsidian/

# Templates contain placeholders, not content
07-templates/

# Binary attachments
08-attachments/

# Personal health/medical notes
02-areas/health/

# Financial records
02-areas/finance/personal/

# Career documents (resumes, salary data)
02-areas/career/private/

The indexer reads this file before scanning and skips matching paths entirely. Files in excluded paths are never chunked, never embedded, and never appear in search results.

Note Schema

Every note should have YAML frontmatter. The retriever uses frontmatter fields for filtering and context enrichment:

---
title: "OAuth Token Rotation Patterns"
type: note           # note | signal | project | moc | daily
domain: security     # primary domain for routing
tags:
  - authentication
  - oauth
  - token-management
created: 2026-01-15
updated: 2026-02-28
source: ""           # URL if captured from external source
status: active       # active | archived | draft
---

Required fields for retrieval:

  • title — Used in search result display and heading context for BM25
  • type — Enables type-filtered queries (“show me only MOCs” or “only signals”)
  • tags — Indexed in FTS5 heading context with 0.3 weight, providing keyword matches even when the body uses different terminology

Optional but valuable fields:

  • domain — Enables domain-scoped queries (“search security notes only”)
  • source — Attribution for captured content; the retriever can include source URLs in results
  • status — Allows excluding archived or draft notes from active search

Chunking Conventions

The retriever chunks at H2 (##) heading boundaries. This means your note structure directly affects retrieval granularity:

Good for retrieval:

## Token Rotation Strategy

The rotation interval depends on the threat model...

## Implementation with refresh_token

The OAuth 2.0 refresh token flow requires...

## Error Handling: Expired Tokens

When a token expires mid-request...

Three H2 sections produce three independently searchable chunks. Each chunk has enough context for the embedding to capture its meaning. A query about “expired token handling” matches the third chunk specifically.

Poor for retrieval:

# OAuth Notes

Token rotation depends on threat model. The OAuth 2.0 refresh
token flow requires storing the refresh token securely. When a
token expires mid-request, the client should retry after refresh.
The rotation interval is typically 15-30 minutes for access tokens
and 7-30 days for refresh tokens...

One long section with no H2 headings produces one large chunk. The embedding averages across all topics in the section. A query about any subtopic matches the entire note equally.

Rule of thumb: If a section covers more than one concept, split it into H2 subsections. The chunker handles the rest.

What Not to Put in Notes

Content that degrades retrieval quality:

  • Raw copy-pastes of entire articles without annotation. The retriever indexes the original article’s keywords, diluting your vault with content you did not write. Add a summary, extract key points, or link to the source URL instead.
  • Screenshots without text description. The retriever indexes markdown text. An image without alt text or surrounding description is invisible to both BM25 and vector search.
  • Credential strings. API keys, tokens, passwords, connection strings. Even with credential filtering, the safest approach is to never paste secrets into notes. Reference them by name (“the Cloudflare API token in ~/.env”) instead.
  • Auto-generated content without curation. If a tool generates a note (meeting transcript, Readwise highlights, RSS import), review and annotate it before it enters the permanent vault. Uncurated auto-imports add volume without adding retrievable value.

Plugin Ecosystem for AI Workflows

Obsidian plugins that improve vault quality for AI retrieval fall into three categories: structural (enforce consistency), querying (expose metadata), and sync (keep the vault current).

Essential Plugins

Dataview. Queries your vault like a database using frontmatter fields. Create dynamic indexes: “all notes tagged security updated in the last 30 days” or “all project notes with status active.” Dataview does not directly help retrieval, but it helps you identify gaps in your vault’s coverage and find notes that need updating.

TABLE type, domain, updated
FROM "03-resources"
WHERE status = "active"
SORT updated DESC
LIMIT 20

Templater. Creates notes from templates with dynamic fields. Ensure every new note starts with correct frontmatter by using a template that pre-fills created, type, and domain fields. Consistent frontmatter improves retrieval filtering.

<%* /* New Resource Note Template */ %>
---
title: "<% tp.file.cursor() %>"
type: note
domain: <% tp.system.suggester(["programming", "security", "ai-engineering", "design", "devops"], ["programming", "security", "ai-engineering", "design", "devops"]) %>
tags: []
created: <% tp.date.now("YYYY-MM-DD") %>
updated: <% tp.date.now("YYYY-MM-DD") %>
source: ""
status: active
---

## Key Points

## Details

## References

Linter. Enforces formatting rules across the vault. Consistent heading hierarchy (H1 for title, H2 for sections, H3 for subsections) ensures the chunker produces predictable results. Linter rules that matter for retrieval:

  • Heading increment: enforce sequential heading levels (no jumping from H1 to H3)
  • YAML title: match the filename
  • Trailing spaces: remove (avoids FTS5 tokenization artifacts)
  • Consecutive blank lines: limit to 1 (cleaner chunks)

Git integration. Version control for your vault. Track changes over time, sync between machines, and recover from accidental deletions. Git also provides mtime data that the indexer uses for incremental change detection.

Plugins That Help Indexing

Smart Connections. An Obsidian plugin that provides AI-powered semantic search within Obsidian itself. It creates its own embedding index. While the retrieval system in this guide is external to Obsidian (runs as a Python pipeline), Smart Connections is useful for exploring semantic relationships while writing. The two systems index the same content but serve different use cases: Smart Connections for in-editor discovery, the external retriever for AI tool integration.

Metadata Menu. Provides structured frontmatter editing with autocomplete for field values. Reduces typos in type, domain, and tags fields. Consistent metadata improves retrieval filtering accuracy.

Plugins That Hurt Indexing

Excalidraw. Stores drawings as JSON embedded in markdown files. The JSON is syntactically valid markdown but produces garbage when chunked and embedded. Exclude Excalidraw files from the index via .indexignore or filter by file extension.

Kanban. Stores board state as specially-formatted markdown. The format is designed for Kanban rendering, not for prose retrieval. The chunker produces fragments of card titles and metadata that do not embed well. Exclude Kanban boards from the index.

Calendar. Creates daily notes with minimal content (often just a date header). Empty or near-empty notes produce low-quality chunks. If you use daily notes, write substantive content in them or exclude the daily notes folder from the index.

Plugin Configuration That Matters

File recovery → Enabled. Protects against accidental note deletion. Not directly related to retrieval but critical for a knowledge base you depend on.

Strict line breaks → Disabled. Markdown-standard line breaks (double newline for paragraph) produce cleaner chunks than Obsidian’s strict mode (single newline for <br>).

Default new file location → Designated folder. Route new files to 00-inbox/ so uncategorized notes do not pollute domain folders. The inbox is a staging area; files move to domain folders after triage.

Wiki-link format → Shortest path when possible. Shorter link targets are easier for the retriever to resolve when indexing link structure.


Embedding Models: Choosing and Configuring

The embedding model converts text chunks into numerical vectors for semantic search. The model choice determines retrieval quality, index size, embedding speed, and runtime dependencies. This section explains why Model2Vec’s potion-base-8M is the default choice and when to choose alternatives.

Why Model2Vec potion-base-8M

Model: minishlab/potion-base-8M Parameters: 7.6 million Dimensions: 256 Size: ~30 MB Dependencies: model2vec (numpy only, no PyTorch) Inference: CPU-only, static word embeddings (no attention layers)

Model2Vec distills a sentence transformer’s knowledge into static token embeddings. Instead of running attention layers over the input (as BERT, MiniLM, and other transformer models do), Model2Vec produces vectors through weighted averaging of pre-computed token embeddings.3 The practical consequence: embedding speed is 50-500x faster than transformer-based models because there is no sequential computation.

On the MTEB benchmark suite, potion-base-8M achieves 89% of all-MiniLM-L6-v2’s performance (50.03 vs 56.09 average).4 The 11% quality gap is the trade-off for the speed and simplicity advantages. For short markdown chunks (average 200-400 words in a typical vault), the quality difference is less pronounced than on longer documents because both models converge on similar representations for short, focused text.

Configuration

# embedder.py
DEFAULT_MODEL = "minishlab/potion-base-8M"
EMBEDDING_DIM = 256

class Model2VecEmbedder:
    def __init__(self, model_name=DEFAULT_MODEL):
        self._model_name = model_name
        self._model = None

    def _ensure_model(self):
        if self._model is not None:
            return
        _activate_venv()  # Add isolated venv to sys.path
        from model2vec import StaticModel
        self._model = StaticModel.from_pretrained(self._model_name)

    def embed_batch(self, texts):
        self._ensure_model()
        vecs = self._model.encode(texts)
        return [v.tolist() for v in vecs]

Lazy loading. The model loads on first use, not at import time. Importing the embedder module costs nothing when the retriever operates in BM25-only fallback mode (e.g., when the embedding venv is not installed).

Isolated virtual environment. The model runs in a dedicated venv (e.g., ~/.claude/venvs/memory/) to avoid dependency conflicts with the rest of the toolchain. The _activate_venv() function adds the venv’s site-packages to sys.path at runtime.

# Create isolated venv
python3 -m venv ~/.claude/venvs/memory
~/.claude/venvs/memory/bin/pip install model2vec

Batch processing. The embedder processes texts in batches of 64 to amortize Model2Vec’s overhead. The indexer feeds chunks to embed_batch() rather than embedding one chunk at a time.

When to Choose Alternatives

Model Dim Size Speed Quality (MTEB) Best for
potion-base-8M 256 30 MB 500x 50.03 Default: local, fast, no GPU
all-MiniLM-L6-v2 384 80 MB 1x 56.09 Higher quality, still local
nomic-embed-text-v1.5 768 270 MB 0.5x 62.28 Best local quality
text-embedding-3-small 1536 API N/A 62.30 API-based, highest quality

Choose all-MiniLM-L6-v2 when retrieval quality matters more than speed and you have PyTorch installed. The 384-dimensional vectors increase the SQLite database size by ~50% compared to 256-dim vectors. Embedding speed drops from <1 minute to ~10 minutes for a full reindex of 15,000 files on M-series hardware.

Choose nomic-embed-text-v1.5 when you need the best possible local retrieval quality and accept slower indexing. The 768-dimensional vectors roughly triple the database size. Requires PyTorch and a modern CPU or GPU.

Choose text-embedding-3-small when network latency and privacy are acceptable trade-offs. The API produces the highest-quality embeddings but introduces a cloud dependency, per-token cost ($0.02/million tokens), and sends your content to OpenAI’s servers.

Stay with potion-base-8M in all other cases. The speed advantage is critical for iterative indexing (reindex during development), the numpy-only dependency avoids PyTorch installation complexity, and the 256-dimensional vectors keep the database compact.

Model Hash Tracking

The indexer stores a hash derived from the model name and vocabulary size. If you change the embedding model, the indexer detects the mismatch on the next incremental run and triggers a full reindex automatically.

def _compute_model_hash(self):
    """Hash model name + vocab size for compatibility tracking."""
    key = f"{self._model_name}:{self._model.vocab_size}"
    return hashlib.sha256(key.encode()).hexdigest()[:16]

This prevents mixing vectors from different models in the same database, which would produce nonsensical cosine similarity scores.

Failure Modes

Model download failure. The first run downloads the model from Hugging Face. If the download fails (network issue, corporate firewall), the retriever falls back to BM25-only mode. The model is cached locally after the first download.

Dimension mismatch. If you switch models without clearing the database, the stored vectors have a different dimension than new embeddings. The indexer detects this via the model hash and triggers a full reindex. If the hash check fails (custom model without proper hash), sqlite-vec will error on KNN queries with mismatched dimensions.

Memory pressure on large vaults. Embedding 50,000+ chunks in a single batch can consume significant memory. The indexer processes in batches of 64 to limit peak memory usage. If memory is still an issue, reduce the batch size.


Full-Text Search with FTS5

SQLite’s FTS5 extension provides full-text search with BM25 ranking. FTS5 is the keyword search component of the hybrid retrieval pipeline. This section covers the FTS5 configuration, when BM25 excels, and its specific failure modes.

FTS5 Virtual Table

CREATE VIRTUAL TABLE chunks_fts USING fts5(
    chunk_text,
    section,
    heading_context,
    content=chunks,
    content_rowid=id
);

Content-sync mode. The content=chunks parameter tells FTS5 to reference the chunks table directly rather than storing a duplicate copy of the text. This halves the storage requirement but means FTS5 must be manually synced when chunks are inserted, updated, or deleted.

Columns. Three columns are indexed: - chunk_text — The primary content of each chunk (BM25 weight: 1.0) - section — The H2 heading text (BM25 weight: 0.5) - heading_context — Note title, tags, and metadata (BM25 weight: 0.3)

BM25 Ranking

BM25 ranks documents by term frequency, inverse document frequency, and document length normalization. The bm25() auxiliary function in FTS5 accepts per-column weights:

SELECT
    c.id, c.file_path, c.section, c.chunk_text,
    bm25(chunks_fts, 1.0, 0.5, 0.3) AS score
FROM chunks_fts
JOIN chunks c ON chunks_fts.rowid = c.id
WHERE chunks_fts MATCH ?
ORDER BY score
LIMIT 30;

The column weights (1.0, 0.5, 0.3) mean: - A keyword match in chunk_text contributes the most to the score - A match in section (heading) contributes half as much - A match in heading_context (title, tags) contributes 30% as much

These weights are tunable. If your vault has descriptive headings that strongly predict content quality, increase the section weight. If your tags are comprehensive and accurate, increase the heading_context weight.

When BM25 Wins

BM25 excels at queries containing exact identifiers:

  • Function names: _rrf_fuse, embed_batch, get_stale_files
  • CLI flags: --incremental, --vault, --model
  • Configuration keys: bm25_weight, max_tokens, batch_size
  • Error messages: SQLITE_LOCKED, ConnectionRefusedError
  • Specific terms of art: PostToolUse, PreToolUse, AGENTS.md

For these queries, BM25 finds the exact match immediately. Vector search would return semantically related content but might rank the exact match lower than a conceptual discussion.

When BM25 Fails

BM25 fails at queries that use different terminology than the stored content:

  • Query: “how to handle authentication failures” → Vault contains notes about “login error recovery” and “session expiration handling.” BM25 does not match because the keywords differ.
  • Query: “what is the best way to manage state” → Vault contains notes about “Redux store patterns” and “context providers.” BM25 misses because “state management” is expressed through specific technology names.

BM25 also fails with keyword collision at scale. In a 15,000-file vault, a search for “configuration” matches hundreds of notes because nearly every project note mentions configuration. The results are technically correct but practically useless — the ranking cannot determine which “configuration” note is relevant to the current query.

FTS5 Tokenizer

FTS5 uses the unicode61 tokenizer by default, which handles ASCII and Unicode text. For vaults with significant CJK (Chinese, Japanese, Korean) content, consider the trigram tokenizer:

-- For CJK-heavy vaults
CREATE VIRTUAL TABLE chunks_fts USING fts5(
    chunk_text, section, heading_context,
    content=chunks, content_rowid=id,
    tokenize='trigram'
);

The default unicode61 tokenizer splits on word boundaries, which works poorly for languages without spaces between words. The trigram tokenizer splits every three characters, enabling substring matching at the cost of index size (roughly 3x larger).

Maintenance

FTS5 requires explicit sync when the underlying chunks table changes:

# After inserting chunks
cursor.execute("""
    INSERT INTO chunks_fts(chunks_fts)
    VALUES('rebuild')
""")

The rebuild command reconstructs the FTS5 index from the content table. Run it after bulk inserts (full reindex) but not after individual incremental updates — for those, use INSERT INTO chunks_fts(rowid, chunk_text, section, heading_context) to sync individual rows.


Vector Search with sqlite-vec

The sqlite-vec extension brings vector KNN (K-Nearest Neighbors) search into SQLite. This section covers the sqlite-vec configuration, the embedding pipeline from note to searchable vector, and the specific query patterns.

sqlite-vec Virtual Table

CREATE VIRTUAL TABLE chunk_vecs USING vec0(
    id INTEGER PRIMARY KEY,
    embedding float[256]
);

The vec0 module stores 256-dimensional float vectors as packed binary data. The id column maps 1:1 to the chunks table, enabling joins between vector results and chunk metadata.

Embedding Pipeline

The pipeline flows from note to searchable vector:

Note (.md file)
   Chunker: split at H2 boundaries
     Chunks (30-2000 chars each)
       Credential filter: scrub secrets
         Embedder: Model2Vec encode
           Vectors (256-dim float arrays)
             sqlite-vec: store as packed binary
               Ready for KNN queries

Vector Serialization

Python’s struct module serializes float vectors for sqlite-vec storage:

import struct

def _serialize_vector(vec):
    """Pack float list into binary for sqlite-vec."""
    return struct.pack(f"{len(vec)}f", *vec)

def _deserialize_vector(blob, dim=256):
    """Unpack binary blob to float list."""
    return list(struct.unpack(f"{dim}f", blob))

KNN Query

A vector search query embeds the input query, then finds the K nearest chunks by cosine distance:

def _vector_search(self, query_text, limit=30):
    query_vec = self.embedder.embed_batch([query_text])[0]
    packed = _serialize_vector(query_vec)

    results = self.db.execute("""
        SELECT
            cv.id,
            cv.distance,
            c.file_path,
            c.section,
            c.chunk_text
        FROM chunk_vecs cv
        JOIN chunks c ON cv.id = c.id
        WHERE embedding MATCH ?
            AND k = ?
        ORDER BY distance
    """, [packed, limit]).fetchall()

    return results

The MATCH operator in sqlite-vec performs approximate nearest neighbor search. The k parameter controls how many results to return. The distance column contains the cosine distance (0 = identical, 2 = opposite).

When Vector Search Wins

Vector search excels at queries where the concept matters more than the specific words:

  • Query: “how to handle authentication failures” → Finds notes about “login error recovery” (same semantic space, different keywords)
  • Query: “what patterns exist for caching” → Finds notes about “memoization,” “Redis TTL strategies,” and “HTTP cache headers” (related concepts, diverse terminology)
  • Query: “approaches to testing asynchronous code” → Finds notes about “pytest-asyncio fixtures,” “mock event loops,” and “async test patterns” (same concept expressed through implementation details)

When Vector Search Fails

Vector search struggles with exact identifiers:

  • Query: _rrf_fuse → Returns notes about “fusion algorithms” and “rank merging” but may rank the actual function definition lower than conceptual discussions
  • Query: PostToolUse → Returns notes about “tool lifecycle hooks” and “post-execution handlers” rather than the specific hook name

Vector search also struggles with structured data. JSON configuration files, YAML blocks, and code snippets produce embeddings that capture structural patterns rather than semantic meaning. A JSON file with "review": true embeds differently than a prose discussion of code review.

Graceful Degradation

If sqlite-vec fails to load (missing extension, incompatible platform, corrupted library), the retriever falls back to BM25-only search:

class VectorIndex:
    def __init__(self, db_path):
        self.db = sqlite3.connect(db_path)
        self._vec_available = False
        try:
            self.db.enable_load_extension(True)
            self.db.load_extension("vec0")
            self._vec_available = True
        except Exception:
            pass  # BM25-only mode

    @property
    def vec_available(self):
        return self._vec_available

The retriever checks vec_available before attempting vector queries. When disabled, all searches use BM25 only, and the RRF fusion step is skipped.


Reciprocal Rank Fusion (RRF)

RRF merges two ranked lists without requiring score calibration. This section covers the algorithm, a worked query trace, tuning the k parameter, and why RRF is chosen over alternatives. For an interactive calculator with editable ranks, scenario presets, and a visual architecture explorer, see the hybrid retriever deep dive.

The Algorithm

RRF assigns each document a score based only on its rank position in each list:

score(d) = Σ (weight_i / (k + rank_i))

Where: - k is a smoothing constant (60, following Cormack et al.1) - rank_i is the document’s 1-based rank in result list i - weight_i is an optional per-list multiplier (default 1.0)

Documents that rank well in multiple lists receive higher fused scores. Documents that appear in only one list receive a score from that single source.

Why RRF Over Alternatives

Weighted linear combination requires calibrating BM25 scores against cosine distances. BM25 scores are unbounded and scale with corpus size. Cosine distances are bounded [0, 2]. Combining them requires normalization, and the normalization parameters are dataset-dependent. RRF uses only rank positions, which are always integers starting at 1 regardless of the scoring method.

Learned fusion models require labeled training data — query-document relevance pairs. For a personal knowledge base, this training data does not exist. You would need to manually judge hundreds of query-document pairs to train a useful model. RRF works without any training data.

Condorcet voting methods (Borda count, Schulze method) are theoretically elegant but more complex to implement and tune. The original RRF paper demonstrated that RRF outperforms Condorcet methods on TREC evaluation data.1

Fusion in Practice

Query: “how does the review aggregator handle disagreements”

BM25 ranks review-aggregator.py at position 3 (exact keyword matches on “review,” “aggregator,” “disagreements”) but places two config files higher (they match “review” more prominently). Vector search ranks the same chunk at position 1 (semantic match on conflict resolution). After RRF fusion:

Chunk BM25 Vec Fused Score
review-aggregator.py “Disagreement Resolution” #3 #1 0.0323
code-review-patterns.md “Multi-Reviewer” #4 #2 0.0317
deliberation-config.json “Review Weights” #1 0.0164

Chunks that rank well in both lists surface to the top. Chunks that only appear in one list get a single-source score and drop below dual-ranked results. The actual disagreement resolution logic wins because both methods found it — BM25 through keywords, vector search through semantics.

For the full step-by-step trace with per-rank RRF math, try different k values in the interactive RRF calculator.

Implementation

RRF_K = 60

def _rrf_fuse(self, bm25_results, vec_results,
              bm25_weight=1.0, vec_weight=1.0):
    """Fuse BM25 and vector results using Reciprocal Rank Fusion."""
    scores = {}

    for rank, r in enumerate(bm25_results, start=1):
        cid = r["id"]
        if cid not in scores:
            scores[cid] = {
                "rrf_score": 0.0,
                "file_path": r["file_path"],
                "section": r["section"],
                "chunk_text": r["chunk_text"],
                "bm25_rank": None,
                "vec_rank": None,
            }
        scores[cid]["rrf_score"] += bm25_weight / (self._rrf_k + rank)
        scores[cid]["bm25_rank"] = rank

    for rank, r in enumerate(vec_results, start=1):
        cid = r["id"]
        if cid not in scores:
            scores[cid] = {
                "rrf_score": 0.0,
                "file_path": r["file_path"],
                "section": r["section"],
                "chunk_text": r["chunk_text"],
                "bm25_rank": None,
                "vec_rank": None,
            }
        scores[cid]["rrf_score"] += vec_weight / (self._rrf_k + rank)
        scores[cid]["vec_rank"] = rank

    fused = sorted(
        scores.values(),
        key=lambda x: x["rrf_score"],
        reverse=True,
    )
    return fused

Tuning k

The k constant controls how much weight is given to top-ranked results versus lower-ranked results:

  • Lower k (e.g., 10): Top-ranked results dominate. Rank 1 scores 1/11 = 0.091, rank 10 scores 1/20 = 0.050 (1.8x difference). Good when you trust the individual rankers to get the top result right.
  • Default k (60): Balanced. Rank 1 scores 1/61 = 0.0164, rank 10 scores 1/70 = 0.0143 (1.15x difference). Rank differences are compressed, giving more weight to appearing in multiple lists.
  • Higher k (e.g., 200): Appearing in both lists matters much more than rank position. Rank 1 scores 1/201, rank 10 scores 1/210 — nearly identical. Use when the individual rankers produce noisy rankings but cross-list agreement is reliable.

Start with k=60. The original RRF paper found this value robust across diverse TREC datasets. Tune only after measuring failure cases on your own query distribution.

Tie-Breaking

When two chunks have identical RRF scores (rare but possible with the same rank in one list and no appearance in the other), break ties by:

  1. Prefer chunks that appear in both lists over chunks that appear in only one
  2. Among chunks in both lists, prefer the one with the lower combined rank
  3. Among chunks in only one list, prefer the one with the lower rank in that list

The Complete Retrieval Pipeline

This section traces a query from input to output through the entire pipeline: BM25 search, vector search, RRF fusion, token budget truncation, and context assembly.

End-to-End Flow

User query: "PostToolUse hook for context compression"
  │
  ├─ BM25 Search (FTS5)
  │    → MATCH "PostToolUse hook context compression"
  │    → Top 30 results ranked by BM25 score
  │    → 12ms
  │
  ├─ Vector Search (sqlite-vec)
  │    → Embed query with Model2Vec
  │    → KNN k=30 on chunk_vecs
  │    → Top 30 results ranked by cosine distance
  │    → 8ms
  │
  └─ RRF Fusion
       → Merge 60 candidates (may overlap)
       → Score by rank position
       → Top 10 results
       → 3ms
       │
       └─ Token Budget
            → Truncate to max_tokens (default 4000)
            → Estimate at 4 chars per token
            → Return results with metadata
            → <1ms

Total latency: ~23ms for a 49,746-chunk database on Apple M3 Pro hardware.

The Search API

class HybridRetriever:
    def search(self, query, limit=10, max_tokens=4000,
               bm25_weight=1.0, vec_weight=1.0):
        """
        Search the vault using hybrid BM25 + vector retrieval.

        Args:
            query: Search query text
            limit: Maximum results to return
            max_tokens: Token budget for total result text
            bm25_weight: Weight for BM25 results in RRF
            vec_weight: Weight for vector results in RRF

        Returns:
            List of SearchResult with file_path, section,
            chunk_text, rrf_score, bm25_rank, vec_rank
        """
        # BM25 search
        bm25_results = self._bm25_search(query, limit=30)

        # Vector search (if available)
        if self.index.vec_available:
            vec_results = self._vector_search(query, limit=30)
            fused = self._rrf_fuse(
                bm25_results, vec_results,
                bm25_weight, vec_weight,
            )
        else:
            fused = bm25_results  # BM25-only fallback

        # Token budget truncation
        results = []
        token_count = 0
        for r in fused[:limit]:
            chunk_tokens = len(r["chunk_text"]) // 4
            if token_count + chunk_tokens > max_tokens:
                break
            results.append(r)
            token_count += chunk_tokens

        return results

Token Budget Truncation

The max_tokens parameter prevents the retriever from returning more context than the AI tool can use. The estimate uses 4 characters per token (a reasonable approximation for English prose). Results are truncated greedily: add results in ranked order until the budget is exhausted.

This is a conservative strategy. A more sophisticated approach would consider per-result quality scores and prefer shorter, higher-quality results over longer, lower-quality results. The greedy approach is simpler and works well in practice because RRF ranking already orders results by relevance.

Database Schema (Complete)

-- Chunk content and metadata
CREATE TABLE chunks (
    id INTEGER PRIMARY KEY,
    file_path TEXT NOT NULL,
    section TEXT NOT NULL,
    chunk_text TEXT NOT NULL,
    heading_context TEXT DEFAULT '',
    mtime_ns INTEGER NOT NULL,
    embedded_at REAL NOT NULL
);

CREATE INDEX idx_chunks_file ON chunks(file_path);
CREATE INDEX idx_chunks_mtime ON chunks(mtime_ns);

-- FTS5 for BM25 search (content-synced to chunks table)
CREATE VIRTUAL TABLE chunks_fts USING fts5(
    chunk_text, section, heading_context,
    content=chunks, content_rowid=id
);

-- sqlite-vec for vector KNN search
CREATE VIRTUAL TABLE chunk_vecs USING vec0(
    id INTEGER PRIMARY KEY,
    embedding float[256]
);

-- Model metadata for compatibility tracking
CREATE TABLE model_meta (
    key TEXT PRIMARY KEY,
    value TEXT
);

Graceful Degradation Path

Full pipeline:     BM25 + Vector + RRF    Best results
No sqlite-vec:     BM25 only              Good results (no semantic)
No model download:  BM25 only              Good results (no semantic)
No FTS5:           Vector only             Decent results (no keyword)
No database:       Error                   Prompt user to run indexer

The retriever checks capabilities at initialization and adapts its query strategy. A missing component degrades quality but does not cause errors. The only hard failure is a missing database file.

Production Stats

Measured on a vault of 16,894 files, 49,746 chunks, 83 MB SQLite database, Apple M3 Pro:

Metric Value
Total files 16,894
Total chunks 49,746
Database size 83 MB
BM25 query latency (p50) 12ms
Vector query latency (p50) 8ms
RRF fusion latency 3ms
End-to-end search latency (p50) 23ms
Full reindex time ~4 minutes
Incremental reindex time <10 seconds
Embedding model potion-base-8M (256-dim)
BM25 candidate pool 30
Vector candidate pool 30
Default result limit 10
Default token budget 4,000 tokens

Content Hashing and Change Detection

The indexer needs to know which files have changed since the last index run. This section covers the change detection mechanism and the hashing strategy.

File Modification Time Comparison

The indexer stores mtime_ns (file modification time in nanoseconds) for every chunk in the chunks table. On an incremental run, the indexer:

  1. Scans the vault for all .md files in allowed folders
  2. Reads the mtime_ns for each file from the filesystem
  3. Compares against the stored mtime_ns in the database
  4. Identifies three categories:
  5. New files: path exists in filesystem but not in database
  6. Changed files: path exists in both but mtime_ns differs
  7. Deleted files: path exists in database but not in filesystem
def get_stale_files(self, vault_mtimes):
    """Find files whose mtime changed or are new."""
    stored = dict(self.db.execute(
        "SELECT DISTINCT file_path, mtime_ns FROM chunks"
    ).fetchall())

    stale = []
    for path, mtime in vault_mtimes.items():
        if path not in stored or stored[path] != mtime:
            stale.append(path)
    return stale

def get_deleted_files(self, vault_paths):
    """Find files in database that no longer exist in vault."""
    stored_paths = set(r[0] for r in self.db.execute(
        "SELECT DISTINCT file_path FROM chunks"
    ).fetchall())
    return stored_paths - set(vault_paths)

Why mtime, Not Content Hash

Content hashing (SHA-256 of file contents) would be more reliable than mtime comparison — it would detect cases where a file was touched without changing (e.g., git checkout restoring the original mtime). However, hashing requires reading every file on every incremental run. For 16,894 files, reading file contents takes 2-3 seconds. Reading mtimes from the filesystem takes <100ms.

The trade-off: mtime comparison occasionally triggers unnecessary re-indexing of unchanged files (false positives) but never misses actual changes. False positives cost a few extra embedding calls per run. The speed difference (100ms vs 3 seconds) makes mtime the pragmatic choice for a system that runs on every AI interaction.

Handling Deletions

When a file is deleted from the vault, the indexer removes all its chunks from the database:

def remove_file(self, file_path):
    """Remove all chunks and vectors for a file."""
    chunk_ids = [r[0] for r in self.db.execute(
        "SELECT id FROM chunks WHERE file_path = ?",
        [file_path],
    ).fetchall()]

    for cid in chunk_ids:
        self.db.execute(
            "DELETE FROM chunk_vecs WHERE id = ?", [cid]
        )
    self.db.execute(
        "DELETE FROM chunks WHERE file_path = ?",
        [file_path],
    )

FTS5 content-sync tables require explicit deletion via INSERT INTO chunks_fts(chunks_fts, rowid, ...) VALUES('delete', ?, ...) for each removed row. The indexer handles this as part of the file removal process.


Incremental vs Full Reindex

The indexer supports two modes: incremental (fast, daily use) and full (slow, occasional). This section covers when to use each, the idempotency guarantees, and corruption recovery.

Incremental Reindex

When to use: Daily indexing after editing notes. The default mode.

What it does: 1. Scan vault for file changes (mtime comparison) 2. Delete chunks for deleted files 3. Re-chunk and re-embed changed files 4. Insert new chunks for new files 5. Sync FTS5 index

Typical duration: <10 seconds for a day’s edits on a 16,000-file vault.

python index_vault.py --incremental

Full Reindex

When to use: - After changing the embedding model (model hash mismatch detected) - After schema migration (new columns, changed indexes) - After database corruption (integrity check fails) - When incremental indexing produces unexpected results

What it does: 1. Drop all existing data (chunks, vectors, FTS5 entries) 2. Scan entire vault 3. Chunk all files 4. Embed all chunks 5. Build FTS5 index from scratch

Typical duration: ~4 minutes for 16,894 files on Apple M3 Pro.

python index_vault.py --full

Idempotency

Both modes are idempotent: running the same command twice produces the same result. The indexer deletes existing chunks for a file before inserting new ones, so a re-run of incremental indexing on an already-current database produces zero changes. A re-run of full indexing produces an identical database.

Corruption Recovery

If the SQLite database becomes corrupted (power loss during write, disk error, killed process mid-transaction):

# Check integrity
sqlite3 vectors.db "PRAGMA integrity_check;"

# If corruption detected, full reindex rebuilds from source files
python index_vault.py --full

The source of truth is always the vault files, not the database. The database is a derived artifact that can be rebuilt at any time. This is a critical design property: you never need to back up the database.

The --incremental Flag

When the indexer runs with --incremental:

  1. Model hash check. Compare stored model hash against current model. If different, automatically switch to full reindex mode and warn the user.
  2. File scan. Walk allowed folders, collect file paths and mtimes.
  3. Change detection. Compare against stored data.
  4. Batch processing. Re-chunk and re-embed changed files in batches of 64.
  5. Progress reporting. Print count of processed files and elapsed time.
  6. Graceful shutdown. Handle SIGINT by finishing the current file before stopping.

Credential Filtering and Data Boundaries

Personal notes contain secrets: API keys, bearer tokens, database connection strings, private keys pasted during debugging sessions. The credential filter prevents these from entering the retrieval index.

The Problem

A note about debugging an OAuth integration might contain:

The token was: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
I used this curl command:
  curl -H "Authorization: Bearer sk-ant-api03-abc123..."

Without filtering, both the JWT and the API key would be chunked, embedded, and stored in the database. A search for “authentication” would return the chunk containing real secrets. Worse, if the retriever feeds results to an AI tool through MCP, the secrets appear in the AI’s context window and potentially in the tool’s logs.

Pattern-Based Filtering

The credential filter runs on every chunk before storage, matching 25 vendor-specific patterns plus generic patterns:

Vendor-Specific Patterns:

Pattern Example Regex
OpenAI API key sk-... sk-[a-zA-Z0-9_-]{20,}
Anthropic API key sk-ant-api03-... sk-ant-api\d{2}-[a-zA-Z0-9_-]{20,}
GitHub PAT ghp_... gh[ps]_[a-zA-Z0-9]{36,}
AWS Access Key AKIA... AKIA[0-9A-Z]{16}
Stripe key sk_live_... [sr]k_(live\|test)_[a-zA-Z0-9]{24,}
Cloudflare token ... Various patterns

Generic Patterns:

Pattern Detection
JWT tokens eyJ[a-zA-Z0-9_-]+\.eyJ[a-zA-Z0-9_-]+
Bearer tokens Bearer\s+[a-zA-Z0-9_\-\.]+
Private keys -----BEGIN (RSA\|EC\|OPENSSH) PRIVATE KEY-----
High-entropy base64 Strings with >4.5 bits/char entropy, 40+ chars
Password assignments password\s*[:=]\s*["'][^"']+["']

Filter Implementation

def clean_content(text):
    """Scrub credentials from text before indexing."""
    result = ScanResult(is_clean=True, match_count=0, patterns=[])

    for pattern in CREDENTIAL_PATTERNS:
        matches = pattern.regex.findall(text)
        if matches:
            text = pattern.regex.sub(
                f"[REDACTED:{pattern.name}]", text
            )
            result.is_clean = False
            result.match_count += len(matches)
            result.patterns.append(pattern.name)

    return text, result

Key design choices:

  1. Filter before embedding. The cleaned text is what gets embedded. The vector representation never encodes credential patterns. A query for “API key” returns notes that discuss API key management, not notes that contain actual keys.

  2. Replace, not remove. The [REDACTED:pattern-name] token preserves the semantic context of the surrounding text. The embedding captures that “something credential-like was here” without encoding the credential itself.

  3. Log patterns, not values. The filter logs which patterns matched (e.g., “Scrubbed 2 credential(s) from oauth-debug.md [jwt, bearer-token]”) but never logs the credential value.

Path-Based Exclusion

The .indexignore file provides coarse-grained exclusion by path. The credential filter provides fine-grained scrubbing within indexed files. Both are necessary:

  • .indexignore for entire folders you know contain sensitive content (health notes, financial records, career documents)
  • Credential filter for secrets accidentally embedded in otherwise-indexable content

Data Classification

For vaults containing diverse content, consider classifying notes by sensitivity:

Level Examples Index? Filter?
Public Blog drafts, technical notes Yes Yes
Internal Project plans, architecture decisions Yes Yes
Sensitive Salary data, health records No (.indexignore) N/A
Restricted Credentials, private keys No (.indexignore) N/A

MCP Server Architecture

Model Context Protocol (MCP) servers expose the retriever as a tool that AI agents can call. This section covers the server design, capability surface, and permission boundaries.

Protocol Choice: STDIO vs HTTP

MCP supports two transport modes:

STDIO — The AI tool spawns the MCP server as a child process and communicates over stdin/stdout. This is the standard mode for local tools. Claude Code, Codex CLI, and Cursor all support STDIO MCP servers.

{
  "mcpServers": {
    "obsidian": {
      "command": "python",
      "args": ["/path/to/obsidian_mcp.py"],
      "env": {
        "VAULT_PATH": "/path/to/vault",
        "DB_PATH": "/path/to/vectors.db"
      }
    }
  }
}

HTTP — The MCP server runs as a standalone HTTP service. Useful for remote access, multi-client setups, or team configurations where the vault is on a shared server.

{
  "mcpServers": {
    "obsidian": {
      "url": "http://localhost:3333/mcp"
    }
  }
}

Recommendation: Use STDIO for personal vaults. It is simpler, more secure (no network exposure), and the server lifecycle is managed by the AI tool. Use HTTP only when multiple tools or multiple machines need concurrent access to the same vault.

Capability Design

The MCP server should expose a minimal set of tools:

search — The primary tool. Runs hybrid retrieval and returns ranked results.

{
  "name": "obsidian_search",
  "description": "Search the Obsidian vault using hybrid BM25 + vector retrieval",
  "parameters": {
    "query": { "type": "string", "description": "Search query" },
    "limit": { "type": "integer", "default": 5 },
    "max_tokens": { "type": "integer", "default": 2000 }
  }
}

read_note — Read the full content of a specific note by path. Useful when the agent wants to see the complete context of a search result.

{
  "name": "obsidian_read_note",
  "description": "Read the full content of a note by file path",
  "parameters": {
    "file_path": { "type": "string", "description": "Relative path within vault" }
  }
}

list_notes — List notes matching a filter (by folder, tag, type, or date range). Useful for exploration when the agent does not have a specific query.

{
  "name": "obsidian_list_notes",
  "description": "List notes matching filters",
  "parameters": {
    "folder": { "type": "string", "description": "Folder path within vault" },
    "tag": { "type": "string", "description": "Tag to filter by" },
    "limit": { "type": "integer", "default": 20 }
  }
}

get_context — A convenience tool that runs a search and formats the results as a context block suitable for injection into a conversation.

{
  "name": "obsidian_get_context",
  "description": "Get formatted context from vault for a topic",
  "parameters": {
    "topic": { "type": "string", "description": "Topic to get context for" },
    "max_tokens": { "type": "integer", "default": 2000 }
  }
}

Permission Boundaries

The MCP server should enforce strict boundaries:

  1. Read-only. The server reads the vault and the index database. It does not create, modify, or delete notes. Write operations (capturing new notes) are handled by separate hooks or skills, not the MCP server.

  2. Vault-scoped. The server only reads files within the configured vault path. Path traversal attempts (../../etc/passwd) must be rejected.

  3. Credential-filtered output. Even if the database contains pre-filtered content, apply credential filtering on output as a defense-in-depth measure.

  4. Token-limited responses. Enforce max_tokens on all tool responses to prevent the AI tool from receiving excessively large context blocks.

Error Handling

MCP tools should return structured error messages that help the AI tool recover:

def search(self, query, limit=5, max_tokens=2000):
    if not self.db_path.exists():
        return {
            "error": "Index database not found. Run the indexer first.",
            "suggestion": "python index_vault.py --full"
        }

    results = self.retriever.search(query, limit, max_tokens)

    if not results:
        return {
            "results": [],
            "message": f"No results found for '{query}'. Try broader terms."
        }

    return {
        "results": [
            {
                "file_path": r["file_path"],
                "section": r["section"],
                "text": r["chunk_text"],
                "score": round(r["rrf_score"], 4),
            }
            for r in results
        ],
        "count": len(results),
        "query": query,
    }

Claude Code Integration

Claude Code is the primary consumer of the Obsidian retrieval system. This section covers MCP configuration, hook integration, and the obsidian_bridge.py pattern.

MCP Configuration

Add the Obsidian MCP server to ~/.claude/settings.json:

{
  "mcpServers": {
    "obsidian": {
      "command": "python",
      "args": ["/path/to/obsidian_mcp.py"],
      "env": {
        "VAULT_PATH": "/absolute/path/to/vault",
        "DB_PATH": "/absolute/path/to/vectors.db"
      }
    }
  }
}

After adding the configuration, restart Claude Code. The MCP server will start as a child process. Verify it is running:

> What tools do you have from the obsidian MCP server?

Claude Code should list the available tools (obsidian_search, obsidian_read_note, etc.).

Hook Integration

Hooks extend Claude Code’s behavior at defined lifecycle points. Two hooks are relevant for Obsidian integration:

PreToolUse hook — Queries the vault before the agent processes a tool call. Injects relevant context automatically.

#!/bin/bash
# ~/.claude/hooks/pre-tool-use/obsidian-context.sh
# Automatically inject vault context before tool execution

TOOL_NAME="$1"
PROMPT="$2"

# Only inject context for code-related tools
case "$TOOL_NAME" in
    Edit|Write|Bash)
        # Query the vault
        CONTEXT=$(python /path/to/retriever.py search "$PROMPT" --limit 3 --max-tokens 1500)
        if [ -n "$CONTEXT" ]; then
            echo "---"
            echo "Relevant vault context:"
            echo "$CONTEXT"
            echo "---"
        fi
        ;;
esac

PostToolUse hook — Captures significant tool outputs back to the vault for future retrieval.

#!/bin/bash
# ~/.claude/hooks/post-tool-use/capture-insight.sh
# Capture significant outputs to vault (selective)

TOOL_NAME="$1"
OUTPUT="$2"

# Only capture substantial outputs
if [ ${#OUTPUT} -gt 500 ]; then
    python /path/to/capture.py --text "$OUTPUT" --source "claude-code-$TOOL_NAME"
fi

The obsidian_bridge.py Pattern

A bridge module provides a Python API that hooks and skills can call:

# obsidian_bridge.py
from retriever import HybridRetriever

_retriever = None

def get_retriever():
    global _retriever
    if _retriever is None:
        _retriever = HybridRetriever(
            db_path="/path/to/vectors.db",
            vault_path="/path/to/vault",
        )
    return _retriever

def search_vault(query, limit=5, max_tokens=2000):
    """Search vault and return formatted context."""
    retriever = get_retriever()
    results = retriever.search(query, limit, max_tokens)

    if not results:
        return ""

    lines = ["## Vault Context\n"]
    for r in results:
        lines.append(f"**{r['file_path']}** — {r['section']}")
        lines.append(f"> {r['chunk_text'][:500]}")
        lines.append("")

    return "\n".join(lines)

The /capture Skill

A Claude Code skill for capturing insights back to the vault:

/capture "OAuth token rotation requires both access and refresh token invalidation"
  --domain security
  --tags oauth,tokens

The skill creates a new note in 00-inbox/ with proper frontmatter and triggers an incremental reindex so the new note is immediately searchable.

Context Window Management

The integration should be mindful of Claude Code’s context window:

  • Limit injected context to 1,500-2,000 tokens per query. More than this competes with the agent’s working memory.
  • Include source attribution. Always include the file path and section heading so the agent can reference the source.
  • Truncate chunk text. Long chunks should be truncated with ... rather than omitted entirely. The first 300-500 characters usually contain the key information.
  • Do not inject on every tool call. The PreToolUse hook should selectively inject context based on the tool being called. Read operations do not need vault context. Write and Edit operations benefit from it.

Codex CLI Integration

Codex CLI connects to MCP servers through config.toml. The integration pattern differs from Claude Code in configuration syntax and instruction delivery.

MCP Configuration

Add to .codex/config.toml or ~/.codex/config.toml:

[mcp_servers.obsidian]
command = "python"
args = ["/path/to/obsidian_mcp.py"]

[mcp_servers.obsidian.env]
VAULT_PATH = "/absolute/path/to/vault"
DB_PATH = "/absolute/path/to/vectors.db"

AGENTS.md Patterns

Codex CLI reads AGENTS.md for project-level instructions. Include vault search guidance:

## Available Tools

### Obsidian Vault (MCP: obsidian)
Use the `obsidian_search` tool to find relevant context from the knowledge base.
Search the vault when you need:
- Background on a concept or pattern
- Prior decisions or rationale
- Reference material for implementation

Example queries:
- "authentication patterns in FastAPI"
- "how does the review aggregator work"
- "sqlite-vec configuration"

Differences from Claude Code

Feature Claude Code Codex CLI
MCP config settings.json config.toml
Hooks ~/.claude/hooks/ Not supported
Skills ~/.claude/skills/ Not supported
Instruction file CLAUDE.md AGENTS.md
Approval modes --dangerously-skip-permissions suggest / auto-edit / full-auto

Key difference: Codex CLI does not support hooks. The automatic context injection pattern (PreToolUse hook) is not available. Instead, include explicit instructions in AGENTS.md telling the agent to search the vault before starting work.


Cursor and Other Tools

Cursor and other AI tools that support MCP can connect to the same Obsidian MCP server. This section covers configuration for common tools.

Cursor

Add to .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "obsidian": {
      "command": "python",
      "args": ["/path/to/obsidian_mcp.py"],
      "env": {
        "VAULT_PATH": "/absolute/path/to/vault",
        "DB_PATH": "/absolute/path/to/vectors.db"
      }
    }
  }
}

Cursor’s .cursorrules file can include instructions to use the vault:

When working on implementation tasks, search the Obsidian vault
for relevant context before writing code. Use the obsidian_search
tool with descriptive queries about the concept you're implementing.

Compatibility Matrix

Tool MCP Support Transport Config Location
Claude Code Full STDIO ~/.claude/settings.json
Codex CLI Full STDIO .codex/config.toml
Cursor Full STDIO .cursor/mcp.json
Windsurf Full STDIO .windsurf/mcp.json
Continue.dev Partial HTTP ~/.continue/config.json
Zed In progress STDIO Settings UI

Fallback for Non-MCP Tools

For tools that do not support MCP, the retriever can be wrapped as a CLI:

# Search from command line
python retriever_cli.py search "query text" --limit 5

# Output formatted for copy-paste into any tool
python retriever_cli.py context "query text" --format markdown

The CLI outputs structured text that can be manually pasted into any AI tool’s input. This is less elegant than MCP integration but works universally.


Prompt Caching from Structured Notes

Structured notes in the vault can serve as reusable context blocks that reduce token usage across AI interactions. This section covers cache key design and token budget management.

The Pattern

Instead of searching for context on every interaction, pre-build context blocks from well-structured vault notes and cache them:

# cache_keys.py
CONTEXT_BLOCKS = {
    "auth-patterns": {
        "vault_query": "authentication patterns implementation",
        "max_tokens": 1500,
        "ttl_hours": 24,  # Rebuild daily
    },
    "api-conventions": {
        "vault_query": "API design conventions REST patterns",
        "max_tokens": 1000,
        "ttl_hours": 168,  # Rebuild weekly
    },
    "project-architecture": {
        "vault_query": "current project architecture decisions",
        "max_tokens": 2000,
        "ttl_hours": 12,  # Rebuild twice daily
    },
}

Cache Invalidation

Cache invalidation is based on two signals:

  1. TTL expiry. Each context block has a time-to-live. When the TTL expires, the block is rebuilt by re-querying the vault.
  2. Vault change detection. When the indexer detects changes to files that contributed to a cached context block, the block is invalidated immediately.

Token Budget Management

A session starts with a total context budget. Cached blocks consume part of that budget:

Total context budget:    8,000 tokens
├─ System prompt:        1,500 tokens
├─ Cached blocks:        3,000 tokens (pre-loaded)
├─ Dynamic search:       2,000 tokens (on-demand)
└─ Conversation:         1,500 tokens (remaining)

The cached blocks load at session start. Dynamic search results fill the remaining budget on a per-query basis. This hybrid approach gives the agent a baseline of frequently-needed context while preserving budget for specific queries.

Before/After Token Usage

Without caching: Every relevant query triggers a vault search, returning 1,500-2,000 tokens of context. Over 10 queries in a session, the agent consumes 15,000-20,000 tokens of vault context.

With caching: Three pre-built context blocks consume 4,500 tokens total. Additional searches add 1,500-2,000 tokens per unique query. Over 10 queries where 6 are covered by cached blocks, the agent consumes 4,500 + (4 * 1,500) = 10,500 tokens — roughly half the uncached usage.


PostToolUse Hooks for Context Compression

Tool outputs can be verbose: stack traces, file listings, test results. A PostToolUse hook can compress these outputs before they consume context window space.

The Problem

A Bash tool call that runs tests might return:

PASSED tests/test_auth.py::test_login_success
PASSED tests/test_auth.py::test_login_failure
PASSED tests/test_auth.py::test_token_refresh
PASSED tests/test_auth.py::test_session_expiry
... (200 more lines)
FAILED tests/test_api.py::test_rate_limit_exceeded

The full output is 5,000 tokens, but the signal is in 2 lines: 200 passed, 1 failed.

Hook Implementation

#!/bin/bash
# ~/.claude/hooks/post-tool-use/compress-output.sh
# Compress verbose tool outputs to preserve context window

TOOL_NAME="$1"
OUTPUT="$2"
OUTPUT_LEN=${#OUTPUT}

# Only compress large outputs
if [ "$OUTPUT_LEN" -lt 2000 ]; then
    exit 0  # Pass through unchanged
fi

case "$TOOL_NAME" in
    Bash)
        # Compress test output
        if echo "$OUTPUT" | grep -q "PASSED\|FAILED"; then
            PASSED=$(echo "$OUTPUT" | grep -c "PASSED")
            FAILED=$(echo "$OUTPUT" | grep -c "FAILED")
            FAILURES=$(echo "$OUTPUT" | grep "FAILED")
            echo "Tests: $PASSED passed, $FAILED failed"
            if [ "$FAILED" -gt 0 ]; then
                echo "Failures:"
                echo "$FAILURES"
            fi
        fi
        ;;
esac

Recursive Trigger Prevention

A compression hook that emits output could trigger itself if not guarded:

# Guard against recursive invocation
if [ -n "$COMPRESS_HOOK_ACTIVE" ]; then
    exit 0
fi
export COMPRESS_HOOK_ACTIVE=1

Compression Heuristics

Output Type Detection Compression Strategy
Test results PASSED / FAILED keywords Count pass/fail, show failures only
File listings ls or find in command Truncate to first 20 entries + count
Stack traces Traceback keyword Keep first and last frame + error message
Git status modified: / new file: Summarize counts by status
Build output warning: / error: Strip info lines, keep warnings/errors

Signal Intake and Triage Pipeline

The intake layer determines what enters the vault. Without curation, the vault accumulates noise. This section covers the scoring pipeline that routes signals to domain folders.

Sources

Signals come from multiple channels:

  • RSS feeds: Technical blogs, security advisories, release notes
  • Bookmarks: Browser bookmarks saved via Obsidian Web Clipper or bookmarklet
  • Newsletters: Key excerpts from email newsletters
  • Manual capture: Notes written during reading, conversations, or research
  • Tool output: Significant AI tool outputs captured via hooks

Scoring Dimensions

Each signal is scored on four dimensions (0.0 to 1.0 each):

Dimension Question Low Score (0.0-0.3) High Score (0.7-1.0)
Relevance Does this relate to my active domains? Tangential, outside scope Directly relevant to active work
Actionability Can I use this information? Pure theory, no application Specific technique or pattern I can apply
Depth How substantive is the content? Headlines, shallow summary Detailed analysis with examples
Authority How credible is the source? Anonymous blog, unverified Primary source, peer-reviewed, recognized expert

Composite Score and Routing

composite = (relevance * 0.35) + (actionability * 0.25) +
            (depth * 0.25) + (authority * 0.15)
Score Range Action
0.55+ Auto-route to domain folder
0.40 - 0.55 Queue for manual review
< 0.40 Drop (do not store)

Domain Routing

Signals scoring above 0.55 route to one of 12 domain folders based on keyword matching and topic classification:

05-signals/
├── ai-tooling/        # Claude, LLMs, AI development tools
├── security/          # Vulnerabilities, auth, cryptography
├── systems/           # Architecture, distributed systems
├── programming/       # Languages, patterns, algorithms
├── web/               # Frontend, backends, APIs
├── data/              # Databases, data engineering
├── devops/            # CI/CD, containers, infrastructure
├── design/            # UI/UX, product design
├── mobile/            # iOS, Android, cross-platform
├── career/            # Industry trends, hiring, growth
├── research/          # Academic papers, whitepapers
└── other/             # Signals that don't fit a domain

Production Stats

Over 14 months of operation:

Metric Value
Total signals processed 7,771
Auto-routed (>0.55) 4,832 (62%)
Queued for review (0.40-0.55) 1,543 (20%)
Dropped (<0.40) 1,396 (18%)
Active domain folders 12
Average signals per day ~18

Knowledge Graph Patterns

Obsidian’s wiki-link graph encodes relationships between notes. This section covers link semantics, graph traversal for context expansion, and anti-patterns that degrade graph quality.

Every wiki-link creates a directed edge in the graph. Obsidian tracks both forward links and backlinks:

  • Forward link: Note A contains [[Note B]] → A links to B
  • Backlink: Note B shows that Note A references it

The graph encodes different types of relationships depending on context:

Link Pattern Semantic Example
Inline link “Is related to” “See [[OAuth Token Rotation]] for details”
Header link “Has subtopic” ”## Related\n- [[Token Rotation]]\n- [[Session Management]]”
Tag-like link “Is categorized as” ”[[type/reference]]”
MOC link “Is part of” A Map of Content note listing related notes

Maps of Content (MOCs)

MOCs are index notes that organize related notes into a navigable structure:

---
title: "Authentication & Security MOC"
type: moc
domain: security
---

## Core Concepts
- [[OAuth 2.0 Overview]]
- [[JWT Token Anatomy]]
- [[Session Management Patterns]]

## Implementation Patterns
- [[OAuth Token Rotation]]
- [[Refresh Token Security]]
- [[PKCE Flow Implementation]]

## Failure Modes
- [[Token Expiry Handling]]
- [[Session Fixation Prevention]]
- [[CSRF Defense Strategies]]

MOCs benefit retrieval in two ways:

  1. Direct match. A search for “authentication overview” matches the MOC itself, providing the agent with a curated list of related notes.
  2. Context expansion. After finding a specific note, the retriever can check if the note appears in any MOCs and include the MOC’s structure in the results, giving the agent a map of the broader topic.

Graph Traversal for Context Expansion

A future enhancement to the retriever: after finding top results, expand the context by following links:

def expand_context(results, depth=1):
    """Follow wiki-links from top results to find related context."""
    expanded = set()
    for result in results:
        # Parse wiki-links from chunk text
        links = extract_wiki_links(result["chunk_text"])
        for link_target in links:
            # Resolve link to file path
            target_path = resolve_wiki_link(link_target)
            if target_path and target_path not in expanded:
                expanded.add(target_path)
                # Include target's most relevant chunk
                target_chunks = get_chunks_for_file(target_path)
                # ... rank and include best chunk
    return results + list(expanded_results)

This is not implemented in the current retriever but represents a natural extension of the graph structure.

Anti-Patterns

Orphan clusters. Groups of notes that link to each other but have no connections to the rest of the vault. The graph panel in Obsidian makes these visible as disconnected islands. Orphan clusters indicate missing MOCs or missing cross-domain links.

Tag sprawl. Using tags inconsistently or creating too many fine-grained tags. A vault with 500 unique tags across 5,000 notes averages 1 note per 10 tags — the tags are not useful for filtering. Consolidate to 20-50 high-level tags that map to your domain folders.

Link-heavy, content-light notes. Notes that consist entirely of wiki-links with no prose. These notes index poorly because the chunker has no text to embed. Add at least a paragraph of context explaining why the linked notes are related.

Bidirectional links for everything. Not every reference needs to be a wiki-link. Mentioning “OAuth” in passing does not require [[OAuth 2.0 Overview]]. Reserve wiki-links for intentional, navigable relationships where clicking the link would provide useful context.


Developer Workflow Recipes

Practical workflows that combine vault retrieval with daily development tasks.

Morning Context Load

Start the day by loading relevant context:

Search my vault for notes about [current project] updated in the last week

The retriever returns recent notes about your active project, giving you a quick refresher on where you left off. More effective than re-reading yesterday’s commit messages.

Research Capture During Coding

While implementing a feature, capture insights without leaving the editor:

/capture "FastAPI dependency injection with async generators requires yield,
not return. The generator is the dependency lifecycle."
  --domain programming
  --tags fastapi,dependency-injection

The captured insight is immediately indexed and available for future retrieval. Over months, these micro-captures build a corpus of implementation-specific knowledge.

Project Kickoff

When starting a new project or feature:

  1. Search the vault: “What do I know about [technology/pattern]?”
  2. Review the top 5 results for prior decisions and gotchas
  3. Check if a MOC exists for the domain; if not, create one
  4. Search for failure modes: “problems with [technology]”

When encountering an error or unexpected behavior:

Search my vault for [error message or symptom]

Prior debugging notes often contain the root cause and fix. This is particularly valuable for recurring issues across projects — the vault remembers what you forget.

Code Review Preparation

Before reviewing a PR:

Search my vault for patterns and conventions about [module being changed]

The vault returns prior decisions, architectural constraints, and coding standards relevant to the code under review. The review is informed by institutional knowledge, not just the diff.


Performance Tuning

This section covers optimization strategies for different vault sizes and usage patterns.

Index Size Management

Vault Size Chunks DB Size Full Reindex Incremental
500 notes ~1,500 3 MB 15 seconds <1 second
2,000 notes ~6,000 12 MB 45 seconds 2 seconds
5,000 notes ~15,000 30 MB 2 minutes 4 seconds
15,000 notes ~50,000 83 MB 4 minutes <10 seconds
50,000 notes ~150,000 250 MB 15 minutes 30 seconds

At 50,000+ notes, consider: - Increasing the batch size from 64 to 128 for faster embedding - Using WAL mode (default) for concurrent access - Running full reindex during off-hours

Query Optimization

WAL mode. SQLite’s Write-Ahead Logging mode enables concurrent reads while the indexer writes:

db.execute("PRAGMA journal_mode=WAL")

This is critical when the MCP server handles queries while the indexer runs an incremental update.

Connection pooling. The MCP server should reuse database connections rather than opening a new connection per query. A single long-lived connection with WAL mode supports concurrent reads.

# MCP server initialization
db = sqlite3.connect(DB_PATH, check_same_thread=False)
db.execute("PRAGMA journal_mode=WAL")
db.execute("PRAGMA mmap_size=268435456")  # 256 MB mmap

Memory-mapped I/O. The mmap_size pragma tells SQLite to use memory-mapped I/O for the database file. For an 83 MB database, mapping the entire file into memory eliminates most disk reads.

FTS5 optimization. After a full reindex, run:

INSERT INTO chunks_fts(chunks_fts) VALUES('optimize');

This merges FTS5’s internal b-tree segments, reducing query latency for subsequent searches.

Scaling Benchmarks

Measured on Apple M3 Pro, 36 GB RAM, NVMe SSD:

Operation 500 notes 5K notes 15K notes 50K notes
BM25 query 2ms 5ms 12ms 25ms
Vector query 1ms 3ms 8ms 20ms
RRF fusion <1ms <1ms 3ms 5ms
Full search 3ms 8ms 23ms 50ms

All benchmarks include database access, query execution, and result formatting. Network latency for MCP STDIO communication adds 1-2ms.


Troubleshooting

Index Drift

Symptom: Search returns stale results or misses recently added notes.

Cause: The incremental indexer did not run after adding notes, or a file’s mtime was not updated (e.g., synced from another machine with preserved timestamps).

Fix: Run a full reindex: python index_vault.py --full

Embedding Model Swap

Symptom: After changing the embedding model, vector search returns nonsensical results.

Cause: Old vectors (from the previous model) are being compared against new query vectors. The dimensions or vector space semantics are incompatible.

Fix: The indexer should detect the model hash mismatch and trigger a full reindex automatically. If it does not, manually clear the database and reindex:

rm vectors.db
python index_vault.py --full

FTS5 Maintenance

Symptom: FTS5 queries return incorrect or incomplete results after many incremental updates.

Cause: FTS5 internal segments may become fragmented after many small updates.

Fix: Rebuild and optimize:

INSERT INTO chunks_fts(chunks_fts) VALUES('rebuild');
INSERT INTO chunks_fts(chunks_fts) VALUES('optimize');

MCP Timeout

Symptom: AI tool reports that the MCP server timed out.

Cause: The first query triggers model loading (lazy initialization), which takes 2-5 seconds. The AI tool’s default MCP timeout may be shorter.

Fix: Pre-warm the model on server startup:

# In MCP server initialization
retriever = HybridRetriever(db_path, vault_path)
retriever.search("warmup", limit=1)  # Trigger model load

SQLite File Locks

Symptom: SQLITE_BUSY or SQLITE_LOCKED errors.

Cause: Multiple processes writing to the database simultaneously. WAL mode allows concurrent reads but only one writer.

Fix: Ensure only one process (the indexer) writes to the database. The MCP server and hooks should only read. If you need concurrent writes, use WAL mode and set a busy timeout:

db.execute("PRAGMA busy_timeout=5000")  # Wait up to 5 seconds

sqlite-vec Not Loading

Symptom: Vector search is disabled; retriever runs in BM25-only mode.

Cause: The sqlite-vec extension is not installed, not found in the library path, or incompatible with the SQLite version.

Fix:

# Install via pip
pip install sqlite-vec

# Or compile from source
git clone https://github.com/asg017/sqlite-vec
cd sqlite-vec && make

Verify the extension loads:

import sqlite3
db = sqlite3.connect(":memory:")
db.enable_load_extension(True)
db.load_extension("vec0")
print("sqlite-vec loaded successfully")

Large Vault Memory Issues

Symptom: Out-of-memory errors during full reindex of a large vault (50,000+ notes).

Cause: Embedding batch size is too large, or all file contents are loaded into memory simultaneously.

Fix: Reduce the batch size and process files incrementally:

BATCH_SIZE = 32  # Reduce from 64

Also ensure the indexer processes files one at a time (reading, chunking, and embedding each file before moving to the next) rather than loading all files into memory.


Migration Guide

From Apple Notes

  1. Export Apple Notes via the “Export All” option (macOS) or use a migration tool like apple-notes-liberator
  2. Convert HTML exports to markdown using markdownify or pandoc
  3. Move the converted files to your vault’s 00-inbox/ folder
  4. Review and add frontmatter to each note
  5. Move notes to appropriate domain folders

From Notion

  1. Export from Notion: Settings → Export → Markdown & CSV
  2. Unzip the export into your vault’s 00-inbox/ folder
  3. Fix Notion-specific markdown artifacts:
  4. Notion uses - [ ] for checklists — these are standard markdown
  5. Notion includes property tables as HTML — convert to YAML frontmatter
  6. Notion embeds images as relative paths — copy images to your attachments folder
  7. Add standard frontmatter (type, domain, tags)
  8. Replace Notion page links with Obsidian wiki-links

From Google Docs

  1. Use Google Takeout to export all documents
  2. Convert .docx files to markdown: pandoc -f docx -t markdown input.docx -o output.md
  3. Batch convert: for f in *.docx; do pandoc -f docx -t markdown "$f" -o "${f%.docx}.md"; done
  4. Move to vault, add frontmatter, organize into folders

From Plain Markdown (No Obsidian)

If you already have a directory of markdown files:

  1. Open the directory as an Obsidian vault (Obsidian → Open Vault → Open folder)
  2. Add .obsidian/ to .gitignore if the directory is version-controlled
  3. Create frontmatter templates and apply to existing files
  4. Start linking notes with [[wiki-links]] as you read and organize
  5. Run the indexer immediately — the retrieval system works on day one

From Another Retrieval System

If you are migrating from a different embedding/search system:

  1. Do not try to migrate vectors. Different models produce incompatible vector spaces. Run a full reindex with the new model.
  2. Migrate the content, not the index. The vault files are the source of truth. The index is a derived artifact.
  3. Verify after migration. Run 10-20 queries you know the answers to and verify the results match your expectations.

Changelog

Date Change
2026-03-01 Initial release

References


  1. Cormack, G.V., Clarke, C.L.A., and Buettcher, S. Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods. SIGIR, 2009. Introduces RRF with k=60 as a parameter-free method for combining ranked lists. 

  2. OpenAI Embeddings Pricing. text-embedding-3-small: $0.02 per million tokens. Estimated vault cost per full reindex: ~$0.30. 

  3. van Dongen, T. et al. Model2Vec: Turn any Sentence Transformer into a Small Fast Model. arXiv, 2025. Describes the distillation approach producing static embeddings from sentence transformers. 

  4. MTEB: Massive Text Embedding Benchmark. potion-base-8M scores 50.03 average vs 56.09 for all-MiniLM-L6-v2 (89% retention). 

  5. SQLite FTS5 Extension. FTS5 provides full-text search with BM25 ranking and configurable column weights. 

  6. sqlite-vec: A vector search SQLite extension. Provides vec0 virtual tables for KNN vector search within SQLite. 

  7. Robertson, S. and Zaragoza, H. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, 2009. 

  8. Karpukhin, V. et al. Dense Passage Retrieval for Open-Domain Question Answering. EMNLP, 2020. Dense representations outperform BM25 by 9-19% on open-domain QA. 

  9. Reimers, N. and Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. EMNLP, 2019. Foundational work on dense semantic similarity. 

  10. Luan, Y. et al. Sparse, Dense, and Attentional Representations for Text Retrieval. TACL, 2021. Hybrid retrieval consistently outperforms single-modality approaches on MS MARCO. 

  11. SQLite Write-Ahead Logging. WAL mode for concurrent reads with a single writer. 

  12. Gao, Y. et al. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv, 2024. Survey of RAG architectures and chunking strategies. 

  13. Thakur, N. et al. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. NeurIPS, 2021. 

  14. Model2Vec: Distill a Small Fast Model from any Sentence Transformer. Minish Lab, 2024. 

  15. Obsidian Documentation. Official documentation for Obsidian. 

  16. Model Context Protocol Specification. The MCP standard for connecting AI tools to data sources. 

  17. Author’s production data. 16,894 files, 49,746 chunks, 83.56 MB SQLite database, 7,771 signals processed across 14 months. Query latency measured via time.perf_counter()

VAULT obsidian-ai-infrastructure.md INDEXED