Why use Obsidian instead of Notion or Apple Notes for AI infrastructure?

Obsidian stores everything as local plaintext markdown files. Any tool that reads files can index and search your vault — no API required, no cloud dependency, no proprietary format. This local-first architecture means your retrieval pipeline runs as fast as your disk, not as fast as an API endpoint responds. Personal notes never leave your machine.

Why use hybrid retrieval (BM25 + vector search) instead of just one method?

BM25 catches exact identifiers and function names but misses synonyms. Vector search catches semantic matches across different terminology but may miss exact identifiers. Reciprocal Rank Fusion (RRF) merges both without requiring score calibration. Research on MS MARCO passage ranking confirms that hybrid retrieval consistently outperforms either method alone.

How does the Obsidian retriever connect to Claude Code and other AI tools?

Through Model Context Protocol (MCP) servers. The retriever runs as an MCP server that AI tools call directly. Claude Code, Codex CLI, and Cursor all support MCP. The agent queries the vault, receives ranked results with source attribution, and uses the context without loading entire files into the conversation.

What embedding model should I use for an Obsidian vault?

Model2Vec's potion-base-8M is the recommended default: 30 MB, 256 dimensions, CPU-only, numpy-only dependencies, and roughly 500x faster than transformer models in this guide's benchmark framing. Current Model2Vec tables put it at about 92% of all-MiniLM-L6-v2's all-task score. For short markdown chunks (200-400 words), the quality difference is minimal. Use a larger model only if retrieval quality on your specific queries is measurably inadequate.

How long does it take to index a large Obsidian vault?

A full reindex of 16,894 files (49,746 chunks) takes about 4 minutes on Apple M-series hardware. Incremental updates (only changed files) take under 10 seconds. The system uses file modification time comparison for change detection, so daily indexing is nearly instant.

obsidian:~/vault$ search --hybrid obsidian

 ██████╗ ██████╗ ███████╗██╗██████╗ ██╗ █████╗ ███╗   ██╗
██╔═══██╗██╔══██╗██╔════╝██║██╔══██╗██║██╔══██╗████╗  ██║
██║   ██║██████╔╝███████╗██║██║  ██║██║███████║██╔██╗ ██║
██║   ██║██╔══██╗╚════██║██║██║  ██║██║██╔══██║██║╚██╗██║
╚██████╔╝██████╔╝███████║██║██████╔╝██║██║  ██║██║ ╚████║
 ╚═════╝ ╚═════╝ ╚══════╝╚═╝╚═════╝ ╚═╝╚═╝  ╚═╝╚═╝  ╚═══╝

Obsidian MCP Guide: AI Search & Retrieval (2026)

# Wire Obsidian into Claude and other agents over MCP: server setup, hybrid BM25 + vector retrieval, and indexing a 16,894-file vault — with working configs.

words: 18845 read_time: 94m updated: 2026-07-07 00:00

$ retriever search --hybrid obsidian

Obsidian is not a note-taking app. It is a local-first, plaintext, graph-structured markdown corpus that becomes an AI context reservoir when you add retrieval infrastructure. 16,894 files. 49,746 chunks. 23ms queries. Zero API calls. One 83 MB SQLite file. This guide covers the complete system: from vault architecture to hybrid retrieval to MCP integration to operational workflows.

Key Takeaways

Context engineering, not note-taking. The value of an Obsidian vault for AI is not the notes themselves but the retrieval layer that makes them queryable. A 16,000-file vault without retrieval is a write-only database. A 200-file vault with hybrid search and MCP integration is an AI knowledge base. The retrieval infrastructure is the product. The notes are the raw material.

Hybrid retrieval beats pure keyword or pure semantic search. BM25 catches exact identifiers and function names. Vector search catches synonyms and conceptual matches across different terminology. Reciprocal Rank Fusion (RRF) merges both without requiring score calibration. Neither method alone covers both failure modes. Research on MS MARCO passage ranking confirms the pattern: hybrid retrieval consistently outperforms either method in isolation.³ The hybrid retriever deep dive covers the RRF math, worked examples with real numbers, failure mode analysis, and an interactive fusion calculator.

MCP gives AI tools direct vault access. Model Context Protocol (MCP) servers expose the retriever as a tool that Claude Code, Codex CLI, Cursor, and other AI tools can call directly. The agent queries the vault, receives ranked results with source attribution, and uses the context without loading entire files. The MCP server is a thin wrapper around the retrieval engine.

Local-first means zero API costs and full privacy. The entire stack runs on a single machine: SQLite for storage, Model2Vec for embeddings, FTS5 for keyword search, sqlite-vec for vector KNN. No cloud services, no API calls, no network dependency. Personal notes never leave the machine. The full re-embed of 49,746 chunks would cost roughly $0.30 at OpenAI API prices, but the real costs are latency, privacy exposure, and the network dependency for a system that should work offline.⁴

Incremental indexing keeps the system current in under 10 seconds. File modification time comparison detects changes. Only modified files are re-chunked and re-embedded. A full reindex takes about four minutes on Apple M-series hardware. Incremental updates on a typical day’s edits run in under ten seconds. The system stays current without manual intervention.

The architecture scales from 200 to 20,000+ notes. The same three-layer design (intake, retrieval, integration) works at any vault size. Start with BM25-only search over a small vault. Add vector search when keyword collisions become a problem. Add RRF fusion when you need both exact and semantic matches. Each layer is independently useful and independently removable.

How to Use This Guide

This guide covers the complete system. Your starting point depends on where you are:

You are…	Start here	Then explore
New to Obsidian + AI	Why Obsidian for AI Infrastructure, Obsidian MCP Setup	Vault Architecture, MCP Server Architecture
Existing vault, want AI access	MCP Server Architecture, Claude Code Integration	Embedding Models, Full-Text Search
Building a retrieval system	The Complete Retrieval Pipeline, Reciprocal Rank Fusion	Performance Tuning, Troubleshooting
Team or enterprise context	Decision Framework, Knowledge Graph Patterns	Developer Workflow Recipes, Migration Guide

Sections marked Contract include implementation details, configuration blocks, and failure modes. Sections marked Narrative focus on concepts, architecture decisions, and the reasoning behind design choices. Sections marked Recipe provide step-by-step workflows.

Why Obsidian for AI Infrastructure

The thesis of this guide: Obsidian vaults are the best substrate for personal AI knowledge bases because they are local-first, plaintext, graph-structured, and the user controls every layer of the stack.

What Obsidian gives AI that alternatives do not

Plaintext markdown files. Every note is a .md file on your filesystem. No proprietary format, no database export, no API required to read the content. Any tool that reads files can read your vault. grep, ripgrep, Python’s pathlib, SQLite FTS5 — they all work directly on the source files. When you build a retrieval system, you are indexing files, not API responses. The index is always consistent with the source because the source is the file system.

Local-first architecture. The vault lives on your machine. No server, no cloud sync dependency, no API rate limits, no terms of service governing how you process your own content. You can embed, index, chunk, and search your notes without any external service. This matters for AI infrastructure because the retrieval pipeline runs as fast as your disk allows, not as fast as an API endpoint responds. It also matters for privacy: personal notes containing credentials, health data, financial information, and private reflections never leave your machine.

Graph structure through wiki-links. Obsidian’s [[wiki-link]] syntax creates a directed graph across notes. A note about OAuth implementation links to notes about token rotation, session management, and API security. The graph structure encodes human-curated relationships between concepts. Vector embeddings capture semantic similarity, but wiki-links capture intentional connections that the author made while thinking about the topic. The graph is a signal that embeddings cannot replicate.

Plugin ecosystem. Obsidian has 2,500+ community plugins (as of March 2026, up from 1,800+ in mid-2025). Dataview queries your vault like a database. Templater generates notes from templates with JavaScript logic. Git integration syncs your vault to a repository. Linter enforces formatting consistency. The Bases core plugin (introduced in v1.9.10) adds database-like views — tables, galleries, calendars, and kanban boards — over vault files using frontmatter properties as fields, saved as .base files.¹⁵ These plugins add structure to the vault without changing the underlying plaintext format. The retrieval system indexes the output of these plugins, not the plugins themselves.

5 million+ users. Obsidian has a large active community producing templates, workflows, plugins, and documentation. When you encounter a problem with vault organization or plugin configuration, someone has likely documented a solution. The community also produces Obsidian-adjacent tools: MCP servers, indexing scripts, publishing pipelines, and API wrappers.

What a filesystem alone does not give you

A directory of markdown files has the plaintext advantage but lacks three things that Obsidian adds:

Bidirectional links. Obsidian tracks backlinks automatically. When you link from Note A to Note B, Note B shows that Note A references it. The graph panel visualizes connection clusters. This bidirectional awareness is metadata that a raw filesystem does not provide.
Live preview with plugin rendering. Dataview queries, Mermaid diagrams, and callout blocks render in real-time. The writing experience is richer than a text editor while the storage format remains plaintext. You write and organize in a rich environment; the retrieval system indexes the raw markdown.
Community infrastructure. Plugin discovery, theme marketplace, sync service (optional), publish service (optional), and a documentation ecosystem. You can replicate any individual feature with standalone tools, but Obsidian packages them into a coherent workflow.

What Obsidian does NOT do (and what you build)

Obsidian does not include retrieval infrastructure. It has basic search (full-text, filename, tag) but no embedding pipeline, no vector search, no fusion ranking, no MCP server, no credential filtering, no chunking strategy, and no integration hooks for external AI tools. This guide covers the infrastructure you build on top of Obsidian. The vault is the substrate. The retrieval pipeline, the MCP server, and the integration hooks are the infrastructure.

The architecture described here is markdown-first, not Obsidian-exclusive. If you use Logseq, Foam, Dendron, or a plain directory of markdown files, the retrieval pipeline works identically. The chunker reads .md files. The embedder processes text strings. The indexer writes to SQLite. None of these components depend on Obsidian-specific features. Obsidian’s contribution is the writing and organizational environment that produces the markdown files the retriever indexes.

Obsidian MCP Setup

Model Context Protocol (MCP) is the standard interface that gives Claude Code, Codex CLI, Cursor, and other AI tools direct access to an Obsidian vault. This section gets a vault connected to an AI tool in five minutes. You will install Obsidian, create a vault, install an MCP server, and run your first query. The quick start uses a community MCP server for immediate results. Later sections cover building a custom retrieval pipeline for production use.

Prerequisites

macOS, Linux, or Windows
Node.js 18+ (for MCP server)
Obsidian 1.12+ (for CLI integration; 1.13.1 is the current public desktop release; earlier versions work for MCP-only setups)
Claude Code, Codex CLI, or Cursor installed

Step 1: Create a vault

Download Obsidian from obsidian.md and create a new vault. Choose a location you will remember — the MCP server needs the absolute path.

# Example vault location
~/Documents/knowledge-base/

Add a few notes to give the retriever something to work with. Even 10-20 notes are enough to see results. Each note should be a .md file with a meaningful title and at least one paragraph of content.

Step 2: Install an MCP server

Several community MCP servers provide immediate vault access. The ecosystem has grown significantly through 2025-2026. A notable one is MCPVault (npm @bitbonsai/mcpvault, repo bitbonsai/mcpvault), now at v0.12.1 — a separate project from MarkusPfundstein/mcp-obsidian below, not a rename of it. Its v0.11.0 (March 2026) added list_all_tags for scanning frontmatter and hashtags with counts, improved dotted-folder handling, and .base/.canvas support. Two medium-severity advisories (GHSA-9c83-rr99-vfwj and GHSA-j99q-93c9-h869) were disclosed against its path-filter restricted-directory deny-list, so run a current release.¹³

April 2026 shift — Obsidian CLI as the preferred bridge: Obsidian 1.12.0 introduced the first-class CLI, and the public 1.12.7 installer (March 23, 2026) bundled the standalone binary + TUI + socket-file improvements that made terminal workflows easier to install and run.¹⁶ The current public desktop release, 1.13.1 (public channel, June 9, 2026), is a version-currency bump over 1.13.0 — settings-UX refinements and a CodeMirror upgrade — with no new AI/automation capabilities beyond the 1.12.x CLI surface.²⁵²⁶ Community tooling is actively migrating from the Local REST API plugin (which powered mcp-obsidian) to CLI-based integration because it’s faster and more stable. The MarkusPfundstein/mcp-obsidian repo is still maintained — commits through May 2026 added tools including search_by_tag and get_frontmatter — though it ships no tagged releases (install from a pinned commit). It remains Local-REST-API-based; for new setups the CLI bridge is generally faster and more stable, so prefer it or the newer community alternatives listed below.²⁰ See the “Obsidian CLI for AI Workflows” section later in this guide for the recommended setup.

Server	Author	Transport	Requires Plugin	Key Feature
obsidian-mcp-server	StevenStavrakis	STDIO	No	Lightweight, file-based
mcp-obsidian	MarkusPfundstein	STDIO	Local REST API	Full vault CRUD via REST, plus `search_by_tag`/`get_frontmatter` — actively maintained (commits through May 2026); no tagged releases, pin a commit²⁰
obsidian-mcp-tools	jacksteamdev	STDIO	Yes (plugin)	Semantic search + Templater
obsidian-claude-code-mcp	iansinnott	WebSocket	Yes (plugin)	Auto-discovery for Claude Code
obsidian-mcp-server	cyanheads	STDIO	Local REST API	Tags, frontmatter management
Hybrid Search MCP	community	STDIO	No	BM25 + semantic search MCP server + CLI. New and actively maintained as of April 2026.

For the quick start, the simplest option is a file-based server that reads .md files directly:

npm install -g obsidian-mcp-server

Step 3: Configure your AI tool

Claude Code — add to ~/.claude/settings.json:

{
  "mcpServers": {
    "obsidian": {
      "command": "obsidian-mcp-server",
      "args": ["--vault", "/absolute/path/to/your/vault"]
    }
  }
}

Codex CLI — add to .codex/config.toml:

[mcp_servers.obsidian]
command = "obsidian-mcp-server"
args = ["--vault", "/absolute/path/to/your/vault"]

Cursor — add to .cursor/mcp.json:

{
  "mcpServers": {
    "obsidian": {
      "command": "obsidian-mcp-server",
      "args": ["--vault", "/absolute/path/to/your/vault"]
    }
  }
}

Step 4: Run your first query

Open your AI tool and ask a question that your vault notes can answer:

Search my Obsidian vault for notes about [topic you wrote about]

The AI tool calls the MCP server, which searches your vault and returns matching content. You should see results with file paths and relevant excerpts.

What Claude can do once connected

Exact tool names vary by server, but the core capability surface is consistent across implementations:

Capability	Typical tool	What the agent does with it
Search the vault	`obsidian_search` / `search`	Finds notes matching a query and returns ranked excerpts with file paths and source attribution
Read a full note	`obsidian_read_note` / `read_note`	Pulls complete note content when a search excerpt is not enough
List and browse	`obsidian_list_notes` / `list_notes`	Explores notes by folder, tag, or date range when there is no specific query
Get formatted context	`obsidian_get_context`	Returns a topic-shaped context block sized to a token budget, ready for injection into the conversation

In practice: Claude answers questions from your notes with source attribution, pulls prior decisions and reference material into coding sessions, and explores vault structure without loading entire files into context. Some community servers also expose write operations (create, append, tag and frontmatter management); the custom server built later in this guide is deliberately read-only, with note creation handled by hooks instead.

Deep dives: MCP Server Architecture for tool and permission design, Claude Code Integration for hooks and the bridge pattern, Codex CLI Integration and Cursor and Other Tools for other agents.

What you just built

You connected a local knowledge base to an AI tool through a standard protocol. The MCP server reads your vault files, performs basic search, and returns results. This is the minimal viable version.

What this quick start does NOT give you: - Hybrid retrieval (BM25 + vector search + RRF fusion) - Embedding-based semantic search - Credential filtering - Incremental indexing - Hook-based automatic context injection

The rest of this guide covers building each of these capabilities. The quick start proves the concept. The full pipeline delivers production-quality retrieval.

Obsidian CLI for AI Workflows

Obsidian 1.12 (February 2026) introduced a built-in command line interface that opens a new integration surface for AI workflows; it remains current through the 1.13.1 public desktop release (public channel, June 9, 2026), a settings-UX + CodeMirror version bump with no new CLI capabilities.¹⁶²⁵²⁶ The CLI acts as a remote control for the Obsidian GUI — Obsidian must be running (or will launch automatically on first command). Enable it in Settings > General > Command line interface.

Why the CLI matters for AI infrastructure

The CLI provides programmatic access to Obsidian-native operations that previously required the GUI or plugin APIs. For AI workflows, the key capabilities are:

Search from scripts and hooks. obsidian search "query" and obsidian search:context "query" run vault searches from any shell script, hook, or automation pipeline. The search:context variant returns matching lines with surrounding context, useful for feeding results into AI prompts.
Daily notes automation. obsidian daily opens or creates today’s daily note. Combined with shell scripting, this enables automated daily briefing workflows — a hook can append AI-generated summaries to the daily note.
Template-based note creation. obsidian template list and obsidian template create generate notes from Templater or core templates, enabling AI agents to create structured vault entries without directly writing markdown files.
Property management. obsidian property set and obsidian property get read and write frontmatter properties, enabling metadata updates from scripts without parsing YAML.
Plugin control. obsidian plugin enable/disable/list manages plugins programmatically, useful for toggling indexing plugins during batch operations.
Task management. obsidian task list/add/complete provides structured task access, useful for AI agents that manage work items in the vault.

CLI vs MCP for AI access

The CLI and MCP servers serve different roles and are complementary, not competing:

Aspect	Obsidian CLI	MCP Server
Caller	Shell scripts, hooks, cron jobs	AI agents (Claude Code, Codex, Cursor)
Protocol	POSIX process (stdin/stdout/stderr)	MCP (JSON-RPC over STDIO or HTTP)
Strength	Obsidian-native operations (templates, plugins, properties)	Custom retrieval (embeddings, BM25, RRF fusion)
Limitation	No vector search, no embedding pipeline	No access to Obsidian-internal operations
Best for	Automation scripts, intake pipelines, hook actions	Real-time AI agent queries during sessions

Recommendation: Use the CLI for intake automation (creating notes, managing properties, running Obsidian-native search) and MCP for retrieval (hybrid search with embeddings). A PreToolUse hook can call obsidian search:context as a fast pre-check before falling back to the full MCP retriever for ranked results.

Example: CLI-powered intake hook

#!/bin/bash
# Hook: append today's signals to daily note via CLI
DATE=$(date +%Y-%m-%d)
SUMMARY="$1"
obsidian daily  # ensure daily note exists
obsidian file append "Daily Notes/${DATE}.md" "## AI Summary\n${SUMMARY}"

Obsidian Agent Plugins

A growing category of Obsidian plugins embeds AI coding agents directly in the vault UI, providing an alternative to external MCP server configuration. These plugins run the AI agent inside Obsidian’s sidebar rather than connecting from an external tool.

Claudian

Claudian embeds Claude Code as an AI collaborator in the vault. The vault directory becomes Claude’s working directory, giving it full agentic capabilities: file read/write, search, bash commands, and multi-step workflows.¹⁷

Key features for AI infrastructure: - Context-aware prompts. Automatically attaches the focused note, supports @notename file mentions, tag-based exclusion, and editor selection as context. - Vision support. Analyze images via drag-and-drop, paste, or file path — useful for processing screenshots and diagrams captured in the vault. - Slash commands. Create reusable prompt templates triggered by /command, enabling standardized vault operations. - Permission modes. YOLO (auto-approve), Safe (approve each action), and Plan (plan-only) modes with a safety blocklist and vault confinement.

Agent Client

Agent Client brings Claude Code, Codex CLI, and Gemini CLI into a unified Obsidian sidebar via the Agent Client Protocol (ACP).¹⁸

Key features: - Multi-agent switching. Chat with Claude Code, Codex, or Gemini CLI from the same panel, switching between agents as needed. - Note mentions. Use @notename to include note contents in prompts, similar to Claudian but agent-agnostic. - Shell execution. Execute terminal commands inline in the chat — build scripts, git commands, or any terminal operation without leaving the conversation. - Action approval. Fine-grained control over file reads, edits, and command executions.

When to use agent plugins vs external MCP

Scenario	Agent plugin	External MCP
Writing and editing vault notes with AI assistance	Better — agent sees the editor context	Works but no editor awareness
Code development across multiple repos	Limited — vault-scoped	Better — project-scoped with full filesystem
Retrieval from a large indexed corpus	Basic search only	Full hybrid retrieval pipeline
Quick vault Q&A during note-taking sessions	Ideal — no context switching	Requires switching to terminal

Recommendation: Use agent plugins for vault-centric workflows (writing, organizing, summarizing notes). Use external MCP servers for development workflows where the AI agent needs the full retrieval pipeline and access to codebases outside the vault. The two approaches can coexist — run Claudian inside Obsidian for note work and Claude Code with MCP externally for development.

Decision Framework: Obsidian vs Alternatives

Not every use case needs Obsidian. This section maps when Obsidian is the right substrate, when it is overkill, and when something else fits better.

Decision Tree

START: What is your primary content type?
│
├─ Structured data (tables, records, schemas)
│  → Use a database. SQLite, PostgreSQL, or a spreadsheet.
│  → Obsidian is for prose, not tabular data.
│
├─ Ephemeral context (current project, temporary notes)
│  → Use CLAUDE.md / AGENTS.md in the project repo.
│  → These travel with the code and reset per project.
│
├─ Team wiki (shared documentation, onboarding)
│  → Evaluate Notion, Confluence, or a shared git repo.
│  → Obsidian vaults are personal-first. Team sync is possible
│    but not native.
│
└─ Growing personal knowledge corpus
   │
   ├─ < 50 notes
   │  → A folder of markdown files + grep is sufficient.
   │  → Obsidian adds value mainly through the link graph,
   │    which needs density to be useful.
   │
   ├─ 50 - 500 notes
   │  → Obsidian adds value. Wiki-links create a navigable graph.
   │  → BM25-only search (FTS5) is sufficient at this scale.
   │  → Skip vector search and RRF until keyword collisions appear.
   │
   ├─ 500 - 5,000 notes
   │  → Full hybrid retrieval becomes valuable. Keyword collisions
   │    increase. Semantic search catches queries that BM25 misses.
   │  → Add vector search + RRF fusion at this scale.
   │
   └─ 5,000+ notes
      → Full pipeline is essential. BM25-only returns too much noise.
      → Credential filtering becomes critical (more notes = more
        accidentally pasted secrets).
      → Incremental indexing matters (full reindex takes minutes).
      → MCP integration pays dividends on every AI interaction.

Comparison Matrix

Criterion	Obsidian	Notion	Apple Notes	Plain Filesystem	CLAUDE.md
Local-first	Yes	No (cloud)	Partial (iCloud)	Yes	Yes
Plaintext	Yes (markdown)	No (blocks)	No (proprietary)	Yes	Yes
Graph structure	Yes (wiki-links)	Partial (mentions)	No	No	No
AI indexable	Direct file access	API required	Export required	Direct file access	Already in context
Plugin ecosystem	2,500+ plugins	Integrations	None	N/A	N/A
Offline capable	Full	Read-only cached	Partial	Full	Full
Scales to 10K+ notes	Yes	Yes (with API)	Degrades	Yes	No (single file)
Cost	Free (core)	$10/mo+	Free	Free	Free

When Obsidian is overkill

Single-project context. If the AI only needs context about the current codebase, put it in CLAUDE.md, AGENTS.md, or project-level documentation. These files travel with the repo and are automatically loaded.
Structured data. If the content is tables, records, or schemas, use a database. Obsidian notes are prose-first. Dataview can query frontmatter fields, but a real database handles structured queries better.
Temporary research. If the notes will be discarded after the project ends, a scratch directory with markdown files is simpler. Do not build retrieval infrastructure for ephemeral content.

When Obsidian is the right choice

Accumulating knowledge over months or years. The value compounds as the corpus grows. A 200-note vault queried daily for six months provides more value than a 5,000-note vault queried once.
Multiple domains in one corpus. A vault containing notes on programming, architecture, security, design, and personal projects benefits from cross-domain retrieval that a project-specific CLAUDE.md cannot provide.
Privacy-sensitive content. Local-first means the retrieval pipeline never sends content to external services. The vault contains whatever you put in it, including content you would not upload to a cloud service.

Mental Model: Three Layers

The system has three layers that operate independently but compound when combined. Each layer has a different concern and a different failure mode.

┌─────────────────────────────────────────────────────┐
│                 INTEGRATION LAYER                     │
│  MCP servers, hooks, skills, context injection        │
│  Concern: delivering context to AI tools              │
│  Failure: wrong context, too much context, stale      │
└──────────────────────┬──────────────────────────────┘
                       │ query + ranked results
┌──────────────────────┴──────────────────────────────┐
│                  RETRIEVAL LAYER                      │
│  BM25, vector KNN, RRF fusion, token budget           │
│  Concern: finding the right content for any query     │
│  Failure: wrong ranking, missed results, slow queries │
└──────────────────────┬──────────────────────────────┘
                       │ chunked, embedded, indexed
┌──────────────────────┴──────────────────────────────┐
│                   INTAKE LAYER                        │
│  Note creation, signal triage, vault organization     │
│  Concern: what enters the vault and how it's stored   │
│  Failure: noise, duplicates, missing structure        │
└─────────────────────────────────────────────────────┘

Intake determines what enters the vault. Without curation, the vault accumulates noise: screenshots of tweets, copy-pasted articles with no annotation, half-finished thoughts with no context. The intake layer is responsible for quality control at the point of entry. A scoring pipeline, tagging convention, or manual review process — any mechanism that ensures the vault contains content worth retrieving.

Retrieval makes the vault queryable. This is the engine: chunking notes into search units, embedding chunks into vector space, indexing for keyword and semantic search, fusing results with RRF. The retrieval layer transforms a directory of files into a queryable knowledge base. Without this layer, the vault is navigable through manual browsing and basic search but not programmatically accessible to AI tools.

Integration connects the retrieval layer to AI tools. An MCP server exposes retrieval as a callable tool. Hooks inject context automatically. Skills capture new knowledge back into the vault. The integration layer is the interface between the knowledge base and the AI agents that consume it.

The layers are decoupled by design. The intake scoring pipeline knows nothing about embeddings. The retriever knows nothing about signal routing rules. The MCP server knows nothing about how notes were created. This decoupling means you can improve any layer independently. Replace the embedding model without changing the intake pipeline. Add a new MCP capability without modifying the retriever. Change the signal scoring heuristics without touching the index.

Vault Architecture for AI Consumption

A vault optimized for AI retrieval follows different conventions than a vault optimized for personal browsing. This section covers folder structure, note schema, frontmatter conventions, and the specific patterns that improve retrieval quality.

Folder Structure

Use numbered prefixes for top-level folders to create a predictable organizational hierarchy. The numbers do not imply priority — they group related domains and make the structure scannable.

vault/
├── 00-inbox/              # Unsorted captures, pending triage
├── 01-projects/           # Active project notes
├── 02-areas/              # Ongoing areas of responsibility
├── 03-resources/          # Reference material by topic
│   ├── programming/
│   ├── security/
│   ├── ai-engineering/
│   ├── design/
│   └── devops/
├── 04-archive/            # Completed projects, old references
├── 05-signals/            # Scored signal intake
│   ├── ai-tooling/
│   ├── security/
│   ├── systems/
│   └── ...12 domain folders
├── 06-daily/              # Daily notes (if used)
├── 07-templates/          # Note templates (excluded from index)
├── 08-attachments/        # Images, PDFs (excluded from index)
├── .obsidian/             # Obsidian config (excluded from index)
└── .indexignore            # Paths to exclude from retrieval index

Folders that should be indexed: Everything containing markdown prose — projects, areas, resources, signals, daily notes.

Folders that should be excluded from indexing: Templates (they contain placeholder variables, not content), attachments (binary files), Obsidian configuration, and any folder containing sensitive content you do not want in the retrieval index.

The `.indexignore` file

Create a .indexignore file at the vault root to explicitly exclude paths from the retrieval index. The syntax matches .gitignore:

# Obsidian internal
.obsidian/

# Templates contain placeholders, not content
07-templates/

# Binary attachments
08-attachments/

# Personal health/medical notes
02-areas/health/

# Financial records
02-areas/finance/personal/

# Career documents (resumes, salary data)
02-areas/career/private/

The indexer reads this file before scanning and skips matching paths entirely. Files in excluded paths are never chunked, never embedded, and never appear in search results.

Note Schema

Every note should have YAML frontmatter. The retriever uses frontmatter fields for filtering and context enrichment:

---
title: "OAuth Token Rotation Patterns"
type: note           # note | signal | project | moc | daily
domain: security     # primary domain for routing
tags:
  - authentication
  - oauth
  - token-management
created: 2026-01-15
updated: 2026-02-28
source: ""           # URL if captured from external source
status: active       # active | archived | draft
---

Required fields for retrieval:

title — Used in search result display and heading context for BM25
type — Enables type-filtered queries (“show me only MOCs” or “only signals”)
tags — Indexed in FTS5 heading context with 0.3 weight, providing keyword matches even when the body uses different terminology

Optional but valuable fields:

domain — Enables domain-scoped queries (“search security notes only”)
source — Attribution for captured content; the retriever can include source URLs in results
status — Allows excluding archived or draft notes from active search

Chunking Conventions

The retriever chunks at H2 (##) heading boundaries. This means your note structure directly affects retrieval granularity:

Good for retrieval:

## Token Rotation Strategy

The rotation interval depends on the threat model...

## Implementation with refresh_token

The OAuth 2.0 refresh token flow requires...

## Error Handling: Expired Tokens

When a token expires mid-request...

Three H2 sections produce three independently searchable chunks. Each chunk has enough context for the embedding to capture its meaning. A query about “expired token handling” matches the third chunk specifically.

Poor for retrieval:

# OAuth Notes

Token rotation depends on threat model. The OAuth 2.0 refresh
token flow requires storing the refresh token securely. When a
token expires mid-request, the client should retry after refresh.
The rotation interval is typically 15-30 minutes for access tokens
and 7-30 days for refresh tokens...

One long section with no H2 headings produces one large chunk. The embedding averages across all topics in the section. A query about any subtopic matches the entire note equally.

Rule of thumb: If a section covers more than one concept, split it into H2 subsections. The chunker handles the rest.

What Not to Put in Notes

Content that degrades retrieval quality:

Raw copy-pastes of entire articles without annotation. The retriever indexes the original article’s keywords, diluting your vault with content you did not write. Add a summary, extract key points, or link to the source URL instead.
Screenshots without text description. The retriever indexes markdown text. An image without alt text or surrounding description is invisible to both BM25 and vector search.
Credential strings. API keys, tokens, passwords, connection strings. Even with credential filtering, the safest approach is to never paste secrets into notes. Reference them by name (“the Cloudflare API token in ~/.env”) instead.
Auto-generated content without curation. If a tool generates a note (meeting transcript, Readwise highlights, RSS import), review and annotate it before it enters the permanent vault. Uncurated auto-imports add volume without adding retrievable value.

Plugin Ecosystem for AI Workflows

Obsidian plugins that improve vault quality for AI retrieval fall into three categories: structural (enforce consistency), querying (expose metadata), and sync (keep the vault current).

Essential Plugins

Dataview. Queries your vault like a database using frontmatter fields. Create dynamic indexes: “all notes tagged security updated in the last 30 days” or “all project notes with status active.” Dataview does not directly help retrieval, but it helps you identify gaps in your vault’s coverage and find notes that need updating.

TABLE type, domain, updated
FROM "03-resources"
WHERE status = "active"
SORT updated DESC
LIMIT 20

Templater. Creates notes from templates with dynamic fields. Ensure every new note starts with correct frontmatter by using a template that pre-fills created, type, and domain fields. Consistent frontmatter improves retrieval filtering.

<%* /* New Resource Note Template */ %>
---
title: "<% tp.file.cursor() %>"
type: note
domain: <% tp.system.suggester(["programming", "security", "ai-engineering", "design", "devops"], ["programming", "security", "ai-engineering", "design", "devops"]) %>
tags: []
created: <% tp.date.now("YYYY-MM-DD") %>
updated: <% tp.date.now("YYYY-MM-DD") %>
source: ""
status: active
---

## Key Points

## Details

## References

Linter. Enforces formatting rules across the vault. Consistent heading hierarchy (H1 for title, H2 for sections, H3 for subsections) ensures the chunker produces predictable results. Linter rules that matter for retrieval:

Heading increment: enforce sequential heading levels (no jumping from H1 to H3)
YAML title: match the filename
Trailing spaces: remove (avoids FTS5 tokenization artifacts)
Consecutive blank lines: limit to 1 (cleaner chunks)

Git integration. Version control for your vault. Track changes over time, sync between machines, and recover from accidental deletions. Git also provides mtime data that the indexer uses for incremental change detection.

Plugins That Help Indexing

Smart Connections. An Obsidian plugin that provides AI-powered semantic search within Obsidian itself. Smart Connections v4 creates local embeddings by default — once your vault is indexed, semantic connections and lookup work entirely offline with no API calls.¹¹ v4.5.0 (May 5, 2026) makes footer connections part of Smart Connections Core, so every install can show related-note connections in the footer without opening a side panel. Recent v4 releases also added graph views for connection lists, configurable dock locations, improved block-embedding recovery after interrupted indexing runs, and “Substrate,” a cross-plugin environment letting Smart Connections, Smart Chat, and Smart Composer share state.²¹ While the retrieval system in this guide is external to Obsidian (runs as a Python pipeline), Smart Connections is useful for exploring semantic relationships while writing. The two systems index the same content but serve different use cases: Smart Connections for in-editor discovery, the external retriever for AI tool integration via MCP.

AI-native plugins shipping in April 2026. A wave of new community plugins targets the Claude Code / Codex / Gemini-CLI workflow directly:

Plugin	Released	What it does
Cortex	April 4	Vault agent powered by Claude Code — treats the vault as an agent workspace, not just a note store
VaultSearch	April 7	Local-first hybrid search: BM25 + semantic + fuzzy (direct overlap with this guide’s retrieval stack)
LLM Wiki	April 9	Turn your vault into a privately-queryable knowledge base
Drift	April 11	VS Code-style diff viewer for AI-powered Obsidian editing; positioned for Claude Code workflows
EngramQuest	April 11	Generates memory challenges from notes; ships “AI Skills” for Claude Code / Gemini CLI / Cursor
Hybrid Search MCP	March (still new)	MCP server + CLI with BM25 + semantic search — purpose-built for AI assistants

Treat this as emergent surface area — several of these will likely consolidate or be absorbed into Smart Connections / Obsidian core over the next few quarters. If you’re picking one today, VaultSearch and Hybrid Search MCP are the closest in philosophy to this guide’s external retriever.

Dataview note: Dataview (the long-standing Obsidian query plugin) last released 0.5.70 in April 2025 and has been effectively dormant since. For new work, Obsidian’s built-in Bases feature (1.9+) is the implicit successor and the recommended path.

Metadata Menu. Provides structured frontmatter editing with autocomplete for field values. Reduces typos in type, domain, and tags fields. Consistent metadata improves retrieval filtering accuracy.

Plugins That Hurt Indexing

Excalidraw. Stores drawings as JSON embedded in markdown files. The JSON is syntactically valid markdown but produces garbage when chunked and embedded. Exclude Excalidraw files from the index via .indexignore or filter by file extension.

Kanban. Stores board state as specially-formatted markdown. The format is designed for Kanban rendering, not for prose retrieval. The chunker produces fragments of card titles and metadata that do not embed well. Exclude Kanban boards from the index.

Calendar. Creates daily notes with minimal content (often just a date header). Empty or near-empty notes produce low-quality chunks. If you use daily notes, write substantive content in them or exclude the daily notes folder from the index.

Plugin Configuration That Matters

File recovery → Enabled. Protects against accidental note deletion. Not directly related to retrieval but critical for a knowledge base you depend on.

Strict line breaks → Disabled. Markdown-standard line breaks (double newline for paragraph) produce cleaner chunks than Obsidian’s strict mode (single newline for <br>).

Default new file location → Designated folder. Route new files to 00-inbox/ so uncategorized notes do not pollute domain folders. The inbox is a staging area; files move to domain folders after triage.

Wiki-link format → Shortest path when possible. Shorter link targets are easier for the retriever to resolve when indexing link structure.

Embedding Models: Choosing and Configuring

The embedding model converts text chunks into numerical vectors for semantic search. The model choice determines retrieval quality, index size, embedding speed, and runtime dependencies. This section explains why Model2Vec’s potion-base-8M is the default choice and when to choose alternatives.

Why Model2Vec potion-base-8M

Model: minishlab/potion-base-8M Parameters: 7.6 million Dimensions: 256 Size: ~30 MB Dependencies: model2vec (numpy only, no PyTorch) Inference: CPU-only, static word embeddings (no attention layers)

Model2Vec distills a sentence transformer’s knowledge into static token embeddings. Instead of running attention layers over the input (as BERT, MiniLM, and other transformer models do), Model2Vec produces vectors through weighted averaging of pre-computed token embeddings.⁵ The practical consequence: embedding speed is 50-500x faster than transformer-based models because there is no sequential computation.

On the current Model2Vec results page, potion-base-8M reaches about 92% of all-MiniLM-L6-v2’s all-task score (51.32 vs 55.80) while staying orders of magnitude faster.⁶ The remaining quality gap is the trade-off for the speed and simplicity advantages. For short markdown chunks (average 200-400 words in a typical vault), the quality difference is less pronounced than on longer documents because both models converge on similar representations for short, focused text.

Configuration

# embedder.py
DEFAULT_MODEL = "minishlab/potion-base-8M"
EMBEDDING_DIM = 256

class Model2VecEmbedder:
    def __init__(self, model_name=DEFAULT_MODEL):
        self._model_name = model_name
        self._model = None

    def _ensure_model(self):
        if self._model is not None:
            return
        _activate_venv()  # Add isolated venv to sys.path
        from model2vec import StaticModel
        self._model = StaticModel.from_pretrained(self._model_name)

    def embed_batch(self, texts):
        self._ensure_model()
        vecs = self._model.encode(texts)
        return [v.tolist() for v in vecs]

Lazy loading. The model loads on first use, not at import time. Importing the embedder module costs nothing when the retriever operates in BM25-only fallback mode (e.g., when the embedding venv is not installed).

Isolated virtual environment. The model runs in a dedicated venv (e.g., ~/.claude/venvs/memory/) to avoid dependency conflicts with the rest of the toolchain. The _activate_venv() function adds the venv’s site-packages to sys.path at runtime.

# Create isolated venv
python3 -m venv ~/.claude/venvs/memory
~/.claude/venvs/memory/bin/pip install model2vec

Batch processing. The embedder processes texts in batches of 64 to amortize Model2Vec’s overhead. The indexer feeds chunks to embed_batch() rather than embedding one chunk at a time.

When to Choose Alternatives

Model	Dim	Size	Speed	Quality (MTEB)	Best for
potion-base-8M	256	30 MB	500x	51.32	Default: local, fast, no GPU
potion-base-32M	256	120 MB	400x	52.83	Higher quality, still static
potion-retrieval-32M	256	120 MB	400x	35.06 (retrieval)	Retrieval-optimized static
potion-multilingual-128M	256	~500 MB	300x	—	Multilingual vaults (101 languages)
all-MiniLM-L6-v2	384	80 MB	1x	55.80	Higher quality, still local
nomic-embed-text-v1.5	768	270 MB	0.5x	62.28	Best local quality
text-embedding-3-small	1536	API	N/A	62.30	API-based, highest quality

Choose potion-base-32M when you want better quality than potion-base-8M without leaving the static embedding family. It uses a larger vocabulary distilled from baai/bge-base-en-v1.5, achieving a 52.83 all-task score (about 3% higher than potion-base-8M) while keeping the same 256-dimensional output and numpy-only dependency.⁸ The 4x larger model file increases memory usage but embedding speed remains orders of magnitude faster than transformer models.

Choose potion-retrieval-32M when your primary use case is retrieval (which vault search is). This variant is fine-tuned from potion-base-32M specifically for retrieval tasks, scoring 35.06 on Model2Vec’s retrieval benchmark table versus 32.67 for potion-base-32M.⁸ The trade-off is that it is optimized for retrieval rather than general-purpose embedding quality.

Choose potion-multilingual-128M when your vault contains notes in multiple languages. Released May 2025, this 101-language model is the best performing static embedding model for multilingual tasks, generating embeddings for any text in any language while maintaining the same numpy-only dependency as other potion models.¹² The larger model file (~500 MB) is the trade-off for cross-lingual capability. Use this when you have notes in Japanese, Chinese, German, or other non-English languages alongside English content.

Choose all-MiniLM-L6-v2 when retrieval quality matters more than speed and you have PyTorch installed. The 384-dimensional vectors increase the SQLite database size by ~50% compared to 256-dim vectors. Embedding speed drops from <1 minute to ~10 minutes for a full reindex of 15,000 files on M-series hardware.

Choose nomic-embed-text-v1.5 when you need the best possible local retrieval quality and accept slower indexing. The 768-dimensional vectors roughly triple the database size. Requires PyTorch and a modern CPU or GPU.

Choose text-embedding-3-small when network latency and privacy are acceptable trade-offs. The API produces the highest-quality embeddings but introduces a cloud dependency, per-token cost ($0.02/million tokens), and sends your content to OpenAI’s servers.

Stay with potion-base-8M in all other cases. The speed advantage is critical for iterative indexing (reindex during development), the numpy-only dependency avoids PyTorch installation complexity, and the 256-dimensional vectors keep the database compact.

Quantization and Dimensionality Reduction

Model2Vec v0.5.0+ supports loading models with reduced precision and dimensions.⁸ This is useful for deployment on constrained hardware or reducing database size without switching models:

from model2vec import StaticModel

# Load with int8 quantization (25% of original size)
model = StaticModel.from_pretrained("minishlab/potion-base-8M", quantize=True)

# Load with reduced dimensions (e.g., 128 instead of 256)
model = StaticModel.from_pretrained("minishlab/potion-base-8M", dimensionality=128)

Quantized models retain near-identical retrieval quality at a fraction of the memory footprint. Dimensionality reduction follows Matryoshka-style truncation — the first N dimensions carry the most information. Reducing from 256 to 128 dimensions halves the vector storage with minimal quality loss for short-text retrieval.

Model2Vec v0.8.x updates the tokenizer/persistence internals, deprecates Python 3.9 support, and refreshes published results to the newer MTEB tables. Pin or test model2vec before upgrading a production indexer, because library upgrades can change model-loading paths even when the embedding model name stays the same.¹⁰

Fine-Tuning for Vault-Specific Embeddings

Model2Vec v0.4.0+ supports training custom classification models on top of static embeddings, v0.7.0 adds vocabulary quantization and configurable pooling for distillation, and v0.8.x refactors tokenizer and persistence behavior.¹⁰ This is relevant for vaults with specialized vocabulary (medical notes, legal references, domain-specific jargon) where the default potion models may not capture semantic nuances:

from model2vec import StaticModel
from model2vec.train import train_model

# Fine-tune on vault-specific data
model = StaticModel.from_pretrained("minishlab/potion-base-8M")
trained_model = train_model(model, train_texts, train_labels)
trained_model.save_pretrained("./vault-embeddings")

For most vaults, the default potion-base-8M produces sufficient retrieval quality. Fine-tuning is worthwhile only when retrieval consistently misses domain-specific connections that a general-purpose model cannot capture.

Model Hash Tracking

The indexer stores a hash derived from the model name and vocabulary size. If you change the embedding model, the indexer detects the mismatch on the next incremental run and triggers a full reindex automatically.

def _compute_model_hash(self):
    """Hash model name + vocab size for compatibility tracking."""
    key = f"{self._model_name}:{self._model.vocab_size}"
    return hashlib.sha256(key.encode()).hexdigest()[:16]

This prevents mixing vectors from different models in the same database, which would produce nonsensical cosine similarity scores.

Failure Modes

Model download failure. The first run downloads the model from Hugging Face. If the download fails (network issue, corporate firewall), the retriever falls back to BM25-only mode. The model is cached locally after the first download.

Dimension mismatch. If you switch models without clearing the database, the stored vectors have a different dimension than new embeddings. The indexer detects this via the model hash and triggers a full reindex. If the hash check fails (custom model without proper hash), sqlite-vec will error on KNN queries with mismatched dimensions.

Memory pressure on large vaults. Embedding 50,000+ chunks in a single batch can consume significant memory. The indexer processes in batches of 64 to limit peak memory usage. If memory is still an issue, reduce the batch size.

Full-Text Search with FTS5

SQLite’s FTS5 extension provides full-text search with BM25 ranking. FTS5 is the keyword search component of the hybrid retrieval pipeline. This section covers the FTS5 configuration, when BM25 excels, and its specific failure modes.

FTS5 Virtual Table

CREATE VIRTUAL TABLE chunks_fts USING fts5(
    chunk_text,
    section,
    heading_context,
    content=chunks,
    content_rowid=id
);

Content-sync mode. The content=chunks parameter tells FTS5 to reference the chunks table directly rather than storing a duplicate copy of the text. This halves the storage requirement but means FTS5 must be manually synced when chunks are inserted, updated, or deleted.

Columns. Three columns are indexed: - chunk_text — The primary content of each chunk (BM25 weight: 1.0) - section — The H2 heading text (BM25 weight: 0.5) - heading_context — Note title, tags, and metadata (BM25 weight: 0.3)

BM25 Ranking

BM25 ranks documents by term frequency, inverse document frequency, and document length normalization. The bm25() auxiliary function in FTS5 accepts per-column weights:

SELECT
    c.id, c.file_path, c.section, c.chunk_text,
    bm25(chunks_fts, 1.0, 0.5, 0.3) AS score
FROM chunks_fts
JOIN chunks c ON chunks_fts.rowid = c.id
WHERE chunks_fts MATCH ?
ORDER BY score
LIMIT 30;

The column weights (1.0, 0.5, 0.3) mean: - A keyword match in chunk_text contributes the most to the score - A match in section (heading) contributes half as much - A match in heading_context (title, tags) contributes 30% as much

These weights are tunable. If your vault has descriptive headings that strongly predict content quality, increase the section weight. If your tags are comprehensive and accurate, increase the heading_context weight.

When BM25 Wins

BM25 excels at queries containing exact identifiers:

Function names: _rrf_fuse, embed_batch, get_stale_files
CLI flags: --incremental, --vault, --model
Configuration keys: bm25_weight, max_tokens, batch_size
Error messages: SQLITE_LOCKED, ConnectionRefusedError
Specific terms of art: PostToolUse, PreToolUse, AGENTS.md

For these queries, BM25 finds the exact match immediately. Vector search would return semantically related content but might rank the exact match lower than a conceptual discussion.

When BM25 Fails

BM25 fails at queries that use different terminology than the stored content:

Query: “how to handle authentication failures” → Vault contains notes about “login error recovery” and “session expiration handling.” BM25 does not match because the keywords differ.
Query: “what is the best way to manage state” → Vault contains notes about “Redux store patterns” and “context providers.” BM25 misses because “state management” is expressed through specific technology names.

BM25 also fails with keyword collision at scale. In a 15,000-file vault, a search for “configuration” matches hundreds of notes because nearly every project note mentions configuration. The results are technically correct but practically useless — the ranking cannot determine which “configuration” note is relevant to the current query.

FTS5 Tokenizer

FTS5 uses the unicode61 tokenizer by default, which handles ASCII and Unicode text. For vaults with significant CJK (Chinese, Japanese, Korean) content, consider the trigram tokenizer:

-- For CJK-heavy vaults
CREATE VIRTUAL TABLE chunks_fts USING fts5(
    chunk_text, section, heading_context,
    content=chunks, content_rowid=id,
    tokenize='trigram'
);

The default unicode61 tokenizer splits on word boundaries, which works poorly for languages without spaces between words. The trigram tokenizer splits every three characters, enabling substring matching at the cost of index size (roughly 3x larger).

Maintenance

FTS5 requires explicit sync when the underlying chunks table changes:

# After inserting chunks
cursor.execute("""
    INSERT INTO chunks_fts(chunks_fts)
    VALUES('rebuild')
""")

The rebuild command reconstructs the FTS5 index from the content table. Run it after bulk inserts (full reindex) but not after individual incremental updates — for those, use INSERT INTO chunks_fts(rowid, chunk_text, section, heading_context) to sync individual rows.

Vector Search with sqlite-vec

The sqlite-vec extension brings vector KNN (K-Nearest Neighbors) search into SQLite. This section covers the sqlite-vec configuration, the embedding pipeline from note to searchable vector, and the specific query patterns.

sqlite-vec Virtual Table

CREATE VIRTUAL TABLE chunk_vecs USING vec0(
    id INTEGER PRIMARY KEY,
    embedding float[256]
);

The vec0 module stores 256-dimensional float vectors as packed binary data. The id column maps 1:1 to the chunks table, enabling joins between vector results and chunk metadata.

Embedding Pipeline

The pipeline flows from note to searchable vector:

Note (.md file)
  → Chunker: split at H2 boundaries
    → Chunks (30-2000 chars each)
      → Credential filter: scrub secrets
        → Embedder: Model2Vec encode
          → Vectors (256-dim float arrays)
            → sqlite-vec: store as packed binary
              → Ready for KNN queries

Vector Serialization

Python’s struct module serializes float vectors for sqlite-vec storage:

import struct

def _serialize_vector(vec):
    """Pack float list into binary for sqlite-vec."""
    return struct.pack(f"{len(vec)}f", *vec)

def _deserialize_vector(blob, dim=256):
    """Unpack binary blob to float list."""
    return list(struct.unpack(f"{dim}f", blob))

KNN Query

A vector search query embeds the input query, then finds the K nearest chunks by cosine distance:

def _vector_search(self, query_text, limit=30):
    query_vec = self.embedder.embed_batch([query_text])[0]
    packed = _serialize_vector(query_vec)

    results = self.db.execute("""
        SELECT
            cv.id,
            cv.distance,
            c.file_path,
            c.section,
            c.chunk_text
        FROM chunk_vecs cv
        JOIN chunks c ON cv.id = c.id
        WHERE embedding MATCH ?
            AND k = ?
        ORDER BY distance
    """, [packed, limit]).fetchall()

    return results

The MATCH operator in sqlite-vec performs approximate nearest neighbor search. The k parameter controls how many results to return. The distance column contains the cosine distance (0 = identical, 2 = opposite).

KNN Pagination with Distance Constraints

As of sqlite-vec v0.1.7, KNN queries support WHERE distance < ? constraints, enabling cursor-based pagination through large result sets without re-scanning earlier pages.¹⁴ Later v0.1.8 and v0.1.9 stable releases are packaging and DELETE bug-fix releases rather than new query-model releases, so v0.1.7 remains the feature boundary for this pagination pattern.²³

On the horizon, the v0.1.10-alpha line (March 31 – May 18, 2026) is the first to move sqlite-vec beyond brute-force KNN: it introduces approximate-nearest-neighbor index types — rescore, an experimental ivf (inverted-file) index that is not enabled by default, and a disk-based DiskANN index for vaults too large to keep vectors resident in memory.²³ These would change the scaling story for very large vaults, but the 0.1.10 line is still pre-release (alpha) — treat ANN indexing as experimental and continue building on the stable v0.1.9 brute-force KNN path for production vaults until a stable 0.1.10 ships.

def _paginated_vector_search(self, query_vec, page_size=20, max_distance=None):
    """Paginate through KNN results using distance constraints."""
    packed = _serialize_vector(query_vec)
    constraint = f"AND distance < {max_distance}" if max_distance else ""

    results = self.db.execute(f"""
        SELECT cv.id, cv.distance, c.file_path, c.chunk_text
        FROM chunk_vecs cv
        JOIN chunks c ON cv.id = c.id
        WHERE embedding MATCH ?
            AND k = ?
            {constraint}
        ORDER BY distance
    """, [packed, page_size]).fetchall()

    # Use last result's distance as cursor for next page
    next_cursor = results[-1][1] if results else None
    return results, next_cursor

This replaces the previous pattern of fetching a large k and slicing in Python, reducing memory usage for exploratory queries over large vaults.

DELETE Support in vec0 Tables

sqlite-vec v0.1.7 added native DELETE support for vec0 virtual tables, and v0.1.9 fixed a DELETE error path involving metadata text columns longer than 12 characters.¹⁴²³ Previously, removing vectors required dropping and recreating the table. Now the indexer’s file-removal path can delete vectors directly:

# Before v0.1.7: required workaround (drop + recreate, or mark as inactive)
# After v0.1.7: direct DELETE works
db.execute("DELETE FROM chunk_vecs WHERE id = ?", [chunk_id])

This simplifies incremental reindexing when notes are deleted or moved. The indexer no longer needs to maintain a shadow “active IDs” table or batch rebuilds.

When Vector Search Wins

Vector search excels at queries where the concept matters more than the specific words:

Query: “how to handle authentication failures” → Finds notes about “login error recovery” (same semantic space, different keywords)
Query: “what patterns exist for caching” → Finds notes about “memoization,” “Redis TTL strategies,” and “HTTP cache headers” (related concepts, diverse terminology)
Query: “approaches to testing asynchronous code” → Finds notes about “pytest-asyncio fixtures,” “mock event loops,” and “async test patterns” (same concept expressed through implementation details)

When Vector Search Fails

Vector search struggles with exact identifiers:

Query: _rrf_fuse → Returns notes about “fusion algorithms” and “rank merging” but may rank the actual function definition lower than conceptual discussions
Query: PostToolUse → Returns notes about “tool lifecycle hooks” and “post-execution handlers” rather than the specific hook name

Vector search also struggles with structured data. JSON configuration files, YAML blocks, and code snippets produce embeddings that capture structural patterns rather than semantic meaning. A JSON file with "review": true embeds differently than a prose discussion of code review.

Graceful Degradation

If sqlite-vec fails to load (missing extension, incompatible platform, corrupted library), the retriever falls back to BM25-only search:

class VectorIndex:
    def __init__(self, db_path):
        self.db = sqlite3.connect(db_path)
        self._vec_available = False
        try:
            self.db.enable_load_extension(True)
            self.db.load_extension("vec0")
            self._vec_available = True
        except Exception:
            pass  # BM25-only mode

    @property
    def vec_available(self):
        return self._vec_available

The retriever checks vec_available before attempting vector queries. When disabled, all searches use BM25 only, and the RRF fusion step is skipped.

Reciprocal Rank Fusion (RRF)

RRF merges two ranked lists without requiring score calibration. This section covers the algorithm, a worked query trace, tuning the k parameter, and why RRF is chosen over alternatives. For an interactive calculator with editable ranks, scenario presets, and a visual architecture explorer, see the hybrid retriever deep dive.

The Algorithm

RRF assigns each document a score based only on its rank position in each list:

score(d) = Σ (weight_i / (k + rank_i))

Where: - k is a smoothing constant (60, following Cormack et al.³) - rank_i is the document’s 1-based rank in result list i - weight_i is an optional per-list multiplier (default 1.0)

Documents that rank well in multiple lists receive higher fused scores. Documents that appear in only one list receive a score from that single source.

Why RRF Over Alternatives

Weighted linear combination requires calibrating BM25 scores against cosine distances. BM25 scores are unbounded and scale with corpus size. Cosine distances are bounded [0, 2]. Combining them requires normalization, and the normalization parameters are dataset-dependent. RRF uses only rank positions, which are always integers starting at 1 regardless of the scoring method.

Learned fusion models require labeled training data — query-document relevance pairs. For a personal knowledge base, this training data does not exist. You would need to manually judge hundreds of query-document pairs to train a useful model. RRF works without any training data.

Condorcet voting methods (Borda count, Schulze method) are theoretically elegant but more complex to implement and tune. The original RRF paper demonstrated that RRF outperforms Condorcet methods on TREC evaluation data.³

Fusion in Practice

Query: “how does the review aggregator handle disagreements”

BM25 ranks review-aggregator.py at position 3 (exact keyword matches on “review,” “aggregator,” “disagreements”) but places two config files higher (they match “review” more prominently). Vector search ranks the same chunk at position 1 (semantic match on conflict resolution). After RRF fusion:

Chunk	BM25	Vec	Fused Score
review-aggregator.py “Disagreement Resolution”	#3	#1	0.0323
code-review-patterns.md “Multi-Reviewer”	#4	#2	0.0317
deliberation-config.json “Review Weights”	#1	—	0.0164

Chunks that rank well in both lists surface to the top. Chunks that only appear in one list get a single-source score and drop below dual-ranked results. The actual disagreement resolution logic wins because both methods found it — BM25 through keywords, vector search through semantics.

For the full step-by-step trace with per-rank RRF math, try different k values in the interactive RRF calculator.

Implementation

RRF_K = 60

def _rrf_fuse(self, bm25_results, vec_results,
              bm25_weight=1.0, vec_weight=1.0):
    """Fuse BM25 and vector results using Reciprocal Rank Fusion."""
    scores = {}

    for rank, r in enumerate(bm25_results, start=1):
        cid = r["id"]
        if cid not in scores:
            scores[cid] = {
                "rrf_score": 0.0,
                "file_path": r["file_path"],
                "section": r["section"],
                "chunk_text": r["chunk_text"],
                "bm25_rank": None,
                "vec_rank": None,
            }
        scores[cid]["rrf_score"] += bm25_weight / (self._rrf_k + rank)
        scores[cid]["bm25_rank"] = rank

    for rank, r in enumerate(vec_results, start=1):
        cid = r["id"]
        if cid not in scores:
            scores[cid] = {
                "rrf_score": 0.0,
                "file_path": r["file_path"],
                "section": r["section"],
                "chunk_text": r["chunk_text"],
                "bm25_rank": None,
                "vec_rank": None,
            }
        scores[cid]["rrf_score"] += vec_weight / (self._rrf_k + rank)
        scores[cid]["vec_rank"] = rank

    fused = sorted(
        scores.values(),
        key=lambda x: x["rrf_score"],
        reverse=True,
    )
    return fused

Tuning k

The k constant controls how much weight is given to top-ranked results versus lower-ranked results:

Lower k (e.g., 10): Top-ranked results dominate. Rank 1 scores 1/11 = 0.091, rank 10 scores 1/20 = 0.050 (1.8x difference). Good when you trust the individual rankers to get the top result right.
Default k (60): Balanced. Rank 1 scores 1/61 = 0.0164, rank 10 scores 1/70 = 0.0143 (1.15x difference). Rank differences are compressed, giving more weight to appearing in multiple lists.
Higher k (e.g., 200): Appearing in both lists matters much more than rank position. Rank 1 scores 1/201, rank 10 scores 1/210 — nearly identical. Use when the individual rankers produce noisy rankings but cross-list agreement is reliable.

Start with k=60. The original RRF paper found this value robust across diverse TREC datasets. Tune only after measuring failure cases on your own query distribution.

Tie-Breaking

When two chunks have identical RRF scores (rare but possible with the same rank in one list and no appearance in the other), break ties by:

Prefer chunks that appear in both lists over chunks that appear in only one
Among chunks in both lists, prefer the one with the lower combined rank
Among chunks in only one list, prefer the one with the lower rank in that list

The Complete Retrieval Pipeline

This section traces a query from input to output through the entire pipeline: BM25 search, vector search, RRF fusion, token budget truncation, and context assembly.

End-to-End Flow

User query: "PostToolUse hook for context compression"
  │
  ├─ BM25 Search (FTS5)
  │    → MATCH "PostToolUse hook context compression"
  │    → Top 30 results ranked by BM25 score
  │    → 12ms
  │
  ├─ Vector Search (sqlite-vec)
  │    → Embed query with Model2Vec
  │    → KNN k=30 on chunk_vecs
  │    → Top 30 results ranked by cosine distance
  │    → 8ms
  │
  └─ RRF Fusion
       → Merge 60 candidates (may overlap)
       → Score by rank position
       → Top 10 results
       → 3ms
       │
       └─ Token Budget
            → Truncate to max_tokens (default 4000)
            → Estimate at 4 chars per token
            → Return results with metadata
            → <1ms

Total latency: ~23ms for a 49,746-chunk database on Apple M3 Pro hardware.

The Search API

class HybridRetriever:
    def search(self, query, limit=10, max_tokens=4000,
               bm25_weight=1.0, vec_weight=1.0):
        """
        Search the vault using hybrid BM25 + vector retrieval.

        Args:
            query: Search query text
            limit: Maximum results to return
            max_tokens: Token budget for total result text
            bm25_weight: Weight for BM25 results in RRF
            vec_weight: Weight for vector results in RRF

        Returns:
            List of SearchResult with file_path, section,
            chunk_text, rrf_score, bm25_rank, vec_rank
        """
        # BM25 search
        bm25_results = self._bm25_search(query, limit=30)

        # Vector search (if available)
        if self.index.vec_available:
            vec_results = self._vector_search(query, limit=30)
            fused = self._rrf_fuse(
                bm25_results, vec_results,
                bm25_weight, vec_weight,
            )
        else:
            fused = bm25_results  # BM25-only fallback

        # Token budget truncation
        results = []
        token_count = 0
        for r in fused[:limit]:
            chunk_tokens = len(r["chunk_text"]) // 4
            if token_count + chunk_tokens > max_tokens:
                break
            results.append(r)
            token_count += chunk_tokens

        return results

Token Budget Truncation

The max_tokens parameter prevents the retriever from returning more context than the AI tool can use. The estimate uses 4 characters per token (a reasonable approximation for English prose). Results are truncated greedily: add results in ranked order until the budget is exhausted.

This is a conservative strategy. A more sophisticated approach would consider per-result quality scores and prefer shorter, higher-quality results over longer, lower-quality results. The greedy approach is simpler and works well in practice because RRF ranking already orders results by relevance.

Database Schema (Complete)

-- Chunk content and metadata
CREATE TABLE chunks (
    id INTEGER PRIMARY KEY,
    file_path TEXT NOT NULL,
    section TEXT NOT NULL,
    chunk_text TEXT NOT NULL,
    heading_context TEXT DEFAULT '',
    mtime_ns INTEGER NOT NULL,
    embedded_at REAL NOT NULL
);

CREATE INDEX idx_chunks_file ON chunks(file_path);
CREATE INDEX idx_chunks_mtime ON chunks(mtime_ns);

-- FTS5 for BM25 search (content-synced to chunks table)
CREATE VIRTUAL TABLE chunks_fts USING fts5(
    chunk_text, section, heading_context,
    content=chunks, content_rowid=id
);

-- sqlite-vec for vector KNN search
CREATE VIRTUAL TABLE chunk_vecs USING vec0(
    id INTEGER PRIMARY KEY,
    embedding float[256]
);

-- Model metadata for compatibility tracking
CREATE TABLE model_meta (
    key TEXT PRIMARY KEY,
    value TEXT
);

Graceful Degradation Path

Full pipeline:     BM25 + Vector + RRF  →  Best results
No sqlite-vec:     BM25 only            →  Good results (no semantic)
No model download:  BM25 only            →  Good results (no semantic)
No FTS5:           Vector only           →  Decent results (no keyword)
No database:       Error                 →  Prompt user to run indexer

The retriever checks capabilities at initialization and adapts its query strategy. A missing component degrades quality but does not cause errors. The only hard failure is a missing database file.

Production Stats

Measured on a vault of 16,894 files, 49,746 chunks, 83 MB SQLite database, Apple M3 Pro:

Metric	Value
Total files	16,894
Total chunks	49,746
Database size	83 MB
BM25 query latency (p50)	12ms
Vector query latency (p50)	8ms
RRF fusion latency	3ms
End-to-end search latency (p50)	23ms
Full reindex time	~4 minutes
Incremental reindex time	<10 seconds
Embedding model	potion-base-8M (256-dim)
BM25 candidate pool	30
Vector candidate pool	30
Default result limit	10
Default token budget	4,000 tokens

Content Hashing and Change Detection

The indexer needs to know which files have changed since the last index run. This section covers the change detection mechanism and the hashing strategy.

File Modification Time Comparison

The indexer stores mtime_ns (file modification time in nanoseconds) for every chunk in the chunks table. On an incremental run, the indexer:

Scans the vault for all .md files in allowed folders
Reads the mtime_ns for each file from the filesystem
Compares against the stored mtime_ns in the database
Identifies three categories:
New files: path exists in filesystem but not in database
Changed files: path exists in both but mtime_ns differs
Deleted files: path exists in database but not in filesystem

def get_stale_files(self, vault_mtimes):
    """Find files whose mtime changed or are new."""
    stored = dict(self.db.execute(
        "SELECT DISTINCT file_path, mtime_ns FROM chunks"
    ).fetchall())

    stale = []
    for path, mtime in vault_mtimes.items():
        if path not in stored or stored[path] != mtime:
            stale.append(path)
    return stale

def get_deleted_files(self, vault_paths):
    """Find files in database that no longer exist in vault."""
    stored_paths = set(r[0] for r in self.db.execute(
        "SELECT DISTINCT file_path FROM chunks"
    ).fetchall())
    return stored_paths - set(vault_paths)

Why mtime, Not Content Hash

Content hashing (SHA-256 of file contents) would be more reliable than mtime comparison — it would detect cases where a file was touched without changing (e.g., git checkout restoring the original mtime). However, hashing requires reading every file on every incremental run. For 16,894 files, reading file contents takes 2-3 seconds. Reading mtimes from the filesystem takes <100ms.

The trade-off: mtime comparison occasionally triggers unnecessary re-indexing of unchanged files (false positives) but never misses actual changes. False positives cost a few extra embedding calls per run. The speed difference (100ms vs 3 seconds) makes mtime the pragmatic choice for a system that runs on every AI interaction.

Handling Deletions

When a file is deleted from the vault, the indexer removes all its chunks from the database:

def remove_file(self, file_path):
    """Remove all chunks and vectors for a file."""
    chunk_ids = [r[0] for r in self.db.execute(
        "SELECT id FROM chunks WHERE file_path = ?",
        [file_path],
    ).fetchall()]

    for cid in chunk_ids:
        self.db.execute(
            "DELETE FROM chunk_vecs WHERE id = ?", [cid]
        )
    self.db.execute(
        "DELETE FROM chunks WHERE file_path = ?",
        [file_path],
    )

The DELETE FROM chunk_vecs statement works natively as of sqlite-vec v0.1.7, with a v0.1.9 bug fix for DELETE operations against vec0 tables with longer metadata text columns.¹⁴²³ Earlier versions required workarounds (dropping and recreating the virtual table, or maintaining an external “active IDs” set). If running a pre-0.1.9 version, upgrade before relying on direct deletes in metadata-heavy schemas.

FTS5 content-sync tables require explicit deletion via INSERT INTO chunks_fts(chunks_fts, rowid, ...) VALUES('delete', ?, ...) for each removed row. The indexer handles this as part of the file removal process.

Incremental vs Full Reindex

The indexer supports two modes: incremental (fast, daily use) and full (slow, occasional). This section covers when to use each, the idempotency guarantees, and corruption recovery.

Incremental Reindex

When to use: Daily indexing after editing notes. The default mode.

What it does: 1. Scan vault for file changes (mtime comparison) 2. Delete chunks for deleted files 3. Re-chunk and re-embed changed files 4. Insert new chunks for new files 5. Sync FTS5 index

Typical duration: <10 seconds for a day’s edits on a 16,000-file vault.

python index_vault.py --incremental

Full Reindex

When to use: - After changing the embedding model (model hash mismatch detected) - After schema migration (new columns, changed indexes) - After database corruption (integrity check fails) - When incremental indexing produces unexpected results

What it does: 1. Drop all existing data (chunks, vectors, FTS5 entries) 2. Scan entire vault 3. Chunk all files 4. Embed all chunks 5. Build FTS5 index from scratch

Typical duration: ~4 minutes for 16,894 files on Apple M3 Pro.

python index_vault.py --full

Idempotency

Both modes are idempotent: running the same command twice produces the same result. The indexer deletes existing chunks for a file before inserting new ones, so a re-run of incremental indexing on an already-current database produces zero changes. A re-run of full indexing produces an identical database.

Corruption Recovery

If the SQLite database becomes corrupted (power loss during write, disk error, killed process mid-transaction):

# Check integrity
sqlite3 vectors.db "PRAGMA integrity_check;"

# If corruption detected, full reindex rebuilds from source files
python index_vault.py --full

The source of truth is always the vault files, not the database. The database is a derived artifact that can be rebuilt at any time. This is a critical design property: you never need to back up the database.

The `--incremental` Flag

When the indexer runs with --incremental:

Model hash check. Compare stored model hash against current model. If different, automatically switch to full reindex mode and warn the user.
File scan. Walk allowed folders, collect file paths and mtimes.
Change detection. Compare against stored data.
Batch processing. Re-chunk and re-embed changed files in batches of 64.
Progress reporting. Print count of processed files and elapsed time.
Graceful shutdown. Handle SIGINT by finishing the current file before stopping.

Credential Filtering and Data Boundaries

Personal notes contain secrets: API keys, bearer tokens, database connection strings, private keys pasted during debugging sessions. The credential filter prevents these from entering the retrieval index.

The Problem

A note about debugging an OAuth integration might contain:

The token was: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
I used this curl command:
  curl -H "Authorization: Bearer sk-ant-api03-abc123..."

Without filtering, both the JWT and the API key would be chunked, embedded, and stored in the database. A search for “authentication” would return the chunk containing real secrets. Worse, if the retriever feeds results to an AI tool through MCP, the secrets appear in the AI’s context window and potentially in the tool’s logs.

Pattern-Based Filtering

The credential filter runs on every chunk before storage, matching 25 vendor-specific patterns plus generic patterns:

Vendor-Specific Patterns:

Pattern	Example	Regex
OpenAI API key	`sk-...`	`sk-[a-zA-Z0-9_-]{20,}`
Anthropic API key	`sk-ant-api03-...`	`sk-ant-api\d{2}-[a-zA-Z0-9_-]{20,}`
GitHub PAT	`ghp_...`	`gh[ps]_[a-zA-Z0-9]{36,}`
AWS Access Key	`AKIA...`	`AKIA[0-9A-Z]{16}`
Stripe key	`sk_live_...`	`[sr]k_(live\\|test)_[a-zA-Z0-9]{24,}`
Cloudflare token	`...`	Various patterns

Generic Patterns:

Pattern	Detection
JWT tokens	`eyJ[a-zA-Z0-9_-]+\.eyJ[a-zA-Z0-9_-]+`
Bearer tokens	`Bearer\s+[a-zA-Z0-9_\-\.]+`
Private keys	`-----BEGIN (RSA\\|EC\\|OPENSSH) PRIVATE KEY-----`
High-entropy base64	Strings with >4.5 bits/char entropy, 40+ chars
Password assignments	`password\s[:=]\s["'][^"']+["']`

Filter Implementation

def clean_content(text):
    """Scrub credentials from text before indexing."""
    result = ScanResult(is_clean=True, match_count=0, patterns=[])

    for pattern in CREDENTIAL_PATTERNS:
        matches = pattern.regex.findall(text)
        if matches:
            text = pattern.regex.sub(
                f"[REDACTED:{pattern.name}]", text
            )
            result.is_clean = False
            result.match_count += len(matches)
            result.patterns.append(pattern.name)

    return text, result

Key design choices:

Filter before embedding. The cleaned text is what gets embedded. The vector representation never encodes credential patterns. A query for “API key” returns notes that discuss API key management, not notes that contain actual keys.
Replace, not remove. The [REDACTED:pattern-name] token preserves the semantic context of the surrounding text. The embedding captures that “something credential-like was here” without encoding the credential itself.
Log patterns, not values. The filter logs which patterns matched (e.g., “Scrubbed 2 credential(s) from oauth-debug.md [jwt, bearer-token]”) but never logs the credential value.

Path-Based Exclusion

The .indexignore file provides coarse-grained exclusion by path. The credential filter provides fine-grained scrubbing within indexed files. Both are necessary:

.indexignore for entire folders you know contain sensitive content (health notes, financial records, career documents)
Credential filter for secrets accidentally embedded in otherwise-indexable content

Data Classification

For vaults containing diverse content, consider classifying notes by sensitivity:

Level	Examples	Index?	Filter?
Public	Blog drafts, technical notes	Yes	Yes
Internal	Project plans, architecture decisions	Yes	Yes
Sensitive	Salary data, health records	No (.indexignore)	N/A
Restricted	Credentials, private keys	No (.indexignore)	N/A

MCP Server Architecture

Model Context Protocol (MCP) servers expose the retriever as a tool that AI agents can call. This section covers the server design, capability surface, and permission boundaries.

Protocol Choice: STDIO vs HTTP

MCP supports two transport modes:

STDIO — The AI tool spawns the MCP server as a child process and communicates over stdin/stdout. This is the standard mode for local tools. Claude Code, Codex CLI, and Cursor all support STDIO MCP servers.

{
  "mcpServers": {
    "obsidian": {
      "command": "python",
      "args": ["/path/to/obsidian_mcp.py"],
      "env": {
        "VAULT_PATH": "/path/to/vault",
        "DB_PATH": "/path/to/vectors.db"
      }
    }
  }
}

HTTP — The MCP server runs as a standalone HTTP service. Useful for remote access, multi-client setups, or team configurations where the vault is on a shared server.

{
  "mcpServers": {
    "obsidian": {
      "url": "http://localhost:3333/mcp"
    }
  }
}

Recommendation: Use STDIO for personal vaults. It is simpler, more secure (no network exposure), and the server lifecycle is managed by the AI tool. Use HTTP only when multiple tools or multiple machines need concurrent access to the same vault.

MCP Spec Evolution. The June 2025 MCP specification added OAuth 2.1 authorization, structured tool outputs (typed return schemas), and elicitation (server-initiated user prompts). The November 2025 release shipped Streamable HTTP as a first-class transport mode, .well-known URL discovery for automatic server capability browsing, structured tool annotations that declare whether a tool is read-only or mutating, and an SDK tier standardization system.⁷⁹ The next revision is now concrete: the 2026-07-28 specification entered Release Candidate on May 21, 2026 — the largest MCP revision since launch. Its headline changes are a stateless protocol core (the initialize handshake and Mcp-Session-Id header are removed, so servers no longer track per-connection session state), MCP Apps (servers can return server-rendered HTML displayed in sandboxed client iframes), Tasks graduating from experimental core to an official extension (tasks/get, tasks/update, tasks/cancel for long-running operations), hardened OAuth 2.0 / OIDC authorization, and a 12-month feature-deprecation lifecycle policy. The final spec ships July 28, 2026.²⁴ For personal vault servers, STDIO remains the simplest path, and the stateless core makes single-user STDIO servers even thinner. The Streamable HTTP transport, .well-known discovery, and MCP Apps primarily benefit enterprise HTTP deployments with multi-tenant routing and load balancing. Monitor the MCP roadmap for updates that affect your transport choice.

Capability Design

The MCP server should expose a minimal set of tools:

search — The primary tool. Runs hybrid retrieval and returns ranked results.

{
  "name": "obsidian_search",
  "description": "Search the Obsidian vault using hybrid BM25 + vector retrieval",
  "parameters": {
    "query": { "type": "string", "description": "Search query" },
    "limit": { "type": "integer", "default": 5 },
    "max_tokens": { "type": "integer", "default": 2000 }
  }
}

read_note — Read the full content of a specific note by path. Useful when the agent wants to see the complete context of a search result.

{
  "name": "obsidian_read_note",
  "description": "Read the full content of a note by file path",
  "parameters": {
    "file_path": { "type": "string", "description": "Relative path within vault" }
  }
}

list_notes — List notes matching a filter (by folder, tag, type, or date range). Useful for exploration when the agent does not have a specific query.

{
  "name": "obsidian_list_notes",
  "description": "List notes matching filters",
  "parameters": {
    "folder": { "type": "string", "description": "Folder path within vault" },
    "tag": { "type": "string", "description": "Tag to filter by" },
    "limit": { "type": "integer", "default": 20 }
  }
}

get_context — A convenience tool that runs a search and formats the results as a context block suitable for injection into a conversation.

{
  "name": "obsidian_get_context",
  "description": "Get formatted context from vault for a topic",
  "parameters": {
    "topic": { "type": "string", "description": "Topic to get context for" },
    "max_tokens": { "type": "integer", "default": 2000 }
  }
}

Permission Boundaries

The MCP server should enforce strict boundaries:

Read-only. The server reads the vault and the index database. It does not create, modify, or delete notes. Write operations (capturing new notes) are handled by separate hooks or skills, not the MCP server.
Vault-scoped. The server only reads files within the configured vault path. Path traversal attempts (../../etc/passwd) must be rejected.
Credential-filtered output. Even if the database contains pre-filtered content, apply credential filtering on output as a defense-in-depth measure.
Token-limited responses. Enforce max_tokens on all tool responses to prevent the AI tool from receiving excessively large context blocks.

Error Handling

MCP tools should return structured error messages that help the AI tool recover:

def search(self, query, limit=5, max_tokens=2000):
    if not self.db_path.exists():
        return {
            "error": "Index database not found. Run the indexer first.",
            "suggestion": "python index_vault.py --full"
        }

    results = self.retriever.search(query, limit, max_tokens)

    if not results:
        return {
            "results": [],
            "message": f"No results found for '{query}'. Try broader terms."
        }

    return {
        "results": [
            {
                "file_path": r["file_path"],
                "section": r["section"],
                "text": r["chunk_text"],
                "score": round(r["rrf_score"], 4),
            }
            for r in results
        ],
        "count": len(results),
        "query": query,
    }

Claude Code Integration

Claude Code is the primary consumer of the Obsidian retrieval system. This section covers MCP configuration, hook integration, and the obsidian_bridge.py pattern.

MCP Configuration

Add the Obsidian MCP server to ~/.claude/settings.json:

{
  "mcpServers": {
    "obsidian": {
      "command": "python",
      "args": ["/path/to/obsidian_mcp.py"],
      "env": {
        "VAULT_PATH": "/absolute/path/to/vault",
        "DB_PATH": "/absolute/path/to/vectors.db"
      }
    }
  }
}

After adding the configuration, restart Claude Code. The MCP server will start as a child process. Verify it is running:

> What tools do you have from the obsidian MCP server?

Claude Code should list the available tools (obsidian_search, obsidian_read_note, etc.).

Hook Integration

Hooks extend Claude Code’s behavior at defined lifecycle points. Two hooks are relevant for Obsidian integration:

PreToolUse hook — Queries the vault before the agent processes a tool call. Injects relevant context automatically.

#!/bin/bash
# ~/.claude/hooks/pre-tool-use/obsidian-context.sh
# Automatically inject vault context before tool execution

TOOL_NAME="$1"
PROMPT="$2"

# Only inject context for code-related tools
case "$TOOL_NAME" in
    Edit|Write|Bash)
        # Query the vault
        CONTEXT=$(python /path/to/retriever.py search "$PROMPT" --limit 3 --max-tokens 1500)
        if [ -n "$CONTEXT" ]; then
            echo "---"
            echo "Relevant vault context:"
            echo "$CONTEXT"
            echo "---"
        fi
        ;;
esac

PostToolUse hook — Captures significant tool outputs back to the vault for future retrieval.

#!/bin/bash
# ~/.claude/hooks/post-tool-use/capture-insight.sh
# Capture significant outputs to vault (selective)

TOOL_NAME="$1"
OUTPUT="$2"

# Only capture substantial outputs
if [ ${#OUTPUT} -gt 500 ]; then
    python /path/to/capture.py --text "$OUTPUT" --source "claude-code-$TOOL_NAME"
fi

The obsidian_bridge.py Pattern

A bridge module provides a Python API that hooks and skills can call:

# obsidian_bridge.py
from retriever import HybridRetriever

_retriever = None

def get_retriever():
    global _retriever
    if _retriever is None:
        _retriever = HybridRetriever(
            db_path="/path/to/vectors.db",
            vault_path="/path/to/vault",
        )
    return _retriever

def search_vault(query, limit=5, max_tokens=2000):
    """Search vault and return formatted context."""
    retriever = get_retriever()
    results = retriever.search(query, limit, max_tokens)

    if not results:
        return ""

    lines = ["## Vault Context\n"]
    for r in results:
        lines.append(f"**{r['file_path']}** — {r['section']}")
        lines.append(f"> {r['chunk_text'][:500]}")
        lines.append("")

    return "\n".join(lines)

The `/capture` Skill

A Claude Code skill for capturing insights back to the vault:

/capture "OAuth token rotation requires both access and refresh token invalidation"
  --domain security
  --tags oauth,tokens

The skill creates a new note in 00-inbox/ with proper frontmatter and triggers an incremental reindex so the new note is immediately searchable.

Custom Command Patterns

Claude Code skills can wrap vault operations into named commands. Practitioners have built libraries of Obsidian-specific commands that treat the vault as both a read source and a write target.

Signal scanning. A /scan-intel command queries external sources, scores findings against personal research interests, and writes qualifying signals as vault notes with frontmatter:

/scan-intel --topics "agent infrastructure, security" --lookback 7d

The command fetches from configured sources (arXiv, HN, RSS), applies a scoring model (relevance, actionability, depth, authority), and writes passing signals to topic-specific vault folders. The vault becomes the downstream consumer of an automated intelligence pipeline.

Captain’s log. A /captains-log command aggregates daily git activity across all repositories, writes a structured journal entry to the vault, and includes decisions made, realizations, and open threads:

/captains-log

The command pulls commit history from GitHub, groups by repository, and formats as a narrative journal entry. Over time, the daily logs create a searchable record of what shipped and why.

Obsidian capture. A /obsidian-capture command takes an insight from the current Claude Code session and writes it directly to the vault with proper metadata:

/obsidian-capture "SAST gates in agent loops increase security degradation"
  --folder AI-Tools --tags security,agents

The pattern extends to any vault operation: creating MOCs, updating project status notes, linking related signals, or generating weekly digests from accumulated daily logs.

Community examples. Practitioners are publishing their command libraries. One developer shared 22 custom Obsidian + Claude Code commands covering daily reviews, project planning, research capture, and content workflows.¹ Another built a “Visual Explainer” skill that generates diagram notes in the vault from code analysis.² The commands vary but the architecture is consistent: Claude Code skills as the interface, vault notes as the storage layer, and retrieval infrastructure as the query engine.

Context Window Management

The integration should be mindful of Claude Code’s context window:

Limit injected context to 1,500-2,000 tokens per query. More than this competes with the agent’s working memory.
Include source attribution. Always include the file path and section heading so the agent can reference the source.
Truncate chunk text. Long chunks should be truncated with ... rather than omitted entirely. The first 300-500 characters usually contain the key information.
Do not inject on every tool call. The PreToolUse hook should selectively inject context based on the tool being called. Read operations do not need vault context. Write and Edit operations benefit from it.

Codex CLI Integration

Codex CLI connects to MCP servers through config.toml. The integration pattern differs from Claude Code in configuration syntax and instruction delivery.

MCP Configuration

Add to .codex/config.toml or ~/.codex/config.toml:

[mcp_servers.obsidian]
command = "python"
args = ["/path/to/obsidian_mcp.py"]

[mcp_servers.obsidian.env]
VAULT_PATH = "/absolute/path/to/vault"
DB_PATH = "/absolute/path/to/vectors.db"

AGENTS.md Patterns

Codex CLI reads AGENTS.md for project-level instructions. Include vault search guidance:

## Available Tools

### Obsidian Vault (MCP: obsidian)
Use the `obsidian_search` tool to find relevant context from the knowledge base.
Search the vault when you need:
- Background on a concept or pattern
- Prior decisions or rationale
- Reference material for implementation

Example queries:
- "authentication patterns in FastAPI"
- "how does the review aggregator work"
- "sqlite-vec configuration"

Differences from Claude Code

Feature	Claude Code	Codex CLI
MCP config	`settings.json`	`config.toml`
Hooks	`~/.claude/hooks/`	Not supported
Skills	`~/.claude/skills/`	Not supported
Instruction file	`CLAUDE.md`	`AGENTS.md`
Approval modes	`--dangerously-skip-permissions`	`suggest` / `auto-edit` / `full-auto`

Key difference: Codex CLI does not support hooks. The automatic context injection pattern (PreToolUse hook) is not available. Instead, include explicit instructions in AGENTS.md telling the agent to search the vault before starting work.

Cursor and Other Tools

Cursor and other AI tools that support MCP can connect to the same Obsidian MCP server. This section covers configuration for common tools.

Cursor

Add to .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "obsidian": {
      "command": "python",
      "args": ["/path/to/obsidian_mcp.py"],
      "env": {
        "VAULT_PATH": "/absolute/path/to/vault",
        "DB_PATH": "/absolute/path/to/vectors.db"
      }
    }
  }
}

Cursor’s .cursorrules file can include instructions to use the vault:

When working on implementation tasks, search the Obsidian vault
for relevant context before writing code. Use the obsidian_search
tool with descriptive queries about the concept you're implementing.

Compatibility Matrix

Tool	MCP Support	Transport	Config Location
Claude Code	Full	STDIO	`~/.claude/settings.json`
Codex CLI	Full	STDIO	`.codex/config.toml`
Cursor	Full	STDIO	`.cursor/mcp.json`
Windsurf	Full	STDIO	`.windsurf/mcp.json`
Continue.dev	Partial	HTTP	`~/.continue/config.json`
Zed	In progress	STDIO	Settings UI
Claudian (Obsidian plugin)	N/A (embedded)	Claude Code CLI	Obsidian plugin settings
Agent Client (Obsidian plugin)	N/A (embedded)	ACP	Obsidian plugin settings

Fallback for Non-MCP Tools

For tools that do not support MCP, the retriever can be wrapped as a CLI:

# Search from command line
python retriever_cli.py search "query text" --limit 5

# Output formatted for copy-paste into any tool
python retriever_cli.py context "query text" --format markdown

The CLI outputs structured text that can be manually pasted into any AI tool’s input. This is less elegant than MCP integration but works universally.

Prompt Caching from Structured Notes

Structured notes in the vault can serve as reusable context blocks that reduce token usage across AI interactions. This section covers cache key design and token budget management.

The Pattern

Instead of searching for context on every interaction, pre-build context blocks from well-structured vault notes and cache them:

# cache_keys.py
CONTEXT_BLOCKS = {
    "auth-patterns": {
        "vault_query": "authentication patterns implementation",
        "max_tokens": 1500,
        "ttl_hours": 24,  # Rebuild daily
    },
    "api-conventions": {
        "vault_query": "API design conventions REST patterns",
        "max_tokens": 1000,
        "ttl_hours": 168,  # Rebuild weekly
    },
    "project-architecture": {
        "vault_query": "current project architecture decisions",
        "max_tokens": 2000,
        "ttl_hours": 12,  # Rebuild twice daily
    },
}

Cache Invalidation

Cache invalidation is based on two signals:

TTL expiry. Each context block has a time-to-live. When the TTL expires, the block is rebuilt by re-querying the vault.
Vault change detection. When the indexer detects changes to files that contributed to a cached context block, the block is invalidated immediately.

Token Budget Management

A session starts with a total context budget. Cached blocks consume part of that budget:

Total context budget:    8,000 tokens
├─ System prompt:        1,500 tokens
├─ Cached blocks:        3,000 tokens (pre-loaded)
├─ Dynamic search:       2,000 tokens (on-demand)
└─ Conversation:         1,500 tokens (remaining)

The cached blocks load at session start. Dynamic search results fill the remaining budget on a per-query basis. This hybrid approach gives the agent a baseline of frequently-needed context while preserving budget for specific queries.

Before/After Token Usage

Without caching: Every relevant query triggers a vault search, returning 1,500-2,000 tokens of context. Over 10 queries in a session, the agent consumes 15,000-20,000 tokens of vault context.

With caching: Three pre-built context blocks consume 4,500 tokens total. Additional searches add 1,500-2,000 tokens per unique query. Over 10 queries where 6 are covered by cached blocks, the agent consumes 4,500 + (4 * 1,500) = 10,500 tokens — roughly half the uncached usage.

PostToolUse Hooks for Context Compression

Tool outputs can be verbose: stack traces, file listings, test results. A PostToolUse hook can compress these outputs before they consume context window space.

The Problem

A Bash tool call that runs tests might return:

PASSED tests/test_auth.py::test_login_success
PASSED tests/test_auth.py::test_login_failure
PASSED tests/test_auth.py::test_token_refresh
PASSED tests/test_auth.py::test_session_expiry
... (200 more lines)
FAILED tests/test_api.py::test_rate_limit_exceeded

The full output is 5,000 tokens, but the signal is in 2 lines: 200 passed, 1 failed.

Hook Implementation

#!/bin/bash
# ~/.claude/hooks/post-tool-use/compress-output.sh
# Compress verbose tool outputs to preserve context window

TOOL_NAME="$1"
OUTPUT="$2"
OUTPUT_LEN=${#OUTPUT}

# Only compress large outputs
if [ "$OUTPUT_LEN" -lt 2000 ]; then
    exit 0  # Pass through unchanged
fi

case "$TOOL_NAME" in
    Bash)
        # Compress test output
        if echo "$OUTPUT" | grep -q "PASSED\|FAILED"; then
            PASSED=$(echo "$OUTPUT" | grep -c "PASSED")
            FAILED=$(echo "$OUTPUT" | grep -c "FAILED")
            FAILURES=$(echo "$OUTPUT" | grep "FAILED")
            echo "Tests: $PASSED passed, $FAILED failed"
            if [ "$FAILED" -gt 0 ]; then
                echo "Failures:"
                echo "$FAILURES"
            fi
        fi
        ;;
esac

Recursive Trigger Prevention

A compression hook that emits output could trigger itself if not guarded:

# Guard against recursive invocation
if [ -n "$COMPRESS_HOOK_ACTIVE" ]; then
    exit 0
fi
export COMPRESS_HOOK_ACTIVE=1

Compression Heuristics

Output Type	Detection	Compression Strategy
Test results	`PASSED` / `FAILED` keywords	Count pass/fail, show failures only
File listings	`ls` or `find` in command	Truncate to first 20 entries + count
Stack traces	`Traceback` keyword	Keep first and last frame + error message
Git status	`modified:` / `new file:`	Summarize counts by status
Build output	`warning:` / `error:`	Strip info lines, keep warnings/errors

Signal Intake and Triage Pipeline

The intake layer determines what enters the vault. Without curation, the vault accumulates noise. This section covers the scoring pipeline that routes signals to domain folders.

Sources

Signals come from multiple channels:

RSS feeds: Technical blogs, security advisories, release notes
Bookmarks via Web Clipper: The official Obsidian Web Clipper extension (Chrome, Firefox, Safari) is the highest-fidelity intake path for browser-side capture. The April 2026 release cycle made it materially more useful for AI workflows:²²
- 1.4.0 (Apr 9): Interactive YouTube transcript UI — pin the video, scrub through the transcript, auto-scroll, and highlight the current position. Plus an “Open in Reader” default that sends a one-click capture straight to Reader mode.
- 1.5.0–1.5.1 (Apr 15): Highlights viewer — browse and search captured highlights across the vault. Fade-in transition into Reader. Smoother YouTube play/pause. 1.5.1 fixed a webpack compilation regression.
- 1.6.0–1.6.2 (Apr 21–23): Highlighter UX overhaul with mobile support. Defuddle 0.18 adds source-specific extractors for LinkedIn, Threads, Bluesky, Discourse, and Medium. 1.6.2 fixes a Safari embedded-mode clipboard regression. Configure templates per source domain so YouTube transcripts, GitHub READMEs, and longform articles each land in a sensibly-named note with the right frontmatter for the scoring pipeline below.
Newsletters: Key excerpts from email newsletters
Manual capture: Notes written during reading, conversations, or research
Tool output: Significant AI tool outputs captured via hooks
iOS Share Extension: Obsidian’s iOS app (updated early 2026) includes a Share Extension that saves content from Safari, social networks, and other apps directly to the vault without opening Obsidian.¹⁹ This creates a low-friction mobile intake path — share an article from Safari and it arrives as a vault note ready for scoring.
Obsidian CLI: Shell scripts and hooks can create notes via obsidian file create or append to existing notes via obsidian file append, enabling automated intake pipelines on desktop.

Scoring Dimensions

Each signal is scored on four dimensions (0.0 to 1.0 each):

Dimension	Question	Low Score (0.0-0.3)	High Score (0.7-1.0)
Relevance	Does this relate to my active domains?	Tangential, outside scope	Directly relevant to active work
Actionability	Can I use this information?	Pure theory, no application	Specific technique or pattern I can apply
Depth	How substantive is the content?	Headlines, shallow summary	Detailed analysis with examples
Authority	How credible is the source?	Anonymous blog, unverified	Primary source, peer-reviewed, recognized expert

Composite Score and Routing

composite = (relevance * 0.35) + (actionability * 0.25) +
            (depth * 0.25) + (authority * 0.15)

Score Range	Action
0.55+	Auto-route to domain folder
0.40 - 0.55	Queue for manual review
< 0.40	Drop (do not store)

Domain Routing

Signals scoring above 0.55 route to one of 12 domain folders based on keyword matching and topic classification:

05-signals/
├── ai-tooling/        # Claude, LLMs, AI development tools
├── security/          # Vulnerabilities, auth, cryptography
├── systems/           # Architecture, distributed systems
├── programming/       # Languages, patterns, algorithms
├── web/               # Frontend, backends, APIs
├── data/              # Databases, data engineering
├── devops/            # CI/CD, containers, infrastructure
├── design/            # UI/UX, product design
├── mobile/            # iOS, Android, cross-platform
├── career/            # Industry trends, hiring, growth
├── research/          # Academic papers, whitepapers
└── other/             # Signals that don't fit a domain

Production Stats

Over 14 months of operation:

Metric	Value
Total signals processed	7,771
Auto-routed (>0.55)	4,832 (62%)
Queued for review (0.40-0.55)	1,543 (20%)
Dropped (<0.40)	1,396 (18%)
Active domain folders	12
Average signals per day	~18

Knowledge Graph Patterns

Obsidian’s wiki-link graph encodes relationships between notes. This section covers link semantics, graph traversal for context expansion, and anti-patterns that degrade graph quality.

Backlink Semantics

Every wiki-link creates a directed edge in the graph. Obsidian tracks both forward links and backlinks:

Forward link: Note A contains [[Note B]] → A links to B
Backlink: Note B shows that Note A references it

The graph encodes different types of relationships depending on context:

Link Pattern	Semantic	Example
Inline link	“Is related to”	“See [[OAuth Token Rotation]] for details”
Header link	“Has subtopic”	”## Related\n- [[Token Rotation]]\n- [[Session Management]]”
Tag-like link	“Is categorized as”	”[[type/reference]]”
MOC link	“Is part of”	A Map of Content note listing related notes

Maps of Content (MOCs)

MOCs are index notes that organize related notes into a navigable structure:

---
title: "Authentication & Security MOC"
type: moc
domain: security
---

## Core Concepts
- [[OAuth 2.0 Overview]]
- [[JWT Token Anatomy]]
- [[Session Management Patterns]]

## Implementation Patterns
- [[OAuth Token Rotation]]
- [[Refresh Token Security]]
- [[PKCE Flow Implementation]]

## Failure Modes
- [[Token Expiry Handling]]
- [[Session Fixation Prevention]]
- [[CSRF Defense Strategies]]

MOCs benefit retrieval in two ways:

Direct match. A search for “authentication overview” matches the MOC itself, providing the agent with a curated list of related notes.
Context expansion. After finding a specific note, the retriever can check if the note appears in any MOCs and include the MOC’s structure in the results, giving the agent a map of the broader topic.

Graph Traversal for Context Expansion

A future enhancement to the retriever: after finding top results, expand the context by following links:

def expand_context(results, depth=1):
    """Follow wiki-links from top results to find related context."""
    expanded = set()
    for result in results:
        # Parse wiki-links from chunk text
        links = extract_wiki_links(result["chunk_text"])
        for link_target in links:
            # Resolve link to file path
            target_path = resolve_wiki_link(link_target)
            if target_path and target_path not in expanded:
                expanded.add(target_path)
                # Include target's most relevant chunk
                target_chunks = get_chunks_for_file(target_path)
                # ... rank and include best chunk
    return results + list(expanded_results)

This is not implemented in the current retriever but represents a natural extension of the graph structure.

Anti-Patterns

Orphan clusters. Groups of notes that link to each other but have no connections to the rest of the vault. The graph panel in Obsidian makes these visible as disconnected islands. Orphan clusters indicate missing MOCs or missing cross-domain links.

Tag sprawl. Using tags inconsistently or creating too many fine-grained tags. A vault with 500 unique tags across 5,000 notes averages 1 note per 10 tags — the tags are not useful for filtering. Consolidate to 20-50 high-level tags that map to your domain folders.

Link-heavy, content-light notes. Notes that consist entirely of wiki-links with no prose. These notes index poorly because the chunker has no text to embed. Add at least a paragraph of context explaining why the linked notes are related.

Bidirectional links for everything. Not every reference needs to be a wiki-link. Mentioning “OAuth” in passing does not require [[OAuth 2.0 Overview]]. Reserve wiki-links for intentional, navigable relationships where clicking the link would provide useful context.

Developer Workflow Recipes

Practical workflows that combine vault retrieval with daily development tasks.

Morning Context Load

Start the day by loading relevant context:

Search my vault for notes about [current project] updated in the last week

The retriever returns recent notes about your active project, giving you a quick refresher on where you left off. More effective than re-reading yesterday’s commit messages.

Research Capture During Coding

While implementing a feature, capture insights without leaving the editor:

/capture "FastAPI dependency injection with async generators requires yield,
not return. The generator is the dependency lifecycle."
  --domain programming
  --tags fastapi,dependency-injection

The captured insight is immediately indexed and available for future retrieval. Over months, these micro-captures build a corpus of implementation-specific knowledge.

Project Kickoff

When starting a new project or feature:

Search the vault: “What do I know about [technology/pattern]?”
Review the top 5 results for prior decisions and gotchas
Check if a MOC exists for the domain; if not, create one
Search for failure modes: “problems with [technology]”

Debugging with Vault Search

When encountering an error or unexpected behavior:

Search my vault for [error message or symptom]

Prior debugging notes often contain the root cause and fix. This is particularly valuable for recurring issues across projects — the vault remembers what you forget.

Code Review Preparation

Before reviewing a PR:

Search my vault for patterns and conventions about [module being changed]

The vault returns prior decisions, architectural constraints, and coding standards relevant to the code under review. The review is informed by institutional knowledge, not just the diff.

Performance Tuning

This section covers optimization strategies for different vault sizes and usage patterns.

Index Size Management

Vault Size	Chunks	DB Size	Full Reindex	Incremental
500 notes	~1,500	3 MB	15 seconds	<1 second
2,000 notes	~6,000	12 MB	45 seconds	2 seconds
5,000 notes	~15,000	30 MB	2 minutes	4 seconds
15,000 notes	~50,000	83 MB	4 minutes	<10 seconds
50,000 notes	~150,000	250 MB	15 minutes	30 seconds

At 50,000+ notes, consider: - Increasing the batch size from 64 to 128 for faster embedding - Using WAL mode (default) for concurrent access - Running full reindex during off-hours

Query Optimization

WAL mode. SQLite’s Write-Ahead Logging mode enables concurrent reads while the indexer writes:

db.execute("PRAGMA journal_mode=WAL")

This is critical when the MCP server handles queries while the indexer runs an incremental update.

Connection pooling. The MCP server should reuse database connections rather than opening a new connection per query. A single long-lived connection with WAL mode supports concurrent reads.

# MCP server initialization
db = sqlite3.connect(DB_PATH, check_same_thread=False)
db.execute("PRAGMA journal_mode=WAL")
db.execute("PRAGMA mmap_size=268435456")  # 256 MB mmap

Memory-mapped I/O. The mmap_size pragma tells SQLite to use memory-mapped I/O for the database file. For an 83 MB database, mapping the entire file into memory eliminates most disk reads.

FTS5 optimization. After a full reindex, run:

INSERT INTO chunks_fts(chunks_fts) VALUES('optimize');

This merges FTS5’s internal b-tree segments, reducing query latency for subsequent searches.

Scaling Benchmarks

Measured on Apple M3 Pro, 36 GB RAM, NVMe SSD:

Operation	500 notes	5K notes	15K notes	50K notes
BM25 query	2ms	5ms	12ms	25ms
Vector query	1ms	3ms	8ms	20ms
RRF fusion	<1ms	<1ms	3ms	5ms
Full search	3ms	8ms	23ms	50ms

All benchmarks include database access, query execution, and result formatting. Network latency for MCP STDIO communication adds 1-2ms.

Troubleshooting

Index Drift

Symptom: Search returns stale results or misses recently added notes.

Cause: The incremental indexer did not run after adding notes, or a file’s mtime was not updated (e.g., synced from another machine with preserved timestamps).

Fix: Run a full reindex: python index_vault.py --full

Embedding Model Swap

Symptom: After changing the embedding model, vector search returns nonsensical results.

Cause: Old vectors (from the previous model) are being compared against new query vectors. The dimensions or vector space semantics are incompatible.

Fix: The indexer should detect the model hash mismatch and trigger a full reindex automatically. If it does not, manually clear the database and reindex:

rm vectors.db
python index_vault.py --full

FTS5 Maintenance

Symptom: FTS5 queries return incorrect or incomplete results after many incremental updates.

Cause: FTS5 internal segments may become fragmented after many small updates.

Fix: Rebuild and optimize:

INSERT INTO chunks_fts(chunks_fts) VALUES('rebuild');
INSERT INTO chunks_fts(chunks_fts) VALUES('optimize');

MCP Timeout

Symptom: AI tool reports that the MCP server timed out.

Cause: The first query triggers model loading (lazy initialization), which takes 2-5 seconds. The AI tool’s default MCP timeout may be shorter.

Fix: Pre-warm the model on server startup:

# In MCP server initialization
retriever = HybridRetriever(db_path, vault_path)
retriever.search("warmup", limit=1)  # Trigger model load

SQLite File Locks

Symptom: SQLITE_BUSY or SQLITE_LOCKED errors.

Cause: Multiple processes writing to the database simultaneously. WAL mode allows concurrent reads but only one writer.

Fix: Ensure only one process (the indexer) writes to the database. The MCP server and hooks should only read. If you need concurrent writes, use WAL mode and set a busy timeout:

db.execute("PRAGMA busy_timeout=5000")  # Wait up to 5 seconds

sqlite-vec Not Loading

Symptom: Vector search is disabled; retriever runs in BM25-only mode.

Cause: The sqlite-vec extension is not installed, not found in the library path, or incompatible with the SQLite version.

Fix:

# Install via pip
pip install sqlite-vec

# Or compile from source
git clone https://github.com/asg017/sqlite-vec
cd sqlite-vec && make

Verify the extension loads:

import sqlite3
db = sqlite3.connect(":memory:")
db.enable_load_extension(True)
db.load_extension("vec0")
print("sqlite-vec loaded successfully")

Large Vault Memory Issues

Symptom: Out-of-memory errors during full reindex of a large vault (50,000+ notes).

Cause: Embedding batch size is too large, or all file contents are loaded into memory simultaneously.

Fix: Reduce the batch size and process files incrementally:

BATCH_SIZE = 32  # Reduce from 64

Also ensure the indexer processes files one at a time (reading, chunking, and embedding each file before moving to the next) rather than loading all files into memory.

Migration Guide

From Apple Notes

Export Apple Notes via the “Export All” option (macOS) or use a migration tool like apple-notes-liberator
Convert HTML exports to markdown using markdownify or pandoc
Move the converted files to your vault’s 00-inbox/ folder
Review and add frontmatter to each note
Move notes to appropriate domain folders

From Notion

Export from Notion: Settings → Export → Markdown & CSV
Unzip the export into your vault’s 00-inbox/ folder
Fix Notion-specific markdown artifacts:
Notion uses - [ ] for checklists — these are standard markdown
Notion includes property tables as HTML — convert to YAML frontmatter
Notion embeds images as relative paths — copy images to your attachments folder
Add standard frontmatter (type, domain, tags)
Replace Notion page links with Obsidian wiki-links

From Google Docs

Use Google Takeout to export all documents
Convert .docx files to markdown: pandoc -f docx -t markdown input.docx -o output.md
Batch convert: for f in *.docx; do pandoc -f docx -t markdown "$f" -o "${f%.docx}.md"; done
Move to vault, add frontmatter, organize into folders

From Plain Markdown (No Obsidian)

If you already have a directory of markdown files:

Open the directory as an Obsidian vault (Obsidian → Open Vault → Open folder)
Add .obsidian/ to .gitignore if the directory is version-controlled
Create frontmatter templates and apply to existing files
Start linking notes with [[wiki-links]] as you read and organize
Run the indexer immediately — the retrieval system works on day one

From Another Retrieval System

If you are migrating from a different embedding/search system:

Do not try to migrate vectors. Different models produce incompatible vector spaces. Run a full reindex with the new model.
Migrate the content, not the index. The vault files are the source of truth. The index is a derived artifact.
Verify after migration. Run 10-20 queries you know the answers to and verify the results match your expectations.

Changelog

Date	Change
2026-07-07	Accuracy corrections. MCPVault clarified as its own project (npm `@bitbonsai/mcpvault`, repo `bitbonsai/mcpvault`), now v0.12.1, carrying two medium-severity path-filter advisories (GHSA-9c83-rr99-vfwj, GHSA-j99q-93c9-h869) — the earlier `[^24]` link pointed at the wrong repo (`MarkusPfundstein/mcp-obsidian`). Corrected `MarkusPfundstein/mcp-obsidian`’s status: it is actively maintained (commits through May 15, 2026, adding `search_by_tag`/`get_frontmatter`), not “dormant since June 2025”; it still ships no tagged releases. Verified against GitHub commit history, GitHub Security Advisories, and npm.
2026-07-06	Editorial restructure for findability: retitled “Quick Start: First AI-Connected Vault” to Obsidian MCP Setup (anchor `#obsidian-mcp-setup`) and added a “What Claude can do once connected” capability summary (search, read, list, formatted context; read-only boundary with writes handled by hooks) consolidated from the MCP Server Architecture section. No new facts; internal links updated.
2026-06-10	Version-currency bump. Obsidian 1.13.1 desktop reached the public channel (June 9, 2026) — a settings-UX + CodeMirror upgrade over 1.13.0, with no major AI/automation change. Current-version body references moved from 1.13.0 to 1.13.1 (public, June 9 2026).
2026-06-09	Ecosystem refresh. MCP 2026-07-28 specification entered Release Candidate (announced May 21, 2026) — the largest MCP revision since launch: stateless protocol core (removes the `initialize` handshake and `Mcp-Session-Id`), MCP Apps (server-rendered HTML in sandboxed iframes), Tasks graduating from experimental core to an official extension, OAuth 2.0/OIDC hardening, and a 12-month deprecation-lifecycle policy (final spec July 28, 2026); replaced the speculative “tentatively mid-2026” roadmap framing in the MCP Spec Evolution note with the concrete RC. sqlite-vec v0.1.10-alpha (Mar 31 – May 18, 2026) adds approximate-nearest-neighbor index types (`rescore`, experimental `ivf`, disk-based `DiskANN`) beyond brute-force KNN — flagged as on-the-horizon/experimental since the 0.1.10 line is still pre-release. Obsidian 1.13.0 desktop (early access, May 28, 2026) bumped as the current version across body references; it is a UX/security/dev-tooling release with no new AI/automation capabilities.
2026-06-08	Maintenance check. Model2Vec v0.8.2 (May 29, 2026) released: a maintenance release adding a frozen-weights option for training, plus multiword-token fixes, a training refactor, and non-quantized weight-handling fixes; footnote updated. Nothing else newer than the existing baseline: Obsidian latest remains 1.13.0 (May 28, already documented below), sqlite-vec stable remains v0.1.9 (v0.1.10 still alpha), and the MCP specification remains the 2025-11-25 revision. No body change beyond the Model2Vec version note.
2026-05-28	Obsidian 1.13.0 desktop + 1.13.0 mobile (Catalyst early-access) released. Desktop: revamped Settings panel that opens in its own window with built-in search and keyboard navigation; Obsidian URIs now present a confirmation dialog before firing actions; new warning before loading HTML resources from network drives; Search added to the Bookmarks view; enhanced editor image handling; File Explorer / Properties / Sync improvements; numerous developer-API and bug fixes. Mobile: new iOS Share Sheet with configurable target locations; tab reordering from the tab switcher; tablet press-and-hold gestures to resize splits and pinned sidebars; Bases gains a menu item to resize columns in table views; iOS and search bug fixes. Implications for AI workflows: the confirmation dialog on Obsidian URIs adds a deliberate gate to URI-driven MCP/agent integrations; the Bases column-resize menu makes Bases more usable as a vault-front index that agents query; the iOS Share Sheet configurable target makes the iPhone capture path (already documented as the primary intake) faster to wire up for Claude/Codex pipelines.
2026-05-06	Refresh source-verified currentness: Smart Connections v4.5.0 moved footer connections into Core; sqlite-vec v0.1.8/v0.1.9 stable releases updated packaging and DELETE behavior; Model2Vec v0.8.x updated tokenizer/persistence internals and benchmark tables; correct the Obsidian CLI chronology from “1.12.7 introduced CLI” to “1.12.0 introduced CLI, 1.12.7 improved installation/runtime packaging.”
2026-04-27	Web Clipper April cycle: 1.4.0 (interactive YouTube transcript UI + Open in Reader default), 1.5.0 (Highlights viewer), 1.6.0 (Highlighter UX overhaul + Defuddle 0.18 source extractors for LinkedIn/Threads/Bluesky/Discourse/Medium), 1.6.1 + 1.6.2 (Reader and Safari fixes). Reframe Web Clipper as the primary browser-side intake path for AI workflows rather than a passing bookmark mention. No Obsidian desktop, Sync, or Bases releases in the window.
2026-04-16	Smart Connections v4.3.0 (graph view, configurable dock, block-embedding recovery, Substrate cross-plugin env). Document April 2026 AI-native plugin wave (Cortex, VaultSearch, LLM Wiki, Drift, EngramQuest, Hybrid Search MCP). Flag `MarkusPfundstein/mcp-obsidian` as maintenance-mode (last commit June 2025). Dataview dormant; Bases is the successor for new work. Obsidian CLI 1.12.7 continues to be the preferred bridge for AI assistants.
2026-04-01	Add Obsidian CLI section (v1.12 commands for AI workflows). Add agent plugin section (Claudian, Agent Client). Document Bases core plugin for vault organization. Update plugin count to 2,500+. Add iOS Share Extension as intake source. Update compatibility matrix with embedded agent plugins.
2026-03-30	MCPVault v0.11.0: `list_all_tags` tool, `.base`/`.canvas` support, renamed to `@bitbonsai/mcpvault`. Obsidian Desktop v1.12.7 bundles CLI binary for faster terminal interactions.
2026-03-23	Document sqlite-vec v0.1.7 stable: DELETE support for vec0 tables, KNN distance constraints for pagination. DiskANN approximate nearest neighbor index announced for upcoming release.
2026-03-07	Add potion-multilingual-128M (101 languages, May 2025) to embedding model comparison. sqlite-vec at v0.1.7-alpha.10 (CI/CD fixes, no feature changes). MCP spec and retrieval techniques confirmed current.
2026-03-03	Update MCP spec evolution (Nov 2025 shipped: Streamable HTTP, .well-known, tool annotations). Add Model2Vec fine-tuning and BPE/Unigram tokenizer support. Add community MCP server comparison table. Update Smart Connections to v4.
2026-03-02	Add potion-base-32M and potion-retrieval-32M to model comparison. Add quantization/dimensionality reduction section. Add MCP spec evolution note.
2026-03-01	Initial release

References

Internet Vin, “22 commands I use with Obsidian and Claude Code,” March 2026, x.com/internetvin/status/2026461256677245131. ↩
Nicopreme, “Visual Explainer” agent skill with slash commands, x.com/nicopreme/status/2023495040258261460. ↩
Cormack, G.V., Clarke, C.L.A., and Buettcher, S. Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods. SIGIR, 2009. Introduces RRF with k=60 as a parameter-free method for combining ranked lists. ↩↩↩
OpenAI Embeddings Pricing. text-embedding-3-small: $0.02 per million tokens. Estimated vault cost per full reindex: ~$0.30. ↩
van Dongen, T. et al. Model2Vec: Turn any Sentence Transformer into a Small Fast Model. arXiv, 2025. Describes the distillation approach producing static embeddings from sentence transformers. ↩
potion-base-8M Model Card and Model2Vec results. Current published tables report potion-base-8M at 51.32 Avg (All) / 51.08 Avg (MTEB), compared with all-MiniLM-L6-v2 at 55.80 Avg (All) / 55.93 Avg (MTEB), or roughly 92% retention on the all-task score. ↩
Model Context Protocol Specification. The MCP standard for connecting AI tools to data sources. ↩
Model2Vec Potion Models, potion-base-32M, and potion-retrieval-32M. Current model cards report potion-base-32M at 52.83 Avg (All) and potion-retrieval-32M at 35.06 on the retrieval table. ↩↩↩
Update on the Next MCP Protocol Release. November 2025 release shipped Streamable HTTP transport, .well-known URL discovery, structured tool annotations, and SDK tier standardization. Next release tentatively mid-2026 with async operations, domain-specific extensions, and agent-to-agent communication. ↩
Model2Vec Releases. v0.4.0 (Feb 2025): training/fine-tuning support. v0.5.0 (Apr 2025): backend rewrite, quantization, dimensionality reduction. v0.7.0 (Oct 2025): vocabulary quantization, BPE/Unigram tokenizer support. v0.8.0/v0.8.1 (Mar 2026): tokenizer and persistence refactors, Python 3.9 deprecation, MTEB V2 result updates, and Windows path compatibility. v0.8.2 (May 29, 2026): a maintenance release adding a frozen-weights option for training, plus multiword-token fixes, a training refactor, and non-quantized weight-handling fixes. ↩↩
Smart Connections for Obsidian. Smart Connections v4: local-first AI embeddings, semantic search works offline after initial indexing. ↩
potion-multilingual-128M. Minish Lab, May 2025. 101-language static embedding model, best performing multilingual static embeddings. Same numpy-only dependency as other potion models. ↩
MCPVault — bitbonsai/mcpvault. npm @bitbonsai/mcpvault, latest v0.12.1 (published 2026-06-23); a distinct project from MarkusPfundstein/mcp-obsidian, not a rename of it. v0.11.0 (March 2026) added the list_all_tags tool for scanning frontmatter and hashtags with counts, improved dotted-folder handling, and .base/.canvas file support. Two medium-severity GitHub Security Advisories affect its path filter: GHSA-9c83-rr99-vfwj (restricted directories denied only at vault root, not nested) and GHSA-j99q-93c9-h869 (deny-list bypass via case and trailing dot/space equivalence) — run v0.12.1 or later. ↩
sqlite-vec v0.1.7 Release. March 17, 2026. Stable release: DELETE support for vec0 virtual tables, KNN distance constraints for pagination, fuzz testing improvements. DiskANN approximate nearest neighbor indexing announced for future release. ↩↩↩
Introduction to Bases. Obsidian core plugin introduced in v1.9.10. Database-like views (tables, galleries, calendars, kanban boards) over vault files using frontmatter properties as fields. Files saved as .base format. ↩
Obsidian Desktop v1.12.0 Changelog and Obsidian Desktop v1.12.7 Changelog. v1.12.0 introduced the CLI for terminal-based vault automation; v1.12.7 improved installation/runtime packaging with a standalone binary, TUI, and socket-file behavior. See also the CLI documentation. ↩↩
Claudian. Obsidian plugin that embeds Claude Code as an AI collaborator in the vault. Provides sidebar chat, context-aware prompts, vision support, slash commands, and permission modes. ↩
Agent Client. Obsidian plugin providing a unified interface for Claude Code, Codex CLI, and Gemini CLI via Agent Client Protocol (ACP). Supports note mentions, shell execution, and action approval. ↩
Obsidian iOS Changelog. Early 2026 updates include Share Extension for saving content from other apps directly to the vault, Daily Note and Bookmark widget fixes, and View Note widget refresh improvements. ↩
MarkusPfundstein/mcp-obsidian. Actively maintained — commits through May 15, 2026, with recent work adding tools including search_by_tag and get_frontmatter plus expanded test coverage (verified against the repository’s commit history and tools.py). Still ships no tagged releases, so install from a pinned commit. Local-REST-API-based; forum discussions (April 2026) report community migration toward the first-class Obsidian CLI bridge (1.12.x) for new setups, but mcp-obsidian remains a working, updated option for existing REST-API deployments. ↩↩
Smart Connections v4.5.0 Release. May 5, 2026. Footer connections became a Core feature; recent v4 releases also include graph views for connection lists, configurable connection-panel locations, improved block-embedding recovery, Substrate cross-plugin state, transformer fallback fixes, and reduced duplicate connection calculations. ↩
obsidianmd/obsidian-clipper releases — primary source for the Web Clipper version-feature mapping. April 2026 cycle: 1.4.0 (Apr 9, YouTube transcript UI + Open in Reader default), 1.5.0 (Apr 15, Highlights viewer + Reader fade-in), 1.5.1 (Apr 15, webpack compilation fix), 1.6.0 (Apr 21, Highlighter UX + Defuddle 0.18 with LinkedIn/Threads/Bluesky/Discourse/Medium extractors), 1.6.1 (Apr 22, Reader outline fixes + highlights search), 1.6.2 (Apr 23, Safari embedded-mode clipboard fix). Cross-listed on the Mozilla Add-ons store and Chrome Web Store. ↩
sqlite-vec v0.1.8, sqlite-vec v0.1.9, sqlite-vec v0.1.10-alpha.3, and sqlite-vec v0.1.10-alpha.4. v0.1.8 fixed npm packaging; v0.1.9 fixed a DELETE bug for metadata text columns longer than 12 characters; v0.1.10-alpha.3 adds proper INSERT OR REPLACE INTO support; v0.1.10-alpha.4 (May 18, 2026) fixes ALTER TABLE RENAME failing on vec0 tables using the new ivf/diskann features and a cached-statement cleanup bug in DiskANN. The 0.1.10 line is still prerelease. ↩↩↩↩
MCP 2026-07-28 Specification Release Candidate. Announced May 21, 2026; final spec ships July 28, 2026. Largest MCP revision since launch: stateless protocol core (removes the initialize handshake and Mcp-Session-Id header), MCP Apps (server-rendered HTML in sandboxed client iframes), Tasks graduating from experimental core to an official extension (tasks/get, tasks/update, tasks/cancel), OAuth 2.0 / OIDC authorization hardening, and a 12-month feature-deprecation lifecycle policy. ↩
Obsidian Desktop v1.13.0 Changelog. Early access, May 28, 2026. UX/security/developer-tooling release: revamped Settings panel that opens in its own window with search and keyboard navigation, confirmation dialogs before Obsidian URIs fire, a new Settings API for plugin developers, and a CLI fix for flatpak installs. No major new AI/automation capabilities beyond the 1.12.x CLI surface. ↩↩
Obsidian Changelog. Obsidian 1.13.1 desktop reached the public channel on June 9, 2026 — a settings-UX refinement and CodeMirror upgrade over 1.13.0, with no new AI/automation capability. ↩↩

Obsidian MCP Guide: AI Search & Retrieval (2026)

Key Takeaways

How to Use This Guide

Why Obsidian for AI Infrastructure

What Obsidian gives AI that alternatives do not

What a filesystem alone does not give you

What Obsidian does NOT do (and what you build)

Obsidian MCP Setup

Prerequisites

Step 1: Create a vault

Step 2: Install an MCP server

Step 3: Configure your AI tool

Step 4: Run your first query

What Claude can do once connected

What you just built

Obsidian CLI for AI Workflows

Why the CLI matters for AI infrastructure

CLI vs MCP for AI access

Example: CLI-powered intake hook

Obsidian Agent Plugins

Claudian

Agent Client

When to use agent plugins vs external MCP

Decision Framework: Obsidian vs Alternatives

Decision Tree

Comparison Matrix

When Obsidian is overkill

When Obsidian is the right choice

Mental Model: Three Layers

Vault Architecture for AI Consumption

Folder Structure

The .indexignore file

Note Schema

Chunking Conventions

What Not to Put in Notes

Plugin Ecosystem for AI Workflows

Essential Plugins

Plugins That Help Indexing

Plugins That Hurt Indexing

Plugin Configuration That Matters

Embedding Models: Choosing and Configuring

Why Model2Vec potion-base-8M

Configuration

When to Choose Alternatives

Quantization and Dimensionality Reduction

Fine-Tuning for Vault-Specific Embeddings

Model Hash Tracking

Failure Modes

Full-Text Search with FTS5

FTS5 Virtual Table

BM25 Ranking

When BM25 Wins

When BM25 Fails

FTS5 Tokenizer

Maintenance

Vector Search with sqlite-vec

sqlite-vec Virtual Table

Embedding Pipeline

Vector Serialization

KNN Query

KNN Pagination with Distance Constraints

DELETE Support in vec0 Tables

When Vector Search Wins

When Vector Search Fails

Graceful Degradation

Reciprocal Rank Fusion (RRF)

The Algorithm

Why RRF Over Alternatives

Fusion in Practice

Implementation

Tuning k

Tie-Breaking

The Complete Retrieval Pipeline

End-to-End Flow

The Search API

Token Budget Truncation

Database Schema (Complete)

Graceful Degradation Path

Production Stats

Content Hashing and Change Detection

The `.indexignore` file

The `--incremental` Flag

The `/capture` Skill