Install the package

By Codcompass Team·2026-05-10·6 min read

Install the package

Current Situation Analysis

Traditional AI research workflows suffer from critical failure modes when tasked with complex, multi-source queries. Standard LLM chat interfaces operate on single-turn generation, frequently producing hallucinated paragraphs without verifiable citations. They lack iterative research loops, cannot natively traverse academic databases (arXiv, PubMed) or local document repositories, and force data exfiltration to third-party cloud APIs. This creates three primary pain points:

Verification Debt: Engineers and researchers spend disproportionate time cross-referencing AI outputs against original sources.
Privacy & Compliance Risks: Sensitive internal queries, proprietary codebases, or regulated data cannot be safely processed by commercial deep-research APIs.
Fragmented Knowledge Bases: Each research session is ephemeral. There is no compounding, searchable library that grows with each query, forcing teams to rebuild context repeatedly.

Traditional RAG pipelines often fail here because they rely on static vector embeddings and lack the dynamic, multi-step search synthesis required for open-ended research questions. They also struggle to balance live web retrieval with local document indexing without heavy custom orchestration.

WOW Moment: Key Findings

Local Deep Research (LDR) closes the gap between commercial cloud-based research agents and self-hosted infrastructure by implementing an iterative search-synthesis loop with persistent, encrypted local storage. Benchmark testing against the SimpleQA dataset demonstrates parity with enterprise-grade tools while maintaining full data sovereignty.

Approach	Citation Accuracy (SimpleQA)	Source Diversity	Data Privacy	Iterative Synthesis	Setup Complexity
Traditional LLM Chat	~45-60%	Low (Training data only)	Cloud-dependent	None	Low
Commercial Deep Research (Cloud)	~85-90%	High (Web/Academic)	Third-party API	Yes	Low
Local Deep Research (LDR)	~90-95%	High (Web/Academic/Local)	Fully Local/Zero-Knowledge	Yes	Moderate

Key Findings:

LDR achieves ~95% accuracy on SimpleQA when paired with GPT-4.1-mini and SearXNG, matching commercial benchmarks.
The iterative discard/expand loop filters low-quality content dynamically, reducing hallucination rates by ~40% compared to single-pass RAG.
SQLCipher (AES-256) encryption ensures zero-knowledge storage; even server administrators cannot decrypt user research libraries.
Full local execution (Ollama + SearXNG) eliminates API costs and data leakage, with WebSocket support

enabling real-time progress tracking.

Core Solution

LDR operates as a self-hosted AI research assistant that orchestrates multi-source retrieval, iterative synthesis, and structured report generation. The architecture bundles three core components:

Ollama: Local LLM inference engine
SearXNG: Self-hosted meta-search engine
SQLCipher: Encrypted SQLite database for persistent knowledge storage

Deployment Architecture

Option 1: Docker (Recommended) Handles dependency resolution, encryption initialization, and service wiring automatically.

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d

Wait about 30 seconds, then open http://localhost:5000.

With NVIDIA GPU acceleration (Linux only): First install the NVIDIA Container Toolkit:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor \
  -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update && sudo apt-get install nvidia-container-toolkit -y
sudo systemctl restart docker
nvidia-smi  # verify it worked

Then bring up the stack with GPU support:

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.gpu.override.yml
docker compose -f docker-compose.yml -f docker-compose.gpu.override.yml up -d

Option 2: pip (For developers / Python integration)

# Install the package
pip install local-deep-research

# Run SearXNG in Docker for search
docker run -d -p 8080:8080 --name searxng searxng/searxng

# Install Ollama from https://ollama.ai, then pull a model
ollama pull gemma3:12b

# Start the web UI
python -m local_deep_research.web.app

Important note on encryption: The pip install does not automatically set up SQLCipher (the AES-256 encrypted database LDR uses for storing your data and API keys). If you hit errors during setup, bypass it for now with:

export LDR_ALLOW_UNENCRYPTED=true

This stores data in plain SQLite. Fine for local dev, not recommended for production or shared setups. Docker handles encryption out of the box.

Programmatic Integration

Python API:

from local_deep_research.api import LDRClient, quick_query

# One-liner research
summary = quick_query("username", "password", "What is the current state of Rust async runtimes?")
print(summary)

# Client for more control
client = LDRClient()
client.login("username", "password")
result = client.quick_research("Compare FAISS vs Hnswlib for vector search at scale")
print(result["summary"])

HTTP API: LDR exposes a REST API with session-based authentication and CSRF protection. The auth flow is a bit verbose but works reliably:

import requests
from bs4 import BeautifulSoup

session = requests.Session()

# Get CSRF token from login page
login_page = session.get("http://localhost:5000/auth/login")
soup = BeautifulSoup(login_page.text, "html.parser")
csrf = soup.find("input", {"name": "csrf_token"}).get("value")

# Authenticate
session.post("http://localhost:5000/auth/login", data={
    "username": "user",
    "password": "pass",
    "csrf_token": csrf
})

# Get API CSRF token
api_csrf = session.get("http://localhost:5000/auth/csrf-token").json()["csrf_token"]

# Submit a research query
response = session.post(
    "http://localhost:5000/api/start_research",
    json={"query": "What are the tradeoffs between gRPC and REST for internal microservices?"},
    headers={"X-CSRF-Token": api_csrf}
)
print(response.json())

The repository includes ready-to-run HTTP examples under examples/api_usage/http/ that handle authentication, retry logic, and progress polling.

Enterprise / RAG Integration

LDR integrates with existing vector stores via LangChain retrievers, enabling hybrid search across live web results and internal knowledge bases:

from local_deep_research.api import quick_summary

result = quick_summary(
    query="What are our current deployment procedures for the payments service?",
    retrievers={"internal_kb": your_langchain_retriever},
    search_tool="internal_kb"
)

Supported backends include FAISS, Chroma, Pinecone, Weaviate, Elasticsearch, and any LangChain-compatible retriever.

Search Sources & LLM Configuration

Free (no API key needed): arXiv, PubMed, Semantic Scholar, Wikipedia, SearXNG, GitHub, The Guardian, Wikinews, Wayback Machine. Premium (API key required): Tavily, Google (SerpAPI/Programmable Search), Brave Search. Local LLMs: Llama 3, Mistral, Gemma, DeepSeek, and any Ollama-supported model. Cloud LLMs: OpenAI (GPT-4, GPT-4.1-mini), Anthropic (Claude 3), Google (Gemini), 100+ models via OpenRouter.

Pitfall Guide

SQLCipher Encryption Bypass in pip Install: The pip install path does not auto-configure SQLCipher. Setting LDR_ALLOW_UNENCRYPTED=true drops data into plain SQLite, which is acceptable for local development but violates security compliance in production or multi-user environments. Always validate encryption status before deploying to shared infrastructure.
Hardware Constraints for Local LLM Execution: Running Ollama + SearXNG + LDR concurrently demands significant RAM and VRAM. CPU-only setups will experience severe latency during synthesis loops. A dedicated GPU (8GB+ VRAM recommended) is required for acceptable throughput, and NVIDIA Container Toolkit misconfiguration is a common deployment blocker.
CSRF Token Handling in HTTP API: The REST API enforces strict CSRF protection. Failing to extract the token from the login page HTML and attach it to subsequent requests will result in 403 Forbidden errors. Always parse the <input name="csrf_token"> value and include it in both authentication and research submission headers.
Over-Provisioning for Simple Q&A Workloads: LDR is engineered for multi-step research synthesis, not conversational Q&A. Routing simple factual queries through the iterative search loop introduces unnecessary latency and resource consumption. Reserve LDR for literature reviews, competitive analysis, and complex technical investigations.
GPU Passthrough & NVIDIA Container Toolkit Configuration: Docker GPU acceleration requires explicit --gpus all flags and proper NVIDIA driver/container toolkit alignment. Mismatched driver versions or missing nvidia-container-toolkit packages will cause silent fallback to CPU execution, degrading performance by 10-50x. Verify with nvidia-smi inside the container.
Ignoring Zero-Knowledge Password Recovery Limits: LDR's zero-knowledge architecture means there is no password recovery mechanism. If you lose credentials, the SQLCipher database becomes permanently inaccessible. Implement secure credential management (e.g., HashiCorp Vault, Bitwarden) and maintain encrypted backups of the database volume.

Deliverables

📘 Deployment & Architecture Blueprint A structured reference covering the LDR service topology (Ollama → SearXNG → LDR Core → SQLCipher), network port mapping, GPU passthrough requirements, and hybrid RAG integration patterns. Includes environment variable matrices for cloud vs. local LLM routing.

✅ Pre-Flight & Integration Checklist

Verify NVIDIA driver & container toolkit compatibility (Linux/GPU path)
Validate SQLCipher encryption initialization or explicitly acknowledge unencrypted fallback
Configure SearXNG instance and confirm meta-search endpoint responsiveness
Pull target Ollama model and verify VRAM allocation via ollama ps
Test CSRF token extraction flow before automating HTTP API calls
Map LangChain retrievers to internal vector stores (FAISS/Chroma/Pinecone)
Establish credential backup strategy aligned with zero-knowledge constraints
Run SimpleQA benchmark query to validate synthesis loop & citation accuracy

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back

Install the package

Current Situation Analysis

WOW Moment: Key Findings

🎉 Mid-Year Sale — Unlock Full Article

Production Bundle