Install the package

By Codcompass Team·2026-05-10·6 min read

Local Deep Research (LDR): Self-Hosted AI Research Assistant

Current Situation Analysis

Modern AI workflows for technical writing, literature reviews, and competitive analysis face critical failure modes when relying on traditional single-turn LLMs or cloud-based research tools. The primary pain points include:

Hallucination & Lack of Citations: Standard chat interfaces generate plausible but unverified paragraphs without source attribution, making them unsuitable for rigorous research.
Data Sovereignty Violations: Cloud APIs inherently route queries and context windows through third-party infrastructure, violating compliance requirements for sensitive or proprietary domains.
Fragmented Knowledge Accumulation: Manual research or basic RAG pipelines do not automatically curate, index, and compound findings into a searchable local library over time.
Architectural Overhead: Building iterative search-synthesis loops with multi-source retrieval (arXiv, PubMed, web, local docs) requires complex orchestration, custom retrievers, and state management that most teams lack the bandwidth to maintain.

Traditional methods fail because they treat research as a single inference step rather than an iterative, source-validated workflow. LDR addresses this by decoupling the research loop from the model provider, enforcing local-first data handling, and automating the synthesis-to-citation pipeline.

WOW Moment: Key Findings

Benchmarking and architectural validation reveal that LDR bridges the gap between commercial deep research platforms and self-hosted privacy-preserving systems. The iterative search-synthesis loop, combined with zero-knowledge encryption and compounding local knowledge bases, delivers enterprise-grade research capabilities without cloud dependency.

Approach	Accuracy (SimpleQA)	Source Citation Rate	Data Privacy Model	Knowledge Base Accumulation
Traditional LLM (ChatGPT/Claude)	~70-75%	Low (None/Implicit)	Cloud-Only	None
Commercial Deep Research Tools	~85-90%	High	Cloud-Proprietary	Limited/Platform-Locked
Local Deep Research (LDR)	~95% (GPT-4.1-mini)	High (Explicit/Verifiable)	Zero-Knowledge / Fully Local	Compounding / Searchable

Key Findings:

Iterative sub-query decomposition + multi-source retrieval (SearXNG, arXiv, PubMed, local docs) significantly reduces hallucination rate

s compared to single-pass generation.

SQLCipher AES-256 encryption with zero-knowledge design ensures that even infrastructure administrators cannot access research data or API keys.
Local knowledge base indexing enables future queries to leverage historically collected sources, creating a compounding research asset.

Core Solution

LDR operates on a deterministic research loop: query ingestion → strategy selection → sub-query decomposition → multi-source retrieval → iterative synthesis → citation-backed report generation → local library indexing. The architecture supports both local and cloud LLMs via Ollama or direct API integration, while SearXNG handles privacy-preserving meta-search.

Installation & Deployment

Option 1: Docker (Recommended) Handles dependencies, encryption, and service wiring automatically.

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d

Wait about 30 seconds, then open http://localhost:5000.

With NVIDIA GPU acceleration (Linux only):

First install the NVIDIA Container Toolkit:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor \
  -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update && sudo apt-get install nvidia-container-toolkit -y
sudo systemctl restart docker
nvidia-smi  # verify it worked

Then bring up the stack with GPU support:

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.gpu.override.yml
docker compose -f docker-compose.yml -f docker-compose.gpu.override.yml up -d

The Docker Compose setup bundles Ollama (local LLM runner) and SearXNG (self-hosted meta-search engine) together with LDR. Everything runs locally.

Option 2: pip (For developers / Python integration)

# Install the package
pip install local-deep-research

# Run SearXNG in Docker for search
docker run -d -p 8080:8080 --name searxng searxng/searxng

# Install Ollama from https://ollama.ai, then pull a model
ollama pull gemma3:12b

# Start the web UI
python -m local_deep_research.web.app

Important note on encryption: The pip install does not automatically set up SQLCipher (the AES-256 encrypted database LDR uses for storing your data and API keys). If you hit errors during setup, bypass it for now with:

export LDR_ALLOW_UNENCRYPTED=true

This stores data in plain SQLite. Fine for local dev, not recommended for production or shared setups. Docker handles encryption out of the box.

API Integration

Python API

from local_deep_research.api import LDRClient, quick_query

# One-liner research
summary = quick_query("username", "password", "What is the current state of Rust async runtimes?")
print(summary)

# Client for more control
client = LDRClient()
client.login("username", "password")
result = client.quick_research("Compare FAISS vs Hnswlib for vector search at scale")
print(result["summary"])

HTTP API

import requests
from bs4 import BeautifulSoup

session = requests.Session()

# Get CSRF token from login page
login_page = session.get("http://localhost:5000/auth/login")
soup = BeautifulSoup(login_page.text, "html.parser")
csrf = soup.find("input", {"name": "csrf_token"}).get("value")

# Authenticate
session.post("http://localhost:5000/auth/login", data={
    "username": "user",
    "password": "pass",
    "csrf_token": csrf
})

# Get API CSRF token
api_csrf = session.get("http://localhost:5000/auth/csrf-token").json()["csrf_token"]

# Submit a research query
response = session.post(
    "http://localhost:5000/api/start_research",
    json={"query": "What are the tradeoffs between gRPC and REST for internal microservices?"},
    headers={"X-CSRF-Token": api_csrf}
)
print(response.json())

The repository includes ready-to-run HTTP examples under examples/api_usage/http/ that handle authentication, retry logic, and progress polling.

Enterprise / RAG Integration

LDR integrates with existing vector stores via LangChain retrievers, enabling hybrid search across live web results and internal knowledge bases.

from local_deep_research.api import quick_summary

result = quick_summary(
    query="What are our current deployment procedures for the payments service?",
    retrievers={"internal_kb": your_langchain_retriever},
    search_tool="internal_kb"
)

It supports FAISS, Chroma, Pinecone, Weaviate, Elasticsearch, and anything LangChain-compatible. This is where the tool gets interesting for teams — you can combine live web search with your own internal documents in a single research pass.

Pitfall Guide

SQLCipher Encryption Bypass in pip Install: The LDR_ALLOW_UNENCRYPTED=true environment variable forces plain SQLite storage. While useful for local debugging, it exposes API keys and research data in plaintext. Always use Docker for production or manually configure SQLCipher when deploying via pip.
Hardware Resource Contention: Running Ollama, SearXNG, and the LDR application stack simultaneously demands significant RAM and CPU/GPU resources. Systems with <16GB RAM or without GPU acceleration will experience severe latency during iterative synthesis and local model inference.
Misaligned Use Cases: LDR is engineered for deep, multi-source research workflows. Using it for simple Q&A or quick fact-checking introduces unnecessary orchestration overhead. Reserve it for literature reviews, technical analysis, and compounding knowledge discovery.
Zero-Knowledge Password Recovery Limitation: The security model intentionally omits password recovery mechanisms to maintain zero-knowledge architecture. If credentials are lost, the encrypted SQLCipher database becomes permanently inaccessible. Implement secure credential management (e.g., hardware keys, password managers) before deployment.
GPU Passthrough Configuration Gaps: NVIDIA acceleration is Linux-only and requires the NVIDIA Container Toolkit, proper Docker daemon configuration, and correct docker-compose.gpu.override.yml mounting. Missing steps will cause silent fallback to CPU inference, drastically slowing research loops.
Data Leakage via Search Engines: Selecting a local LLM does not guarantee zero network traffic. SearXNG and premium search APIs (Tavily, Google, Brave) still route queries externally. Review your search source configuration to ensure compliance with internal data handling policies.

Deliverables

Deployment Blueprint: Architecture diagram detailing the Ollama + SearXNG + LDR container orchestration, SQLCipher encryption flow, and LangChain retriever integration paths for hybrid RAG.
Pre-Flight Checklist: Hardware requirements (CPU/GPU/RAM), dependency validation (Docker, NVIDIA Toolkit, Ollama models), security configuration (CSRF tokens, API keys, encryption status), and network port mapping verification.
Configuration Templates: Production-ready docker-compose.yml, GPU override manifests, .env templates for API key rotation, and LangChain retriever configuration snippets for FAISS/Chroma/Pinecone integration.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back

Local Deep Research (LDR): Self-Hosted AI Research Assistant

Current Situation Analysis

WOW Moment: Key Findings

🎉 Mid-Year Sale — Unlock Full Article

Production Bundle