side the same transactional boundary as your application data, you stop building reconciliation logic and start shipping features.
Core Solution
Implementing semantic search with pgvector requires a disciplined approach to schema design, index selection, embedding lifecycle management, and query optimization. The following implementation uses TypeScript with the pg driver, but the architecture applies equally to Drizzle, Prisma, or direct SQL clients.
Step 1: Install Extension and Define Schema
pgvector must be installed as a Postgres extension. Once active, you can use the vector data type with a fixed dimension.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536), -- Must match your embedding model's output dimension
model_version TEXT NOT NULL DEFAULT 'openai-text-embedding-3-small',
created_at TIMESTAMPTZ DEFAULT NOW()
);
Step 2: Generate and Store Embeddings
Embeddings should be generated asynchronously or during write-time depending on your latency budget. The following TypeScript module handles batch insertion with proper parameterization.
import { Pool } from 'pg';
import { createClient } from '@supabase/supabase-js'; // or your preferred embedding SDK
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
async function generateEmbedding(text: string): Promise<number[]> {
// Replace with OpenAI, Cohere, or local model (e.g., ONNX/TensorRT)
const response = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
},
body: JSON.stringify({
input: text,
model: 'text-embedding-3-small',
dimensions: 1536,
}),
});
const data = await response.json();
return data.data[0].embedding;
}
export async function insertDocument(content: string, title: string): Promise<string> {
const embedding = await generateEmbedding(content);
const query = `
INSERT INTO documents (title, content, embedding, model_version)
VALUES ($1, $2, $3, $4)
RETURNING id;
`;
const res = await pool.query(query, [title, content, `[${embedding.join(',')}]`, 'openai-text-embedding-3-small']);
return res.rows[0].id;
}
Step 3: Create an HNSW Index
HNSW (Hierarchical Navigable Small World) is the default recommendation for production semantic search. It trades memory for faster queries and higher recall.
CREATE INDEX documents_embedding_idx ON documents
USING hnsw (embedding vector_cosine_ops);
Step 4: Query with Semantic Similarity
Postgres provides three distance operators: <=> (cosine), <-> (L2/Euclidean), <#> (inner product). Cosine is standard for normalized embeddings.
export async function semanticSearch(queryText: string, limit: number = 5): Promise<any[]> {
const queryEmbedding = await generateEmbedding(queryText);
const vectorStr = `[${queryEmbedding.join(',')}]`;
const query = `
SELECT id, title, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT $2;
`;
const res = await pool.query(query, [vectorStr, limit]);
return res.rows;
}
Architecture Decisions and Rationale
- Index Selection: HNSW is preferred for workloads under 10M vectors where recall and latency matter. IVFFlat (
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100)) is better for memory-constrained environments or datasets exceeding 50M vectors, but requires tuning lists and probes.
- Distance Metric: Cosine (
<=>) is standard for LLM embeddings because they are typically normalized to unit length. If your model outputs unnormalized vectors or you're using raw dot products, switch to inner product (<#>). L2 (<->) is rarely optimal for semantic search but useful for clustering tasks.
- Batch Processing: Single-row inserts trigger index updates per transaction, causing lock contention. Batch inserts (500-2000 rows) reduce WAL pressure and allow Postgres to build the index incrementally.
- Embedding Versioning: Storing
model_version enables gradual rollout and rollback. When switching embedding models, you can run a background job to re-embed only rows where model_version != current_model, avoiding full table scans.
- Connection Pooling: Vector queries are CPU-bound during distance calculations. Use
pgbouncer or pgpool with transaction-mode pooling to prevent connection exhaustion during traffic spikes.
Pitfall Guide
1. Mismatched Distance Metric and Embedding Normalization
Using cosine distance on unnormalized vectors produces incorrect similarity scores. Conversely, using inner product on normalized vectors wastes compute. Always verify your embedding model's output distribution. If the model documentation states "normalized to unit length," use <=>. If not, normalize in application code or switch to <#>.
2. Ignoring HNSW Index Parameters
Default HNSW settings (m = 16, ef_construction = 64) are safe but suboptimal for production. m controls graph connectivity (higher = better recall, more memory). ef_construction affects build time and index quality. ef_search (set via SET hnsw.ef_search = 100;) directly impacts query accuracy vs speed. Failing to tune these results in either slow queries or degraded recall. Profile with your actual data distribution before locking parameters.
3. Dimension Mismatch Between Model and Column
The vector(n) type enforces strict dimensionality. If your embedding model outputs 1024 dimensions but the column is defined as vector(1536), Postgres throws a runtime error. Conversely, truncating vectors to fit a smaller column destroys semantic meaning. Always parameterize the dimension in your schema migration and validate against the model's output length during CI.
4. Skipping VACUUM ANALYZE After Bulk Operations
pgvector indexes rely on Postgres' visibility map and statistics. Bulk inserts or deletes leave dead tuples that bloat index size and degrade planner accuracy. Run VACUUM ANALYZE documents; after loading >10k rows or deleting >20% of the table. Autovacuum handles routine maintenance, but initial loads and major schema changes require manual intervention.
5. Over-Indexing Without Memory Monitoring
HNSW stores the entire graph in RAM. A 1M vector dataset with 1536 dimensions and m=32 consumes ~2.5GB of RAM for the index alone. Adding multiple vector columns or high m values can exhaust server memory, triggering swap and catastrophic latency spikes. Monitor pg_total_relation_size() and set shared_buffers appropriately. Use pg_prewarm to load indexes into memory after restarts.
6. Treating Embeddings as Immutable
Embedding models evolve. Assuming vectors never change leads to silent accuracy decay. Implement a versioning strategy: store model_version, batch-reindex outdated rows, and maintain a fallback query path that unions results from multiple versions during migration. Never run UPDATE documents SET embedding = ... without a concurrent index rebuild strategy.
7. Querying Without Filtering Strategy
Running pure vector search on unfiltered tables returns semantically similar but contextually irrelevant results. Always combine vector similarity with relational predicates: WHERE category = 'technical' AND created_at > '2024-01-01'. Postgres' query planner efficiently pushes filters before vector comparison, reducing the candidate set and improving both latency and relevance.
Best Practices from Production:
- Use connection pooling with
statement_timeout to prevent runaway queries.
- Cache frequent semantic queries with Redis when query patterns stabilize.
- Monitor
pg_stat_user_indexes for index bloat and scan ratios.
- Run
EXPLAIN ANALYZE on vector queries to verify index usage; full scans indicate missing or invalid indexes.
- Version your embedding models and automate re-embedding pipelines with
pg_cron or Temporal.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| <10M vectors, strict ACID required, existing Postgres infra | pgvector with HNSW | Eliminates dual-write complexity, maintains transactions, leverages existing backups | -60% infra cost vs managed vector DB |
| >50M vectors, memory-constrained, batch-only writes | pgvector with IVFFlat + high lists | Lower memory footprint, predictable build times, acceptable recall for offline workloads | +15% latency, -40% RAM usage |
| Multi-tenant SaaS, row-level security, complex joins | pgvector + RLS policies | Vector results inherit relational permissions natively; no sync layer needed | Neutral infra, -80% compliance engineering time |
| Rapid prototyping, <100k vectors, experimental models | pgvector + default HNSW | Zero config, fast iteration, easy rollback if model changes | Minimal, scales linearly with data |
Configuration Template
-- schema.sql
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE semantic_documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
title TEXT NOT NULL,
body TEXT NOT NULL,
embedding vector(1536) NOT NULL,
model_version TEXT DEFAULT 'openai-text-embedding-3-small',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX semantic_docs_embedding_idx ON semantic_documents
USING hnsw (embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128);
CREATE INDEX semantic_docs_tenant_idx ON semantic_documents (tenant_id);
-- Enable RLS if multi-tenant
ALTER TABLE semantic_documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON semantic_documents
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);
// vectorSearch.ts
import { Pool } from 'pg';
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
export async function searchSemantic(
tenantId: string,
query: string,
queryEmbedding: number[],
limit: number = 10
) {
const vectorStr = `[${queryEmbedding.join(',')}]`;
// Set session-level HNSW parameter for query accuracy
await pool.query('SET hnsw.ef_search = 100;');
const res = await pool.query(
`SELECT id, title, body, 1 - (embedding <=> $1::vector) AS similarity
FROM semantic_documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1::vector
LIMIT $3`,
[vectorStr, tenantId, limit]
);
return res.rows;
}
Quick Start Guide
- Launch pgvector-enabled Postgres: Run
docker run -d --name pgvector -e POSTGRES_PASSWORD=postgres -p 5432:5432 pgvector/pgvector:pg16
- Initialize Schema: Connect via
psql, run CREATE EXTENSION vector;, and execute the configuration template SQL.
- Generate Embeddings: Use your preferred SDK (OpenAI, Cohere, or local ONNX) to convert text to 1536-dimension arrays.
- Insert & Index: Batch-insert 1000 documents using the TypeScript template. Postgres automatically updates the HNSW index.
- Query: Call
searchSemantic() with a query string and its embedding. Verify results return within 30-60ms p99 on a 4 vCPU instance.