her recall at lower latency than IVFFlat for datasets under 10M vectors. IVFFlat remains viable only when RAM is severely constrained.
2. Distance Metric: Cosine similarity (<=>) is standard for normalized embeddings. Use L2 (<->) only when magnitude carries semantic weight. Inner product (<#>) applies to unnormalized models.
3. Dimension Limits: pgvector supports up to 2,000 dimensions. Most modern embedding models (text-embedding-3-small, nomic-embed, etc.) output 512β1536 dimensions, fitting comfortably within limits.
4. Hybrid Search: Vector similarity alone fails on exact matches, typos, and structured filtering. Combine HNSW with PostgreSQL full-text search (tsvector) or LIKE/ILIKE for production-grade recall.
Step-by-Step Implementation
1. Enable the extension and define the schema
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(768),
created_at TIMESTAMPTZ DEFAULT NOW(),
user_id UUID NOT NULL
);
2. Build the HNSW index
-- vector_cosine_ops matches the distance operator used in queries
CREATE INDEX idx_documents_embedding
ON documents
USING hnsw (embedding vector_cosine_ops);
3. TypeScript ingestion and query layer
import { Pool } from 'pg';
import { openai } from '@ai-sdk/openai';
import { embed } from 'ai';
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
async function upsertDocument(title: string, content: string, userId: string) {
// Generate embedding
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: `${title} ${content}`,
});
// Single transactional upsert
const client = await pool.connect();
try {
await client.query('BEGIN');
await client.query(
`INSERT INTO documents (title, content, embedding, user_id)
VALUES ($1, $2, $3, $4)
ON CONFLICT (id) DO UPDATE
SET title = $1, content = $2, embedding = $3
RETURNING id`,
[title, content, `[${embedding.join(',')}]`, userId]
);
await client.query('COMMIT');
} catch (err) {
await client.query('ROLLBACK');
throw err;
} finally {
client.release();
}
}
async function semanticSearch(
queryText: string,
userId: string,
limit: number = 5
) {
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: queryText,
});
const result = await pool.query(
`SELECT id, title, content, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE user_id = $2
ORDER BY embedding <=> $1
LIMIT $3`,
[`[${embedding.join(',')}]`, userId, limit]
);
return result.rows;
}
4. Query-time tuning
HNSW uses ef_search to control the candidate pool size during traversal. The default is 40. Increase it for higher recall at the cost of latency:
SET hnsw.ef_search = 100;
Apply this per-connection or via ALTER DATABASE for workloads requiring >95% recall.
Pitfall Guide
1. Using L2 distance for normalized embeddings
Cosine and L2 are mathematically equivalent only when vectors are unit-normalized. Most embedding APIs return normalized vectors. Using L2 (<->) on normalized data adds unnecessary CPU cycles and can distort ranking when magnitudes vary. Stick to <=> and vector_cosine_ops unless your pipeline explicitly preserves magnitude semantics.
2. Ignoring ef_search tuning
The default ef_search = 40 prioritizes speed over recall. In production semantic search, this yields 80β85% recall on benchmark datasets. Raising it to 100β200 typically pushes recall above 95% with <5ms additional latency. Failure to tune this parameter is the #1 cause of "inaccurate" search results in pgvector deployments.
3. Under-provisioning memory for HNSW
HNSW builds an in-memory graph structure. Index builds and high-concurrency queries consume RAM proportional to m (neighbors per node, default 16) and ef_construction. If maintenance_work_mem is too low, PostgreSQL spills to disk, causing index builds to take hours instead of minutes. Set maintenance_work_mem = '1GB' or higher before building indexes on >1M rows.
4. Skipping index maintenance after bulk updates
HNSW indexes do not auto-optimize like B-tree indexes. Bulk INSERT or UPDATE operations fragment the graph, increasing traversal depth and latency. Run REINDEX INDEX idx_documents_embedding; after loading >10% of the table, or schedule periodic reindexing during maintenance windows.
5. Relying solely on vector similarity
Vectors capture semantic proximity, not exact string matching, keyword presence, or structural constraints. A query for "API v2 migration guide" may return results about "API version 2" but miss documents explicitly titled "v2 migration". Combine HNSW with tsvector full-text search or ILIKE filters. Use UNION or JOIN to merge ranked results, or implement weighted scoring in application logic.
6. Assuming pgvector replaces full-text search
pgvector does not implement tokenization, stemming, or stop-word removal. It stores fixed-length float arrays. Full-text search and vector search solve different problems. Production systems use both: tsvector for exact/keyword matching, vector for conceptual similarity. Treat them as complementary layers, not alternatives.
Best Practices from Production
- Batch embeddings: Generate embeddings in chunks of 100β500 to respect API rate limits and reduce network overhead.
- Use connection pooling: pgvector queries are CPU-bound during traversal. Pool connections to avoid connection handshake overhead under load.
- Monitor index health: Query
pg_stat_user_indexes for idx_scan, idx_tup_read, and idx_tup_fetch. Sudden drops indicate fragmentation or query plan degradation.
- Set
hnsw.ef_construction during index creation: Higher values (default 64) improve index quality at build time. Use CREATE INDEX ... WITH (ef_construction = 128) for critical datasets.
- Implement circuit breakers: If embedding API latency spikes, degrade gracefully by falling back to text-only search or returning cached results.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| <500k vectors, strict ACID, existing Postgres | pgvector HNSW | Single-system transactionality, zero sync overhead, lower infra cost | -40% vs external vector DB |
| >10M vectors, sub-10ms latency SLA | External vector DB + pgvector metadata | HNSW memory footprint scales non-linearly; specialized stores optimize for massive graphs | +25% infra, but meets latency SLA |
| Multi-tenant SaaS with row-level security | pgvector with RLS + HNSW | Postgres RLS applies to vector tables natively; external stores require app-layer filtering | No additional cost, simplifies auth |
| Budget-constrained startup | pgvector on existing RDS/Supabase | Eliminates separate vector store subscription, reduces CI/CD complexity | -$100β$300/month |
| Hybrid search mandatory (exact + semantic) | pgvector + tsvector | Native SQL joins and UNION ranking avoid cross-system aggregation latency | Neutral, leverages existing Postgres features |
Configuration Template
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Table definition
CREATE TABLE items (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
body TEXT NOT NULL,
embedding vector(768),
status TEXT DEFAULT 'active',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- HNSW index with tuned construction parameters
CREATE INDEX idx_items_embedding
ON items
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);
-- Query-time recall tuning (apply per session or globally)
SET hnsw.ef_search = 100;
-- Hybrid search example
SELECT id, title,
1 - (embedding <=> '[0.1,0.2,...]') AS vec_score,
ts_rank(to_tsvector('english', body), plainto_tsquery('migration')) AS txt_score
FROM items
WHERE status = 'active'
ORDER BY (vec_score * 0.7 + txt_score * 0.3) DESC
LIMIT 10;
Quick Start Guide
- Install the extension: Connect to your PostgreSQL instance and run
CREATE EXTENSION IF NOT EXISTS vector;. If using a managed provider, ensure the version supports pgvector (PostgreSQL 14+).
- Add the vector column: Alter your target table or create a new one with
embedding vector(768) (match your model's dimensionality).
- Generate and insert embeddings: Use your preferred SDK to generate embeddings, format them as
[x1,x2,...,xn], and insert via parameterized queries. Wrap in a transaction if updating relational data simultaneously.
- Build the HNSW index: Execute
CREATE INDEX ON table USING hnsw (embedding vector_cosine_ops);. Set maintenance_work_mem to 1GB+ beforehand for datasets >100k rows.
- Query with distance operators: Run
SELECT * FROM table ORDER BY embedding <=> '[query_vector]' LIMIT 10;. Tune hnsw.ef_search in your session for recall/latency balance.