ology and stored in the Graph DB. Text chunks are embedded and stored in the Vector DB.
2. Storage Layer:
* Graph DB: Stores nodes, edges, and properties. Supports Cypher/Gremlin queries.
* Vector DB: Stores chunk embeddings for semantic fallback.
3. Retrieval Layer:
* Graph Retrieval: Executes structured queries based on query decomposition.
* Vector Retrieval: Executes semantic search for unstructured nuances.
* Fusion: Combines results, prioritizing graph data for factual claims.
4. Generation Layer: LLM receives fused context and generates the response.
Step-by-Step Implementation
1. Define the Ontology
Start with a lightweight schema. Over-engineering the ontology is a common failure point. Define core entity types and relationship predicates.
interface Ontology {
entityTypes: string[];
predicates: string[];
constraints: {
subject: string;
predicate: string;
object: string;
}[];
}
const ontology: Ontology = {
entityTypes: ["Person", "Company", "Project", "Regulation"],
predicates: ["WORKS_FOR", "OWNS", "COMPLIES_WITH", "PART_OF"],
constraints: [
{ subject: "Person", predicate: "WORKS_FOR", object: "Company" },
{ subject: "Company", predicate: "COMPLIES_WITH", object: "Regulation" }
]
};
2. Extraction Pipeline with Validation
Use the LLM to extract triples, but validate against the ontology before insertion.
import { z } from "zod";
const TripleSchema = z.object({
subject: z.string(),
predicate: z.enum(ontology.predicates as any),
object: z.string(),
sourceChunkId: z.string(),
confidence: z.number().min(0).max(1)
});
type Triple = z.infer<typeof TripleSchema>;
async function extractTriples(
text: string,
llmClient: LLMClient
): Promise<Triple[]> {
// Prompt engineering: Constrain output to JSON schema
const prompt = `
Extract entities and relationships from the text.
Valid predicates: ${ontology.predicates.join(", ")}.
Return JSON array of triples.
Text: ${text}
`;
const rawOutput = await llmClient.generate(prompt);
// Parse and validate
const parsed = JSON.parse(rawOutput);
const validatedTriples = parsed
.map((t: any) => TripleSchema.safeParse(t))
.filter((r): r is z.SafeParseSuccess<Triple> => r.success)
.map(r => r.data);
// Filter by confidence threshold
return validatedTriples.filter(t => t.confidence > 0.85);
}
3. Graph Query Integration
Implement a function to query the graph based on LLM-generated query plans.
import neo4j from "neo4j-driver";
const driver = neo4j.driver(
process.env.NEO4J_URI!,
neo4j.auth.basic(process.env.NEO4J_USER!, process.env.NEO4J_PASS!)
);
interface GraphContext {
entities: { id: string; type: string; properties: Record<string, any> }[];
relationships: { source: string; target: string; type: string }[];
}
async function queryGraph(query: string): Promise<GraphContext> {
// In production, use an LLM to translate natural language to Cypher
// Here we demonstrate a parameterized query pattern for safety
const cypher = `
MATCH (n)-[r]->(m)
WHERE n.name CONTAINS $keyword OR m.name CONTAINS $keyword
RETURN n, r, m
LIMIT 50
`;
const session = driver.session();
try {
const result = await session.run(cypher, { keyword: query });
const entities = new Map<string, any>();
const relationships: { source: string; target: string; type: string }[] = [];
result.records.forEach(record => {
const source = record.get("n");
const rel = record.get("r");
const target = record.get("m");
entities.set(source.identity.toString(), {
id: source.identity.toString(),
type: source.labels[0],
properties: source.properties
});
entities.set(target.identity.toString(), {
id: target.identity.toString(),
type: target.labels[0],
properties: target.properties
});
relationships.push({
source: source.identity.toString(),
target: target.identity.toString(),
type: rel.type
});
});
return {
entities: Array.from(entities.values()),
relationships
};
} finally {
await session.close();
}
}
4. Hybrid Retrieval Orchestrator
Combine graph and vector retrieval.
async function retrieveContext(
query: string,
strategy: "graph" | "vector" | "hybrid"
): Promise<string> {
let graphContext = "";
let vectorContext = "";
if (strategy === "graph" || strategy === "hybrid") {
const graphData = await queryGraph(query);
// Serialize graph data into a format LLM can consume
graphContext = formatGraphForLLM(graphData);
}
if (strategy === "vector" || strategy === "hybrid") {
// Vector retrieval logic
vectorContext = await vectorSearch(query);
}
// Fusion strategy: Prioritize graph for entities, vector for context
if (strategy === "hybrid") {
return `
### Structured Data
${graphContext}
### Semantic Context
${vectorContext}
`;
}
return graphContext || vectorContext;
}
function formatGraphForLLM(data: GraphContext): string {
// Convert graph structure to text description or JSON for context injection
return JSON.stringify(data, null, 2);
}
Architecture Decisions
- Graph Database Selection: Use Neo4j or Amazon Neptune for mature tooling and Cypher support. For massive scale with lower latency requirements, consider TigerGraph or native graph storage in PostgreSQL with
pg_graphql.
- Extraction Model: Use a specialized model for extraction (e.g., Llama-3-70B or GPT-4o) rather than the generation model. Extraction requires high precision; generation requires creativity. Separating them optimizes cost and quality.
- Schema Evolution: Implement versioned ontologies. As new entity types emerge, the schema must update without breaking existing queries. Use a migration strategy similar to database schema migrations.
Pitfall Guide
1. The Embedding Trap
Mistake: Embedding graph nodes and edges into vectors and ignoring the graph structure during retrieval.
Explanation: This destroys the relational integrity. If you embed "Alice -> WORKS_FOR -> Acme", the vector may retrieve "Alice -> FRIEND_OF -> Bob" but fail to answer "Who does Alice work for?" deterministically.
Best Practice: Always query the graph explicitly for relationship traversal. Use vectors only for semantic fuzzy matching or when the graph lacks the specific entity.
2. Over-Normalization of Ontology
Mistake: Creating a rigid, highly normalized schema that requires complex joins for simple queries.
Explanation: LLMs struggle to generate correct complex graph queries against highly normalized schemas. This increases latency and error rates.
Best Practice: Denormalize where appropriate. Store computed properties on nodes (e.g., total_contract_value) rather than forcing the LLM to aggregate edges. Keep the ontology flat for LLM consumption.
3. Stale Graph Data
Mistake: Treating the KG as a static dump after initial ingestion.
Explanation: KGs become inaccurate quickly. LLMs will retrieve outdated relationships, leading to hallucinations or factual errors.
Best Practice: Implement incremental ingestion pipelines. Use change data capture (CDC) from source systems. Schedule periodic re-validation of relationships using the LLM to detect drift.
4. Unconstrained LLM Graph Query Generation
Mistake: Allowing the LLM to generate arbitrary Cypher/Gremlin queries without validation.
Explanation: This poses security risks (query injection) and performance risks (full graph scans, cartesian products).
Best Practice: Use parameterized queries. Implement a query validator that checks query complexity and restricts dangerous operations. Use a "Query Router" LLM that outputs a structured plan, which is then executed by a deterministic query builder.
Mistake: Assuming LLM extraction is 100% accurate.
Explanation: LLMs can hallucinate relationships that do not exist in the source text. These false triples propagate through the graph.
Best Practice: Implement a verification step. Cross-reference extracted triples against the source text snippet. Use a smaller, faster model to validate triples generated by the larger model. Set confidence thresholds and quarantine low-confidence triples for human review.
6. Ignoring Temporal Context
Mistake: Storing relationships without timestamps.
Explanation: Relationships change over time. "Company A acquired Company B" is true only after a specific date. Without temporal data, the graph provides incorrect answers for historical queries.
Best Practice: Attach valid-from and valid-to properties to relationships. Use temporal graph databases or implement time-aware querying patterns.
7. Context Window Bloat
Mistake: Injecting the entire subgraph into the LLM context.
Explanation: Large subgraphs can exceed context limits or overwhelm the LLM with irrelevant details, degrading generation quality.
Best Practice: Implement subgraph pruning. Only retrieve k-hop neighborhoods relevant to the query entities. Summarize large communities before injection. Use graph algorithms to identify the most central or relevant nodes.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Simple FAQ, low complexity | Vector RAG | KG overhead not justified; semantic search sufficient. | Low |
| Multi-hop reasoning required | KG-Augmented RAG | Graph traversal enables relationship reasoning. | Medium |
| Global summarization needed | GraphRAG | Community detection provides holistic insights. | High |
| Real-time updates critical | Event-Driven KG + Vector | CDC ensures freshness; hybrid retrieval balances speed/accuracy. | Medium |
| Strict compliance/audit | KG-First with LLM | Deterministic graph queries provide audit trails. | High |
Configuration Template
# kg-llm-config.yaml
graph:
type: neo4j
uri: ${NEO4J_URI}
credentials:
user: ${NEO4J_USER}
password: ${NEO4J_PASS}
query_limit: 50
max_hops: 3
extraction:
model: gpt-4o
temperature: 0.1
confidence_threshold: 0.85
validation:
enabled: true
schema_version: "v1.2"
retrieval:
strategy: hybrid
vector_db:
type: pinecone
index: ${VECTOR_INDEX}
top_k: 5
graph:
enabled: true
pruning: true
max_nodes: 100
orchestration:
router_model: llama-3-8b
fallback: vector_only
cache:
enabled: true
ttl: 3600
Quick Start Guide
-
Initialize Graph Database:
Deploy a Neo4j instance (local or cloud). Create constraints and indexes for entity names.
CREATE CONSTRAINT entity_id IF NOT EXISTS FOR (n:Entity) REQUIRE n.id IS UNIQUE;
-
Run Extraction on Sample Data:
Use the provided TypeScript extraction function on a small dataset. Verify triples are stored correctly.
ts-node src/ingest.ts --input ./data/sample.json --ontology ./config/ontology.yaml
-
Test Hybrid Query:
Execute a test query that requires relationship traversal.
ts-node src/query.ts --query "Find all projects associated with suppliers in Region X" --strategy hybrid
-
Integrate into RAG Chain:
Wrap the retrieval function in your LangChain/LlamaIndex pipeline. Configure the prompt to utilize structured data.
const chain = createRetrievalChain({
retriever: hybridRetriever,
llm: generationModel,
prompt: graphEnhancedPrompt
});
-
Monitor and Iterate:
Review extraction logs for low-confidence triples. Adjust ontology and thresholds based on initial results. Deploy monitoring dashboards for latency and accuracy.