ultiple thereof. Misalignment forces partial segment reads, negating the benefits of caching. The optimal granularity balances cache hit probability against memory overhead.
2. Partial Recomputation: Only the "hot" segment at the trailing edge of the window is recomputed. All "cold" segments are fetched from the cache. This minimizes scan volume and CPU usage.
3. Aggregation Decomposability: The strategy works best with aggregations that can be merged from partial results (e.g., SUM, COUNT, AVG, MAX, MIN). Non-decomposable aggregations like APPROX_COUNT_DISTINCT require careful handling, as merging partial distinct counts is non-trivial.
Implementation Example
The following TypeScript example demonstrates how to construct interval-aware query segments and manage cache keys. This logic can be integrated into a query builder or middleware layer that interacts with Druid.
interface TimeSegment {
id: string;
start: Date;
end: Date;
isCacheable: boolean;
}
interface QueryWindow {
startTime: Date;
endTime: Date;
granularity: number; // milliseconds
}
/**
* Decomposes a rolling window into cacheable segments.
* Segments older than the current time are marked cacheable.
* The segment containing the current time is marked for recomputation.
*/
export function decomposeWindow(window: QueryWindow): TimeSegment[] {
const segments: TimeSegment[] = [];
let currentStart = new Date(window.startTime);
while (currentStart < window.endTime) {
const currentEnd = new Date(currentStart.getTime() + window.granularity);
// Determine if this segment is fully in the past
// In production, use a strict cutoff relative to ingestion latency
const isPast = currentEnd.getTime() < Date.now() - INGESTION_LAG_MS;
segments.push({
id: generateSegmentId(currentStart, window.granularity),
start: currentStart,
end: currentEnd,
isCacheable: isPast
});
currentStart = currentEnd;
}
return segments;
}
/**
* Generates a deterministic cache key for a segment.
* Includes aggregation type to prevent key collisions.
*/
function generateSegmentId(segmentStart: Date, granularityMs: number): string {
const timestamp = segmentStart.getTime();
return `druid_cache_seg_${timestamp}_${granularityMs}`;
}
/**
* Constructs the Druid query context with interval chunking enabled.
*/
export function buildDruidQuery(segments: TimeSegment[], aggregations: any[]): any {
const cacheableIntervals = segments
.filter(s => s.isCacheable)
.map(s => ({ start: s.start.toISOString(), end: s.end.toISOString() }));
return {
queryType: "timeseries",
dataSource: "metrics_stream",
intervals: segments.map(s => ({ start: s.start.toISOString(), end: s.end.toISOString() })),
aggregations: aggregations,
context: {
// Enable interval-aware caching in Druid
useIntervalChunking: true,
// Pass cacheable intervals to optimize cache lookup
cacheableIntervals: cacheableIntervals,
// Set appropriate TTL based on segment staleness
cacheTtl: 3600
}
};
}
Rationale:
decomposeWindow: Separates the query into discrete units. This allows the cache layer to store results for [T-24h, T-23h] and reuse them when the window shifts to [T-23h, T-22h].
isCacheable Flag: Prevents caching of segments that may still receive late-arriving data. This is critical for data integrity.
useIntervalChunking: This Druid context parameter activates the interval-aware caching logic, instructing the engine to evaluate cache hits at the segment level rather than the full query level.
Pitfall Guide
Implementing interval-aware caching requires careful configuration to avoid subtle failures. The following pitfalls are common in production environments.
-
Granularity Drift
- Explanation: Cache segments do not align with Druid's ingestion segments. This causes the engine to read partial data segments, increasing I/O and reducing cache efficiency.
- Fix: Align cache segment boundaries with the ingestion granularity (e.g., hourly segments for hourly ingestion). Validate alignment using Druid's segment metadata APIs.
-
Late-Arriving Data Corruption
- Explanation: A segment is cached as "complete," but data arrives after the cutoff time, leading to stale results in the cache.
- Fix: Implement a safety buffer (
INGESTION_LAG_MS) before marking segments as cacheable. Monitor late-arrival rates and adjust the buffer dynamically if necessary.
-
Memory Pressure from Fragmentation
- Explanation: Using overly small segments creates a high number of cache entries, increasing memory overhead and eviction churn.
- Fix: Choose a segment size that balances hit rate with entry count. Monitor cache memory usage and adjust granularity. Larger segments reduce entry count but may lower hit rates for short windows.
-
Non-Decomposable Aggregations
- Explanation: Queries using
APPROX_COUNT_DISTINCT or custom JavaScript aggregations may not benefit from interval chunking, as partial results cannot be merged accurately.
- Fix: Profile query patterns. Disable interval chunking for queries containing non-decomposable aggregations. Use HyperLogLog or ThetaSketch for distinct counts to enable partial merging.
-
Timezone and DST Inconsistencies
- Explanation: Segment boundaries calculated in local time can shift during Daylight Saving Time transitions, causing cache misses or duplicate computations.
- Fix: Always use UTC for segment calculations and cache keys. Convert to local time only at the presentation layer.
-
Cache Invalidation Storms
- Explanation: A bulk data reload or schema change invalidates a large portion of the cache simultaneously, causing a spike in query load.
- Fix: Implement staggered invalidation or cache warming strategies. Use versioned cache keys to allow gradual migration during schema changes.
-
Ignoring Query Complexity
- Explanation: Applying interval caching to queries with high cardinality groupings or complex filters may yield low hit rates, as the cache key becomes too specific.
- Fix: Analyze query patterns. Interval caching is most effective for time-series aggregations with consistent groupings. For ad-hoc queries, consider alternative optimization strategies.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-frequency rolling dashboards | Interval-Aware Caching | Maximizes cache reuse for sliding windows; reduces redundant scans. | Lowers compute costs; improves user experience. |
| Ad-hoc exploratory queries | Standard Caching or No Cache | Low repetition rate makes interval chunking ineffective. | No additional overhead; avoids cache pollution. |
| Queries with non-decomposable aggregations | Disable Interval Chunking | Partial results cannot be merged accurately. | Prevents incorrect results; maintains data integrity. |
| High late-arrival rates | Increase Safety Buffer | Reduces risk of caching incomplete data. | Slight increase in recomputation; ensures accuracy. |
| Memory-constrained clusters | Larger Segment Granularity | Reduces number of cache entries and memory overhead. | May lower hit rate; trade-off based on resource limits. |
Configuration Template
The following JSON configuration snippet demonstrates how to enable interval-aware caching in Apache Druid. This can be applied via query context or broker configuration.
{
"druid": {
"server": {
"http": {
"defaultQueryContext": {
"useIntervalChunking": true,
"cacheTtl": 3600,
"cacheableIntervals": "auto"
}
}
},
"cache": {
"type": "caffeine",
"useIntervalChunking": true,
"maxSize": "1GB",
"ttl": "1h"
}
}
}
Notes:
useIntervalChunking: Enables the interval-aware logic.
cacheableIntervals: Set to auto to let Druid determine cacheable intervals based on time, or provide explicit intervals.
cacheTtl: Time-to-live for cache entries. Align with segment staleness policies.
maxSize: Adjust based on available memory and query patterns.
Quick Start Guide
- Enable Interval Chunking: Add
"useIntervalChunking": true to the query context of your rolling window queries.
- Verify Segment Alignment: Check Druid segment metadata to ensure your cache granularity matches ingestion intervals.
- Monitor Cache Metrics: Use Druid's metrics endpoints to observe
query/cache/hit and query/cache/miss ratios.
- Adjust Safety Buffer: If late arrivals are detected, increase the lag buffer in your query builder logic.
- Validate Results: Compare query results with and without interval chunking to ensure accuracy.
By implementing interval-aware caching, organizations can significantly reduce query load and improve latency for real-time analytics workloads in Apache Druid. This strategy leverages temporal overlap to maximize cache efficiency, enabling scalable and responsive dashboards without proportional infrastructure expansion.