ing Notion integrations against the 10,000-result ceiling requires three architectural shifts: explicit metadata validation, deterministic partitioning, and baseline cross-validation. The following implementation demonstrates a production-ready approach using TypeScript.
Step 1: Define Strict Response Interfaces
First, establish a type contract that treats request_status as a required integrity field rather than an optional extension.
interface NotionPaginationEnvelope<T> {
object: 'list';
results: T[];
next_cursor: string | null;
has_more: boolean;
request_status?: {
type: 'complete' | 'incomplete';
incomplete_reason?: string;
};
}
class TruncationError extends Error {
constructor(receivedCount: number, reason: string) {
super(`Notion query truncated: ${reason}. Received ${receivedCount} records; exceeds 10k pagination ceiling.`);
this.name = 'TruncationError';
}
}
Replace cursor-only termination with explicit status inspection. The loop must halt and raise an exception when request_status.type === 'incomplete'.
import { Client } from '@notionhq/client';
import type { QueryDatabaseResponse } from '@notionhq/client';
const notion = new Client({ auth: process.env.NOTION_INTEGRATION_TOKEN });
async function fetchDatasetWithIntegrityCheck(
databaseId: string,
filter?: Record<string, unknown>,
pageSize = 100
): Promise<QueryDatabaseResponse['results']> {
const accumulated: QueryDatabaseResponse['results'] = [];
let cursor: string | undefined;
while (true) {
const response = await notion.databases.query({
database_id: databaseId,
filter,
start_cursor: cursor,
page_size: pageSize,
}) as NotionPaginationEnvelope<QueryDatabaseResponse['results'][number]>;
accumulated.push(...response.results);
// Integrity checkpoint: reject truncated payloads immediately
if (response.request_status?.type === 'incomplete') {
throw new TruncationError(
accumulated.length,
response.request_status.incomplete_reason ?? 'unknown'
);
}
if (!response.has_more) break;
cursor = response.next_cursor ?? undefined;
}
return accumulated;
}
Step 3: Implement Deterministic Partitioning
When a dataset legitimately exceeds 10,000 records, partitioning is mandatory. The API limit applies per query, not per database. Partition by high-cardinality properties that distribute rows evenly.
type PartitionStrategy = 'date_range' | 'status_bucket' | 'alphabetical_slice';
interface PartitionConfig {
strategy: PartitionStrategy;
property: string;
segments: Array<{ filter: Record<string, unknown>; label: string }>;
}
async function fetchPartitionedDataset(
databaseId: string,
config: PartitionConfig
): Promise<QueryDatabaseResponse['results']> {
const masterCollection: QueryDatabaseResponse['results'] = [];
for (const segment of config.segments) {
try {
const segmentData = await fetchDatasetWithIntegrityCheck(
databaseId,
segment.filter,
100
);
masterCollection.push(...segmentData);
} catch (err) {
if (err instanceof TruncationError) {
// Sub-partition further or alert operations
console.error(`Partition "${segment.label}" exceeded ceiling. Refine filter or split segment.`);
throw err;
}
throw err;
}
}
return masterCollection;
}
Architecture Decisions & Rationale
- Explicit Failure Over Silent Degradation: Throwing
TruncationError forces observability. Sync jobs should fail loudly rather than write incomplete data. This aligns with fail-fast principles in data engineering.
- Partitioning Over Page Size Tuning: Increasing
page_size does not bypass the 10,000-result ceiling. Partitioning by date ranges, status values, or alphabetical boundaries distributes the load across multiple independent queries, each staying under the limit.
- Type-Safe Metadata Enforcement: Casting the response to
NotionPaginationEnvelope ensures TypeScript catches missing request_status checks at compile time. This prevents runtime blind spots when SDK types lag behind API updates.
- Segment-Level Error Isolation: Wrapping partition fetches in try/catch allows granular failure reporting. If one segment truncates, you know exactly which filter boundary needs refinement rather than debugging a monolithic sync failure.
Pitfall Guide
1. The Cursor Exhaustion Fallacy
Explanation: Assuming has_more === false guarantees complete data retrieval. The API now uses this flag to signal both natural end-of-results and hard-cap termination.
Fix: Always inspect request_status.type before treating a loop termination as successful completion.
Explanation: Helper methods like iteratePaginatedAPI abstract response inspection. They follow the legacy contract and silently stop at the ceiling.
Fix: Replace auto-pagination helpers with custom loops that explicitly validate metadata, or wrap the helper in a middleware that checks the final response envelope.
3. Partition Size Miscalculation
Explanation: Creating date or status partitions that still contain >10,000 matching rows. The ceiling applies to each individual query, not the aggregate.
Fix: Estimate row distribution using Notion UI counts or secondary metadata APIs. Cap each partition at ~8,000 records to maintain a safety buffer.
4. Treating Truncation as Transient
Explanation: Retrying the same query on 429 or 500 logic. Truncation is a deterministic limit, not a network or rate-limit issue.
Fix: Implement a circuit breaker that switches to partitioning mode when TruncationError is caught. Never retry a truncated query without filter refinement.
Explanation: Schema validators that permit unknown fields will pass truncated responses as valid. Strict validators may reject them but won't explain the data loss.
Fix: Explicitly assert request_status in validation pipelines. Treat type: "incomplete" as a terminal state, not an optional extension.
6. Missing Baseline Validation
Explanation: Sync jobs report "N records synced" without verifying against a source-of-truth count. 10,000 looks correct until a user notices missing entries.
Fix: Implement a pre-sync baseline check. Query Notion's database metadata or use a secondary lightweight endpoint to fetch expected row counts. Alert if synced_count < expected_count * 0.95.
7. Hardcoded Full-Database Syncs
Explanation: Running daily full exports regardless of dataset growth. As databases cross 10k, syncs silently truncate without triggering alerts.
Fix: Transition to incremental syncs using last_edited_time filters. Only fetch rows modified since the last successful checkpoint. This naturally keeps queries under the ceiling and reduces compute overhead.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Database < 8,000 rows | Standard integrity-checked loop | Stays safely under ceiling; partitioning adds unnecessary complexity | Low (minimal compute) |
| Database 8,000β25,000 rows | Status/date partitioning | Distributes load across independent queries; each stays under 10k | Medium (increased API calls) |
| Database > 25,000 rows | Incremental sync with last_edited_time | Avoids full scans; only fetches deltas; naturally respects limits | Low-Medium (efficient bandwidth) |
| Real-time dashboard | Event-driven webhooks + partial sync | Pushes updates instead of polling; reduces sync frequency | High (initial webhook setup) |
| One-time migration | Partitioned export with parallel workers | Maximizes throughput while respecting per-query limits | Medium (worker infrastructure) |
Configuration Template
// sync.config.ts
export interface SyncPipelineConfig {
databaseId: string;
partitionStrategy: 'none' | 'status' | 'date_range' | 'incremental';
partitionProperty?: string;
maxPartitionSize: number;
pageSize: number;
retryPolicy: {
maxAttempts: number;
backoffMs: number;
retryOnTruncation: boolean;
};
telemetry: {
enableBaselineCheck: boolean;
alertThreshold: number; // percentage divergence
metricNamespace: string;
};
}
export const defaultConfig: SyncPipelineConfig = {
databaseId: process.env.NOTION_DATABASE_ID!,
partitionStrategy: 'date_range',
partitionProperty: 'created_time',
maxPartitionSize: 8000,
pageSize: 100,
retryPolicy: {
maxAttempts: 3,
backoffMs: 1000,
retryOnTruncation: false, // truncation requires filter refinement, not retry
},
telemetry: {
enableBaselineCheck: true,
alertThreshold: 5,
metricNamespace: 'notion.sync.integrity',
},
};
Quick Start Guide
- Install dependencies:
npm install @notionhq/client zod (Zod recommended for runtime validation)
- Replace existing pagination loops: Swap
while (res.has_more) with the integrity-checked pattern that validates request_status.type
- Add partition configuration: Define
PartitionConfig segments based on your database's high-cardinality properties (status, date, category)
- Deploy baseline validation: Add a pre-sync check that compares
accumulated.length against Notion's reported row count or a cached baseline
- Enable alerting: Configure your monitoring system to trigger on
TruncationError or synced_count < expected_count * 0.95
This approach transforms a silent data loss vector into a deterministic, observable pipeline. By treating API metadata as authoritative rather than optional, you eliminate the gap between structural validity and logical completeness.