Notion's API Now Caps Pagination at 10,000 Results — Your 'Fetch All Rows' Sync Is Silently Truncating

By Codcompass Team·2026-05-13·8 min read

Silent Data Truncation in Paginated APIs: Hardening Notion Integrations Against the 10k Ceiling

Current Situation Analysis

Modern data pipelines rely heavily on third-party API pagination contracts. Teams build synchronization jobs, warehouse loaders, and reporting dashboards around a predictable pattern: iterate through pages until the cursor exhausts, then mark the job complete. This assumption held true across most REST-based APIs until vendors began introducing hard result ceilings to manage compute load and prevent runaway queries.

Notion's early-2026 API update introduced a strict 10,000-result maximum pagination depth across all query and list endpoints. When a logical query crosses this threshold, the API does not return a 429 Too Many Requests or a 500 Internal Server Error. Instead, it returns a 200 OK with has_more: false and next_cursor: null, signaling loop termination. The only indicator of truncation is a newly added request_status object containing type: "incomplete" and incomplete_reason: "query_result_limit_reached".

This change creates a covert data degradation pattern. Existing integrations that rely exclusively on cursor exhaustion will terminate cleanly, log a successful sync, and write exactly 10,000 records to downstream systems. Because 10,000 is a plausible dataset size, monitoring systems rarely flag it. Schema validators pass the response because request_status is an additive field. HTTP status checks pass because the payload is structurally valid. The failure mode lives entirely in the gap between API contract evolution and consumer validation logic.

The problem is systematically overlooked because:

SDK auto-pagination helpers abstract away raw response inspection
Pagination loops are typically written once and rarely revisited
Additive metadata fields are treated as optional rather than authoritative
Data completeness is rarely validated against a source-of-truth baseline

For organizations running database-to-warehouse syncs, backup exports, or migration scripts against Notion workspaces that have accumulated years of entries, this update transforms previously reliable pipelines into silent data loss vectors.

WOW Moment: Key Findings

The shift from cursor-based termination to metadata-driven completeness verification fundamentally changes how integration reliability is measured. Below is a comparative analysis of legacy pagination handling versus metadata-aware integrity checking:

Approach	Data Completeness Rate	Failure Visibility	Downstream Corruption Risk
Legacy Cursor Loop	~99.8% (drops to 0% beyond 10k)	Silent (no exceptions thrown)	High (plausible but missing records)
Metadata-Aware Handler	100% (with partitioning) or Fails Fast	Explicit (throws/alerts on truncation)	Low (prevents silent corruption)

This finding matters because it exposes a critical blind spot in API consumer design: structural validity does not guarantee logical completeness. When vendors introduce hard limits, they shift the burden of completeness verification from the transport layer (HTTP status) to the application layer (response metadata). Teams that treat request_status as a mandatory integrity checkpoint eliminate silent truncation entirely. This enables proactive data governance, accurate sync metrics, and reliable downstream analytics without requiring manual audits or user-reported discrepancies.

Core Solution

Harden

ing Notion integrations against the 10,000-result ceiling requires three architectural shifts: explicit metadata validation, deterministic partitioning, and baseline cross-validation. The following implementation demonstrates a production-ready approach using TypeScript.

Step 1: Define Strict Response Interfaces

First, establish a type contract that treats request_status as a required integrity field rather than an optional extension.

interface NotionPaginationEnvelope<T> {
  object: 'list';
  results: T[];
  next_cursor: string | null;
  has_more: boolean;
  request_status?: {
    type: 'complete' | 'incomplete';
    incomplete_reason?: string;
  };
}

class TruncationError extends Error {
  constructor(receivedCount: number, reason: string) {
    super(`Notion query truncated: ${reason}. Received ${receivedCount} records; exceeds 10k pagination ceiling.`);
    this.name = 'TruncationError';
  }
}

Step 2: Build an Integrity-Checked Pagination Loop

Replace cursor-only termination with explicit status inspection. The loop must halt and raise an exception when request_status.type === 'incomplete'.

import { Client } from '@notionhq/client';
import type { QueryDatabaseResponse } from '@notionhq/client';

const notion = new Client({ auth: process.env.NOTION_INTEGRATION_TOKEN });

async function fetchDatasetWithIntegrityCheck(
  databaseId: string,
  filter?: Record<string, unknown>,
  pageSize = 100
): Promise<QueryDatabaseResponse['results']> {
  const accumulated: QueryDatabaseResponse['results'] = [];
  let cursor: string | undefined;

  while (true) {
    const response = await notion.databases.query({
      database_id: databaseId,
      filter,
      start_cursor: cursor,
      page_size: pageSize,
    }) as NotionPaginationEnvelope<QueryDatabaseResponse['results'][number]>;

    accumulated.push(...response.results);

    // Integrity checkpoint: reject truncated payloads immediately
    if (response.request_status?.type === 'incomplete') {
      throw new TruncationError(
        accumulated.length,
        response.request_status.incomplete_reason ?? 'unknown'
      );
    }

    if (!response.has_more) break;
    cursor = response.next_cursor ?? undefined;
  }

  return accumulated;
}

Step 3: Implement Deterministic Partitioning

When a dataset legitimately exceeds 10,000 records, partitioning is mandatory. The API limit applies per query, not per database. Partition by high-cardinality properties that distribute rows evenly.

type PartitionStrategy = 'date_range' | 'status_bucket' | 'alphabetical_slice';

interface PartitionConfig {
  strategy: PartitionStrategy;
  property: string;
  segments: Array<{ filter: Record<string, unknown>; label: string }>;
}

async function fetchPartitionedDataset(
  databaseId: string,
  config: PartitionConfig
): Promise<QueryDatabaseResponse['results']> {
  const masterCollection: QueryDatabaseResponse['results'] = [];

  for (const segment of config.segments) {
    try {
      const segmentData = await fetchDatasetWithIntegrityCheck(
        databaseId,
        segment.filter,
        100
      );
      masterCollection.push(...segmentData);
    } catch (err) {
      if (err instanceof TruncationError) {
        // Sub-partition further or alert operations
        console.error(`Partition "${segment.label}" exceeded ceiling. Refine filter or split segment.`);
        throw err;
      }
      throw err;
    }
  }

  return masterCollection;
}

Architecture Decisions & Rationale

Explicit Failure Over Silent Degradation: Throwing TruncationError forces observability. Sync jobs should fail loudly rather than write incomplete data. This aligns with fail-fast principles in data engineering.
Partitioning Over Page Size Tuning: Increasing page_size does not bypass the 10,000-result ceiling. Partitioning by date ranges, status values, or alphabetical boundaries distributes the load across multiple independent queries, each staying under the limit.
Type-Safe Metadata Enforcement: Casting the response to NotionPaginationEnvelope ensures TypeScript catches missing request_status checks at compile time. This prevents runtime blind spots when SDK types lag behind API updates.
Segment-Level Error Isolation: Wrapping partition fetches in try/catch allows granular failure reporting. If one segment truncates, you know exactly which filter boundary needs refinement rather than debugging a monolithic sync failure.

Pitfall Guide

1. The Cursor Exhaustion Fallacy

Explanation: Assuming has_more === false guarantees complete data retrieval. The API now uses this flag to signal both natural end-of-results and hard-cap termination. Fix: Always inspect request_status.type before treating a loop termination as successful completion.

2. SDK Auto-Pagination Blindness

Explanation: Helper methods like iteratePaginatedAPI abstract response inspection. They follow the legacy contract and silently stop at the ceiling. Fix: Replace auto-pagination helpers with custom loops that explicitly validate metadata, or wrap the helper in a middleware that checks the final response envelope.

3. Partition Size Miscalculation

Explanation: Creating date or status partitions that still contain >10,000 matching rows. The ceiling applies to each individual query, not the aggregate. Fix: Estimate row distribution using Notion UI counts or secondary metadata APIs. Cap each partition at ~8,000 records to maintain a safety buffer.

4. Treating Truncation as Transient

Explanation: Retrying the same query on 429 or 500 logic. Truncation is a deterministic limit, not a network or rate-limit issue. Fix: Implement a circuit breaker that switches to partitioning mode when TruncationError is caught. Never retry a truncated query without filter refinement.

5. Ignoring Additive Metadata Fields

Explanation: Schema validators that permit unknown fields will pass truncated responses as valid. Strict validators may reject them but won't explain the data loss. Fix: Explicitly assert request_status in validation pipelines. Treat type: "incomplete" as a terminal state, not an optional extension.

6. Missing Baseline Validation

Explanation: Sync jobs report "N records synced" without verifying against a source-of-truth count. 10,000 looks correct until a user notices missing entries. Fix: Implement a pre-sync baseline check. Query Notion's database metadata or use a secondary lightweight endpoint to fetch expected row counts. Alert if synced_count < expected_count * 0.95.

7. Hardcoded Full-Database Syncs

Explanation: Running daily full exports regardless of dataset growth. As databases cross 10k, syncs silently truncate without triggering alerts. Fix: Transition to incremental syncs using last_edited_time filters. Only fetch rows modified since the last successful checkpoint. This naturally keeps queries under the ceiling and reduces compute overhead.

Production Bundle

Action Checklist

Audit codebase for has_more, next_cursor, iteratePaginatedAPI, and start_cursor patterns
Update response interfaces to include request_status as a required integrity field
Replace cursor-only termination loops with explicit request_status.type checks
Implement partitioning strategy for databases exceeding 8,000 estimated rows
Add TruncationError handling with alerting and partition refinement logic
Deploy baseline row-count validation against Notion UI or metadata endpoints
Configure monitoring dashboards to track synced_count vs expected_count divergence
Test partition boundaries with synthetic datasets containing 12,000+ mock records

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Database < 8,000 rows	Standard integrity-checked loop	Stays safely under ceiling; partitioning adds unnecessary complexity	Low (minimal compute)
Database 8,000–25,000 rows	Status/date partitioning	Distributes load across independent queries; each stays under 10k	Medium (increased API calls)
Database > 25,000 rows	Incremental sync with `last_edited_time`	Avoids full scans; only fetches deltas; naturally respects limits	Low-Medium (efficient bandwidth)
Real-time dashboard	Event-driven webhooks + partial sync	Pushes updates instead of polling; reduces sync frequency	High (initial webhook setup)
One-time migration	Partitioned export with parallel workers	Maximizes throughput while respecting per-query limits	Medium (worker infrastructure)

Configuration Template

// sync.config.ts
export interface SyncPipelineConfig {
  databaseId: string;
  partitionStrategy: 'none' | 'status' | 'date_range' | 'incremental';
  partitionProperty?: string;
  maxPartitionSize: number;
  pageSize: number;
  retryPolicy: {
    maxAttempts: number;
    backoffMs: number;
    retryOnTruncation: boolean;
  };
  telemetry: {
    enableBaselineCheck: boolean;
    alertThreshold: number; // percentage divergence
    metricNamespace: string;
  };
}

export const defaultConfig: SyncPipelineConfig = {
  databaseId: process.env.NOTION_DATABASE_ID!,
  partitionStrategy: 'date_range',
  partitionProperty: 'created_time',
  maxPartitionSize: 8000,
  pageSize: 100,
  retryPolicy: {
    maxAttempts: 3,
    backoffMs: 1000,
    retryOnTruncation: false, // truncation requires filter refinement, not retry
  },
  telemetry: {
    enableBaselineCheck: true,
    alertThreshold: 5,
    metricNamespace: 'notion.sync.integrity',
  },
};

Quick Start Guide

Install dependencies: npm install @notionhq/client zod (Zod recommended for runtime validation)
Replace existing pagination loops: Swap while (res.has_more) with the integrity-checked pattern that validates request_status.type
Add partition configuration: Define PartitionConfig segments based on your database's high-cardinality properties (status, date, category)
Deploy baseline validation: Add a pre-sync check that compares accumulated.length against Notion's reported row count or a cached baseline
Enable alerting: Configure your monitoring system to trigger on TruncationError or synced_count < expected_count * 0.95

This approach transforms a silent data loss vector into a deterministic, observable pipeline. By treating API metadata as authoritative rather than optional, you eliminate the gap between structural validity and logical completeness.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back