ingestion function should remain provider-agnostic in its core logic. It receives the raw request context, extracts provider-specific metadata from headers, flattens critical fields for indexing, and persists the complete payload for auditability.
async function processIncomingWebhook(context: ExecutionContext) {
const requestHeaders = context.request.headers ?? {};
const rawBody = context.request.body;
const eventType = requestHeaders['x-github-event'] ?? 'unclassified';
const deliveryToken = requestHeaders['x-github-delivery'] ?? null;
const repositoryName = rawBody.repository?.full_name ?? 'unknown';
const actorHandle = rawBody.sender?.login ?? null;
const actionType = rawBody.action ?? null;
const persistedRecord = await context.storage.createEntry('provider-events', {
classification: eventType,
deliveryToken: deliveryToken,
repository: repositoryName,
actor: actorHandle,
action: actionType,
ingestionTimestamp: new Date().toISOString(),
originalPayload: rawBody
});
context.logger.info('Webhook persisted', {
deliveryToken,
eventType,
repository: repositoryName,
recordId: persistedRecord.id
});
return { status: 'accepted', recordId: persistedRecord.id };
}
Architecture Rationale:
context.request.body contains the unmodified POST payload. No middleware should parse or mutate it before verification.
- Headers are extracted explicitly. GitHub places the event classification in
x-github-event and a unique delivery identifier in x-github-delivery. These fields are flattened to the top level to enable efficient filtering without scanning nested JSON.
- The complete payload is stored under
originalPayload to preserve audit trails and support future schema evolution.
- An
ingestionTimestamp is added server-side to track arrival time, which differs from provider-generated timestamps and helps detect network latency or replay attempts.
Create a dedicated trigger bound to the ingestion function. Assign a clean path segment to isolate the endpoint from other integrations.
| Configuration Field | Value |
|---|
| Trigger Name | github-event-listener |
| Bound Function | processIncomingWebhook |
| Trigger Type | HTTP Endpoint |
| Route Path | github |
The runtime generates a public endpoint:
https://api.runtime.io/data/workspace/{workspace-id}/api/v1/http-trigger/github
Enable cryptographic validation at the trigger level. GitHub uses HMAC-SHA256 with a simple prefix format. Configure the verification engine to match this specification:
| Verification Setting | Value |
|---|
| Enable Signature Check | true |
| Signing Secret | {your-github-webhook-secret} |
| Header Source | x-hub-signature-256 |
| HMAC Algorithm | sha256 |
| Digest Encoding | hex |
| Extraction Pattern | sha256=(.+) |
| Secret Encoding | raw |
The extraction pattern strips the sha256= prefix, leaving only the hexadecimal digest for comparison. The verification engine computes the HMAC of the raw request body using the configured secret and compares it against the extracted digest using constant-time comparison to prevent timing attacks.
Why per-trigger configuration? Stripe requires timestamp extraction (t=) and versioned hash parsing (v1=). Shopify demands base64 decoding. Twilio relies on query parameters. Centralizing verification logic forces conditional branching that increases attack surface and maintenance cost. Isolating rules per trigger ensures cryptographic correctness without code changes.
Step 4: Programmatic Querying
Once events are persisted, they can be queried using the platform's data client. Flattened fields enable direct filtering, while the raw payload remains accessible for deep inspection.
import { DataClient } from '@platform-sdk/core';
const client = new DataClient({
workspaceId: process.env.WORKSPACE_ID,
credentials: {
clientId: process.env.CLIENT_ID,
clientSecret: process.env.CLIENT_SECRET
}
});
// Filter by event classification
const pushEvents = await client.fetchRecords('provider-events', {
filter: { 'data.classification': 'push' }
});
// Scope to a specific repository
const repoActivity = await client.fetchRecords('provider-events', {
filter: { 'data.repository': 'acme-corp/frontend' }
});
// Combine classification and actor
const userPullRequests = await client.fetchRecords('provider-events', {
filter: {
'data.classification': 'pull_request',
'data.actor': 'octocat'
}
});
Architecture Rationale: The SDK abstracts pagination and query compilation. Flattened fields (classification, repository, actor) are automatically indexed during schema discovery, enabling sub-second query latency. The originalPayload field remains unindexed by default to preserve storage efficiency, but can be queried via full-text or JSON path operators when needed.
Pitfall Guide
1. Ignoring Delivery ID Deduplication
Explanation: Providers resend events on timeout or manual retry. Without tracking delivery identifiers, the pipeline processes identical payloads multiple times, corrupting metrics and triggering duplicate side effects.
Fix: Extract the provider's delivery ID from headers, store it as a unique constraint, and reject incoming requests with matching tokens within a configurable window.
2. Hardcoding Rigid Schemas for Heterogeneous Payloads
Explanation: Forcing a strict table structure onto webhook events causes validation failures when providers add optional fields or change payload shapes during API version upgrades.
Fix: Use schemaless storage for the primary collection. Flatten frequently queried fields to the top level, and run schema discovery periodically to promote stable fields to indexed columns.
3. Mishandling Signature Prefixes & Encoding
Explanation: GitHub prefixes its digest with sha256=, Stripe uses v1=, and Shopify base64-encodes its HMAC. Applying a single extraction regex or encoding assumption causes verification failures.
Fix: Configure extraction patterns and encoding per trigger. Validate the header format before computation, and log mismatched prefixes for debugging without exposing secrets.
4. Skipping Content-Type Validation
Explanation: Accepting application/x-www-form-urlencoded or text/plain payloads when expecting JSON opens the pipeline to parsing errors or injection attempts.
Fix: Reject requests where Content-Type does not match application/json. Fail fast with a 415 Unsupported Media Type response to prevent unnecessary processing.
5. Overlooking Replay Protection Timestamps
Explanation: Some providers include timestamps in their signature headers. Processing events older than a defined window increases exposure to replay attacks.
Fix: Extract the timestamp from the header, compare it against the current server time, and reject payloads exceeding the maximum age threshold (typically 5 minutes).
6. Storing Raw Payloads Without Indexing Strategy
Explanation: Persisting large JSON blobs without a query strategy leads to full-collection scans, degrading performance as event volume grows.
Fix: Flatten high-cardinality fields (eventType, repo, action) to the top level. Use schema discovery to auto-index stable paths. Keep raw payloads in a separate, unindexed column for audit purposes.
7. Assuming All Providers Use HMAC
Explanation: Twilio, certain SaaS platforms, and legacy systems use URL-based authentication, bearer tokens, or IP allowlists instead of cryptographic signatures.
Fix: Design the trigger configuration to support multiple verification modes. Disable HMAC checks when the provider uses alternative authentication, and enforce IP filtering or token validation instead.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single provider, predictable payload shape | Rigid schema + dedicated function | Simplifies querying and enforces data contracts | Low storage, higher migration cost on API changes |
| Multi-provider ingestion, varying payload structures | Schemaless collection + flattened top-level fields | Absorbs structural variance without breaking the pipeline | Moderate storage, near-zero migration cost |
| High-volume event streaming (>10k/min) | Partitioned schemaless storage + async indexing | Prevents write contention and maintains query performance | Higher infrastructure cost, linear scalability |
| Compliance/audit requirements | Raw payload retention + immutable delivery logs | Preserves cryptographic proof and payload history | Increased storage cost, negligible compute impact |
Configuration Template
# trigger-config.yaml
trigger:
name: github-event-listener
type: HTTP_ENDPOINT
path: /github
function: processIncomingWebhook
security:
signature_verification:
enabled: true
header_source: x-hub-signature-256
algorithm: sha256
digest_encoding: hex
extraction_pattern: "sha256=(.+)"
secret_encoding: raw
secret_ref: env:GITHUB_WEBHOOK_SECRET
storage:
collection: provider-events
schema_mode: schemaless
flattened_fields:
- classification
- deliveryToken
- repository
- actor
- action
raw_payload_field: originalPayload
observability:
log_level: info
metrics:
- event_ingestion_count
- signature_verification_failures
- duplicate_rejection_count
Quick Start Guide
- Create the storage collection: Initialize a schemaless collection named
provider-events in your workspace console.
- Deploy the ingestion function: Paste the
processIncomingWebhook implementation into your function registry and bind it to the collection.
- Configure the HTTP trigger: Set the route path, enable signature verification, and input your provider's signing secret and extraction pattern.
- Register the endpoint: Add the generated public URL to your provider's webhook settings, matching the content type and secret configuration.
- Validate ingestion: Trigger a test event, verify the
202 Accepted response, and query the collection using flattened fields to confirm successful persistence.