without session loss.
import { createClient } from 'redis';
import session from 'express-session';
import RedisStore from 'connect-redis';
const redisClient = createClient({
url: process.env.REDIS_URL,
socket: { reconnectStrategy: (retries) => Math.min(retries * 50, 2000) }
});
await redisClient.connect();
const sessionMiddleware = session({
store: new RedisStore({ client: redisClient }),
secret: process.env.APP_SESSION_KEY,
resave: false,
saveUninitialized: false,
cookie: {
secure: true,
httpOnly: true,
sameSite: 'none',
maxAge: 24 * 60 * 60 * 1000
}
});
export default sessionMiddleware;
Rationale: The resave: false and saveUninitialized: false flags reduce Redis write operations, lowering cache load during high traffic. The reconnectStrategy ensures the session store remains available even during transient Redis network blips.
3. Dependency-Aware Health Endpoints
Load balancers route traffic based on health status. A superficial health check that only verifies the HTTP server is running will route traffic to instances that have lost database connections or cache access. This causes requests to hang or fail silently.
Implementation: Expose a readiness endpoint that validates critical dependencies.
import { Router } from 'express';
import { dbPool } from '../infrastructure/database';
import { cacheClient } from '../infrastructure/cache';
const router = Router();
router.get('/status/ready', async (req, res) => {
const checks = {
database: false,
cache: false
};
try {
await Promise.all([
dbPool.query('SELECT 1').then(() => { checks.database = true; }),
cacheClient.ping().then(() => { checks.cache = true; })
]);
const isHealthy = checks.database && checks.cache;
const statusCode = isHealthy ? 200 : 503;
res.status(statusCode).json({
status: isHealthy ? 'ready' : 'degraded',
checks
});
} catch (error) {
res.status(503).json({
status: 'unhealthy',
error: error instanceof Error ? error.message : 'Unknown error'
});
}
});
export default router;
Rationale: Checking both the database and cache ensures the instance can perform actual work. The Promise.all structure detects failures quickly. Load balancers should poll this endpoint every 10 seconds and mark instances as down after two consecutive failures.
4. Nginx Configuration for Webhook Pools
For webhook workers, Nginx provides robust load balancing with connection management features. The configuration must use least connections, manage upstream failures gracefully, and reuse connections to reduce latency.
upstream webhook_processors {
least_conn;
server node-01.prod.internal:3000 max_fails=3 fail_timeout=30s;
server node-02.prod.internal:3000 max_fails=3 fail_timeout=30s;
server node-03.prod.internal:3000 max_fails=3 fail_timeout=30s;
keepalive 64;
}
server {
listen 80;
location /webhooks/ {
proxy_pass http://webhook_processors;
proxy_read_timeout 15s;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
Rationale:
least_conn: Routes to the instance with the lowest active connection count.
max_fails=3 fail_timeout=30s: Removes an upstream server from rotation after 3 failures within 30 seconds, preventing traffic from hitting a degraded node.
keepalive 64: Maintains persistent connections to upstream servers. This eliminates TCP handshake overhead for every webhook request, significantly reducing latency during bursts.
proxy_http_version 1.1 and Connection "": Required to enable keepalive connections with upstream servers.
5. Circuit Breaking for External Dependencies
Shopify apps depend on external APIs (Admin API, fulfillment services). When these APIs experience latency spikes or outages, your worker threads can become exhausted waiting for responses. Circuit breakers prevent this by failing fast when error rates exceed a threshold.
Implementation: Use a circuit breaker library like opossum to wrap API calls.
import CircuitBreaker from 'opossum';
import { fetchShopifyData } from '../services/shopify-api';
import { getCachedData } from '../services/cache-layer';
const shopifyBreaker = new CircuitBreaker(fetchShopifyData, {
timeout: 4500,
errorThresholdPercentage: 60,
resetTimeout: 30000,
volumeThreshold: 10,
cacheEnabled: false
});
shopifyBreaker.fallback(async (shopId, endpoint) => {
const cached = await getCachedData(shopId, endpoint);
if (cached) return cached;
throw new Error('Service unavailable and no cache hit');
});
export async function getShopData(shopId: string) {
return shopifyBreaker.fire(shopId, '/products');
}
Rationale:
timeout: 4500: Fails the call if the API takes longer than 4.5 seconds.
errorThresholdPercentage: 60: Opens the circuit when 60% of requests fail.
volumeThreshold: 10: Requires at least 10 requests before evaluating the error threshold, preventing premature opening during low traffic.
resetTimeout: 30000: Allows a test request after 30 seconds to check if the service has recovered.
- Fallback: Returns cached data when the circuit is open, maintaining partial functionality during outages.
Pitfall Guide
1. Sticky Sessions on Webhook Workers
Explanation: Enforcing IP Hash or cookie-based stickiness for webhook endpoints forces all webhooks for a merchant to hit the same instance. If that instance becomes overloaded, webhooks queue up, causing delays and potential Shopify retries.
Fix: Use Least Connections for webhook pools. Webhooks should be stateless and idempotent; no instance affinity is required.
2. Shallow Health Checks
Explanation: Health endpoints that only return 200 OK without checking dependencies cause the load balancer to route traffic to instances that cannot process requests. This results in high error rates and timeout spikes.
Fix: Implement dependency checks for database, cache, and critical external services. Return 503 if any dependency is unavailable.
3. Missing Upstream Keepalive
Explanation: Without keepalive in Nginx, a new TCP connection is established for every request to upstream servers. This adds significant latency and consumes file descriptors, limiting throughput during bursts.
Fix: Configure keepalive in the upstream block and set proxy_http_version 1.1 with an empty Connection header to enable connection reuse.
4. Circuit Breaker Misconfiguration
Explanation: Setting the error threshold too low causes the circuit to open during transient spikes, cutting off traffic unnecessarily. Setting it too high allows cascading failures to exhaust resources.
Fix: Tune thresholds based on historical error rates. A 50-60% error threshold with a volume threshold of 10+ requests provides a balance between resilience and availability.
5. In-Memory Job Queues
Explanation: Storing job queues in process memory means jobs are lost if an instance restarts or if the load balancer routes subsequent requests elsewhere. This breaks async workflows.
Fix: Use a distributed queue system like Redis Streams, BullMQ, or a message broker. Ensure jobs are persisted and can be processed by any instance.
6. Thundering Herd on Recovery
Explanation: When a failed instance recovers and is added back to the pool, or when a circuit breaker closes, a sudden influx of traffic can overwhelm the recovering service.
Fix: Implement gradual recovery. Nginx's slowstart parameter (if available) or application-level warm-up logic can ramp up traffic to recovering instances. Circuit breakers should use half-open states to probe recovery before full restoration.
7. Ignoring Idempotency in Webhooks
Explanation: Shopify retries webhooks if it does not receive a 200 response within the timeout window. If your app processes the same webhook multiple times without idempotency checks, it can cause duplicate actions, data corruption, or API rate limit violations.
Fix: Implement idempotency keys based on the webhook event ID. Check for processed events in a store before executing business logic. Return 200 immediately for duplicate events.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-Volume Webhooks | Least Connections + Redis Queue | Handles variable job durations; prevents queue buildup. | Moderate (Redis/Queue infra) |
| Stateless API Scaling | Round Robin + Stateless Design | Simple distribution; maximizes throughput for uniform requests. | Low |
| OAuth / WebSockets | IP Hash / Sticky Sessions | Maintains session continuity required for handshakes. | Low |
| Mixed Instance Fleet | Weighted Round Robin | Distributes traffic proportional to instance capacity. | Low |
| Zero-Downtime Deploys | Blue-Green with Traffic Split | Allows gradual rollout and instant rollback. | Moderate (Double capacity during deploy) |
| External API Volatility | Circuit Breaker + Cache Fallback | Prevents cascading failures; maintains partial availability. | Low |
Configuration Template
Nginx Load Balancer Configuration for Shopify App:
# Upstream for Webhook Workers (Variable Duration)
upstream webhook_workers {
least_conn;
server worker-01.internal:3000 max_fails=3 fail_timeout=30s;
server worker-02.internal:3000 max_fails=3 fail_timeout=30s;
server worker-03.internal:3000 max_fails=3 fail_timeout=30s;
keepalive 64;
}
# Upstream for API Workers (Uniform Duration)
upstream api_workers {
server api-01.internal:3000;
server api-02.internal:3000;
server api-03.internal:3000;
keepalive 32;
}
server {
listen 80;
server_name app.example.com;
# Webhook Endpoint
location /webhooks/ {
proxy_pass http://webhook_workers;
proxy_read_timeout 15s;
proxy_set_header X-Real-IP $remote_addr;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
# API Endpoint
location /api/ {
proxy_pass http://api_workers;
proxy_set_header X-Real-IP $remote_addr;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
# Health Check Endpoint (Internal)
location /status/ready {
proxy_pass http://api_workers;
access_log off;
}
}
Quick Start Guide
- Externalize State: Configure Redis and update your session middleware to use a Redis store. Ensure all instances share the same Redis cluster.
- Add Health Endpoint: Implement a
/status/ready endpoint in your application that checks database and cache connectivity. Return 200 only if all dependencies are healthy.
- Update Nginx Config: Replace your Nginx upstream blocks with the template above. Use
least_conn for webhook workers and keepalive for connection reuse.
- Deploy Circuit Breakers: Install
opossum and wrap your Shopify API calls with circuit breaker logic. Configure timeouts and fallbacks.
- Verify Routing: Test the load balancer by sending requests to webhook and API endpoints. Verify that health checks correctly remove unhealthy instances and that circuit breakers trigger on API failures.