ervice boundary.
- Client-Side Flags: For UI variations and A/B testing where latency is critical. Flags are evaluated locally using a pre-fetched configuration bundle.
Rationale: Server-side evaluation prevents flag state manipulation by clients and ensures consistent behavior across microservices. Client-side evaluation reduces latency for UI rendering but requires careful cache invalidation strategies.
2. Canary Release Implementation
Canary releases route a percentage of traffic to the new version while monitoring for anomalies. This requires integration with the API gateway or service mesh.
TypeScript Implementation: Canary Router Middleware
This middleware intercepts requests and routes them based on a canary percentage and user segmentation.
import { Request, Response, NextFunction } from 'express';
import { FeatureFlagClient } from '@codcompass/flags-sdk';
interface CanaryConfig {
flagKey: string;
canaryPercentage: number;
targetServiceUrl: string;
fallbackServiceUrl: string;
}
export class CanaryRouter {
private flagClient: FeatureFlagClient;
constructor(flagClient: FeatureFlagClient) {
this.flagClient = flagClient;
}
public route = async (req: Request, res: Response, next: NextFunction) => {
const userId = req.user?.id || req.ip;
const config: CanaryConfig = req.locals.canaryConfig;
// Evaluate flag with user context for sticky bucketing
const isCanary = await this.flagClient.getBoolVariation(
config.flagKey,
{ key: userId, email: req.user?.email },
false
);
if (isCanary) {
// Route to canary service
req.url = config.targetServiceUrl + req.url;
req.headers['x-canary'] = 'true';
} else {
// Route to stable service
req.url = config.fallbackServiceUrl + req.url;
req.headers['x-canary'] = 'false';
}
next();
};
}
3. Automated Rollback with SLO Enforcement
Rollbacks must be triggered automatically when Service Level Objectives (SLOs) are breached. This requires integration between the launch controller and the observability platform.
TypeScript Implementation: Launch Controller with SLO Guard
import { MetricsClient } from '@codcompass/observability';
import { FlagManager } from '@codcompass/flags-sdk';
export class LaunchController {
private metrics: MetricsClient;
private flags: FlagManager;
constructor(metrics: MetricsClient, flags: FlagManager) {
this.metrics = metrics;
this.flags = flags;
}
public async executeLaunch(
flagKey: string,
rolloutPercentage: number,
sloThresholds: { errorRate: number; p99Latency: number }
): Promise<void> {
// 1. Enable flag for rollout
await this.flags.updateVariation(flagKey, { percentage: rolloutPercentage });
// 2. Monitor SLOs for 5 minutes
const monitoringWindow = 300_000; // 5 minutes
const checkInterval = 10_000; // 10 seconds
const startTime = Date.now();
while (Date.now() - startTime < monitoringWindow) {
const currentErrorRate = await this.metrics.getMetric('http_error_rate_5xx');
const currentP99 = await this.metrics.getMetric('http_request_duration_p99');
// 3. Check SLO breach
if (
currentErrorRate > sloThresholds.errorRate ||
currentP99 > sloThresholds.p99Latency
) {
console.error(`SLO BREACH DETECTED. Error: ${currentErrorRate}, P99: ${currentP99}`);
// 4. Automated Rollback
await this.flags.updateVariation(flagKey, { percentage: 0 });
await this.metrics.emitEvent('launch_rollback_triggered', {
flagKey,
reason: 'slo_breach',
errorRate: currentErrorRate,
p99: currentP99
});
throw new Error('Launch aborted due to SLO breach');
}
await new Promise(resolve => setTimeout(resolve, checkInterval));
}
// 5. Launch successful
await this.metrics.emitEvent('launch_completed', { flagKey, percentage: rolloutPercentage });
}
}
Architecture Decisions and Rationale
- Flag Storage: Store flag configurations in a distributed key-value store (e.g., Redis Cluster) with edge caching. This ensures low-latency evaluation even under high load during launch spikes.
- Context Enrichment: All flag evaluations must include rich context (user ID, tenant ID, geographic region, device type). This enables targeted rollouts (e.g., "roll out to internal users first," "exclude enterprise tenants until validation").
- Circuit Breaking: Implement circuit breakers around new feature dependencies. If a new feature calls an external API, the circuit breaker should trip immediately upon detecting latency spikes, preventing cascading failures.
- Database Strategy: Schema changes must be backward-compatible. Use expand/contract pattern. The new code must handle both old and new schema states during the launch window. Never block on schema migrations during a product launch.
Pitfall Guide
1. Flag Debt Accumulation
Mistake: Leaving feature flags in the codebase indefinitely after launch.
Impact: Code complexity increases, testing matrix explodes, and performance degrades due to excessive conditional logic.
Best Practice: Implement a "Flag Lifecycle" policy. Every flag must have an expiration date. Use automated tooling to scan for stale flags and generate cleanup tickets. Integrate flag cleanup into the Definition of Done.
2. Inadequate Flag Testing
Mistake: Testing feature flags only in production or relying on manual toggling.
Impact: Flags may fail to evaluate correctly under load, or flag configurations may be corrupted during deployment.
Best Practice: Include flag evaluation in integration tests. Mock flag providers to test all variations. Run load tests with flags enabled to verify evaluation latency and throughput.
3. Cache Invalidation Failures
Mistake: Not invalidating caches when flag states change.
Impact: Users see stale feature states. For example, a user might continue seeing the old UI after a rollout, or worse, see a broken hybrid state.
Best Practice: Implement cache invalidation hooks in the flag management system. When a flag changes, emit a Pub/Sub event to invalidate relevant cache keys. Use versioned cache keys tied to flag configurations.
4. Missing Business Metrics in Observability
Mistake: Monitoring only technical metrics (CPU, latency) during launch.
Impact: Technical success does not guarantee product success. A feature might be stable but fail to drive engagement or cause a drop in conversion.
Best Practice: Define business SLIs (Service Level Indicators) for every launch. Track metrics like checkout_completion_rate, feature_adoption_rate, and user_retention. Correlate these with technical metrics in a single dashboard.
5. Over-Engineering Flag Logic
Mistake: Creating complex nested flag conditions or flag-dependent feature interactions.
Impact: Unpredictable behavior and debugging nightmares. Flag combinations can create exponential state spaces.
Best Practice: Keep flag logic flat. Avoid flag dependencies. If a feature requires multiple flags, use a configuration object rather than nested evaluations. Document flag interactions explicitly.
6. Ignoring Mobile App Release Constraints
Mistake: Treating mobile app launches like web deployments.
Impact: Mobile apps cannot be rolled back instantly. App store review processes delay updates.
Best Practice: For mobile, use remote configuration to gate features. The app binary must contain the feature code, but the feature is disabled by default. Remote config toggles the feature on for specific user segments. Plan for "staged rollouts" via app store percentage releases combined with remote config.
7. Silent Flag Failures
Mistake: The flag provider goes down, and the app fails open or closed without alerting.
Impact: If the flag provider is unavailable, the app must have a deterministic fallback. Failing open might expose unfinished features; failing closed might block users.
Best Practice: Define fallback values for all flags. Implement circuit breakers around flag evaluation calls. Alert immediately if flag evaluation failure rates exceed a threshold. Use local caching to survive provider outages.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Internal Tool Launch | 100% Rollout with Feature Flag | Low risk, internal users can provide immediate feedback. Flag allows instant kill switch. | Low. Minimal infrastructure overhead. |
| Public API v2 | Canary Release (1% β 5% β 20% β 100%) | API changes can break clients. Canary allows monitoring error rates and latency on a small subset. | Medium. Requires gateway routing and monitoring setup. |
| Mobile App Feature | Remote Config + Staged Store Rollout | App stores limit rollback speed. Remote config enables instant disablement. Staged rollout limits exposure. | High. Requires app update cycle and remote config infrastructure. |
| Database Migration | Expand/Contract Pattern | Ensures zero downtime. Old and new code coexist during transition. | Medium. Requires careful schema design and dual-write logic. |
| Marketing Campaign | Percentage Rollout with A/B Testing | Validates conversion impact. Allows comparison against control group. | Low. Standard A/B testing infrastructure. |
Configuration Template
Launch Configuration YAML
This template defines the launch parameters, SLOs, and rollout strategy. Use this as input for your launch automation pipeline.
launch:
id: "checkout-flow-redesign-v2"
timestamp: "2024-05-20T10:00:00Z"
owner: "team-payments"
feature_flags:
- key: "checkout_redesign_enabled"
type: "server_side"
default: false
targeting:
- rule: "internal_users"
variation: true
- rule: "percentage"
value: 0
increment: 10
interval: "30m"
slos:
technical:
error_rate_5xx: 0.5
p99_latency_ms: 200
business:
conversion_rate_drop_percent: 2.0
rollback:
auto_trigger: true
condition: "slo_breach"
action: "disable_flag"
notification:
channels: ["#launch-alerts", "slack-payments"]
observability:
dashboard: "launch-checkout-v2"
metrics:
- "http_request_duration"
- "checkout_completion_rate"
- "feature_flag_evaluation_latency"
Quick Start Guide
- Install Feature Flag SDK:
npm install @codcompass/flags-sdk
- Initialize Client in Application:
import { FlagClient } from '@codcompass/flags-sdk';
const flagClient = new FlagClient({
apiKey: process.env.FLAG_API_KEY,
environment: process.env.NODE_ENV,
cache: true
});
await flagClient.initialize();
- Wrap Feature Code:
const isNewCheckout = await flagClient.getBoolVariation(
'checkout_redesign_enabled',
{ key: user.id },
false
);
if (isNewCheckout) {
return renderNewCheckout(user);
}
return renderLegacyCheckout(user);
- Deploy and Toggle:
Deploy the code. Use the flag management dashboard to enable the flag for internal users first, then gradually increase the percentage while monitoring SLOs.
- Verify and Clean:
Once the launch is stable and metrics meet SLOs, remove the flag logic and dead code. Merge the cleanup PR.