e elimination of configuration drift.
Core Solution
Implementing a robust data governance framework requires embedding controls into the software delivery lifecycle. This section outlines a technical implementation using Policy-as-Code, Row-Level Security, and Automated Lineage.
Step-by-Step Implementation
1. Define Policies as Code
Move policy definitions from spreadsheets to version-controlled configuration files. Use a policy engine like Open Policy Agent (OPA) or native database policy languages.
TypeScript Policy Definition SDK:
Define policies using a typed interface to ensure consistency and enable IDE validation.
// governance/policies/user-data.ts
import { DataPolicy, ResourceType, Action, Transformation } from '@codcompass/governance-sdk';
export const piiMaskingPolicy: DataPolicy = {
id: 'POL-001',
resource: {
type: ResourceType.TABLE,
name: 'public.users',
database: 'prod_analytics'
},
action: Action.READ,
subject: {
role: 'data_analyst'
},
enforcement: {
column: 'email',
transformation: Transformation.MASK_EMAIL,
condition: 'NOT is_service_account()'
},
metadata: {
owner: 'team-data-platform',
compliance: ['GDPR', 'CCPA'],
reviewCycle: 'QUARTERLY'
}
};
export const rlsAccessPolicy: DataPolicy = {
id: 'POL-002',
resource: {
type: ResourceType.TABLE,
name: 'public.transactions',
database: 'prod_warehouse'
},
action: Action.READ,
subject: {
role: 'finance_team'
},
enforcement: {
rowFilter: 'region = current_setting("app.current_region")',
columnExclusions: ['ssn', 'credit_card_hash']
}
};
2. Implement Row-Level and Column-Level Security
Leverage database-native features for runtime enforcement. This ensures data protection even if application logic is bypassed.
PostgreSQL RLS Implementation:
-- Enable RLS on critical tables
ALTER TABLE public.transactions ENABLE ROW LEVEL SECURITY;
-- Create policy based on application context
CREATE POLICY tenant_isolation ON public.transactions
FOR SELECT
USING (tenant_id = current_setting('app.tenant_id')::uuid);
-- Create policy for PII access
CREATE POLICY pii_restriction ON public.users
FOR SELECT
USING (
CASE
WHEN has_role('admin') THEN true
WHEN has_role('support') THEN created_at > NOW() - INTERVAL '90 days'
ELSE false
END
);
3. CI/CD Integration with OPA
Validate infrastructure and schema changes against governance policies before deployment.
OPA Rego Policy for Schema Validation:
# policies/data_schema.rego
package data.governance
deny[msg] {
input.resource.type == "aws_db_instance"
input.resource.storage_encrypted == false
msg := "DB instances must have storage encryption enabled"
}
deny[msg] {
input.resource.type == "postgresql_table"
not input.resource.row_level_security
msg := "Tables containing PII must have Row Level Security enabled"
}
deny[msg] {
input.resource.type == "aws_db_instance"
input.resource.publicly_accessible == true
msg := "Publicly accessible databases are prohibited"
}
GitHub Action Workflow Snippet:
- name: Validate Data Policies
uses: open-policy-agent/conftest-action@v1
with:
files: terraform/
policy: policies/
fail-on-warn: false
4. Automated Lineage and Cataloging
Deploy an automated metadata management solution. Use OpenLineage to capture lineage events from compute engines.
Architecture Decision:
- Tooling: DataHub or Amundsen for the catalog; OpenLineage for event streaming.
- Rationale: OpenLineage provides a vendor-neutral standard for lineage, preventing lock-in. DataHub offers robust integration with modern stacks and supports policy enforcement hooks.
- Implementation: Inject OpenLineage interceptors into Airflow, dbt, and Spark jobs. Lineage is reconstructed automatically from execution logs.
Architecture Decisions
- Shift-Left Governance: Policies are evaluated in CI, not just at runtime. This prevents non-compliant infrastructure from reaching production.
- Centralized Policy, Distributed Enforcement: Policy definitions are stored centrally in Git, but enforcement happens at the data plane (database, compute engine) to minimize latency.
- Immutable Audit Logs: All policy changes and access events are written to an append-only log (e.g., S3 with Object Lock) for forensic analysis.
Pitfall Guide
Common Mistakes
- Governance Without Lineage: Implementing access controls without lineage creates blind spots. If you cannot trace where data flows, you cannot guarantee PII is masked in downstream datasets.
- Fix: Integrate lineage capture into every data transformation job.
- Over-Reliance on Network Security: Assuming VPC peering or private subnets protect data is a critical error. Lateral movement within the network can expose unencrypted data.
- Fix: Encrypt data at rest and in transit; enforce RLS regardless of network topology.
- Static Policies in Dynamic Environments: Hardcoding roles and permissions in SQL scripts leads to drift as teams scale.
- Fix: Use IaC (Terraform) and policy engines to manage access dynamically based on identity provider groups.
- Ignoring Non-Production Data: Development and staging environments often contain production clones with real PII. This is a major compliance violation.
- Fix: Implement automated data masking or synthetic data generation for non-production refreshes.
- Governance as a Gatekeeper: Requiring manual approvals for every data access request stalls development.
- Fix: Implement self-service access with automated policy validation. Pre-approve roles that meet policy criteria.
- Lack of Data Quality SLAs: Governance includes quality. Accepting data with nulls, duplicates, or schema violations undermines trust.
- Fix: Define quality checks (e.g., Great Expectations) as part of the pipeline contract. Block pipelines on SLA breaches.
- Siloed Tooling: Using separate tools for security, quality, and cataloging creates operational overhead.
- Fix: Consolidate on platforms that offer unified governance capabilities or ensure tight integration via APIs.
Best Practices
- Tag Critical Data: Apply sensitivity labels (e.g.,
confidential, pii) to all data assets. Policies should reference labels, not specific columns, to reduce maintenance.
- Regular Access Reviews: Automate quarterly access reviews using the catalog. Flag dormant accounts and excessive privileges.
- Policy Testing: Write unit tests for policies. Ensure that policy changes do not inadvertently block legitimate access patterns.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Startup / MVP | Native DB RBAC + Basic Catalog | Low operational overhead; sufficient for small teams and limited data scope. | Low |
| Regulated Enterprise | Policy-as-Code + OPA + DataHub | Mandatory auditability; granular control; automated compliance reporting. | High initial, Low risk |
| Multi-Cloud Data Mesh | Federated Governance + OpenLineage | Decentralized ownership with unified standards; prevents siloed governance. | Medium |
| AI/ML Workloads | Feature Store Governance + Model Lineage | Ensures data consistency between training and inference; tracks model drift. | Medium |
Configuration Template
Terraform Module for Governance-Ready Postgres:
module "governed_postgres" {
source = "terraform-aws-modules/rds/aws"
version = "~> 5.0"
identifier = "prod-analytics"
engine = "postgres"
# Encryption and Security
storage_encrypted = true
kms_key_id = module.kms.key_arn
publicly_accessible = false
# Governance Tags
tags = {
GovernanceTier = "Critical"
DataOwner = "team-data-platform"
Compliance = "GDPR,CCPA"
}
# RLS Enforcement via Parameter Group
family = "postgres14"
parameters = [
{
name = "session_preload_libraries"
value = "pg_prewarm" # Placeholder for RLS extension if needed
}
]
}
Data Quality Contract (YAML):
# governance/contracts/users.yaml
dataset: public.users
quality:
- name: no_null_emails
check: "email IS NOT NULL"
severity: critical
- name: valid_email_format
check: "email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$'"
severity: warning
- name: unique_user_ids
check: "COUNT(user_id) = COUNT(DISTINCT user_id)"
severity: critical
governance:
sensitivity: pii
retention: 7_years
access:
- role: data_analyst
action: read
mask: email
Quick Start Guide
- Install Policy Engine: Run
brew install open-policy-agent/conftest/conftest or install via package manager.
- Create First Policy: Write a simple Rego policy in
policies/encryption.rego to check for unencrypted storage.
- Validate Infrastructure: Run
conftest test terraform/ against your Terraform files. Fix violations.
- Enable RLS: Execute
ALTER TABLE <table> ENABLE ROW LEVEL SECURITY; on a test database. Define a basic policy.
- Verify Enforcement: Test access with different roles to confirm RLS blocks unauthorized queries.
This framework provides the technical foundation for enterprise-grade data governance. By treating governance as code and automating enforcement, organizations achieve compliance without sacrificing engineering velocity.