Enterprise Data Platform Consolidation: From 50 Systems to One Source of Truth
Most enterprises have 50+ data systems and no single source of truth. The consolidation playbook we run — without forcing a multi-year migration.
Most enterprises don’t have a data problem. They have a data systems problem. A typical Fortune-500 we audit has 50–200 distinct data stores: 12 different CRMs from acquisitions, 5 ERPs across regions, dozens of departmental SQL Servers, hundreds of Excel files passing as “the system of record” for specific processes, a data warehouse that everyone agrees is the source of truth except for the 30 reports that bypass it. Nobody trusts any single number because every number has a different lineage.
The instinct is to declare “we’ll consolidate everything into one platform” and start a 3-year migration. That instinct kills more enterprise data programs than any technical decision. The migrations are too long, the disruption too painful, the political resistance too entrenched.
There’s a better way: consolidate the layer that matters while leaving the operational systems alone. Here’s the playbook we use.
The pattern that works: federated truth, centralized analytics#
Stop trying to consolidate operational systems. CRMs, ERPs, billing systems, HR systems — these run business processes. Replacing them is a multi-year project that distracts from the actual goal (one source of truth for analytics and decisions).
Instead:
- Leave the operational systems where they are. Salesforce stays. SAP stays. The 12 acquired ERPs stay (or get migrated on the business unit’s own timeline).
- Build the consolidated analytical layer. One warehouse, one lakehouse, one analytical platform that ingests from all the operational systems via CDC or scheduled ELT.
- Govern that one layer. Single business definitions, single semantic model, single access controls, single audit trail.
The operational systems remain federated; the analytics become unified. This is what every “consolidated data platform” success story actually looks like. The failures are the ones that tried to consolidate operations.
The 5 layers we build#
A unified enterprise data platform has five layers. Building them in the right order avoids the “we have lots of pipes but no value” problem most consolidations hit.
Layer 1: ingestion (where data comes from)#
Connect every operational system that produces business-meaningful data. Tools depend on the system:
- Postgres / MySQL / SQL Server / Oracle: CDC via Debezium → Kafka, or AWS DMS, or Fivetran for low-volume tables
- SaaS (Salesforce, HubSpot, Workday, NetSuite, etc.): Fivetran, Airbyte, or custom connectors depending on volume and budget
- File-based / legacy: SFTP polling + Python scripts (often the only way)
- Event streams: Kafka if your enterprise already has Kafka; managed Kinesis or Confluent if not
The 80/20 rule: 80% of business-meaningful data comes from 20% of the systems. Start with those. The long tail can wait.
Layer 2: raw storage (the lake)#
Everything ingested lands in a data lake (S3, GCS, ADLS) in open table format (Iceberg or Delta — see our data versioning piece). This is your historical record. Schema-on-read, no transformation yet.
Why open formats: every analytical tool can read them, you’re not locked into one query engine, time-travel is built in.
Layer 3: transformation (silver + gold)#
Raw → cleaned & joined (silver) → business-ready models (gold). Almost always dbt for the SQL layer; sometimes Spark for compute-heavy transforms.
The hard part isn’t the technology. It’s getting business stakeholders to agree on definitions: what’s a “customer,” what’s “revenue,” what’s an “active account.” That’s where most projects stall.
Layer 4: serving (warehouse + real-time)#
Depending on the workload:
- Analytics + BI at warehouse scale: Snowflake, Databricks, or BigQuery for the heavy lifting
- Real-time / user-facing analytics: ClickHouse Cloud or self-hosted for sub-second query needs
- Operational reverse-ETL: tools like Hightouch or Census to push curated data back into operational systems (Salesforce, marketing tools) so they get the unified view too
Layer 5: governance (across all layers)#
The piece that holds it all together:
- Catalog: Unity Catalog, AWS Glue, or open-source DataHub — every dataset registered with owner, description, schema, lineage
- Access controls: column-level and row-level where compliance demands; role-based at minimum
- Audit logging: every query, every download, every export — for regulated industries this is non-negotiable
- Data quality monitoring: Great Expectations, Soda, or Monte Carlo — alerts when production data shapes change
Governance isn’t a phase 2 thing. Build it during phase 1 or it gets bolted on badly forever.
The sequencing that ships#
The temptation: try to build all five layers across the entire enterprise at once. The pattern that works:
Quarter 1: Pick three high-value domains (e.g., customer master, financial reporting, operational metrics). Build layers 1-3 for those domains only. Ship a small set of high-impact dashboards or reports off the new platform.
Quarter 2: Add 2-3 more domains. Build out layer 4 (warehouse + real-time as needed). Add the first reverse-ETL flows.
Quarter 3: Governance layer matures. Catalog comprehensive. Access controls rolled out. Data quality monitoring on production datasets.
Quarters 4-8: Add remaining domains. By month 18, the platform covers 80% of business-meaningful data, the legacy ad-hoc reporting pipelines have started to atrophy (because the unified platform gives better answers faster), and consolidation is mostly done.
This is sequenced delivery, not a big-bang migration. Stakeholders see value every quarter. Political capital accumulates. The 80% solution is usually 100% of what’s needed.
The 4 anti-patterns that kill enterprise consolidations#
Anti-pattern 1: replatform-and-then-rebuild. Migrate to the new warehouse, then start building business value. Six months of migration with nothing to show. Stakeholders lose faith. Fix: build new pipelines in parallel; show value on the new platform before retiring the old.
Anti-pattern 2: technical purity over pragmatism. “We won’t let any team use the old warehouse anymore.” Forces every team’s hand at once; creates resistance. Fix: let teams migrate at their own pace. The new platform’s better experience pulls them over.
Anti-pattern 3: ignoring the business semantics work. Build great pipes that surface the same definitional confusion. Five teams’ “active customer” definitions all flow through the new warehouse, still arguing. Fix: governance + semantic modeling alongside the pipes, not after.
Anti-pattern 4: trying to consolidate operational systems. “We’ll migrate all 12 ERPs to one.” Multi-year project, business resistance, the old systems still need to run during migration anyway. Fix: leave operational systems federated; consolidate analytics only.
Tools we deploy by default#
For new enterprise consolidation engagements:
- Ingestion: Debezium → Kafka for CDC, Fivetran or Airbyte for SaaS sources
- Lake: S3/GCS with Iceberg or Delta
- Transformation: dbt with advanced patterns — incremental models, snapshots, contracts, mesh
- Warehouse: Snowflake or Databricks per client preference; sometimes BigQuery for GCP-native
- Real-time serving: ClickHouse for user-facing analytics
- Orchestration: Dagster or Airflow per team familiarity
- Catalog + governance: Unity Catalog if Databricks; Atlan or DataHub otherwise
- Reverse-ETL: Hightouch or Census
The specific tool choices matter less than the architecture pattern: federated operational systems, consolidated analytical layer, governance from day one.
The thing nobody tells you#
The hardest part of enterprise data consolidation isn’t technical. It’s politics. The legacy systems have owners who built their careers on them. The “we don’t trust the new platform” backlash is real. The third-party vendor whose connector you’re replacing has a long lunch with your CFO.
The platform that survives these is the one that delivers visible value early and often, and earns its territory rather than declaring it.
The pattern of patterns#
Enterprise data consolidation in 2026 is a 12–24 month organizational program with technology in the supporting role. The technology stack matters; the sequencing, governance, and political navigation matter more.
The enterprises that consolidate successfully aren’t the ones with the most modern stack. They’re the ones who picked the right scope (analytics, not operations), delivered value every quarter, built governance in early, and let business units migrate on their own timeline rather than forcing it.
Enterprise data consolidation is a long game played one quarter at a time. If you’re scoping a multi-year platform consolidation and want a sane sequence, our data engineering service has shipped this for enterprises across multiple verticals. Tell us about the org.