The Modern Data Stack is an Operational Engine, Not a Library
Data Platforms should drive Operational Automation, not just dashboards. Why ClickHouse + Airflow + dbt is our default high-performance engine.
Most data teams measure success in dashboards shipped. They shouldn’t. Dashboards are the read-only output of a system that should be doing far more interesting work.
The warehouse-as-archive antipattern
The dominant pattern of the last decade: pump every event into Snowflake, model it in dbt, expose it in Looker, hand to the BI team. Done.
This is the warehouse-as-archive antipattern. The data sits there. Analysts query it. Reports come out weekly. Nobody on the operational side of the business ever feels the warehouse move under them.
The warehouse is full of value, and almost none of it is captured. Because reading data is the easy part. The hard part — and the part that compounds — is making the data actually do something.
Closing the loop: data → decision → action
The pattern that turns a Data Platform into an operational engine looks like this:
- Ingest events from product, marketing, and operational systems into a warehouse or lakehouse.
- Transform in dbt into modeled tables. Test every model.
- Score / compute the operational signal in the warehouse — model output, threshold check, anomaly score.
- Sync the signal back to the operational tool where the work happens.
- Trigger the automated action — alert, email, ticket, workflow.
- Measure whether the action moved the metric you cared about.
Step 6 is the one most teams skip. Without it you have Operational Automation, not a feedback loop, and the system slowly drifts from useful to wrong.
Examples we’ve shipped
- A customer-health score computed in dbt, synced to Salesforce as a field. CSMs see “at risk” before churn, not after.
- A lead-quality score from a model running over the warehouse, written back to HubSpot. Sales works the right leads first.
- A patient no-show predictor wired into a Hospital Management System that triggers a reminder pipeline 48 hours before high-risk appointments. Recovered hours of clinician time per week.
- A student dropout-risk model wired into a School ERP that surfaces intervention candidates to the counsellor’s dashboard before grades fall.
In every case the warehouse stopped being an archive and started being an operational source of truth. That’s the inflection point where Data Platforms pay back their entire investment.
Our default high-performance stack
For most Data Platforms work, three tools punch above their weight:
ClickHouse for the read side
Sub-second analytical queries over billions of rows on hardware that costs a fraction of equivalent Snowflake spend. When latency matters — and for an operational system, it always does — ClickHouse wins. We use it for cohort analysis, real-time anomaly detection, and product analytics shipped inside customer-facing UIs, including Hospital Management dashboards and School ERP reports.
Airflow for orchestration
Yes, it’s old. Yes, the DAG syntax is awkward. But it has the broadest operator catalog, the most battle-tested retry semantics, and a community that has fixed every weird edge case you’ll hit at scale. Newer orchestrators (Dagster, Prefect) are excellent; Airflow is the safe choice that lets you focus on the pipeline, not the orchestrator.
dbt for the model layer
Tests on every model. Docs generated from the model layer. Semantic definitions that finance, ops, and product can agree on. dbt’s modeling discipline is what stops a warehouse from becoming a write-only swamp.
The combination — ClickHouse for fast queries, Airflow for orchestrated jobs, dbt for the model layer between them — is our default operational engine. Lean, scales, and still cheap at 10× the data volume.
Performance first, always
Operational data only works if it’s fresh and fast. A churn signal that arrives a week late is a postmortem, not an intervention. A model score that takes 30 seconds to compute can’t power a customer-facing feature, no matter how accurate.
Build for latency from the start:
- Set freshness SLAs per dataset and alert when missed.
- Profile every dbt model; the slowest 5% are usually 95% of the cost.
- Pick the right materialisation —
viewis rarely the answer;incrementalis rarely wrong. - Right-size your warehouse. The cheapest cluster that meets your latency SLA is the right cluster.
Data Platforms that don’t ship as operational engines end up as expensive reporting backends. The teams that get the most out of the modern data stack are the ones who realised, somewhere along the way, that the warehouse was supposed to do things, not just show them.
Your warehouse is full of value. Most of it is sitting still. If you want a Data Platform that actually drives Operational Automation, we’ll help you ship the loop. One focused engagement. Working automation by the end.