dbt Beyond the Basics: Incremental Models, Snapshots, Contracts, and Mesh
dbt scales when you stop writing every model as a full refresh. The advanced patterns we use in production — and the ones we deliberately avoid.
dbt is one of those rare tools that’s easy to start with and hard to use well at scale. Most teams adopt it for the model-as-SQL ergonomics, and within a year they’re running 300+ models that take an hour to refresh because everything is materialized='table'. The advanced features — incrementals, snapshots, contracts, mesh — are what take dbt from “nice DSL for SQL” to “the backbone of an operational data platform.”
Here are the patterns we deploy by default, and the ones we deliberately don’t.
Incremental models, done right#
The single biggest performance win in dbt is moving large fact tables from table to incremental materialization. Run the full table once, then only process new/changed rows on subsequent runs.
The right shape:
{{
config(
materialized='incremental',
unique_key='order_id',
on_schema_change='append_new_columns',
incremental_strategy='merge'
)
}}
select
order_id,
customer_id,
amount,
updated_at
from {{ source('raw', 'orders') }}
{% if is_incremental() %}
where updated_at > (select coalesce(max(updated_at), '1900-01-01') from {{ this }})
{% endif %}
Three things that matter here:
unique_keywithmergestrategy. Don’t use the defaultappendfor anything that can change. Late-arriving updates will dupe.- The
is_incremental()guard. Full refresh on first run, incremental after. on_schema_change='append_new_columns'. When source adds a column, dbt adds it to the destination without manual intervention. Better than the defaultignore.
What we avoid: complex incremental logic that filters on application timestamps that can be backdated. If a row’s updated_at can be set to a value older than the previous run’s high-water mark, your incremental misses it forever. Add a dbt_loaded_at column or use a CDC source.
Snapshots: track changes you wish you’d captured#
Snapshots are dbt’s answer to slowly-changing dimensions. They write history to a table — for every row, you get a dbt_valid_from / dbt_valid_to range.
When we use them: any dimension that changes meaningfully over time and that downstream analysis cares about historical values — customer pricing tier, employee department, product status. Without snapshots, “what was this customer’s plan when they churned?” becomes unanswerable.
{% snapshot customer_plan_history %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='customer_id',
strategy='check',
check_cols=['plan_tier', 'status', 'pricing_modifier']
)
}}
select * from {{ ref('customer_dim') }}
{% endsnapshot %}
We use check strategy (track changes on specific columns) over timestamp strategy (track changes via updated_at) — check is more robust to broken source timestamps.
What we avoid: snapshotting fact tables. Snapshots are for dimensions. A snapshot of a 10B-row events table is an operational nightmare.
Contracts: catch schema-breaking changes before they break dashboards#
Contracts (introduced in dbt 1.5) let you declare a model’s output schema as part of its config. dbt validates the model’s actual output matches at run time.
models:
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: customer_id
data_type: bigint
constraints:
- type: not_null
- type: primary_key
- name: email
data_type: varchar(255)
- name: created_at
data_type: timestamp
If a refactor accidentally drops a column or changes a type, the model run fails — instead of silently breaking five downstream dashboards.
We add contracts to models that are: (a) consumed by other models in different projects (dbt-mesh), or (b) consumed by BI tools / reverse ETL with hardcoded column expectations. Internal staging models don’t need contracts.
dbt Mesh: multi-project dbt at scale#
Once your project hits 200+ models and multiple teams own different parts, the monolith starts to creak. dbt-mesh (1.6+) lets you split into multiple projects with explicit ref() across project boundaries.
The pattern:
- Platform project owns shared dimensions:
dim_customers,dim_products,dim_dates. Marks them asaccess: public. - Domain projects (marketing, finance, ops) consume the platform’s public models via
{{ ref('platform', 'dim_customers') }}. - Versioning on the platform models means consumers pin to a version and aren’t broken by upstream changes.
This is the right shape once you have multiple data teams. For a single team with a small project, mesh is overkill — keep it as a monolith.
What we deliberately don’t use#
ephemeral materialization. Looks elegant in theory (“a model that doesn’t materialize, just inlines into consumers”). In practice: makes debugging painful, sometimes causes the consuming query to exceed the warehouse’s compile limits. Use view or table instead.
Heavy use of macros. A macro for every repeated pattern looks DRY but obscures what’s actually running. Generally we limit macros to: cross-database compatibility ({{ dbt_utils.generate_surrogate_key(...) }}), date spines, and one-off transformations that genuinely repeat 5+ times. A macro used twice is just hiding the SQL.
Custom tests for everything. dbt’s built-in tests (not_null, unique, accepted_values, relationships) cover 80%. dbt_utils and dbt_expectations add another 15%. Custom tests are for the last 5% where domain-specific business rules need codifying.
pre_hook / post_hook for arbitrary SQL. Tempting for “before running this model, vacuum this table.” Becomes a hidden dependency. Pull it out into a macro that’s explicitly called.
var() and env_var() over-use. Some configurability is good. A model with 12 vars affecting its behavior is a feature flag system masquerading as a SQL file.
The orchestration question#
dbt Cloud or your own scheduler? We default to:
- Small projects (≤50 models, single team): dbt Cloud. The IDE, scheduler, and lineage UI pay off immediately.
- Medium projects: dbt Core via Airflow / Dagster (our orchestrator piece) on your own infrastructure. Cheaper at scale; integrates with the rest of your data platform.
- Large multi-team projects: dbt Core + Dagster for the strongest dev experience + mesh-aware lineage.
Dagster’s dbt integration treats every dbt model as a first-class asset, which is genuinely better than Cloud’s scheduler for complex setups.
What we deploy by default#
For a new operational data platform:
- dbt Core with a structured project layout (
staging/,intermediate/,marts/). - Incremental materialization on any fact table over 10M rows.
- Snapshots on key dimensions where history matters.
- Contracts on models that cross project or team boundaries.
- dbt tests in CI on every PR — failing tests block merge.
- Dagster (or sometimes Airflow if the org standardizes there) for orchestration.
- Mesh introduced only when there’s a real multi-team boundary to enforce.
For hospital management systems and banking platforms we ship, the dbt project is the operational analytical layer — driving dashboards, alerts, and downstream automation. See our piece on the data stack as an operational engine.
The pattern of patterns#
dbt rewards teams that treat it like a real software project: tests in CI, version control discipline, layer separation, deliberate use of advanced features. It punishes teams that treat it as “a place to dump SQL queries.”
The advanced features (incrementals, snapshots, contracts, mesh) aren’t there to be used everywhere. They’re there for the specific situations where they earn their keep. The teams that ship dbt well aren’t the ones with the most macros — they’re the ones who know which features to skip. For where dbt sits in a larger multi-system stack, see our enterprise data platform consolidation playbook.
dbt is a software project, not a SQL drawer. If you’re building a data platform and want a second opinion on the dbt structure, our data engineering team deploys dbt routinely. Tell us about the corpus.