dbt Advanced Patterns in Production

dbt is one of those rare tools that’s easy to start with and hard to use well at scale. Most teams adopt it for the model-as-SQL ergonomics, and within a year they’re running 300+ models that take an hour to refresh because everything is materialized='table'. The advanced features — incrementals, snapshots, contracts, mesh — are what take dbt from “nice DSL for SQL” to “the backbone of an operational data platform.”

Here are the patterns we deploy by default, and the ones we deliberately don’t.

Incremental models, done right#

The single biggest performance win in dbt is moving large fact tables from table to incremental materialization. Run the full table once, then only process new/changed rows on subsequent runs.

The right shape:

{{
  config(
    materialized='incremental',
    unique_key='order_id',
    on_schema_change='append_new_columns',
    incremental_strategy='merge'
  )
}}

select
  order_id,
  customer_id,
  amount,
  updated_at
from {{ source('raw', 'orders') }}

{% if is_incremental() %}
  where updated_at > (select coalesce(max(updated_at), '1900-01-01') from {{ this }})
{% endif %}

Three things that matter here:

unique_key with merge strategy. Don’t use the default append for anything that can change. Late-arriving updates will dupe.
The is_incremental() guard. Full refresh on first run, incremental after.
on_schema_change='append_new_columns'. When source adds a column, dbt adds it to the destination without manual intervention. Better than the default ignore.

What we avoid: complex incremental logic that filters on application timestamps that can be backdated. If a row’s updated_at can be set to a value older than the previous run’s high-water mark, your incremental misses it forever. Add a dbt_loaded_at column or use a CDC source.

Snapshots: track changes you wish you’d captured#

Snapshots are dbt’s answer to slowly-changing dimensions. They write history to a table — for every row, you get a dbt_valid_from / dbt_valid_to range.

When we use them: any dimension that changes meaningfully over time and that downstream analysis cares about historical values — customer pricing tier, employee department, product status. Without snapshots, “what was this customer’s plan when they churned?” becomes unanswerable.

{% snapshot customer_plan_history %}
{{
  config(
    target_database='analytics',
    target_schema='snapshots',
    unique_key='customer_id',
    strategy='check',
    check_cols=['plan_tier', 'status', 'pricing_modifier']
  )
}}
select * from {{ ref('customer_dim') }}
{% endsnapshot %}

We use check strategy (track changes on specific columns) over timestamp strategy (track changes via updated_at) — check is more robust to broken source timestamps.

What we avoid: snapshotting fact tables. Snapshots are for dimensions. A snapshot of a 10B-row events table is an operational nightmare.

Contracts: catch schema-breaking changes before they break dashboards#

Contracts (introduced in dbt 1.5) let you declare a model’s output schema as part of its config. dbt validates the model’s actual output matches at run time.

models:
  - name: dim_customers
    config:
      contract:
        enforced: true
    columns:
      - name: customer_id
        data_type: bigint
        constraints:
          - type: not_null
          - type: primary_key
      - name: email
        data_type: varchar(255)
      - name: created_at
        data_type: timestamp

If a refactor accidentally drops a column or changes a type, the model run fails — instead of silently breaking five downstream dashboards.

We add contracts to models that are: (a) consumed by other models in different projects (dbt-mesh), or (b) consumed by BI tools / reverse ETL with hardcoded column expectations. Internal staging models don’t need contracts.

dbt Mesh: multi-project dbt at scale#

Once your project hits 200+ models and multiple teams own different parts, the monolith starts to creak. dbt-mesh (1.6+) lets you split into multiple projects with explicit ref() across project boundaries.

The pattern:

Platform project owns shared dimensions: dim_customers, dim_products, dim_dates. Marks them as access: public.
Domain projects (marketing, finance, ops) consume the platform’s public models via {{ ref('platform', 'dim_customers') }}.
Versioning on the platform models means consumers pin to a version and aren’t broken by upstream changes.

This is the right shape once you have multiple data teams. For a single team with a small project, mesh is overkill — keep it as a monolith.

What we deliberately don’t use#

ephemeral materialization. Looks elegant in theory (“a model that doesn’t materialize, just inlines into consumers”). In practice: makes debugging painful, sometimes causes the consuming query to exceed the warehouse’s compile limits. Use view or table instead.

Heavy use of macros. A macro for every repeated pattern looks DRY but obscures what’s actually running. Generally we limit macros to: cross-database compatibility ({{ dbt_utils.generate_surrogate_key(...) }}), date spines, and one-off transformations that genuinely repeat 5+ times. A macro used twice is just hiding the SQL.

Custom tests for everything. dbt’s built-in tests (not_null, unique, accepted_values, relationships) cover 80%. dbt_utils and dbt_expectations add another 15%. Custom tests are for the last 5% where domain-specific business rules need codifying.

pre_hook / post_hook for arbitrary SQL. Tempting for “before running this model, vacuum this table.” Becomes a hidden dependency. Pull it out into a macro that’s explicitly called.

var() and env_var() over-use. Some configurability is good. A model with 12 vars affecting its behavior is a feature flag system masquerading as a SQL file.

The orchestration question#

dbt Cloud or your own scheduler? We default to:

Small projects (≤50 models, single team): dbt Cloud. The IDE, scheduler, and lineage UI pay off immediately.
Medium projects: dbt Core via Airflow / Dagster (our orchestrator piece) on your own infrastructure. Cheaper at scale; integrates with the rest of your data platform.
Large multi-team projects: dbt Core + Dagster for the strongest dev experience + mesh-aware lineage.

Dagster’s dbt integration treats every dbt model as a first-class asset, which is genuinely better than Cloud’s scheduler for complex setups.

What we deploy by default#

For a new operational data platform:

dbt Core with a structured project layout (staging/, intermediate/, marts/).
Incremental materialization on any fact table over 10M rows.
Snapshots on key dimensions where history matters.
Contracts on models that cross project or team boundaries.
dbt tests in CI on every PR — failing tests block merge.
Dagster (or sometimes Airflow if the org standardizes there) for orchestration.
Mesh introduced only when there’s a real multi-team boundary to enforce.

For hospital management systems and banking platforms we ship, the dbt project is the operational analytical layer — driving dashboards, alerts, and downstream automation. See our piece on the data stack as an operational engine.

The pattern of patterns#

dbt rewards teams that treat it like a real software project: tests in CI, version control discipline, layer separation, deliberate use of advanced features. It punishes teams that treat it as “a place to dump SQL queries.”

The advanced features (incrementals, snapshots, contracts, mesh) aren’t there to be used everywhere. They’re there for the specific situations where they earn their keep. The teams that ship dbt well aren’t the ones with the most macros — they’re the ones who know which features to skip. For where dbt sits in a larger multi-system stack, see our enterprise data platform consolidation playbook.

dbt is a software project, not a SQL drawer. If you’re building a data platform and want a second opinion on the dbt structure, our data engineering team deploys dbt routinely. Tell us about the corpus.

Incremental models, done right#

Snapshots: track changes you wish you’d captured#

Contracts: catch schema-breaking changes before they break dashboards#

dbt Mesh: multi-project dbt at scale#

What we deliberately don’t use#

The orchestration question#

What we deploy by default#

The pattern of patterns#

Related posts.

Enterprise Data Platform Consolidation: From 50 Systems to One Source of Truth

Operational Automation That Ships: Boring Tools Win

The AI IPO Race Is an Engineering Problem, Not a Stock Tip