Feature Engineering for Tabular 2026

LLMs absorbed a lot of oxygen. Meanwhile, tabular ML still drives credit decisions, fraud screening, inventory forecasting, churn prediction, and most of the boring high-value problems in the enterprise. XGBoost, LightGBM, CatBoost didn’t get worse because GPT-5 launched.

Feature engineering on this stack is where most teams leave 5–15 points of AUC on the table. The patterns that still earn their weight in 2026.

Aggregations that respect time#

Counting “transactions in the last 30 days” sounds trivial. Doing it without leakage is harder than most teams realize. Common mistakes:

Using future data because the join didn’t filter on the prediction time
Mixing labeled and unlabeled aggregation windows
Off-by-one on time-zone or daylight savings boundaries
Recomputing aggregates with code that differs from training to serving

A feature store (Feast or Tecton) helps here. Even without one, write the aggregation code once and call it from both training and serving.

Encodings beyond one-hot#

For high-cardinality categoricals (customer ID, ZIP, merchant ID), one-hot is usually wrong. Better patterns:

Target encoding with smoothing. Replace each category with the smoothed average target. Strong signal, prone to leakage if you forget to compute out-of-fold. CatBoost does this natively; for XGBoost/LightGBM, do it yourself with care.

Frequency encoding. Just the count. Cheap, often surprisingly useful.

Embedding from a small NN. Train a tiny network to embed the category into a dense vector; use the vector as input to your tree model. Higher ceiling, more pipeline complexity.

Time features that aren’t naive#

“Day of week” is a feature. “Is this a payday for this customer cohort” is a better feature. “Hours since last transaction” is often more useful than “transaction timestamp.”

Useful time derivations:

Time since previous event (per entity)
Time of day in entity’s local timezone
Calendar features (holiday, payday, fiscal close)
Velocity features (rate of change)

A timestamp by itself is a wasted column.

Interaction features#

Tree models can learn interactions, but they often need a hint at low signal levels. Manually constructed interactions that often help:

amount / customer_avg_amount (ratio to entity-specific baseline)
is_new_merchant_for_this_customer (boolean derived from history)
device_x_country_pair_seen_before (combined-key lookup)

Build the interactions that domain experts use mentally. Don’t build a Cartesian product of every column.

Leakage audits#

Before declaring a model ready:

Compute feature-target correlations on the training set; flag anything suspiciously high
For top-correlated features, manually trace the data lineage end to end
Run the same prediction code at training time and serving time; compare on the same row

Most “amazing eval scores” we audit turn out to be one leaked feature. Catch them in the audit, not in the postmortem.

When to stop#

A model with 50 features that scores 0.86 AUC and a model with 500 features that scores 0.87 are not equivalent. The second is harder to debug, slower to retrain, more expensive to serve, and more prone to drift. Each feature has a maintenance cost. Cut aggressively.

We often ship models with 20–80 features. Beyond that the marginal contribution rarely justifies the operational burden.

What’s actually different in 2026#

Two shifts worth noting:

LLM-generated feature suggestions. Tools that read your schema and propose features can save discovery time. Audit every suggestion — the suggestions are often plausible but redundant or leaky.

Foundation tabular models (TabPFN-V2, others) — these compete with tree models on small datasets and have non-trivial cost-quality tradeoffs. For most enterprise tabular work, boosting still wins on the production envelope.

What we ship by default#

For tabular ML engagements via our DevOps practice:

Aggregation code shared between training and serving
Feature store for high-cardinality encodings
Leakage audit before any deployment
Feature catalog with owner, source, refresh cadence
Model with the smallest feature set that hits the quality bar

Feature engineering isn’t outdated. It’s still the highest-leverage work on tabular projects.

Tabular ML didn’t go away — and feature engineering is still where most of the wins live. Our team ships production tabular models across finance, fraud, and operational workloads. Tell us about the task.

Aggregations that respect time#

Encodings beyond one-hot#

Time features that aren’t naive#

Interaction features#

Leakage audits#

When to stop#

What’s actually different in 2026#

What we ship by default#

Related posts.

Apache Kafka in Production: The Patterns That Avoid Operational Pain

AI Weather Forecasting: GraphCast, Pangu-Weather, FourCastNet Compared

Graph Databases in Production: Neo4j vs JanusGraph vs Memgraph in 2026