Feature Engineering for Tabular Models in 2026
Foundation models didn't kill tabular ML. The feature engineering patterns that still move metrics on structured business data.
LLMs absorbed a lot of oxygen. Meanwhile, tabular ML still drives credit decisions, fraud screening, inventory forecasting, churn prediction, and most of the boring high-value problems in the enterprise. XGBoost, LightGBM, CatBoost didn’t get worse because GPT-5 launched.
Feature engineering on this stack is where most teams leave 5–15 points of AUC on the table. The patterns that still earn their weight in 2026.
Aggregations that respect time#
Counting “transactions in the last 30 days” sounds trivial. Doing it without leakage is harder than most teams realize. Common mistakes:
- Using future data because the join didn’t filter on the prediction time
- Mixing labeled and unlabeled aggregation windows
- Off-by-one on time-zone or daylight savings boundaries
- Recomputing aggregates with code that differs from training to serving
A feature store (Feast or Tecton) helps here. Even without one, write the aggregation code once and call it from both training and serving.
Encodings beyond one-hot#
For high-cardinality categoricals (customer ID, ZIP, merchant ID), one-hot is usually wrong. Better patterns:
Target encoding with smoothing. Replace each category with the smoothed average target. Strong signal, prone to leakage if you forget to compute out-of-fold. CatBoost does this natively; for XGBoost/LightGBM, do it yourself with care.
Frequency encoding. Just the count. Cheap, often surprisingly useful.
Embedding from a small NN. Train a tiny network to embed the category into a dense vector; use the vector as input to your tree model. Higher ceiling, more pipeline complexity.
Time features that aren’t naive#
“Day of week” is a feature. “Is this a payday for this customer cohort” is a better feature. “Hours since last transaction” is often more useful than “transaction timestamp.”
Useful time derivations:
- Time since previous event (per entity)
- Time of day in entity’s local timezone
- Calendar features (holiday, payday, fiscal close)
- Velocity features (rate of change)
A timestamp by itself is a wasted column.
Interaction features#
Tree models can learn interactions, but they often need a hint at low signal levels. Manually constructed interactions that often help:
amount / customer_avg_amount(ratio to entity-specific baseline)is_new_merchant_for_this_customer(boolean derived from history)device_x_country_pair_seen_before(combined-key lookup)
Build the interactions that domain experts use mentally. Don’t build a Cartesian product of every column.
Leakage audits#
Before declaring a model ready:
- Compute feature-target correlations on the training set; flag anything suspiciously high
- For top-correlated features, manually trace the data lineage end to end
- Run the same prediction code at training time and serving time; compare on the same row
Most “amazing eval scores” we audit turn out to be one leaked feature. Catch them in the audit, not in the postmortem.
When to stop#
A model with 50 features that scores 0.86 AUC and a model with 500 features that scores 0.87 are not equivalent. The second is harder to debug, slower to retrain, more expensive to serve, and more prone to drift. Each feature has a maintenance cost. Cut aggressively.
We often ship models with 20–80 features. Beyond that the marginal contribution rarely justifies the operational burden.
What’s actually different in 2026#
Two shifts worth noting:
LLM-generated feature suggestions. Tools that read your schema and propose features can save discovery time. Audit every suggestion — the suggestions are often plausible but redundant or leaky.
Foundation tabular models (TabPFN-V2, others) — these compete with tree models on small datasets and have non-trivial cost-quality tradeoffs. For most enterprise tabular work, boosting still wins on the production envelope.
What we ship by default#
For tabular ML engagements via our DevOps practice:
- Aggregation code shared between training and serving
- Feature store for high-cardinality encodings
- Leakage audit before any deployment
- Feature catalog with owner, source, refresh cadence
- Model with the smallest feature set that hits the quality bar
Feature engineering isn’t outdated. It’s still the highest-leverage work on tabular projects.
Tabular ML didn’t go away — and feature engineering is still where most of the wins live. Our team ships production tabular models across finance, fraud, and operational workloads. Tell us about the task.