Weather Data Pipelines: ECMWF, NOAA, NWP Into Production

Operational weather data pipelines feed energy, agriculture, insurance, and logistics decisions daily.

Weather Data Pipelines: ECMWF, NOAA, NWP Into Production

Operational weather forecasts drive billions of dollars of decisions daily — power grid operators, airlines, agriculture, insurance, retail. Most teams that consume weather data treat it as a feed; the teams that build production pipelines extract meaningfully more value. The pipeline architecture is non-trivial.

What a credible operational weather data pipeline looks like.

The data sources#

ECMWF (European Centre for Medium-Range Weather Forecasts). The gold standard for medium-range forecasts. IFS (Integrated Forecast System) is the deterministic flagship; EPS ensemble for uncertainty. Commercial access via ECMWF’s tiers or commercial vendors.

NOAA GFS (Global Forecast System). Free, US-government model. Often paired with NAM, HRRR for regional/short-range forecasts.

Regional models. ICON (DWD, Germany), Met Office UM (UK), JMA models (Japan), etc. Specific accuracy benefits in each region.

Reanalysis datasets. ERA5 (ECMWF) is the workhorse for historical analysis. MERRA-2 for the US Government context.

Observations. METAR (airports), surface stations, satellite (GOES, MetOp, Himawari), radiosondes, radar (NEXRAD in US, others elsewhere).

Lightning detection. Vaisala GLD360, ENTLN, others.

Specialty. Hurricane track guidance, severe weather outlooks, climate indices (NAO, ENSO, etc.).

Most operational systems need 3–6 of these. The data engineering is the work.

The formats#

GRIB1 / GRIB2 dominate for model output. NetCDF is common in research. BUFR for observations.

Each format has its own pain points:

  • GRIB2 is operationally efficient but requires specialized tools (wgrib2, cfgrib)
  • NetCDF is rich but heavy
  • BUFR is operational-meteorology-native; rarely encountered outside that world

A production pipeline often ingests GRIB2, normalizes to NetCDF or Parquet, indexes spatially and temporally for query.

The pipeline architecture#

For credible operational weather pipelines we’ve built via our data engineering practice:

  • Ingestion layer. Pulls model output at the cadence each source publishes (hourly for short-range, 6-hourly for medium-range). Catches missed updates.
  • Normalization. GRIB → Parquet/Zarr/NetCDF; standardized variables and units; consistent coordinate systems.
  • Spatial indexing. Hierarchical resolution (H3, S2, or quad-tree); fast point/region lookup.
  • Storage tiers. Hot (last 30 days), warm (last year), cold (archive). Cost-managed.
  • Query API. Time-series for a point, gridded data for a region, ensemble statistics.
  • Downstream consumers. Specific applications (energy demand, agriculture, logistics) consume from the API, not raw model output.

Where ML earns its place#

Post-processing of raw model output. Statistical downscaling, MOS (Model Output Statistics) to correct model biases. Improves accuracy at specific sites.

Ensemble post-processing. Calibrated probabilistic forecasts from raw ensembles. Better uncertainty quantification.

Specialty forecasts. Fog, wind shear, icing, severe weather probability. Targeted ML on top of physical model outputs.

Foundation weather models. GraphCast, Pangu-Weather, FourCastNet, AIFS — see our AI weather forecasting notes. Real revolution; the pipeline serves these alongside classical models.

What we ship for weather-dependent industries#

For energy, agriculture, insurance, and logistics clients via our data engineering practice:

  • Multi-source weather ingestion (ECMWF + NOAA + regional)
  • Site-specific post-processed forecasts
  • Probabilistic forecast generation
  • Integration with the client’s operational decisions
  • Historical-pattern analysis for risk and planning

The licensing reality#

Operational weather data has licensing constraints worth knowing:

  • ECMWF requires licensing for operational use (free for research; not for production)
  • NOAA data is public-domain but high-resolution products may have restrictions
  • Commercial vendors (DTN, IBM/The Weather Company, AccuWeather, Climacell/Tomorrow.io) bundle and re-license

For most production users, a commercial vendor relationship simplifies licensing dramatically. For high-volume users, direct ECMWF access pays back.

The 2026 maturity#

Operational weather pipelines are mature; the data engineering is the work. The 2026 shift is AI weather models becoming first-class citizens alongside the classical NWP models.

For weather-dependent industries, building or improving the data pipeline is one of the highest-ROI infrastructure investments available.


Operational weather data is a data-engineering problem with high downstream leverage. Our team builds weather data pipelines for energy, agriculture, insurance, and logistics. Tell us about the use case.