Real Estate Operations Data Platforms: From Yardi to a Portfolio Dashboard

Real estate operators are drowning in Yardi exports and Excel pro formas. The data platform pattern that gives operators portfolio-level truth in real.

Real Estate Operations Data Platforms: From Yardi to a Portfolio Dashboard

A real estate operator with 50 buildings has 50 sources of operational truth — and a CFO who wants one. The data is somewhere: Yardi for property management, separate building-management systems for HVAC and energy, MRI or RealPage at the larger end, Argus for valuation, custom Excel pro formas for development pipelines, Procore for active construction projects, sometimes a tenant CRM in HubSpot or Salesforce. Each system has its own data model, its own export pattern, its own user base. The portfolio-level “how are we doing this month” question takes 2 weeks and an Excel hero to answer.

This post is the data platform pattern we deploy for real estate operators — whether they own 20 buildings or 2,000.

What real estate operators are actually asking for#

Across REITs, developers, and family-office operators we’ve worked with, the question shape is consistent. They want answers to:

  • Same-store NOI growth by region, by asset class, this quarter vs last
  • Occupancy and lease pipeline across the portfolio, with renewal risk flagged
  • Capital projects status — what’s in design, in construction, in lease-up
  • Operating cost variance — actual vs budget by category, by property, with anomaly detection
  • Energy + sustainability metrics — kWh, water, emissions by building, trend
  • Tenant health signals — payment patterns, complaints, renewals likelihood

None of these are exotic asks. All of them are hard to answer because the data lives in 6+ systems that don’t talk to each other.

The pattern: portfolio data platform on top of the systems you have#

The mistake real estate operators make is trying to consolidate the operational systems. “Let’s move everyone off Yardi to MRI” — multi-year painful migration, $10M+ project, doesn’t actually solve the analytics problem. The operational system migration solves an operational pain (Yardi UX, MRI feature gap, whatever) but leaves the cross-system analytics gap untouched.

The pattern that works (same principle as enterprise data platform consolidation):

Leave the operational systems where they are. Build the analytical layer that unifies them.

What that looks like at architectural level:

  1. Ingestion: pull from Yardi (or MRI / RealPage), Argus, Procore, BMS systems, CRM, custom Excel
  2. Lake / warehouse: land everything in S3 + Iceberg, model in a warehouse (Snowflake or Databricks)
  3. Transformation: dbt to model property-level + portfolio-level facts, semantic layer for business definitions (NOI, occupancy, etc.)
  4. Serve: BI tool (Power BI / Tableau / Looker) for executives + property managers
  5. Real-time slices: ClickHouse for property-manager dashboards that need sub-second response
  6. Governance: data catalog, role-based access, audit logs

This is the same pattern that works for enterprise data consolidation, applied to real estate operational systems.

The systems and how to integrate them#

Yardi (and MRI, RealPage)#

Yardi is the dominant property management platform. Most operators have Yardi Voyager (the core PM platform) or one of its modules (Investment Suite, Construction Manager, etc.).

Integration options:

  • Yardi Web Services: SOAP-based API. Functional but old. Rate-limited. Some operators have access; some don’t.
  • Yardi Direct database access: works if you self-host Yardi or have a hosting agreement that permits DB access. Faster, more flexible. Requires careful schema reverse-engineering.
  • Scheduled exports: every operator’s fallback. Yardi exports to CSV / Excel on a schedule; you ingest from SFTP. Reliable but lagged.

MRI and RealPage have similar shapes: APIs that are functional, often with the same “we recommend going through our reporting product instead” friction. The pattern we use most: nightly scheduled exports + selective API calls for real-time data that matters (occupancy changes, work orders).

Argus (valuation modeling)#

Argus is the standard for commercial real estate valuation. Most teams have Argus Enterprise (the SaaS version) or Argus Standalone.

Integration: AE has APIs; standalone is mostly file-based exports. The valuation model outputs (cash flow projections, sensitivity analyses, hold scenarios) typically get extracted to Excel and then ingested. Modeling the property-level cash flow as a fact table joined with the operational data is where the portfolio-level “what-if” analysis becomes possible.

Building management systems (HVAC, energy, security)#

Modern BMS (Siemens Desigo, Honeywell, Schneider EcoStruxure, Distech) expose data via BACnet, MQTT, or vendor APIs. The newer cloud-connected systems are easier; older systems need on-prem gateways.

Integration pattern: deploy a Kafka or MQTT broker in the building / building cluster, stream meter and operational data, ingest into the data platform. Energy/sustainability dashboards are the most common downstream use case.

Procore / Autodesk (for the construction pipeline)#

For operators with active development or major capital projects, Procore data flows into the platform exactly as for general contractors: project status, budget, schedule, change orders. The cross-system question is “how is our development pipeline performing, and what’s the cap-ex outlook for the operating portfolio.”

Tenant CRM (HubSpot, Salesforce)#

For operators with active leasing teams, the CRM holds lease pipeline data. Lease prospect → tour → application → lease signed flows live here. Integration is usually one of the easier ones — modern CRMs have great APIs.

Custom Excel (the elephant in the room)#

Every real estate operator has 20-50 Excel files that hold business-critical data:

  • Development pro formas (cash flow models for projects in design)
  • Lease abstracts (manual extraction of lease terms before the lease hits Yardi)
  • Capital allocation models
  • Acquisition underwriting models

These don’t go away. The data platform has to ingest from them — either via SharePoint sync + parsing, or by replacing them with a structured app where the volume justifies it.

The dbt model structure#

For the analytical layer, our default dbt project structure for real estate:

models/
  staging/             # raw → cleaned per source system
    stg_yardi__*       # Yardi staging models
    stg_argus__*
    stg_bms__*
    stg_crm__*
    stg_excel__*
  intermediate/         # joined / enriched concepts
    int_property_monthly_financial
    int_lease_pipeline
    int_capital_project_status
  marts/               # business-facing models
    fct_portfolio_noi    # the main fact table for portfolio NOI
    fct_property_monthly
    fct_lease_lifecycle
    fct_capex_status
    dim_property         # property dimension with all attributes
    dim_market           # market / submarket
    dim_tenant

The semantic layer (LookML, dbt Semantic Layer, or Cube) sits on top and exposes the business metrics (NOI, occupancy, same-store NOI growth) so BI users don’t redefine them every dashboard.

Real-time vs batch#

Most real estate analytics are fine in batch (nightly refresh). The exceptions:

  • Work order status for facilities managers — needs minute-level freshness
  • Building energy + alarm signals from BMS — needs second-level for active monitoring
  • Acquisition pipeline during active deal periods — needs same-day refresh

The pattern: warehouse for the main analytics layer (nightly), ClickHouse for the real-time slices, BMS streaming directly to a time-series store (TimescaleDB, InfluxDB, or ClickHouse) for the operational dashboards.

AI augmentation that actually works for real estate#

The use cases we’ve shipped where AI pays for itself within the first year:

  • Lease abstraction: Claude or GPT-5 via Bedrock reads PDFs of leases, extracts structured terms (rent schedules, options, expense recoveries), populates Yardi automatically. 30-minute lease abstracts become 30-second pre-fills + 5-minute human review.
  • Maintenance work order triage: tenant complaint emails get classified by urgency, routed to the right vendor, and auto-summarized for the facilities team.
  • Acquisition due diligence: bulk PDF review of leases, environmental reports, title docs — AI flags exceptions for human review instead of attorney teams reading everything.
  • Tenant communication drafting: rent increase notices, renewal proposals, default letters — drafted by AI, human-reviewed before send.

The economics: a portfolio doing 100+ lease abstracts a year and 1,000+ maintenance tickets sees ROI within 3-6 months on these.

What we deploy by default#

For new real estate data platform engagements (our data engineering service leads this):

  • Discovery (2-4 weeks): catalog every system, every data flow, every reporting pain point
  • Data warehouse setup: Snowflake or Databricks (operator choice) + S3 lake
  • Ingestion: Yardi (API or DB), Argus (file), BMS (Kafka), Procore (API), CRM (API), Excel (SharePoint)
  • dbt models: staging → intermediate → marts as outlined above
  • BI: Power BI or Tableau dashboards for executive + property-manager personas
  • Semantic layer: dbt Semantic Layer or LookML
  • Real-time: ClickHouse for property-manager and facilities-manager dashboards
  • AI augmentation: lease abstraction + work-order triage as the typical first two AI use cases

Typical timeline: 6 months to the first executive dashboard with real numbers; 12 months to the full platform with AI augmentation.

The thing operators underestimate#

The hard part of real estate data platform work isn’t technical. It’s data quality at the source.

A typical Yardi instance has 5-10% of properties with incomplete or inconsistent data (square footage off, wrong owner attribution, expense codes mismapped, GL accounts that don’t match other properties of the same type). The data platform surfaces all of these — and the property accounting team’s first reaction is “the dashboard is wrong” before realizing the dashboard is right and the underlying data is wrong.

This is good (you finally see the inconsistencies) but it’s an organizational moment. The platform exposes data quality debt that’s been accumulating for years. Plan for a data-cleanup workstream alongside the platform build, not after.

The pattern of patterns#

Real estate operations data platforms in 2026 follow the same architectural pattern as enterprise data consolidation: leave the operational systems where they are, build the analytical layer that unifies them, govern from day one. The specific source systems (Yardi vs MRI vs RealPage) matter less than the consistent shape of the integration work.

The operators who get the most value out of these platforms are the ones who commit to using the platform’s numbers for executive reporting from day one — even when the property-level numbers don’t agree with the old Excel reports. That’s where the source-system data quality cleanup actually happens.


Real estate data platforms are the operator’s path from ‘how are we doing this month’ taking 2 weeks to taking 2 minutes. If you’re scoping a portfolio analytics layer on top of Yardi, MRI, or RealPage, our data engineering team has shipped this for operators of all sizes. Tell us about the portfolio.