Quarterly Engineering Reviews: Metrics That Actually Matter

Most engineering reviews report velocity and bug count. The metrics that actually inform leadership decisions look very different.

Quarterly Engineering Reviews: Metrics That Actually Matter

Quarterly engineering reviews tend toward two failure modes: vanity metrics that flatter the team without informing anyone, or doom-list metrics that frustrate everyone without producing decisions. The reviews that actually inform leadership decisions look different from both — fewer metrics, sharper framing, more attention to the questions leaders are actually trying to answer.

This post walks through what we recommend for engineering directors and VPs preparing quarterly reviews.

The questions leadership actually has#

Before picking metrics, identify the questions. The typical executive audience wants to know:

  • Are we shipping the right things? Strategic alignment between engineering work and company objectives.
  • Are we shipping at a sustainable pace? Velocity over time, accounting for team size and complexity.
  • Are we building durable systems? Reliability, cost trajectory, technical debt.
  • Are we developing the team? Hiring, retention, leveling, learning.
  • What’s at risk? Specific concerns that could derail the next quarter or year.

Metrics that don’t address one of these questions are noise. Including them in the review dilutes the signal and trains executives not to read engineering reports carefully.

The DORA metrics, used carefully#

The four DORA metrics — deployment frequency, lead time for changes, change failure rate, mean time to recovery — are the strongest operational baseline. They measure something real, they’re hard to game without actually improving the underlying capability, and they correlate with the business outcomes leadership cares about.

The discipline is in measurement and interpretation. Deployment frequency without context is meaningless. A team that deploys 50 times a week to 200 microservices is in a very different position from one that deploys once a quarter to a regulated payments product. Lead time for changes has to account for the work the team actually controls — if your code spends two weeks in compliance review, your lead time isn’t really an engineering metric.

We typically report DORA quarterly with a trailing 90-day window, broken out by team or product. The trend matters more than the absolute number.

The metrics worth adding#

Beyond DORA, four metrics are worth tracking quarterly.

Engineering capacity utilization. What percentage of engineering hours went to planned work, incident response, support, and meetings? Most teams underestimate the non-build percentage and over-promise as a result.

On-call load. How many pages per week per engineer, distributed across the team? Concentrated on-call load is a leading indicator of burnout and attrition.

Cost per unit of work. Cloud costs divided by something the team controls — transactions processed, customers served, features shipped. This isn’t always meaningful, but when it is, it surfaces inefficiency that headline metrics miss.

Hiring and retention. Net headcount change, regrettable attrition rate, and time-to-productivity for new hires. These are slow signals but they predict the team’s future capacity.

The metrics to retire#

Some metrics persist in engineering reviews despite producing more harm than insight.

Velocity in story points. Story points are calibrated within a team; comparing them across teams or quarters is meaningless. The metric optimizes for inflated estimates rather than predictable delivery.

Lines of code or commit count. Both are negatively correlated with engineering quality at the individual level. Reporting them at the team level encourages exactly the behaviors you don’t want.

Bug count without severity. A team with 50 open low-priority cosmetic bugs is in much better shape than one with 5 open data-loss bugs. Aggregate counts mislead.

Test coverage percentage. High coverage doesn’t mean good tests; low coverage doesn’t necessarily mean inadequate tests. The metric is too easy to game and too loosely connected to quality.

The format that works#

Through multiple client engagements, we’ve converged on a quarterly review structure that’s typically 5-7 slides.

  • Slide 1: Strategic alignment — what we shipped, what we didn’t, why.
  • Slide 2: DORA metrics with trend lines.
  • Slide 3: Reliability and incidents — what broke, what we learned.
  • Slide 4: Cost trajectory and infrastructure changes.
  • Slide 5: Team health — capacity, on-call, hiring, attrition.
  • Slide 6: Risks and asks — what could derail next quarter, what we need from leadership.
  • Slide 7: Next quarter’s priorities — three to five things, ranked.

Anything longer typically dilutes the signal. The exec audience reads the first two slides carefully and skims the rest; the review’s value is in the questions it raises rather than the slides themselves.

Where pdpspectra fits#

We work with engineering leadership on metrics, reviews, and the broader engineering operating model. The architecture practice does this work alongside platform engineering.

Related reading: the developer productivity metrics post, the platform engineering post, and the SRE error budgets post.


Engineering reviews should produce decisions. Talk to our team about your engineering operating model.