Fivetran vs Airbyte vs Custom ELT

The classic data-engineering Buy-vs-Build conversation has a clean modern shape: do you pay Fivetran, run Airbyte, or write your own connectors? Each has a legitimate case. Each has a failure mode that justifies the others’ existence.

We’ve made this call across hospital data interoperability projects, banking analytics, and supply chain platforms — and we’ve migrated in all directions (Fivetran → Airbyte, custom → Fivetran, Airbyte → custom). Here’s how the decision actually plays out.

The three options#

Fivetran is managed ELT. You point it at a source (Salesforce, Stripe, Postgres CDC, etc.), point it at a destination (Snowflake, BigQuery), and Fivetran moves the data with maintained connectors. You pay per Monthly Active Row.
Airbyte is open-source ELT with a similar connector model. Self-host (free, but you operate it) or Airbyte Cloud (managed, charged differently).
Custom ELT is what your data engineers write — a Python or Go service per source, owned by you, deployed however you deploy other services.

What you’re actually paying for#

Fivetran sells you four things:

Connector maintenance. When Stripe ships a new API version, Fivetran’s team updates the connector. You don’t notice.
Schema evolution. When the source adds a column, Fivetran detects it and adds it to your destination.
Reliability. Failed syncs retry. Alerts fire. The SLA is real.
Time. You don’t have an engineer building or maintaining connectors.

Airbyte (self-hosted) sells you most of #2 and #3 for free, and you pay in:

Operational time. Worker pool, metadata DB, periodic version upgrades.
Connector quality. Airbyte has 300+ connectors; quality varies wildly. The popular ones (Postgres, Salesforce, Stripe) are solid. The long-tail ones are sometimes “works in the happy path.”
The occasional connector PR. When you hit a bug in a connector for a source nobody else uses much, you may end up writing the fix.

Airbyte Cloud is closer to Fivetran’s value prop with Airbyte’s connector base. Pricing has gotten closer to Fivetran’s over time.

Custom ELT is you owning everything — connector code, schema handling, reliability, monitoring. The upside: total control, no vendor cost.

Where Fivetran wins decisively#

Long-tail SaaS sources. Salesforce, HubSpot, Stripe, NetSuite, Workday, Marketo, Jira. The connectors are mature, maintained, and reliable. Building one of these yourself is a multi-month project; using Fivetran is a half-day.

Schema evolution on weird sources. When a Salesforce admin adds a custom field, Fivetran catches it. When a NetSuite SuiteScript changes a calculation, Fivetran handles it. Doing this yourself for 20+ sources is a full-time job.

Compliance reviews. Fivetran has SOC 2 / HIPAA / GDPR audits already done. Easier than convincing your security team that your custom service handles the same data correctly.

Teams that don’t have data engineers. If your “data team” is one analyst, Fivetran is the only viable option. Hiring a data engineer to maintain custom connectors costs more than the Fivetran bill for years.

Where Fivetran hurts#

The bill at scale. Fivetran’s pricing (per Monthly Active Row, with significant changes through 2024-2025 around Free Plan tiers and pricing model adjustments) gets expensive when you have high-volume sources. Postgres CDC on a high-write transactional database can land you a $50k+/year bill for one source. At that point, custom CDC starts looking cheap.

Loss of control over latency. Fivetran’s sync frequencies are tied to your plan tier. If you need 5-minute freshness on a source that Fivetran defaults to hourly, you’re upgrading your plan.

Custom sources. Your internal API, your weird vendor that doesn’t have a Fivetran connector, your CSV-from-FTP pattern. Fivetran is great when there’s a connector; less great when you need to build one. (They support custom connectors but the experience is rougher than the maintained ones.)

Transformation philosophy. Fivetran has historically wanted to be ELT — extract and load, then you do transforms downstream with dbt. They’ve added Transformations but it’s not their strength. If your team wants in-pipeline transformations, you’ll fight the tool.

Where Airbyte wins#

Cost at scale. Self-hosted Airbyte’s cost is your infrastructure cost. For a high-volume Postgres CDC source that would be $5k/month on Fivetran, Airbyte might cost $200/month in EC2.

Custom connectors. Airbyte’s connector development kit (CDK) is well-documented. Writing a custom connector for your internal API is a 1-2 day project.

Data residency. Self-hosted Airbyte means all data stays in your network. Good for healthcare, finance, and government workloads where SaaS data movement is a non-starter.

Open-source license. Apache 2.0. No vendor lock-in beyond the data model itself. You could fork the project tomorrow if you needed to.

Where Airbyte hurts#

Operational surface. Airbyte is a real distributed system — orchestrator, worker pool, source/destination database, scheduler. Plan for the ops time.

Connector reliability variance. The top 30 connectors are excellent. The next 200 vary from “good” to “unmaintained for 18 months.” Test your specific source before committing.

Upgrade friction. Major version upgrades have historically been bumpy. Pin versions, test upgrades in staging, and don’t be the team that runs latest.

Smaller community than Fivetran’s user base. When you hit a weird issue, fewer Stack Overflow answers exist.

Where custom wins#

One or two high-volume sources you care about. Postgres CDC, Kafka stream consumption, your internal product database. These are well-understood patterns; the code is small; you control everything. Custom for one source can be 200 lines of Python or Go.

Sources with bespoke logic. When extraction requires meaningful transformation logic that doesn’t fit ELT — joining across multiple endpoints, deduplication with custom keys, schema flattening — custom code is more honest than wrestling Airbyte’s CDK.

Cost-extreme cases. When the Fivetran or Airbyte bill is more than the salary of the person who’d write the custom code. Rare but real.

Hard latency SLAs. Sub-minute freshness on a specific source. Both Fivetran and Airbyte do it, but custom code lets you tune everything.

Where custom hurts#

Long-tail SaaS. Building a Salesforce connector from scratch is months of work. Don’t do it.

The maintenance treadmill. Every API change, every schema migration, every flaky third party — your team owns it. Multiply by N sources.

Reinventing schema evolution. Detecting added columns, type changes, dropped fields — all of this needs to be designed and tested. Airbyte and Fivetran already solved it.

The decision tree we use#

Q: How many sources?
├─ 1-3 → Custom (especially if high-volume)
└─ 4+ → ELT tool

Q: What kinds of sources?
├─ Mostly long-tail SaaS → Fivetran
├─ Mostly databases / event streams → either; custom for high-volume
└─ Mix → Airbyte or Fivetran

Q: Data residency requirements?
├─ Must stay in-network → Airbyte self-hosted or custom
└─ Cloud SaaS is fine → Fivetran or Airbyte Cloud

Q: Team capacity?
├─ No data engineers → Fivetran
├─ Small data team → Airbyte Cloud or Fivetran
└─ Mature data team → any; lean Airbyte self-host for cost

Q: Budget?
├─ $50k+/yr is fine → Fivetran is the easy answer
└─ Need to optimize → Airbyte self-host + custom for hot sources

Patterns we’ve seen work#

Hybrid: Fivetran for long-tail + custom for hot sources. Most of our larger clients land here. Fivetran handles Salesforce / Stripe / HubSpot. A small set of custom Python connectors handles the 2-3 high-volume internal sources. Best of both worlds; cost reasonable.

Airbyte self-host for everything, custom for hot sources. Cost-sensitive teams with engineering capacity. Operational overhead is real but manageable.

Pure Fivetran. Teams that value time over money. Works great, you just write a check.

Pure custom. Rare and usually wrong. We’ve seen it work for teams with very few sources (under 5) and strict data-residency requirements. Most teams that go custom-for-everything regret it by month 18.

What we deploy by default#

For a new operational data platform — like a hospital management system backend or a logistics analytics platform — our default is:

Postgres CDC via Debezium → Kafka for the operational database (custom, well-understood pattern).
Airbyte (self-hosted on the same Kubernetes cluster as the rest of the platform) for SaaS sources and secondary databases.
Custom connectors for 1-2 sources that have weird shapes, hard latency requirements, or volumes that make Airbyte’s overhead unattractive.

We move to Fivetran when the client has an existing Fivetran contract, no appetite for Airbyte ops, or a budget that makes the convenience cost-effective.

The hidden cost of “buy”#

The Fivetran cost on paper is the bill. The real cost includes:

Vendor risk. Fivetran’s pricing model has changed several times. Plan for it changing again.
Schema drift handling. Fivetran detects new columns; you still need to update downstream dbt models that depend on them. The detection is the easy part.
The “Fivetran says it ran” moment. Sometimes Fivetran reports success but data didn’t actually arrive correctly. Reconciliation tooling is your job.
Cost growth. As your business grows, MAR grows. The bill grows non-linearly with success.

The hidden cost of custom is the engineering treadmill. The hidden cost of Airbyte is the ops surface. There’s no free option.

The pattern of patterns#

Pick the tool that minimizes the kind of work your team is bad at. If your team is bad at writing connectors, buy. If your team is bad at sustained ops, buy managed. If your team is genuinely good at both and the bill matters, build.

The wrong answer is “always one tool for everything.” Most mature data platforms use a mix — managed ELT for long-tail SaaS, self-hosted for cost-sensitive volumes, custom for the handful of sources where it actually matters. For where the ELT layer sits inside a larger multi-system consolidation, see our enterprise data platform consolidation playbook.

The ELT tool is a Buy-vs-Build question with hidden costs on both sides. If you’re sizing the call for a new data platform, our data engineering team has migrated in every direction and has scars to prove it. Tell us about your sources.