AI for Rare Disease Diagnosis

A rare disease is not rare to the person who has one. Collectively, roughly 30 million Americans — about one in ten people — live with one of more than 7,000 recognized rare diseases, and the path to a diagnosis is notoriously brutal: an average of five to seven years and, by some accounts, up to a dozen specialists before a name is put to the condition. That stretch has a name of its own — the diagnostic odyssey — and it is the single problem AI is best positioned to attack in clinical genetics.

The reason is structural. The odyssey is long not because the answer is unknowable but because the answer is buried. The causal variant usually already sits in the patient’s sequencing data; the matching syndrome is usually already described in the literature. The failure is one of search and synthesis across too much data and too little time. That is a problem machines are good at — provided you respect what makes rare disease genuinely hard.

Two signals: the phenotype and the genome#

Diagnosis in rare disease is a join between what the patient looks like clinically (the phenotype) and what their genome says. AI has made real progress on both sides, and the interesting systems combine them.

Next-generation phenotyping#

On the phenotype side, the workhorse is next-generation phenotyping — extracting structured clinical signal, often from a facial photograph, because many genetic syndromes produce recognizable patterns of facial morphology. Face2Gene uses deep learning on facial morphology to suggest syndromic diagnoses, functioning as a digital aid to the dysmorphologist rather than a replacement for one.

The more architecturally interesting tool is GestaltMatcher, published in Nature Genetics. It encodes patient portraits with a deep convolutional network into a “Clinical Face Phenotype Space,” where distance between cases measures syndromic similarity. The design solves the central data problem of rare disease head-on: because it compares cases in an embedding space rather than classifying into a fixed set of trained labels, it can match a patient to others with the same molecular diagnosis even when that disorder was not in the training set. That property — generalizing to disorders you have never seen enough of to train on — is exactly what rare disease demands. It also lets undiagnosed patients across institutions, including networks like the NIH Undiagnosed Diseases Network, be matched to each other, which is how genuinely new phenotypes get delineated.

Benchtop DNA sequencer with a loaded flow cell glowing faint blue

Variant prioritization#

On the genome side, the problem is the opposite of scarcity: a whole-exome sequence yields tens of thousands of variants, and exactly one (or a small handful) is causal. Sorting that haystack is variant prioritization, and the established approach is phenotype-driven.

Exomiser is the open-source standard here. It takes the patient’s variants plus their phenotype encoded as Human Phenotype Ontology (HPO) terms and ranks candidates by integrating allele frequency, predicted pathogenicity, and gene-to-phenotype associations. The HPO encoding is the quiet hero: it turns a clinician’s free-text observations into a structured vector that an algorithm can reason over. Recent benchmarking work has produced concrete recommendations for tuning Exomiser and Genomiser — how the choice of pathogenicity predictor, the quality and quantity of phenotype terms, and the inclusion of family variant data each move the diagnostic yield. The lesson for engineers: the model is fixed and open; the leverage is in the inputs and configuration.

The frontier is agentic. Deep Agentic Variant Prioritisation (DAVP) frames prioritization as a hierarchical system of AI agents that evaluate variants patient-by-patient, aiming for expert-level genetic diagnosis at scale. And work combining genotype and phenotype signals to lift overall diagnostic yield confirms the recurring theme: neither modality alone is enough, and the gain comes from fusing them.

The data-scarcity problem nobody can engineer away#

Here is the uncomfortable core. The defining characteristic of rare disease — that each disorder has very few patients — is fundamentally hostile to the way machine learning works. A standard supervised classifier needs many labeled examples per class. For an ultra-rare disorder you may have a handful of cases worldwide. You cannot train your way out of that.

This is why the best rare-disease AI does not look like conventional classification. It uses three strategies instead.

Similarity over classification. GestaltMatcher’s embedding-space approach is the template: learn a general representation, then compare distances rather than predict from a fixed label set. New disorders cost nothing to “add” because there is nothing to retrain.

Knowledge over data. Exomiser leans on curated biological knowledge — ontologies, gene-disease databases, model-organism phenotypes — to compensate for the absence of training examples. The intelligence is largely encoded in structured prior knowledge, not learned from scratch.

Pooling over isolation. A single hospital will never see enough cases. Federated matching networks let institutions compare cases without centralizing raw patient data, which is both a privacy requirement and the only way to assemble enough signal.

Any vendor promising a rare-disease model trained end-to-end on a proprietary dataset is either working on a handful of the more common “rare” conditions or misrepresenting what is possible.

Clinical geneticist workstation with genome variant tables and pedigree charts

Why the odyssey persists even with good tools#

It is worth being precise about why patients still wait years when the tools above exist. Part of it is access: comprehensive sequencing is not ordered early, and next-generation phenotyping is not in routine primary-care workflows. Part of it is fragmentation: a patient’s records sit across multiple institutions, so no single clinician ever sees the full picture that would trigger a genetics referral. And part of it is that the negative result is not the end — many patients reach the end of standard testing without an answer, and their data has to be periodically re-analyzed as the literature grows. A review of AI for rare-disease diagnosis and therapy frames this re-analysis loop as a structural opportunity: a variant that was “of uncertain significance” two years ago may be reclassified today, and an automated pipeline can re-run prioritization against the current knowledge base without a clinician remembering to. That turns diagnosis from a one-shot event into a standing process — which is exactly the kind of Operational Automation that suits a machine and exhausts a human.

What it takes to deploy this in a real health system#

The published tools work. Getting them in front of clinicians at the right moment is the unglamorous part, and it is where most of the engineering effort goes.

Phenotype capture has to be structured at the source. Exomiser is only as good as its HPO terms, and HPO terms only exist if someone captures them. In most settings the clinical observations live as free text in the Hospital Management System. Turning that narrative into clean HPO codes — at the point of care, without adding clinician burden — is a natural-language and workflow problem, and it gates everything downstream. A School ERP captures structured attendance because the form demands it; clinical phenotyping needs the same discipline applied to dysmorphology and history.

The pipeline must reconcile genomic and clinical data. Sequencing comes back from a lab as a VCF; the phenotype lives in the record; the family history lives somewhere else again. Joining them on a stable patient identity, with consent and provenance tracked, is the same Data Platforms problem that shows up everywhere in clinical AI — and it is the actual bottleneck, not the ranking algorithm.

Outputs are a ranked differential, not a verdict. Every responsible tool here produces candidates for a clinical geneticist to confirm, typically with orthogonal validation. The framing is decision-support and triage: get the right cases to the right specialist faster, and shorten the list they have to consider. Autonomous diagnosis is neither the goal nor permitted — a prioritized variant is a hypothesis that still requires expert interpretation and, often, functional confirmation.

Regulation and equity are not afterthoughts. Phenotyping models built largely on one population can underperform on others, and facial-analysis tools carry obvious sensitivity. Validation across ancestries, careful consent, and conservative claims are conditions of doing this work responsibly, not optional polish.

The honest position#

AI will not “solve” rare disease, and the odyssey will not vanish. But the compression is real and already happening: phenotype-driven prioritization and embedding-based matching can take a case from years of serial specialist referrals to a ranked shortlist a geneticist can act on quickly. The wins come from respecting the data-scarcity constraint — similarity over classification, knowledge over brute-force training, pooled signal over institutional silos — and from doing the integration work that gets structured phenotypes and clean genomic data into the same pipeline.

The algorithms are largely open and validated. The differentiator is the AI implementation discipline around them: capturing phenotype at the source, reconciling it with the genome, and putting a defensible, ranked differential in front of the clinician who still makes the call. Shave even a year or two off a seven-year odyssey, across thousands of patients, and the impact is measured in lives, not benchmarks.

Shortening the odyssey takes more than a model. We build the phenotyping capture and genomic pipelines that make rare-disease AI usable in production. Talk to our engineering team.

Two signals: the phenotype and the genome#

Next-generation phenotyping#

Variant prioritization#

The data-scarcity problem nobody can engineer away#

Why the odyssey persists even with good tools#

What it takes to deploy this in a real health system#

The honest position#

Related posts.

Multimodal AI Diagnosis: Fusing Scans, Notes, and Labs

AI Precision Oncology: Matching Tumors to Therapies

DNA Foundation Models: Reading the Genome with AI