Reference Genomes Are Not Enough: Graph Genomics

Jonathan Alles

EVOBYTE Digital Biology

Introduction: when a single reference hides the signal

If your pipeline still maps reads to a single linear reference like GRCh38 with a vanilla aligner, you’re leaving measurable accuracy on the table. A single “model genome” flattens diversity into one path, which quietly penalizes reads that carry alternate alleles, collapses complex haplotypes, and distorts allele balance in regions that matter for disease. Over the last few years, graph genomics and pangenome references have matured from intriguing research ideas into practical building blocks that reduce reference bias and lift both small-variant and structural-variant accuracy across populations. The Human Pangenome Reference Consortium’s (HPRC) draft pangenome showed this direction clearly, adding large amounts of polymorphic sequence and revealing variation that simply isn’t present in a single reference.

Why linear references bias variant calling

A linear reference is a single path through the genome. It works beautifully for conserved regions, yet it breaks down in segmental duplications, repeats, and loci with many haplotypes such as HLA or KIR. When an individual’s sequence deviates from the reference, read mappers accumulate soft clips and mismatches; reads can be mis-placed or down-weighted; alleles on the reference path get favored. That skews allele fractions, reduces sensitivity for insertions and complex indels, and creates false negatives in underrepresented ancestries. Moving to a pangenome replaces that single path with a structure that explicitly encodes alternatives, so reads don’t have to “pretend” the haplotype isn’t there. Comparative evaluations of pangenome-graph construction methods have consistently found improved mapping and genotyping in such difficult regions compared with linear references.

Meet the pangenome and its graph: the data structures behind the gains

A pangenome is a reference built from many genomes rather than one. In practice, modern pangenomes use a variation graph: nodes store sequence segments; edges connect possible adjacencies; and paths trace known haplotypes. This lets common and rare alleles co-exist in the reference, preserving coordinate systems while making variation explicit. Projects such as the HPRC construct human pangenomes by aligning telomere-to-telomere assemblies using pipelines like Minigraph-Cactus and related graph builders. The result is a reference that represents major haplotypes and complex loci, not just a single consensus. Ongoing HPRC releases continue to expand coverage and diversity, and the pangenome graphs they distribute provide ready-to-use inputs for downstream mapping and calling.

Because a pangenome is a graph, it also influences file formats and indexes. Graph-aware systems commonly use GFA/GBZ for compact storage, plus specialized haplotype indexes to navigate paths efficiently during alignment. Downstream, you can still emit VCF/BCF for compatibility with today’s interpretation tools, but the alignment and genotyping step benefits from the richer reference. Comparative studies of graph representations, including pggb and Minigraph-Cactus, show how graph topology and parameterization affect read mapping, variant genotyping, and even RNA-seq placement in polymorphic exons.

How pangenome-aware mapping and calling reduce reference bias

Graph-aware mappers route reads through the correct haplotype instead of forcing them onto a single path. The vg toolkit’s Giraffe mapper is a good example: it achieves linear-mapper–like speed while improving alignment accuracy for both short and long reads on large human pangenome graphs. In head-to-head comparisons, Giraffe delivers similar or better variant-calling accuracy than best-in-class linear and graph mappers, particularly in complex regions that routinely trip up linear alignment. HPRC analyses themselves use Giraffe to align benchmark samples to minigraph-cactus graphs, underscoring that this is not just a toy—it’s how current pangenome work gets done.

Here is a minimal end-to-end example that maps paired-end reads to a pangenome graph and produces variant calls using vg’s native pack/call. It keeps familiar inputs and outputs (FASTQ in, VCF out) while swapping a graph-aware aligner into your pipeline:

# 1) Build a Giraffe-ready index from a linear reference + cohort VCFs
vg autoindex -w giraffe -r GRCh38.fa -v cohort.vcf.gz -p graph

# 2) Map reads to the pangenome graph
vg giraffe -Z graph.gbz -d graph.dist -m graph.min \
  -f sample_R1.fq.gz -f sample_R2.fq.gz > sample.gam

# 3) Create a coverage pack and call variants
vg pack -Z graph.gbz -g sample.gam -o sample.pack
vg call -k sample.pack -r SAMPLE_ID graph.gbz > sample.graph.vcf

Beyond mapping, graph-aware genotypers exploit the same structure. GraphTyper2 encodes SV breakpoints alongside small variants in a directed acyclic graph and jointly genotypes them at population scale. For high-throughput genotyping without remapping every sample, PanGenie leverages k-mer–based inference on pangenome graphs to rapidly infer genotypes across many variant classes, making it attractive for large cohorts and GWAS-scale studies. Together, these tools enable graph-native workflows that are both fast and accurate.

Population studies, rare disease, and structural variants: where graphs shine

The immediate win from graph references is fewer misalignments in polymorphic sequence, which means cleaner allele balances and fewer missed insertions and complex indels. That shows up in trio consistency, de novo detection, and Mendelian error rates. More importantly, it translates into real discoveries. Recent work in rare disease used pangenome graphs built from high-quality personal assemblies to detect and genotype structural variants, including a pathogenic event in KMT2E that standard linear-reference pipelines had missed. When locus complexity and population diversity collide, the pangenome’s explicit alternatives become the difference between “no call” and a diagnosis.

For population genetics, pangenome genotyping scales to tens of thousands of genomes, keeping structural variants in the picture rather than treating them as afterthoughts. GraphTyper2 was applied to cohorts on the order of fifty thousand genomes while jointly genotyping millions of small variants and hundreds of thousands of SVs—using short-read data. That is crucial for equitable analyses because it lets studies capture non-reference haplotypes and ancestry-specific alleles without requiring long-read sequencing for every sample. In parallel, PanGenie demonstrated that pangenome-based inference can cover a wide spectrum of variant classes at speeds suitable for cohort-scale workflows, further lowering the barrier to adoption.

These gains also extend to agrigenomics and microbes, but the same pattern holds in humans: once the reference encodes real haplotype diversity, mapping quality and variant recall improve in regions that previously produced the most headaches. Systematic comparisons of graph construction and indexing approaches reinforce that these are not marginal improvements in corner cases; they are consistent uplifts concentrated where linear references are most brittle.

Practical adoption: upgrading your pipeline without breaking everything

You don’t have to rebuild your entire secondary analysis stack overnight to benefit from graphs. A pragmatic approach is to start at loci where reference bias is notorious. MHC, pharmacogenes with common star alleles, immunoglobulin loci, and segmental duplications are prime candidates. Swap in a graph-aware mapper for those intervals, cross-check against GIAB-like truth sets where available, and track metrics you already know—sensitivity in low-mappability regions, indel recall, Mendelian consistency, and allele balance. Because graph mapping still produces standard BAM/CRAM and VCF, downstream annotation and interpretation fit right in.

If you’re ready to scale beyond hotspots, lean on community resources. HPRC provides minigraph-cactus graphs and indexes that you can feed directly into vg Giraffe, saving you from building everything from scratch. Building your own graph from a linear reference plus phased population VCFs is also straightforward and often gives a strong first approximation to a full pangenome; vg’s autoindex does most of the heavy lifting. For joint genotyping across cohorts, evaluate GraphTyper2 and PanGenie to keep throughput high while bringing structural variants into scope.

From an engineering perspective, expect two differences versus linear pipelines. First, index assets are larger and include haplotype information, so you’ll want to cache them centrally and version them just like you version references and known-sites VCFs. Second, compute patterns shift toward more contiguous memory and I/O for graph navigation. In practice, modern graph mappers are designed to be competitive with linear tools on runtime, so your wall-clock impact is often modest, especially if you start with targeted regions. Evaluations of Giraffe emphasize speed parity with linear mappers while yielding better mapping accuracy; your mileage will vary by coverage, read length, and graph content, which is why piloting on your own datasets is worth the day of work.

Finally, bring your quality controls along for the ride. Replicate a few samples with both linear and graph pipelines and compare hard outcomes rather than just mapping stats. Look for reductions in strand bias and MAPQ0 pileups near SV breakpoints; check that previously suspicious low-complexity intervals clean up; and confirm that clinically relevant loci stabilize. As HPRC releases expand and mature, you can update graph assets on a cadence similar to major reference upgrades you already manage today.

Summary / Takeaways

Linear references made NGS analysis possible at scale, but they also hard-wire bias into read mapping and variant calling—especially in the regions and ancestries where we most need accuracy. Pangenomes and variation graphs fix that by placing alternate haplotypes into the reference itself. The result is better alignment, higher recall for insertions and complex indels, more robust structural-variant genotyping, and fewer artifacts in population studies.

The tooling has caught up. Graph-aware mappers like vg Giraffe are fast enough for production and improve accuracy where linear methods struggle. Graph-native genotypers such as GraphTyper2 and PanGenie bring cohort-scale throughput to small variants and SVs alike. And with community graphs from HPRC, you can start practical pilots today without re-architecting downstream interpretation.

If you’re choosing one next step, pick a locus your lab cares about and run a small A/B test: linear versus graph. Measure variant recall and Mendelian consistency, then decide how far to scale. Chances are, once you see those stubborn regions open up, you won’t want to go back.