SpliceVarDB: Validated splicing variants for splice-site

Jonathan Alles

EVOBYTE Digital Biology

Introduction

If you work with rare disease genomes or cancer panels, you’ve probably stared at a variant that every predictor flags as “splice‑disruptive,” only to wonder whether it actually alters RNA in a relevant tissue. This is where experimentally validated evidence becomes the difference between a hunch and a diagnosis. SpliceVarDB brings that evidence together at scale, so you can move from guesswork to grounded decisions faster.

What SpliceVarDB contains—and why it’s different

SpliceVarDB consolidates more than 50,000 human variants that were assayed for splicing outcomes across over 8,000 genes, harmonized from 500+ publications into a single resource. Each variant is classified on a spliceogenicity scale—splice‑altering, not splice‑altering, or low‑frequency splice‑altering—based on strength of evidence from assays like RT‑PCR, RNA‑seq, and minigene tests. Notably, over half of the splice‑altering entries lie outside canonical GT/AG splice sites, and a meaningful fraction are deep intronic, the blind spot for many pipelines. These curated facts make SpliceVarDB more than a catalog; it’s a high‑quality truth set designed for clinical curation and tool benchmarking.

Why experimentally validated splice‑site variants matter

In silico models are excellent triage tools, but splicing is context‑dependent. Tissue expression, nonsense‑mediated decay (NMD), and regulatory elements can mask or magnify a variant’s impact. That’s why ACMG/AMP interpretation leans on functional assays: predictions can support pathogenicity (PP3), but demonstrable splice disruption in wet‑lab systems can supply stronger evidence (PS3) and convert a VUS into a reportable call. Practically, patient‑derived RNA (RT‑PCR or RNA‑seq) is most direct when tissue is accessible; minigene constructs help when it isn’t, while still requiring careful interpretation. The bottom line is simple: validated RNA effects sharpen diagnoses, guide therapy decisions, and calibrate the very algorithms we rely on.

How SpliceVarDB improves your splicing analysis pipeline

Because entries are tied to HGVS notation and assay outcomes, SpliceVarDB reduces redundant lab work by revealing when a variant—or a near neighbor—has already been tested. It also anchors benchmarking: you can evaluate models like SpliceAI or emerging deep‑learning approaches against a large, diverse, and public ground truth rather than small, gene‑specific sets. And since many validated hits sit in non‑canonical regions, SpliceVarDB helps you catch deep intronic pseudoexons and branchpoint disruptions that standard exon‑centric filters can miss. The database was explicitly built to cut interpretation cycles and to catalyze better predictors, not to replace them.

A quick, practical example: triaging a VCF with SpliceVarDB

Imagine you’ve downloaded the SpliceVarDB table and want to flag variants in your cohort before launching lab work. You can do this with a lightweight join, then bubble up candidates with prior functional evidence.

Example (Python, using a TSV export from SpliceVarDB):

import pandas as pd

vcf = pd.read_csv("cohort.snvs.tsv", sep="\t")  # CHROM, POS, REF, ALT, SAMPLE, ...
svdb = pd.read_csv("SpliceVarDB.hg38.tsv", sep="\t")  # CHROM, POS, REF, ALT, CLASS, ASSAY, PMID

key_cols = ["CHROM","POS","REF","ALT"]
annot = vcf.merge(svdb[key_cols + ["CLASS","ASSAY","PMID"]], on=key_cols, how="left")
hits = annot[~annot["CLASS"].isna()].sort_values(["SAMPLE","CLASS"])
hits.to_csv("cohort.splicevardb_hits.tsv", sep="\t", index=False)

This simple pass highlights which variants already have functional support and what the assay observed, so your team can focus wet‑lab time on the truly unknowns while using validated events to calibrate thresholds.

Where it fits with other splicing resources

SpliceVarDB complements specialized datasets rather than competing with them. MutSpliceDB anchors its entries in RNA‑seq BAM evidence—great for visual confirmation and cancer‑focused exploration. DBASS3 offers depth on aberrant 3′ acceptor sites and cryptic splice choices, which pairs nicely when you need mechanistic context around acceptor usage. Together, these resources let you traverse from genome to transcript to mechanism without reinventing the wheel.

Summary / Takeaways

Validated splicing evidence is the fastest route from suspicious DNA to actionable RNA. By centralizing tens of thousands of experimentally tested variants and standardizing how we describe spliceogenicity, SpliceVarDB turns scattered literature into a practical asset for diagnostics, research, and model development. Add it near the top of your pipeline: screen with predictors, cross‑reference SpliceVarDB, then reserve bench time for the variants that still need an answer. What would you discover if your next VUS already had RNA data waiting?