Immuno Receptor Sequencing: TCR/BCR Basics & NGS Methods

Jonathan Alles

EVOBYTE Digital Biology

Introduction

If you could eavesdrop on the immune system, you’d hear a constant conversation between receptors and antigens. That dialogue is written in the sequences of T cell receptors (TCRs) and B cell receptors (BCRs). Immuno receptor sequencing—often called AIRR-seq for adaptive immune receptor repertoire sequencing—lets us read those messages at scale. In practice, that means tracking clonotypes over time, mapping vaccine responses, discovering antibodies, and even monitoring residual disease.

This guide gives you a working overview. We’ll quickly ground the biology of TCRs and BCRs, walk through next-generation sequencing (NGS) methods across bulk and single-cell, compare short and long reads, and highlight the bioinformatics challenges that make immune repertoire data different from “standard” RNA/DNA sequencing. Finally, we’ll connect the dots to drug discovery, immuno-oncology biomarkers, and vaccines, and close with what’s trending and what still isn’t solved.

The receptors at a glance: TCR, BCR, and V(D)J

Every unique TCR or BCR is assembled in developing lymphocytes by V(D)J recombination. Variable (V), Diversity (D), and Joining (J) gene segments recombine, and random insertions/deletions at the junctions further diversify the result. BCRs undergo additional somatic hypermutation (SHM) during affinity maturation, rewriting letters in the V region as B cells evolve under selection. The hypervariable complementarity-determining region 3 (CDR3) sits at the center of antigen recognition, so most repertoire analyses zoom in on the V/D/J calls and CDR3s.

A few practical reminders help frame analysis:
– Chains differ. TCRs are typically αβ or γδ; α and γ use VJ (no D), while β and δ use VDJ. BCRs pair a heavy chain (IGH, VDJ) with a light chain (IGK/IGL, VJ).
– Pairing matters. Antigen specificity arises from paired chains, not one alone. Bulk profiling loses pairing information; single-cell methods preserve it.
– Reference catalogs are specialized. IMGT provides the de facto reference for germline V, D, and J genes and standardized numbering schemes used during annotation.

Knowing which segments were used and how the junction was edited lets you cluster clonotypes, build B cell lineage trees, and quantify overall repertoire diversity and clonality.

NGS methods for immune receptor sequencing: bulk, single-cell, short and long reads

Wet-lab choices drive what you can read downstream. Two factors dominate experimental design: how you enrich receptor transcripts and whether you profile cells in bulk or one-by-one.

At the library bench, you’ll often see one of two enrichment strategies. Multiplex PCR uses pools of primers targeted to V and J regions to capture rearranged receptors. It’s sensitive and efficient but can introduce amplification bias if primer coverage is uneven. Alternatively, 5’ RACE extends from a template-switching oligo at the 5’ end, reducing primer bias across V genes and capturing full variable regions. Many assays also incorporate unique molecular identifiers (UMIs) so you can collapse PCR duplicates and correct errors during analysis.

Bulk AIRR-seq profiles thousands to millions of receptors at once from RNA or gDNA, offering depth and cost-efficiency. You get robust repertoire statistics but lose heavy–light or alpha–beta pairing. Single-cell V(D)J, by contrast, partitions cells into droplets or wells, barcodes their cDNAs, and assembles paired, full-length receptor contigs per cell. Modern single-cell immune profiling can also layer in gene expression, surface proteins, and even antigen specificity readouts using DNA-barcoded MHC multimers.

Read length is the second big lever. Short-read sequencing (e.g., paired-end Illumina) remains the workhorse. It’s accurate, cheap per base, and well supported by analysis tools. With careful design, you can assemble full V(D)J regions and CDR3s from short reads. Long-read sequencing (PacBio HiFi, Oxford Nanopore) captures full-length receptor transcripts in one go, clarifying isotypes, splice isoforms, and SHM patterns across complete variable regions. Recent long-read protocols use consensus strategies and UMIs to reach high accuracy, which is valuable for resolving difficult regions and phasing heavy–light pairs when physically linked. The trade-offs are familiar: long reads simplify assembly and isoform detection but typically come with higher cost per read and lower throughput.

In practice, many groups mix and match. Bulk short-reads provide statistical power and longitudinal tracking, while single-cell short-reads add pairing and phenotype. Long-reads then resolve special cases: complex isoforms, unusual recombinations, and full-length validation of candidate therapeutics.

A pragmatic bioinformatics pipeline

Analyzing immune receptor data looks deceptively like standard RNA-seq or amplicon pipelines until you hit the immunology-specific steps. A practical path often runs like this:

Demultiplex and trim. Respect platform barcodes and extract UMIs early so they stay linked to reads and cell barcodes.
Assemble and correct. For targeted amplicons, merge pairs and perform UMI-based consensus to tame PCR and sequencing errors. In single-cell data, assemble contigs per cell, then filter for productive chains.
Annotate V(D)J. Align to an immunogenetics-aware reference such as IMGT to call V, D, J genes/alleles, CDR3 boundaries, and productivity. Tools like IgBLAST and MiXCR are broadly adopted for this step; both support AIRR Community formats that make downstream interoperability easier.
Collapse clonotypes. Define clonotypes with sensible rules: identical CDR3 amino acids plus shared V and J usage is a common starting point; for BCRs, many analyses cluster at the nucleotide level to accommodate SHM and then infer lineages.
Quantify and visualize. Compute diversity indices, track clonal expansions, map class switching for BCRs, and integrate single-cell phenotypes. For TCRs, you may cluster sequences to infer putative antigen specificity groups or compare repertoires across conditions.

Where do things break? Three classes of issues recur.

First, reference limitations and annotation ambiguity. Germline databases are alive and evolving; missing or misassigned alleles can produce incorrect V gene calls and skew downstream lineage inference. For B cells, SHM complicates alignment; ignoring those mutations can inflate diversity or fragment true clonal families. Per-sample germline inference tools help, but they add another modeling layer you need to validate.

Second, amplification and sampling biases. Multiplex PCR can distort gene usage; 5’ RACE reduces that but does not eliminate it. Tissue source and input quantity matter as much as sequencing depth for capturing rare clonotypes. Replicates and UMIs are the best antidotes to over-interpreting noisy expansions.

Third, chain pairing and multi-mapping. Bulk data can’t tell you which heavy chain pairs with which light chain, or which TCR alpha with which beta, so specificity modeling from bulk alone is inherently limited. Single-cell data restore pairing, yet dropouts and multiplets still occur, and you must decide how to handle cells with more than two chains.

Finally, interoperability and provenance aren’t luxuries. The AIRR Community standards (MiAIRR for study reporting and the AIRR Data Model for files/APIs) exist precisely because everyone benefits when repertoire data and metadata can be reanalyzed and reused.

Why AIRR-seq is changing drug discovery, immuno-oncology, and vaccines

The most direct impact in the clinic has come from BCR-based measurable residual disease (MRD) testing in hematologic malignancies, where ultra-deep receptor sequencing detects one malignant clone among hundreds of thousands to a million cells. That sensitivity lets clinicians track molecular response earlier than imaging or cytology and adjust therapy sooner.

Beyond MRD, immuno-oncology teams mine TCR repertoires to understand why some patients respond to checkpoint blockade and others don’t. Expanding tumor-infiltrating clonotypes, repertoire contraction after therapy, or the appearance of new clones can serve as early markers of response or resistance. None of these features is a universal biomarker, but together they give a richer picture of treatment dynamics than bulk tumor genomics alone.

On the discovery side, BCR sequencing complements classical antibody screening. After immunization or infection, convergent “public” motifs and shared V–J combinations point to epitope-focused solutions across individuals. Linking repertoires to paired single-cell expression and antigen-binding assays accelerates triage from thousands of candidates to a handful of high-likelihood binders. In vaccine development, the same logic helps quantify whether a candidate immunogen is eliciting the intended lineages, and how quickly affinity matures across boosts.

Meanwhile, antigen-specific TCR modeling is a frontier where machine learning meets immunology. Methods that cluster TCRs by CDR3 similarity, infer “motifs,” or directly learn TCR–peptide–HLA binding rules are rapidly improving, especially when trained on tetramer-sorted or single-cell labeled data. Today, these models are most reliable within well-characterized antigen families; generalization to unseen epitopes remains the hard part. Still, they already help prioritize candidates for validation and reduce wet-lab screening burden.

Trends and limitations to watch

The field is moving fast on three fronts.

Full-length, high-accuracy long reads are maturing. With consensus strategies and UMIs, long-read AIRR-seq can now deliver near full-length, highly accurate immunoglobulin and TCR transcripts. This makes isotype detection, splice-aware annotation, and SHM profiling more straightforward, and it simplifies validation of therapeutic sequences. Costs and throughput continue to improve, but short reads still dominate for large cohorts and time-course designs.

Single-cell multi-omics is becoming the default for mechanistic studies. The ability to jointly measure a cell’s V(D)J sequences, transcriptome, surface proteins, and even antigen specificity has transformed repertoire interpretation. You can now link clonal expansions to cell states, cytokine programs, and tissue localization rather than inferring biology from sequence alone. The price is complexity: batch effects, cross-assay normalization, and careful experimental design matter even more.

Standards and data commons are paying off. AIRR-compliant formats and APIs have made it easier to share, reprocess, and meta-analyze repertoires from different labs. For practitioners, this means you can mix best-in-class aligners, clonotyping algorithms, and visualization tools without handcrafting glue code for every dataset—provided you respect the standard from the moment you design the study.

As for limitations, a few aren’t going away soon. Germline reference incompleteness and allelic diversity continue to complicate annotation; per-sample inference helps but isn’t perfect. Bulk data still can’t restore chain pairing, so specificity inferences from bulk alone should be framed as hypotheses. And while deep learning has raised the ceiling for TCR specificity prediction, robust performance on truly novel epitopes remains an open research challenge.

Summary / Takeaways

Immuno receptor sequencing opens a quantitative window into how adaptive immunity works in real time. If you’re planning a study, start with the decision tree: bulk or single-cell, multiplex PCR or 5’ RACE, short or long reads. Each choice trades off bias, pairing, cost, and depth. In analysis, treat V(D)J annotation, SHM-aware clonotyping, and UMI-based error correction as first-class citizens, not afterthoughts. Adopt AIRR Community standards early so your data are reusable.

For translational teams, the value proposition is already clear. Repertoires can track therapy response, guide cancer immunotherapy, accelerate antibody discovery, and benchmark vaccine candidates against desired lineages. The near future will lean on full-length long reads for clarity, single-cell multi-omics for context, and machine learning for smarter triage—while we continue to chip away at reference gaps, pairing constraints, and the perennial challenge of generalizing antigen specificity predictions.