Single‑Cell Genomics 101: introduction to technologies and practical workflows

By EVOBYTE Your partner in bioinformatics

Introduction

Single‑cell genomics lets us read out molecular signals one cell at a time, instead of averaging them away in bulk measurements. That simple shift unlocks a lot: rare cell types pop into view, cell states become trackable, and tissue heterogeneity turns into something you can quantify. In this Part 1 primer, we’ll ground the core ideas, contrast single‑cell with spatial transcriptomics, walk through a memorable PBMC (blood) example, and map the main analysis frameworks you’ll actually use day to day. We’ll close with a teaser for Part 2 on data formats, preprocessing, and QC.

Keywords to watch for:

  • scRNA‑seq (single‑cell RNA‑seq)
  • scATAC‑seq (single‑cell chromatin accessibility)
  • CITE‑seq (RNA+protein)
  • PBMC (peripheral blood mononuclear cells
  • UMI (unique molecular identifier)
  • UMAP (a 2‑D embedding)
  • AnnData/H5AD (common Python data structure/file)
  • SingleCellExperiment (R/Bioconductor class)
  • Seurat (R)
  • Scanpy (Python)
  • spatial transcriptomics (ST).

These terms are now standard in discovery biology and translational R&D because they tie cell‑level resolution to interpretable, reproducible workflows that scale.

The promise of single‑cell genomics and core technologies

At its heart, scRNA‑seq measures gene expression per cell. Modern droplet systems encapsulate single cells with barcoded beads so every RNA molecule gets a cell‑specific and molecule‑specific UMI. That barcoding underwrites robust quantification and enables large experiments across tissues and conditions. Commercial platforms combine cell partitioning, barcoding, and library prep into streamlined workflows that routinely capture tens of thousands of cells and support multi‑omic readouts (e.g., scATAC‑seq or RNA+protein via CITE‑seq).

Why it matters: single‑cell readouts change the questions you can ask. Instead of asking “Is this pathway up?” you can ask “Which rare subpopulation drives resistance?” Or “How do cell states shift after treatment?” These are the kinds of questions driving target discovery, biomarker programs, and patient stratification.

Single‑cell versus spatial transcriptomics: complementary lenses

Single‑cell assays excel at deep, unbiased discovery, but they lose where each cell lived in the tissue. Spatial transcriptomics (ST) preserves that context, measuring gene expression in situ—either by sequencing captures on barcoded arrays or by imaging‑based chemistries that read out predefined gene panels. In practice, labs pair the two: use scRNA‑seq to define high‑resolution cell states, then map those states back into tissue with ST to see neighborhoods and niches.

Trade‑offs are real. Spatial platforms vary in resolution, sensitivity, and throughput; some capture transcriptomes across spots that may contain multiple cells, while imaging‑based methods reach nearer single‑cell resolution but on targeted gene sets. The upshot for analysis teams is to plan for deconvolution, integration, and cross‑modal QC when fusing single‑cell and spatial data.

A quick example: decoding blood with PBMCs

If you’re new to the field, the canonical “hello world” dataset is 3k PBMCs from a healthy donor. You’ll see major immune compartments—T cells, B cells, NK cells, monocytes—separate cleanly after basic preprocessing, normalization, dimensionality reduction (PCA → UMAP), and graph‑based clustering. The same logic applies to patient samples; the biology just gets richer

Two minimal snippets show how similar the workflows feel in R (Seurat) and Python (Scanpy):

R (Seurat):

library(Seurat)
pbmc <- Read10X_h5("pbmc3k.h5")
pbmc <- CreateSeuratObject(pbmc)
pbmc <- NormalizeData(pbmc) |> FindVariableFeatures() |> ScaleData() |> RunPCA()
pbmc <- FindNeighbors(pbmc, dims = 1:10) |> FindClusters(resolution = 0.5) |> RunUMAP(dims = 1:10)

Python (Scanpy):

import scanpy as sc
adata = sc.datasets.pbmc3k()
sc.pp.normalize_total(adata); sc.pp.log1p(adata); sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pp.scale(adata); sc.tl.pca(adata); sc.pp.neighbors(adata, n_pcs=10)
sc.tl.leiden(adata, resolution=0.5); sc.tl.umap(adata)

These produce clusters you can annotate with immune marker genes and compare across conditions or donors. (satijalab.org)

Workflows and frameworks you’ll actually use

Most teams standardize on one of three mature ecosystems:

  • Seurat (R): batteries‑included, great community vignettes, strong integration tools. Ideal if your team is R‑first or already uses Bioconductor.
  • Scanpy (Python): builds on AnnData/H5AD, scales well, easy to compose with the broader PyData/ML stack.
  • Bioconductor (R) via the OSCA book: emphasizes robust data structures (SingleCellExperiment), method transparency, and reproducibility across packages.

Across these, the standard analysis arc is consistent: load raw counts, QC (mitochondrial fraction, doublets, ambient RNA), normalize and variance‑stabilize, choose highly variable genes, reduce dimensions, build a neighbor graph, cluster, embed (UMAP), and annotate. For spatial data, add steps for spot deconvolution and alignment to histology. Reviews and living “best practices” resources can help you pick defaults and avoid gotchas as datasets scale.

Industry relevance of the keywords you’ll see in docs and papers:

  • UMIs make molecule‑level counting comparable across batches and sites—essential for regulated studies.
  • AnnData/H5AD and SingleCellExperiment formalize metadata and matrices, which simplifies handoffs between bioinformatics and data science teams.
  • UMAP/t‑SNE enable fast triage and exploratory QC in dashboards before you commit to deeper modeling.

Summary / Takeaways

Single‑cell genomics gives you cellular resolution; spatial transcriptomics returns the map. Used together, they reveal not just what cells exist, but where they act and interact. Operationally, you’ll be productive fastest by picking a primary framework (Seurat, Scanpy, or Bioconductor) and standardizing a few opinionated pipelines around it. In Part 2, we’ll get hands‑on with data formats (H5AD, SingleCellExperiment, loom), practical preprocessing (filtering, normalization, batch correction), and QC patterns you can automate for routine studies. If you could only automate one QC today, what would it be: doublet detection, ambient RNA removal, or annotation drift checks?

Further Reading

  • 10x Genomics: Chromium Single Cell Platform (assays and multi‑omic options). (10xgenomics.com)
  • Seurat Guided Clustering Tutorial (PBMC 3k). (satijalab.org)
  • Scanpy PBMC3k tutorial (preprocessing → clustering → markers). (scanpy-tutorials.readthedocs.io)
  • Orchestrating Single‑Cell Analysis (OSCA) with Bioconductor (open book). (bioconductor.org)
  • The expanding vistas of spatial transcriptomics (review, Nature Biotechnology).

Leave a Comment