By EVOBYTE Your partner in bioinformatics
Introduction
Walk into any histology lab and you’ll see beautiful stained slides. Now imagine pressing one of those slides against a high‑density surface that silently assigns a tiny postal code to every RNA molecule leaving the tissue. After sequencing, you don’t just get “what” was expressed—you also get “where.” That is the promise of sequencing‑based spatial transcriptomics. It’s transforming how we study tumor microenvironments, developing organs, and immune niches because it adds precise spatial context to the rich gene expression profiles we already know how to analyze.
This post kicks off a practical series on spatial transcriptomics data analysis. In Part 1, we’ll stay grounded and actionable. We’ll define what “sequencing‑based” means, contrast it with imaging‑based approaches without getting lost in jargon, introduce the major platforms you’re likely to encounter (10x Genomics Visium and Curio Seeker), show you what the output files look like, and end with a gentle on‑ramp to analysis in Python (Scanpy/Squidpy) and R (Seurat). Along the way, we’ll keep the vocabulary simple and the examples close to what you’ll actually run on your laptop or cluster.
Before we dive in, it’s worth noting that while “spatial transcriptomics” has become a broad umbrella, the field traces back to a 2016 paper that coined the term and demonstrated barcoded arrays to read out spatially registered RNA‑seq from tissue sections. That historical thread still explains why today’s datasets feel like RNA‑seq with coordinates attached.
What sequencing‑based spatial transcriptomics actually is (and how it differs from imaging)
At its core, sequencing‑based spatial transcriptomics captures messenger RNA from a tissue section onto a surface where each capture location carries a known barcode. During library prep, that positional barcode is appended to each cDNA molecule. After next‑generation sequencing (NGS), you align reads to a reference, collapse them into unique molecular identifiers (UMIs), and obtain a gene‑by‑location matrix. Each “location” is either a spot on a patterned array or a bead on a dense monolayer. In short: it’s RNA‑seq, but every count comes with X–Y coordinates.
Imaging‑based spatial transcriptomics reaches the same goal by different means. Instead of barcoded capture and NGS, it usually uses rounds of hybridization with fluorescent probes (for example, MERFISH or seqFISH). Those methods can push to subcellular resolution and are excellent when you have a defined panel of genes and strong microscopy infrastructure. Sequencing‑based approaches shine when you want unbiased, whole‑transcriptome coverage with a familiar NGS workflow. If you’ve ever run bulk or single‑cell RNA‑seq, the wet‑lab and compute patterns will feel comfortable.
Both families are powerful; they simply make different trade‑offs. Imaging focuses on ultrafine spatial detail and targeted multiplexing; sequencing emphasizes scalability and whole‑transcriptome discovery. As an analyst, you’ll see those design choices reflected in the files you receive and the tools you use downstream.
A tour of today’s leading sequencing‑based platforms: Visium and Curio Seeker
If you’re handed a dataset today, chances are good it comes from 10x Genomics Visium or Curio Seeker. They solve the same problem with different capture surfaces and resolutions.
Visium (10x Genomics) uses a glass slide patterned with thousands of spots, each spot containing capture oligos with a unique spatial barcode. Tissue is placed on the capture area, mRNA is reverse‑transcribed in situ, and the resulting libraries are sequenced. The accompanying Space Ranger pipeline outputs both a count matrix and spatial metadata. If you’ve analyzed single‑cell data, the output will feel very familiar: it’s a sparse gene‑by‑spot matrix plus a few files that connect each spot to its location on a histology image.
Curio Seeker commercializes Slide‑seq, which relies on a tightly packed monolayer of 10‑μm barcoded beads. You place fresh‑frozen tissue on a Seeker tile; mRNA transfers to beads; then you prepare libraries and sequence. Because each bead is about the size of a cell, Seeker delivers near single‑cell resolution while retaining whole‑transcriptome coverage and avoiding specialized microscopes or bespoke hardware. Many labs adopt it because it plugs into existing NGS workflows and returns results in a day‑scale protocol.
A quick rule of thumb helps decide which platform fits a study design. If you need an approachable workflow with an established software ecosystem and you’re comfortable with spot‑level resolution, Visium is a reliable baseline. If your biological question hinges on cellular‑scale neighborhoods, boundaries, or micro‑niches, the bead‑based Curio Seeker maps can be a compelling choice. In practice, groups often start with Visium to survey a tissue and then zoom in with a higher‑resolution technology when a region of interest emerges.
What your files look like: counts, spot positions, and images
Regardless of vendor, sequencing‑based spatial datasets share a common shape. You’ll have:
- A gene‑by‑location count matrix. For Visium, Space Ranger writes this as a filtered_feature_bc_matrix (HDF5 or Matrix Market) with barcodes, features, and matrix files; Seeker outputs are similarly structured and designed to be read by common R/Python packages.
-
A mapping from locations to image coordinates. Visium keeps the spot metadata in a CSV that lists each barcode, whether it lies over tissue, and the pixel row/column of the spot center in the full‑resolution image. It also provides a scalefactors JSON to relate the original high‑resolution image to the low‑res and high‑res versions used in visualization. These two files—tissue_positions.csv and scalefactors_json.json—are the glue between expression counts and the H&E slide.
-
One or more tissue images. A high‑resolution H&E (or IF) image plus a downsampled version used for quick plotting. You’ll sometimes see “aligned fiducials” images as well; those help with registration and quality control.
If you’re new to these outputs, the most practical mental model is: think of each barcode as a mini “RNA‑seq library” whose coordinates are known. All your downstream tasks—QC, normalization, clustering, deconvolution, and spatial statistics—work the same way they do in single‑cell analysis, but you gain access to the geometry, adjacency, and histology features of the tissue.
To make this concrete, picture an H&E slide of a colorectal tumor. On Visium, you’d see a hexagonal grid of spots overlaid on the image; each spot barcode has a row and column on the array and pixel coordinates in the image. On Seeker, you’d zoom further, and each bead becomes a tiny point with its own coordinates. In both cases, your code will merge the count matrix with the coordinates to render a spatial heatmap, cluster neighborhoods, or map known cell types back onto the tissue.
Your first analysis: Scanpy and Squidpy in Python, Seurat in R
Most teams analyze spatial data in either the scverse Python stack (AnnData, Scanpy, Squidpy) or in Seurat on the R side. The good news is that both ecosystems read vendor outputs directly and expose familiar workflows.
In Python, Squidpy builds on Scanpy/AnnData and adds spatial primitives, image handling, and neighborhood graphs tailored to tissue geometry. It offers convenient readers and plotting functions that understand the Space Ranger folder structure, load the images, map coordinates, and keep everything synchronized in a single AnnData object. That single object becomes your container for preprocessing, clustering, spatial statistics, and image features.
Here’s the quickest way to get a Visium dataset into memory and make a tissue‑space plot:
import squidpy as sq
# Point to the Space Ranger output directory containing 'filtered_feature_bc_matrix.h5' and 'spatial/'
adata = sq.read.visium("path/to/sample/outs", counts_file="filtered_feature_bc_matrix.h5")
# Basic QC and a quick spatial plot of a gene
sq.pl.spatial_scatter(adata, color="COL1A1", library_id=list(adata.uns["spatial"].keys())[0], size=1.3)
Two small pointers save headaches. First, keep the Space Ranger directory structure intact so the reader can find tissue_positions.csv and scalefactors_json.json next to your images. Second, store library_id when merging multiple sections; it helps later when you compare adjacent slices or stitch larger regions.
In R, Seurat’s spatial module provides parallel functionality. The Load10X_Spatial() helper reads Space Ranger outputs and attaches an image to the Seurat object, preserving the link between counts and coordinates. If you’re already comfortable with Seurat’s standard single‑cell workflow, the spatial functions will feel like a natural extension: same data structure, just with an image and spot coordinates.
Here’s a minimal Seurat example:
library(Seurat)
# Path should contain filtered_feature_bc_matrix.h5 and the spatial/ folder
brain <- Load10X_Spatial(data.dir = "path/to/sample/outs")
# Visualize a gene in tissue space
SpatialFeaturePlot(brain, features = "COL1A1")
From there, the analysis playbook mirrors scRNA‑seq with spatial twists. You’ll still filter low‑quality spots or beads, normalize counts, and cluster. But you’ll also quantify spatial autocorrelation, build neighborhood graphs using spot adjacency, and enrich clusters with image features (for example, texture or nuclei density). Squidpy ships specialized methods for neighborhood enrichment and spatially variable genes; Seurat integrates spatial statistics and supports deconvolution workflows that map single‑cell references onto spatial positions.
If you’re evaluating which ecosystem to use, let your team’s codebase guide you. Python’s scverse stack integrates tightly with modern deep‑learning tooling and image analysis libraries, while Seurat offers a polished R interface and a rich gallery of vignettes and tutorials for spatial data. Both handle Visium gracefully; both can import third‑party outputs. Curio Seeker’s pipeline conveniently exports .h5ad, .mtx, and .rds, making it straightforward to analyze in either environment without custom converters.
A quick word on data quality and expectations
Spatial data inherits everything we love—and everything we troubleshoot—from RNA‑seq. You’ll encounter ambient RNA, variable tissue permeabilization, and differences in capture efficiency across the slide. In practice, that means you should expect gradients in UMIs per spot or bead, and you should inspect mitochondrial content and tissue coverage before any biological interpretation. Visium’s “in_tissue” flag and scalefactors help align counts with the actual tissue footprint on the image, which is why keeping the spatial folder intact is so important.
On the biological side, spatial resolution shapes your questions. With spot‑based data, you’ll often deconvolve mixtures of cell types within a spot and then reason about region‑level patterns. With near single‑cell bead maps, you can trace sharp boundaries, rare niches, and cellular interactions at a finer granularity. Either way, referencing a matched single‑cell RNA‑seq dataset (or a public atlas) can anchor your annotations and reduce over‑interpretation of purely unsupervised clusters.
Putting it together: a mental workflow for Part 2
Let’s tie this to an example you might run next week. Suppose you’re studying fibrosis. You profile a liver section on Visium, run Space Ranger, and create an AnnData or Seurat object. You QC out low‑count spots and non‑tissue areas, normalize, and cluster. Then you overlay collagen and ECM genes as spatial features, quantify spatial autocorrelation, and ask whether specific neighborhoods (for example, portal triads) show enriched myofibroblast signatures. If that survey reveals a sharp gradient near fibrotic foci, you follow up with Curio Seeker in the same region to look for cellular gradients at single‑cell scale. The two datasets, read into the same analysis environment, let you move fluidly between panorama and close‑up.
In Part 2, we’ll make that concrete with a reproducible notebook in Python and an R script, covering QC, normalization, clustering, deconvolution, and spatial statistics on real public datasets.
Summary / Takeaways
Sequencing‑based spatial transcriptomics brings NGS workflows to the tissue slide, coupling RNA‑seq counts with precise coordinates. It originated from barcoded arrays that preserved spatial information during cDNA synthesis and sequencing, and today it’s delivered at scale by platforms like 10x Genomics Visium and Curio Seeker. Visium gives you a robust spot‑level survey with a mature pipeline and outputs; Seeker pushes to cellular‑scale resolution without special microscopes. In both cases, you receive a gene‑by‑location matrix, spot or bead coordinates, and one or more tissue images. Those files drop cleanly into Python (Scanpy/Squidpy) or R (Seurat), where you can apply familiar RNA‑seq practices plus spatial statistics.
As you plan experiments or pick up a dataset, focus first on the biological scale of the question, then on the resolution and workflow that match it. If you’re comfortable in scRNA‑seq analysis, you already know 80% of the journey. The remaining 20% is the fun part: leveraging the geometry of tissue to tell a story you couldn’t see before.
What tissue and question are you exploring? If you share a brief description and the platform you’re using, I’ll tailor Part 2’s walkthrough to your use case.
Further Reading
- Ståhl et al., “Visualization and analysis of gene expression in tissue sections by spatial transcriptomics” (Science, 2016) – PubMed
- 10x Genomics Space Ranger spatial outputs: tissue_positions.csv and scalefactors_json.json
- Curio Bioscience Seeker: high‑resolution sequencing‑based spatial transcriptomics
- Seurat spatial vignette: Load10X_Spatial and visualization
- Squidpy documentation: read Visium data and plot spatial maps
