Free Spatial Transcriptomics Datasets: SODB & SpatialDB

Jonathan Alles

EVOBYTE Digital Biology

By EVOBYTE Your partner in bioinformatics

Introduction

If you’re building a cell–cell interaction model, benchmarking a spot deconvolution algorithm, or teaching a workshop, your first hurdle is almost never the method. It’s data. You need open, well-annotated spatial transcriptomics datasets that download quickly, load cleanly, and cover a range of tissues and technologies. The good news is that two community resources—SODB and SpatialDB—put hundreds to thousands of spatial omics datasets within reach, for free. Let’s unpack what each offers, how they differ, and how to start using them today.

SpatialDB: a curated window into early spatial transcriptomics

SpatialDB is a manually curated database focused on spatially resolved transcriptomics. It grew out of the early wave of spatial methods—Spatial Transcriptomics (ST), Slide‑seq, LCM‑seq, seqFISH, MERFISH, Geo‑seq, Tomo‑seq—and aggregates datasets across human, mouse, and model organisms. You can browse gene maps, compare spatial expression for two genes side by side, and even retrieve lists of spatially variable (SV) genes computed with SpatialDE and trendsceek. In short, it’s a teaching‑friendly and methods‑friendly gallery of classic datasets with built‑in visualization. For many projects, that’s all you need to prototype an analysis pipeline or reproduce a figure.

What does the catalog look like in practice? Expect studies like melanoma lymph node biopsies analyzed with ST, brain and prostate cancer sections, and other well‑cited exemplars. You won’t get the sheer volume of newer Visium or in situ runs here, but you do get clean metadata, technique‑aware viewers, and quick downloads that make first steps painless.

SODB: a broad spatial omics hub with analysis tooling

SODB (Spatial Omics DataBase) takes a bigger swing. It indexes thousands of experiments spanning more than two dozen spatial technologies, from Visium and Slide‑seq to imaging‑based assays such as MERFISH and CODEX. Crucially, it standardizes formats so you can load data into common analysis stacks without spending your afternoon on I/O gymnastics. Beyond hosting data, SODB ships interactive modules—most notably SOView—for quick tissue exploration in the browser, plus a Python client (pysodb) for programmatic access. If your goal is to benchmark across modalities or assemble a cross‑tissue panel at scale, this is where the breadth pays off.

Two details matter for day‑to‑day work. First, SODB links to primary raw data while also offering processed objects for immediate analysis, which shortens time‑to‑first‑plot. Second, pysodb exposes a stable API, so you can script reproducible data pulls in notebooks and CI jobs instead of copy‑pasting URLs. That combination turns “I saw a cool dataset in a paper” into “I can load five comparable datasets and run the same QC in minutes.”

Overview of available datasets

Think of SpatialDB as the curated museum of early spatial transcriptomics and SODB as the bustling warehouse. SpatialDB is ideal when you want canonical examples, lightweight exploration, and baked‑in SV gene computations. It shines in teaching, figure recreation, and quick comparisons across foundational techniques like ST, seqFISH, and MERFISH. SODB, on the other hand, casts a wider net. You’ll find extensive Visium runs across cancers and brain atlases, multi‑omics imaging panels, and newer chemistries—often harmonized so you can run the same downstream tools without bespoke parsers. If you’re stress‑testing a deconvolution method across tissues or building a benchmarking suite, SODB’s scale and API support will save you days.

It also helps to have a third door: vendor‑hosted public datasets. 10x Genomics publishes free Visium and Visium HD example datasets and keeps them aligned with current Space Ranger releases. These make reliable “known‑good” inputs when you’re validating a pipeline, including cases with trimmed FASTQs for faster iteration. They’re not a database in the same sense as SODB or SpatialDB, but they provide clean, versioned ground truth that pairs well with both.

Summary / Takeaways

Free spatial genomics data is more accessible than ever. Use SpatialDB when you want a curated, visualization‑first tour of hallmark spatial transcriptomics datasets and SV‑gene lists without extra setup. Reach for SODB when you need scale, modality diversity, and a programmable path to loading data across technologies with pysodb. And don’t overlook vendor public sets—especially Visium and Visium HD—when you need clean, versioned inputs to validate an analysis. With these three sources in your toolkit, you can move from idea to reproducible spatial analysis with fewer detours and far more confidence. What question will you map in tissue next?