Best Free Spatial Omics Datasets: IHC, WSIs & Multi-Omics

Jonathan Alles

EVOBYTE Digital Biology

By EVOBYTE Your partner in bioinformatics

Introduction

If you’re starting a spatial multi‑omics project, the first roadblock is often data. You know the assays—immunohistochemistry (IHC), whole‑slide images (WSIs), spatial transcriptomics, multiplexed imaging, and spatial metabolomics—but where do you actually get high‑quality, free datasets to explore and benchmark? In this guide, we’ll map the most reliable open resources for spatial omics analysis and show quick ways to pull them into your workflow. Along the way, we’ll clarify a few acronyms so the landscape feels less overwhelming.

Spatial omics simply means measuring molecules—RNA, proteins, lipids—in their native tissue context. Because context matters, the best repositories pair rich metadata with downloadable files and, ideally, a viewer so you can inspect tissue structure before you compute. The portals below check those boxes and cover the spectrum from classic pathology to cutting‑edge multi‑omics.

Open spatial transcriptomics data: robust atlases you can actually download

For broad, technology‑agnostic access to spatial transcriptomics and proteomics, the HuBMAP Data Portal is a strong first stop. HuBMAP curates multi‑modal datasets from healthy human tissues, provides tissue‑level viewers (Vitessce) for CODEX, MIBI, Visium, and imaging mass spectrometry, and supports bulk downloads via Globus. As of October 2025 the portal lists thousands of datasets spanning dozens of organs and data types, with uniform processing and quality control to make cross‑study comparisons less painful.

When you need a single place to browse many spatial technologies at once—including Visium, Slide‑seq, MERFISH‑like panels, and more—the Spatial Omics DataBase (SODB) adds breadth. It aggregates over two thousand experiments across more than two dozen technologies and exposes data in unified formats that play nicely with Python and R, plus an interactive viewer for quick exploration. If your goal is method development across modalities, SODB is a practical jumping‑off point.

Spatial proteomics and metabolomics: multiplexed images and ion maps at scale

Not all spatial omics is transcriptomics. If you’re profiling proteins or small molecules in situ, you’ll want imaging‑based resources. METASPACE is the community’s primary hub for imaging mass spectrometry (IMS)—think MALDI or DESI ion images registered to tissue. It hosts the largest public collection of spatial metabolomics datasets, now numbering in the many thousands, and offers programmatic access and FDR‑controlled annotations so you can compare methods across labs and instruments. For lipidomics‑heavy tumor microenvironment studies or brain mapping, METASPACE is hard to beat.

On the protein side, multiplexed antibody imaging such as CODEX, MIBI, and IMC is increasingly represented in the big portals above. Many studies also deposit raw image data and segmentation masks in image archives linked from BioStudies, but when you need fast discovery plus consistent annotations, starting with HuBMAP or an IMC‑heavy study indexed there streamlines analysis.

Pathology WSIs and IHC: population‑scale images for machine learning and QC

For whole‑slide histopathology, TCGA data served through the NCI Genomic Data Commons (GDC) remains the standard. You can browse diagnostic WSIs in a web viewer, fetch them via the API, and pair slides with matched molecular data for multi‑omics modeling. Some files are controlled access, but the WSI collections for many projects are openly downloadable, making TCGA/GDC ideal for building stain‑agnostic preprocessing or tile‑level models that downstream integrate with spatial assays.

When you need high‑quality IHC images and protein‑level context, the Human Protein Atlas (HPA) Pathology Atlas offers millions of tissue microarray images with gene‑centric pages, prognostic associations, and consistent staining metadata. It’s a reliable source for benchmarking nuclear and cytoplasmic segmentation or validating cell‑type markers before you dive into multiplexed panels.

Quick‑start downloads: tiny code examples you can reuse

Once you’ve picked a repository, the fastest wins come from simple, robust scripts. Here are two tiny snippets you can adapt.

Example 1: list a few TCGA BRCA diagnostic WSIs through the GDC API in Python.

import requests, json
filters = {
  "op": "and",
  "content": [
    {"op": "in", "content": {"field": "cases.project.project_id", "value": ["TCGA-BRCA"]}},
    {"op": "in", "content": {"field": "files.experimental_strategy", "value": ["Diagnostic Slide"]}}
  ]
}
params = {"filters": json.dumps(filters),
          "fields": "file_id,file_name,cases.case_id",
          "size": 5}
r = requests.get("https://api.gdc.cancer.gov/files", params=params).json()
print([hit["file_name"] for hit in r["data"]["hits"]])

Example 2: load a 10x Visium‑formatted dataset you’ve downloaded from a portal into Scanpy.

import scanpy as sc
adata = sc.read_visium("path/to/visium_folder")
adata.var_names_make_unique()
print(adata.obs[["in_tissue","array_row","array_col"]].head())

As you scale, remember that spatial files are big. Use cloud‑friendly formats (OME‑TIFF, Zarr) when available, and keep metadata—antibody clones, panel versions, segmentation parameters—next to the data. That context pays off when you compare across tissues, donors, or assays later.

A quick story from the lab bench

A colleague building a tumor–immune “niche” classifier started by pressure‑testing their nuclei segmentation on HPA IHC images. With that baseline set, they grabbed a handful of open TCGA WSIs to train a stain‑robust model. Finally, they layered in spatial omics: a HuBMAP CODEX dataset to test multi‑marker phenotyping and a METASPACE brain slice to prototype lipid‑based microenvironment features. By moving from simple IHC to multiplexed and metabolomic maps, each stage added signal without derailing the pipeline.

Summary / Takeaways

Free spatial omics data is easier to find than ever, but the best results come from picking the right portal for the question. Use HuBMAP when you need curated, multi‑modal spatial datasets with consistent processing. Reach for SODB when you want a broad, technology‑spanning catalog in unified formats. Tap METASPACE to work with ion images and spatial metabolomics at scale. Rely on GDC for pathology‑grade WSIs with matched genomics, and HPA for clean IHC images and protein‑level context. With a couple of lightweight scripts and careful attention to metadata, you can move from download to insight in an afternoon.