OMIM Database: An Introduction

Jonathan Alles

EVOBYTE Digital Biology

By EVOBYTE Your partner in bioinformatics

Introduction

If you work in bioinformatics, you’ve probably bumped into the acronym OMIM. It pops up in variant curation notes, phenotype-driven analyses, and gene panels. Yet many teams only scratch the surface of what the Online Mendelian Inheritance in Man database can do. In this post, we’ll unpack what OMIM actually is, what you can find inside, why those structured definitions of genetic disorders matter for computational workflows, and how to query OMIM programmatically through its API with a short Python example. OMIM remains one of the most curated sources connecting human genes to phenotypes, and it’s surprisingly approachable once you learn a few key terms.

What is OMIM? A quick definition with the right keywords

OMIM, short for Online Mendelian Inheritance in Man, is a continuously updated, expert-curated catalog of human genes, allelic variants, and genetic phenotypes. It emphasizes gene–phenotype relationships and traces back to the seminal work of Victor McKusick. Each record carries a stable identifier called a MIM number, which makes cross-referencing straightforward across tools and papers. You’ll often see entries broken into sections like Clinical Synopsis, Gene Map, Allelic Variants (AVs), and References, all written to be both human-readable and machine-parsable. OMIM is hosted at omim.org and mirrored at NCBI for search convenience, but the authoritative site and API live at OMIM.

What you can find in the OMIM database

For everyday analysis, a few sections do most of the heavy lifting. Clinical Synopsis summarizes key features of a disorder using standardized terms mapped to ontologies such as HPO (Human Phenotype Ontology) and UMLS, which is invaluable for phenotype matching. The Gene Map ties genes to loci, cytogenetic bands, and known associated phenotypes, while the Morbid Map offers a disorder-centric subset useful for quick scanning. The Allelic Variants section highlights notable pathogenic or historically important variants with brief evidence notes and references. Thanks to this structure, OMIM functions as both a narrative review and a structured data source that your code can traverse predictably.

Why OMIM’s definitions of genetic disorders matter for bioinformatics

Modern pipelines do more than list variants; they infer biological plausibility. When you annotate a VCF and then rank candidates by phenotype fit, you need clear, curated definitions of disorders and their features. Because Clinical Synopsis terms in OMIM are mapped to ontologies, you can convert a patient’s HPO terms into a similarity score against disorder profiles and gene associations. Tools like phenotype-driven prioritization frameworks and knowledge graphs lean on this structure: OMIM supplies curated gene–disease edges, phenotype tags, and canonical names that keep your downstream reasoning stable even as nomenclature evolves. In short, OMIM provides the semantic backbone that helps a scoring model distinguish “incidental variant” from “causal candidate.” Exomiser, for example, integrates OMIM identifiers alongside HPO terms to prioritize genes consistent with a patient’s phenotype.

Accessing the OMIM API: keys, endpoints, and a quick Python example

You can query OMIM programmatically via a REST API to fetch entries, search by text, or pull the gene and morbid maps. Registration is straightforward: use the Downloads link on omim.org to request an API key, which arrives by email with usage instructions. The API responds in formats like JSON or XML and supports parameters such as include=clinicalSynopsis or include=geneMap to control payload size. There’s even a browser-based helper at api.omim.org/api/html that lets you prototype requests before you commit code. Keep in mind that while browsing is free, redistributing or embedding substantial OMIM content may require a license, especially for commercial use. Always review the license terms before building OMIM data into products.

Here’s a minimal Python example that fetches one disorder by MIM number and pulls the clinical synopsis and gene map. Replace YOUR_OMIM_API_KEY with your key.

import os, requests

API_KEY = os.getenv("OMIM_API_KEY", "YOUR_OMIM_API_KEY")
BASE = "https://api.omim.org/api/entry"
params = {
    "mimNumber": "100100",              # Example MIM ID
    "include": "clinicalSynopsis,geneMap",
    "format": "json",
    "apiKey": API_KEY
}

r = requests.get(BASE, params=params, timeout=30)
r.raise_for_status()
entry = r.json()["omim"]["entryList"][0]["entry"]

print(entry["preferredTitle"])
print(entry.get("clinicalSynopsis", {}).get("headAndNeck", "no head/neck summary"))

If you don’t know the MIM number yet, start with a text search. This sample searches for “Marfan syndrome” and returns the first matching MIM ID.

import requests

BASE = "https://api.omim.org/api/entry/search"
params = {
    "search": "Marfan syndrome",
    "retrieve": "geneMap,clinicalSynopsis",
    "format": "json",
    "apiKey": "YOUR_OMIM_API_KEY",
    "start": 0,
    "limit": 1
}

r = requests.get(BASE, params=params, timeout=30)
r.raise_for_status()
hit = r.json()["omim"]["searchResponse"]["entryList"][0]["entry"]
print(hit["mimNumber"], hit["preferredTitle"])

Both patterns are fast enough to slot into annotation or curation tools. Prototype your parameters interactively at the API helper page, then lock them into your pipeline with sensible timeouts and error handling. For batch processing, respect rate limits and cache stable artifacts like mim2gene mappings to reduce calls and keep runs reproducible.

Summary / Takeaways

OMIM is more than a textbook-in-a-database; it’s a structured, curated map of gene–phenotype knowledge that plugs cleanly into computational biology. By leaning on MIM numbers, Clinical Synopsis terms mapped to ontologies, and the Gene/Morbid Maps, you can connect patient features to plausible genes and disorders with far less friction. And with an API key in hand, you can bring OMIM directly into your pipelines—querying just the fields you need and translating narrative curation into features your models can rank. If you’re building or refining a phenotype-aware variant workflow this quarter, try wiring in a small OMIM API step and measure how your top-20 candidate list improves.