From EHR to Insight: Harmonizing Real‑World Data with FHIR and OMOP for Clinical Trial Enrichment

Introduction

Electronic health records (EHRs) are overflowing with clues about how treatments work outside the pristine walls of randomized trials. Turning those messy, real‑world data (RWD) into trial‑ready evidence can supercharge clinical research—especially for clinical trial enrichment, where you identify participants more likely to benefit or to complete follow‑up. The catch? EHR data are fragmented, inconsistent, and hard to reuse. This post shows how harmonizing data with FHIR and OMOP creates a dependable path from EHR to insight.

Why EHR‑derived RWD matters for enrichment

Clinical trial enrichment means using data to tighten inclusion criteria, balance risk, and optimize recruitment. With EHR‑derived RWD, you can pre‑screen eligible patients across sites, learn which phenotypes predict adherence or response, and design pragmatic endpoints that reflect routine care. It’s not just about speed; it improves external validity by reflecting actual clinical practice.

Key terms worth knowing:
– RWD: data from routine care (EHRs, claims, registries). It underpins real‑world evidence (RWE), which regulators increasingly consider in decision‑making.
– Enrichment: strategies that increase the chance a study meets its objectives, for example by selecting patients with a biomarker or a disease severity band seen in prior data.

The challenge is consistency. One hospital codes labs differently; another stores medications in free text. Without a common structure and vocabulary, downstream analytics and machine learning models won’t travel well. That’s where standards step in.

FHIR and OMOP: the common language

Two standards do the heavy lifting in harmonization:

FHIR (Fast Healthcare Interoperability Resources) is an API‑first standard for exchanging clinical data. FHIR “resources” like Patient, Condition, Observation, MedicationRequest provide a predictable shape to EHR data and encourage canonical terminologies (SNOMED CT, LOINC, RxNorm). In data engineering terms, FHIR solves interoperability at the data access and schema level.
OMOP (Observational Medical Outcomes Partnership) is a common data model (CDM) from OHDSI for analytics at scale. OMOP’s relational tables (condition_occurrence, drug_exposure, measurement, person) use standardized concept_ids, enabling consistent queries, cohort definitions, and phenotyping. In short, OMOP is the analysis‑ready destination.

Together, FHIR helps you extract and normalize; OMOP helps you analyze and share methods reproducibly. For clinical trial enrichment, this pairing means you can define computable eligibility, pre‑run feasibility counts across sites, and reuse cohort logic with minimal refactoring.

A pragmatic workflow from EHR to trial‑ready dataset

Think of the pipeline as four steps: access, map, assure, and qualify.

Access via FHIR APIs
Most modern EHRs expose FHIR endpoints. Pull Patients, Conditions, Observations, and Medications incrementally using lastUpdated timestamps. Lean on terminology services to translate local codes to standard vocabularies early.

Map FHIR to OMOP
The extract‑transform‑load (ETL) stage aligns FHIR resources to OMOP tables and concept_ids. For example, a Condition becomes condition_occurrence; an Observation becomes measurement if it’s quantitative or observation if not. Concept mapping is the heart of fidelity. Start with high‑value domains (conditions, drugs, labs) that drive inclusion criteria and outcomes.

Quality checks that matter
Before analysis, build a “data fitness” checklist: required fields present, date plausibility, unit normalization, concept coverage, and duplicate suppression. Simple metrics—proportion of mapped codes, missing value rates by domain—often catch issues that derail cohort selection later.

Qualify cohorts, not just patients
Define eligibility in code and in English. Combine clinical logic (diagnoses, labs, drug exposure windows) with operational filters (encounter types, follow‑up availability). Reuse the same cohort definition for feasibility, screening, and analytic subsets to maintain traceability.

Two tiny examples: mapping and cohort selection

Example 1: FHIR-to-OMOP mapping sketch in Python. Imagine you’ve pulled FHIR Conditions and want to populate OMOP’s condition_occurrence.

# toy example; assumes you have a concept_map dict { (system, code): concept_id }
def fhir_condition_to_omop(cond, person_id):
    coding = cond["code"]["coding"][0]
    concept_id = concept_map.get((coding["system"], coding["code"]), 0)  # 0 = unmapped
    return {
        "person_id": person_id,
        "condition_concept_id": concept_id,
        "condition_start_date": cond.get("onsetDateTime")[:10],
        "condition_source_value": coding["code"],
        "condition_type_concept_id": 32817  # EHR problem list, for example
    }

This snippet spotlights the pivotal step: mapping source codes to standardized concept_ids so downstream queries behave consistently across sites.

Example 2: OMOP SQL to pre‑screen a trial cohort. Suppose you need adults with Type 2 diabetes and A1c ≥ 8% in the past 90 days, not on insulin.

SELECT p.person_id
FROM person p
JOIN condition_occurrence c ON c.person_id = p.person_id
JOIN measurement m ON m.person_id = p.person_id
LEFT JOIN drug_exposure d ON d.person_id = p.person_id
WHERE p.year_of_birth <= EXTRACT(YEAR FROM CURRENT_DATE) - 18
  AND c.condition_concept_id IN (201826)                    -- T2DM SNOMED concept_id (example)
  AND m.measurement_concept_id IN (3004410)                 -- HbA1c LOINC concept_id (example)
  AND m.value_as_number >= 8.0
  AND m.measurement_date >= CURRENT_DATE - INTERVAL '90 day'
  AND NOT EXISTS (
      SELECT 1 FROM drug_exposure d2
      WHERE d2.person_id = p.person_id
        AND d2.drug_concept_id IN (1503297)                 -- insulin concept_id (example)
        AND d2.drug_exposure_start_date >= CURRENT_DATE - INTERVAL '180 day'
  );

With standardized concept_ids, this query can run at multiple hospitals with minimal tweaks, returning comparable feasibility counts for enrichment.

Summary / Takeaways

To make EHR‑derived RWD truly useful for clinical trial enrichment, you need a clear path from access to analysis. FHIR gives you structured, API‑first data acquisition; OMOP gives you a common analytics substrate with standardized concepts. Add disciplined ETL, lightweight but focused data quality checks, and code‑based cohort definitions, and you’ll move from sporadic data pulls to a reproducible pipeline that shortens timelines and improves generalizability. Start small—map the domains that matter for your eligibility criteria—and invest early in concept mapping and unit normalization. The payoff is a reusable, portable approach that turns EHR noise into trial‑ready insight.