Protein Design After Structure Prediction Using Generative Models

Jonathan Alles

EVOBYTE Digital Biology

Introduction

The last decade taught us to ask, “What is the fold?” AlphaFold and its peers answered decisively, turning protein structure prediction into a routine step rather than a moonshot. But prediction alone doesn’t heal a tumor or capture CO2. The new question is, “What should we build?” That shift—from passive analysis to active creation—is why generative models are remaking the bioinformatics workflow. Instead of treating structure prediction as the finish line, teams now use it as a filter inside an iterative design loop powered by diffusion models, protein language models, and sequence–structure models. Together, these tools let us specify function, sample candidates, and triage designs at a scale that was unthinkable even a few years ago.

Beyond AlphaFold: From accurate maps to actionable blueprints

AlphaFold2 proved that sequence alone could yield near–atomic accuracy for many monomers, and in May 2024 AlphaFold 3 extended prediction to complexes with nucleic acids, ligands, and glycans using a diffusion-style architecture. Those milestones changed expectations about what we can model, yet they also exposed a ceiling. Knowing the fold is vital, but discovery happens when we can steer toward properties—binding, stability, activity—under real-world constraints like developability and immunogenicity. In practice, labs increasingly treat structure predictors as downstream validators embedded in generative pipelines: they screen thousands of candidates produced upstream, rather than being the core engine of creativity. This is the mindset shift that turns bioinformatics from map-reading into engineering.

Diffusion models for 3D backbones and interfaces

Enter diffusion models, which learn to generate 3D protein geometry by denoising coordinates or internal frames. RoseTTAFold Diffusion (RFdiffusion) was a watershed: it can hallucinate entirely new backbones, scaffold catalytic motifs, and “inpaint” around functional sites or symmetry axes. Because the model conditions on constraints—say, an epitope surface or a C3 symmetry—it becomes a programmable backbone generator rather than a blind sampler. Designs are not copies of the PDB; they generalize beyond known folds and can be nudged by custom potentials for tasks like binder placement or pocket shaping. In published tests, RFdiffusion produced binders to challenging targets and complex symmetric assemblies, then paired with sequence design to realize atomistic models for experimental screening.

Chroma, from Generate:Biomedicines, pushes in the same direction with a unified, conditional model over sequence and structure. It supports large complexes and user-steerable sampling for shape, symmetry, and function—crucial when you need an architecture that will tolerate a specific insertion, interface, or oligomeric state. The key is that both RFdiffusion and Chroma flip the workflow: instead of predicting the shape of a given sequence, they propose shapes that might achieve a desired function, then hand those shapes off for sequence realization and validation.

To ground this in a story, imagine you want a trimer that positions three binding domains to match the geometry of a viral spike. With RFdiffusion you seed a C3 scaffold, condition on a target interface, and sample backbones that respect the symmetry and surface constraints. You’re no longer hoping nature provides the right shape—you’re sampling it directly under guidance.

Protein language models as priors and fast predictors

If diffusion models supply shape, protein language models supply priors over sequences and rapid structure hints. Models like ESM‑2 learn from hundreds of millions of sequences, internalizing evolutionary constraints that correlate with structure and function. ESMFold then shows how a language model can enable single-sequence structure prediction at speed, providing a fast fold check when you’re sifting through thousands of candidates. In large-scale benchmarks, ESMFold achieved near–atomic predictions for many proteins directly from sequence, dramatically accelerating “design → fold → triage” loops. Even when not used as full generators, language models are potent sequence scorers and mutational effect priors that help you keep designs in “natural-like” territory.

Language models also generate sequences outright. Early systems such as ProGen demonstrated controllable sequence generation conditioned on tags like function or localization, and recent large models continue to explore de novo sequence space while maintaining biochemical plausibility. In modern workflows, unconditional LMs help seed diversity; conditional LMs focus sampling toward motifs or families; and masked LMs support targeted edits to rescue stability or reduce liability sites. The result is a flexible palette for sequence ideation that complements geometry-first approaches.

Sequence–structure models: making designs fold (inverse folding in the loop)

Once you have a candidate backbone from diffusion—or a target conformation you want to realize—the inverse folding step transforms structure into sequence. Sequence–structure models like ProteinMPNN treat residues as a graph on the backbone and learn to propose amino acids that will fold back to the same shape. ProteinMPNN’s reliability and speed made it a staple in design stacks, commonly used after RFdiffusion sampling to “paint” sequences onto generated scaffolds. In published work, this pairing achieved high success rates across tasks from de novo scaffolds to high-affinity binders, providing a practical route from geometry to sequences that survive downstream triage.

Other inverse folding models, such as ESM‑IF1, lean into geometric vector perceptrons and large-scale training on solved and predicted structures. They double as scoring functions: for a given backbone, you can sample sequences, but you can also score existing candidates to rank order a library. This dual role makes inverse folding models natural “gatekeepers” early in the pipeline, long before you commit GPU days to exhaustive folding or CPU-months to docking.

Here’s a minimal sketch of how these pieces fit when you want a binder to a known surface. It’s deliberately short, but it captures the design loop:

# pseudo-code: generative protein design loop

# 1) propose backbones with constraints (e.g., target epitope, symmetry)
backbones = rfdiffusion.sample(n=200, condition={"target_surface": epitope_mesh,
                                                "symmetry": "C3",
                                                "inpaint": motif_coords})

# 2) realize sequences onto each backbone
seq_lib = [protein_mpnn.design(bb, temperature=0.1, nseq=5) for bb in backbones]

# 3) fast fold check and interface triage
folded = [esmfold.predict(seq) for seq in flatten(seq_lib)]
ranked  = sort_by(interface_score(folded, target), stability_metric(folded))

# 4) high-confidence subset goes to AF3 + physics/docking + wet lab
final   = [d for d in ranked if d.plddt > 80 and d.interface_q > 0.6][:48]

In practice, teams layer in developability filters, immunogenicity heuristics, and targeted mutations proposed by a language model fine-tuned on the protein class of interest. What matters is the posture: you’re specifying constraints and objectives, sampling aggressively, and using predictors as filters—not as the source of novelty.

A modern, generative workflow: from spec to sequence to screening

Let’s zoom out and translate this into a day‑to‑day bioinformatics workflow.

You begin with a design spec: a functional goal, context, and constraints. Maybe you need a symmetric nanoparticle that displays an antigen at fixed separations, or an enzyme scaffold around a catalytic triad. Diffusion models handle the geometry under these constraints, producing backbones that respect symmetry, interface normals, or pocket sizes. You then run inverse folding to realize sequences on those shapes, optionally using a language model to bias toward family-consistent residues or to diversify outside local minima.

Next comes fast triage. ESMFold provides low-latency fold checks; inverse folding likelihoods flag sequences likely to refold correctly; simple heuristics catch red flags like exposed hydrophobics or cysteine tangles. Top candidates graduate to heavier checks: AlphaFold 3 for complexed structures and ligand placements, physics-based relaxation, docking for interface energetics, and molecular dynamics for stability under perturbation. Crucially, these predictors are not oracles; they’re prioritizers. They reduce an ocean of possibilities to a few dozen designs that justify synthesis.

Finally, experimental feedback closes the loop. Assays for expression, stability, binding, and activity produce labels that fine‑tune your models or reweight your sampling distributions. Over time, your private data—assay readouts, failure modes, and negative results—matter as much as public PDB and UniProt corpora. This is where bioinformatics becomes product engineering: design budgets, data flywheels, and model retraining schedules are now part of the craft.

Here’s a tiny helper that reflects the “predictor-as-filter” mindset:

def triage(design, target, plddt_cut=80, iface_cut=0.6):
    fold = esmfold.predict(design.sequence)
    if fold.plddt < plddt_cut: 
        return None
    if interface_score(fold, target) < iface_cut:
        return None
    if has_bad_motifs(design.sequence):
        return None
    return fold

Used at scale, small gates like this turn days of GPU compute into minutes of filtering so the bench receives only the most plausible constructs.

Summary / Takeaways

Generative models have shifted protein informatics from forecasting to fabrication. Diffusion models give us programmable control over 3D geometry; protein language models contribute evolutionary priors and rapid structure hints; and inverse folding models translate shapes into sequences that are likely to refold and function. Structure predictors, including AlphaFold 3 as of May 2024, now play a vital but supporting role: they are the sieves through which generative ideas must pass, not the fountains from which ideas spring.

If you’re building a new pipeline, start by writing your “design spec” in the language of constraints and objectives, not just sequences and folds. Then wire a backbone diffusion step to a sequence realization step, and let fast predictors prune aggressively before you invest in heavy computation or wet‑lab cycles. The prize is speed: from months of bespoke modeling to days of programmable exploration—followed by a tighter experimental loop that gets smarter with every run.