Skip to main content
License: RFdiffusion3 is open source and free for academic and commercial use under a BSD-3-Clause license. Please refer to the license for full terms.

Proto is not affiliated with Institute for Protein Design. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


RosettaCommons/foundry
RosettaCommons/foundry
Central repository for biomolecular foundation models with shared trainers and pipeline components
757 stars
View repo
De novo Design of All-atom Biomolecular Interactions with RFdiffusion3
Jasper Butcher, Rohith Krishna, … David Baker
bioRxiv (2025)
Read preprint
@article{butcher2025rfdiffusion3,
  title={De novo Design of All-atom Biomolecular Interactions with RFdiffusion3},
  author={Butcher, Jasper and Krishna, Rohith and Mitra, Raktim and Brent, Rafael Isaac and Li, Yanjing and Corley, Nathaniel and Kim, Paul T and Funk, Jonathan and Mathis, Simon Valentin and Salike, Saman and Muraishi, Aiko and Eisenach, Helen and Thompson, Tuscan Rock and Chen, Jie and Politanska, Yuliya and Sehgal, Enisha and Coventry, Brian and Zhang, Odin and Qiang, Bo and Didi, Kieran and Kazman, Maxwell and DiMaio, Frank and Baker, David},
  journal={bioRxiv},
  year={2025},
  doi={10.1101/2025.09.18.676967},
  url={https://www.biorxiv.org/content/10.1101/2025.09.18.676967},
  publisher={Cold Spring Harbor Laboratory}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/structure_design/rfdiffusion3
View source
Open Notebook
Open notebook
Coming soon!
Run this tool directly in Proto with no setup required.
FunctionDescription
run_rfdiffusion3()De novo protein structure design using RFdiffusion3 (GPU) Docs Source

Background

RFdiffusion3 (Butcher et al., 2025) is a denoising-diffusion generative model trained to design protein structures at all-atom resolution under arbitrary spatial constraints. Starting from random noise, it iteratively denoises atomic coordinates toward a plausible protein while jointly refining the underlying amino-acid sequence. Training combines structures from the Protein Data Bank with multi-task conditioning, in which each training example is presented with a randomly generated design problem that constrains a sampled combination of motif tokens, atom subsets, residue identities, or sequence-index labels. The model is therefore trained jointly on binder design, motif scaffolding, inverse folding, sidechain placement, and prediction-style tasks under a single objective, and a single trained checkpoint supports every conditioning style at inference. RFdiffusion3 is the successor to RFdiffusion (Watson et al., 2023), which diffused only over the backbone N, Cα, C, and O atoms and required ProteinMPNN as a separate sequence-design step. By denoising every atom and co-designing the sequence, RFdiffusion3 incorporates small-molecule pockets, hydrogen-bond donor and acceptor patterning, and explicit nucleotide and ligand context directly into the generative process. It is the structure-design model within the Foundry framework, which distributes it alongside RoseTTAFold3 for structure prediction and ProteinMPNN for inverse folding.

Tools

RFdiffusion3 Structure Design (rfdiffusion3-design)

Generates new protein structures and sequences subject to specified constraints. Each design task is described by an RFdiffusion3DesignSpec containing an optional input structure, a contig string, and per-residue selectors that fix atomic coordinates, constrain sequence positions, or designate hotspot residues. The diffusion sampler returns N candidate structures per specification, each accompanied by its designed amino-acid sequence and the sampled contig.

API Reference

Source
design_specs
List[RFdiffusion3DesignSpec]
List of design specifications. Each spec represents an independent design task with its own constraints. Multiple specs will be processed in a single run.
raw_json
string
Raw JSON string for advanced users who need full RFdiffusion3 flexibility. If provided, design_specs will be ignored and this JSON will be passed directly to RFdiffusion3.
Source
n_batches
integer
default:"1"
Independent batches per spec (total designs = n_batches * diffusion_batch_size * num_specs).
diffusion_batch_size
integer
default:"8"
Designs sampled in parallel per batch.
num_timesteps
integer
default:"200"
Diffusion timesteps; more = slower, generally higher quality.
step_scale
number
default:"1.5"
Step size scale; higher = less diverse, more designable.
sampler_kind
enum
default:"default"
Sampler kind; 'symmetry' for homo-oligomer design (paired with DesignSpec.symmetry).Available options: default, symmetry
center_option
enum
default:"all"
Coordinate-frame centering — all (whole structure), motif (input motif), diffuse (diffused region only).Available options: all, motif, diffuse
use_classifier_free_guidance
boolean
default:"False"
Enable CFG sampling.
cfg_scale
number
default:"1.5"
CFG scale factor (typical 1.0-3.0); no-op when CFG is off.
gamma_0
number
default:"0.6"
Sampler stochasticity; lower = more designable, less diverse; 0.0 = deterministic ODE. Must be > 0.5 when sampler_kind="symmetry".
sampler_tuning
RFdiffusion3SamplerTuning
Finer inference_sampler settings (noise schedule, motif noise); see that class for the fields.
low_memory_mode
boolean
default:"False"
Memory-efficient tokenization (slower); enable only if GPU RAM is tight.
dump_trajectories
boolean
default:"False"
Save diffusion trajectory frames (debugging).
align_trajectory_structures
boolean
default:"False"
Align trajectory frames across timesteps (only when dump_trajectories=True).
prevalidate_inputs
boolean
default:"False"
Fail-fast input JSON validation.
ckpt_path
string
default:"rfd3"
Checkpoint path or alias ("rfd3" = production preset).
input_dir
string
Local-execution input directory; None uses a tempdir.
output_dir
string
Local-execution output directory; None uses a tempdir.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
"cuda" or "cpu".
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
designed_structures
List[RFdiffusion3Designs]
One bundle per input spec. Total design count is len(design_specs) * n_batches * diffusion_batch_size.

Applications

This tool is appropriate for any task in which the output is a novel protein backbone together with a designed amino-acid sequence subject to spatial constraints. Representative applications include protein-binder design against a target hotspot, motif scaffolding around catalytic or epitope residues, enzyme design around a small-molecule active site, symmetric homo-oligomer assembly, and partial-diffusion refinement of an existing backbone. RFdiffusion3 additionally supports nucleic-acid and small-molecule binder design within the same model and the same inference run, capabilities that the original RFdiffusion addressed only through separate model variants or task-specific extensions.

Usage Tips

  • One RFdiffusion3DesignSpec describes a single design task. Multiple specifications passed in one call are processed independently. Each specification produces n_batches * diffusion_batch_size designs, and the resulting design sets are returned in input order. length is the only field that may be specified without an input_structure. The contig, unindex, and select_* fields are resolved against the input atomic coordinates and are rejected by validation when no input structure is provided.
  • The contig string specifies the design topology. Comma-separated segments select chain-indexed residues from the input (A40-60), insert designed regions of variable length (70-80), or introduce a chain break (/0). The full grammar is documented in the RFdiffusion3 input specification.
  • num_timesteps and step_scale are the principal parameters controlling design quality and diversity. Increasing num_timesteps (default 200) extends the number of denoising steps and generally improves designability at a linear cost in runtime. step_scale (default 1.5) scales the per-step denoising magnitude. Lower values yield more diverse but less designable outputs. Higher values produce structures closer to the training distribution.
  • sampler_kind="symmetry" is required for homo-oligomer design and must be paired with a symmetry group on the specification. Setting symmetry="C3" on the specification is converted internally to the {"id": "C3"} SymmetryConfig required by the RFdiffusion3 sampler. The symmetric sampler also imposes a gamma_0 > 0.5 constraint, which is enforced during configuration validation. Under symmetry, length is per-protomer (length="100" + symmetry="C3" → 300-residue trimer).
  • Classifier-free guidance is disabled by default and enabled by setting use_classifier_free_guidance=True. When disabled, cfg_scale, cfg_features, and cfg_t_max have no effect. When enabled, increasing cfg_scale (typical range 1.0 to 3.0) more strongly weights the conditioning features (active-site hydrogen-bond donors and acceptors, per-atom relative accessible surface area) during sampling.
  • partial_t enables partial diffusion, in which the sampler refines an input structure rather than starting from random noise. The input is perturbed with Gaussian noise at the specified amplitude in angstroms and then denoised back, supporting local redesign and topology-preserving diversification of an existing backbone. The RFdiffusion3 InputSpecification reference lists 5.0 to 15.0 angstroms as a recommended range, but the partial-diffusion documentation notes that partial_t is nonlinear and recommends beginning near 2 angstroms and increasing gradually.
  • is_non_loopy is the only secondary-structure conditioning parameter available. Setting it on RFdiffusion3DesignSpec to True biases the sampler toward fewer loops and substantially more helical content, with reduced sheet content. Setting it to False biases toward more loops and correspondingly fewer helices. Leaving it None (the default) applies no topology preference. The parameter is a single boolean flag and offers no per-residue or fractional control.
  • Finer sampler tuning is grouped under the sampler_tuning field, a typed RFdiffusion3SamplerTuning whose settings map to the upstream inference_sampler block: the noise schedule (noise_scale, p, s_trans), the stochasticity threshold gamma_min (paired with gamma_0), and the motif-noise settings allow_realignment / s_jitter_origin (effective only for motif-conditioned designs). Each defaults to None and is forwarded to the sampler only when set, otherwise inheriting the checkpoint’s upstream default. Pass them as a dict, e.g. RFdiffusion3Config(sampler_tuning={"noise_scale": 1.003}); any other valid inference_sampler setting (e.g. cfg_t_max) may be included and is forwarded verbatim.

Toolkit Notes

These apply to every RFdiffusion3 tool in this toolkit (rfdiffusion3-design).
  • A GPU is required for any practical use. CPU execution is supported via device="cpu" but is prohibitively slow for typical workloads. The default execution environment uses CUDA.
  • GPU memory consumption scales with diffusion_batch_size and the length of the designs. When GPU memory is exhausted, first reduce diffusion_batch_size. If memory exhaustion persists, set low_memory_mode=True to enable memory-efficient tokenization at the cost of throughput.
  • Designs are returned as a structure together with a sequence and carry no built-in confidence score. A standard validation procedure is to score the designed sequence against the designed structure with ProteinMPNN, then predict the structure of the designed sequence with ESMFold or Boltz2 and compare the predicted backbone to the designed backbone.
  • Each design exposes structure (coordinates) and a predictor-ready complex (Complex). complex carries the per-chain sequences and entity types; feed design.complex straight to a structure predictor. A chain’s sequence is design.complex.chains[i].sequence.
  • Output chains are returned with positional IDs (A, B, …) in emission order. Symmetric designs are emitted with transformation-suffixed chain IDs (e.g. A1, A2, A3 for a C3 trimer); these are normalized to A, B, C so a chain’s ID is its position. Identify a chain by position or sequence rather than assuming a fixed semantic role.
  • seed does not guarantee bit-exact reproducibility. The diffusion sampler relies on non-deterministic CUDA operations, so repeated runs with the same seed will produce different designs. Generate and rank a batch of designs rather than relying on a single seeded sample.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.