Proto is not affiliated with Institute for Protein Design. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
RFdiffusion3 (Butcher et al., 2025) is a denoising-diffusion generative model trained to design protein structures at all-atom resolution under arbitrary spatial constraints. Starting from random noise, it iteratively denoises atomic coordinates toward a plausible protein while jointly refining the underlying amino-acid sequence. Training combines structures from the Protein Data Bank with multi-task conditioning, in which each training example is presented with a randomly generated design problem that constrains a sampled combination of motif tokens, atom subsets, residue identities, or sequence-index labels. The model is therefore trained jointly on binder design, motif scaffolding, inverse folding, sidechain placement, and prediction-style tasks under a single objective, and a single trained checkpoint supports every conditioning style at inference. RFdiffusion3 is the successor to RFdiffusion (Watson et al., 2023), which diffused only over the backbone N, Cα, C, and O atoms and required ProteinMPNN as a separate sequence-design step. By denoising every atom and co-designing the sequence, RFdiffusion3 incorporates small-molecule pockets, hydrogen-bond donor and acceptor patterning, and explicit nucleotide and ligand context directly into the generative process. It is the structure-design model within the Foundry framework, which distributes it alongside RoseTTAFold3 for structure prediction and ProteinMPNN for inverse folding.Tools
RFdiffusion3 Structure Design (rfdiffusion3-design)
Generates new protein structures and sequences subject to specified constraints. Each design task is described by an RFdiffusion3DesignSpec containing an optional input structure, a contig string, and per-residue selectors that fix atomic coordinates, constrain sequence positions, or designate hotspot residues. The diffusion sampler returns N candidate structures per specification, each accompanied by its designed amino-acid sequence and the sampled contig.API Reference
Input: RFdiffusion3Input
Input: RFdiffusion3Input
design_specs will be ignored and this JSON will be passed directly to RFdiffusion3.Config: RFdiffusion3Config
Config: RFdiffusion3Config
n_batches * diffusion_batch_size * num_specs).'symmetry' for homo-oligomer design (paired with DesignSpec.symmetry).Available options: default, symmetryall (whole structure), motif (input motif), diffuse (diffused region only).Available options: all, motif, diffuse0.0 = deterministic ODE. Must be > 0.5 when sampler_kind="symmetry".inference_sampler settings (noise schedule, motif noise); see that class for the fields.dump_trajectories=True)."rfd3" = production preset).None uses a tempdir.None uses a tempdir.True is coerced to 1 and False to 0."cuda" or "cpu".None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: RFdiffusion3Output
Output: RFdiffusion3Output
len(design_specs) * n_batches * diffusion_batch_size.Applications
This tool is appropriate for any task in which the output is a novel protein backbone together with a designed amino-acid sequence subject to spatial constraints. Representative applications include protein-binder design against a target hotspot, motif scaffolding around catalytic or epitope residues, enzyme design around a small-molecule active site, symmetric homo-oligomer assembly, and partial-diffusion refinement of an existing backbone. RFdiffusion3 additionally supports nucleic-acid and small-molecule binder design within the same model and the same inference run, capabilities that the original RFdiffusion addressed only through separate model variants or task-specific extensions.Usage Tips
- One
RFdiffusion3DesignSpecdescribes a single design task. Multiple specifications passed in one call are processed independently. Each specification producesn_batches * diffusion_batch_sizedesigns, and the resulting design sets are returned in input order.lengthis the only field that may be specified without aninput_structure. Thecontig,unindex, andselect_*fields are resolved against the input atomic coordinates and are rejected by validation when no input structure is provided. - The
contigstring specifies the design topology. Comma-separated segments select chain-indexed residues from the input (A40-60), insert designed regions of variable length (70-80), or introduce a chain break (/0). The full grammar is documented in the RFdiffusion3 input specification. num_timestepsandstep_scaleare the principal parameters controlling design quality and diversity. Increasingnum_timesteps(default200) extends the number of denoising steps and generally improves designability at a linear cost in runtime.step_scale(default1.5) scales the per-step denoising magnitude. Lower values yield more diverse but less designable outputs. Higher values produce structures closer to the training distribution.sampler_kind="symmetry"is required for homo-oligomer design and must be paired with asymmetrygroup on the specification. Settingsymmetry="C3"on the specification is converted internally to the{"id": "C3"}SymmetryConfigrequired by the RFdiffusion3 sampler. The symmetric sampler also imposes agamma_0 > 0.5constraint, which is enforced during configuration validation. Under symmetry,lengthis per-protomer (length="100"+symmetry="C3"→ 300-residue trimer).- Classifier-free guidance is disabled by default and enabled by setting
use_classifier_free_guidance=True. When disabled,cfg_scale,cfg_features, andcfg_t_maxhave no effect. When enabled, increasingcfg_scale(typical range 1.0 to 3.0) more strongly weights the conditioning features (active-site hydrogen-bond donors and acceptors, per-atom relative accessible surface area) during sampling. partial_tenables partial diffusion, in which the sampler refines an input structure rather than starting from random noise. The input is perturbed with Gaussian noise at the specified amplitude in angstroms and then denoised back, supporting local redesign and topology-preserving diversification of an existing backbone. The RFdiffusion3InputSpecificationreference lists 5.0 to 15.0 angstroms as a recommended range, but the partial-diffusion documentation notes thatpartial_tis nonlinear and recommends beginning near 2 angstroms and increasing gradually.is_non_loopyis the only secondary-structure conditioning parameter available. Setting it onRFdiffusion3DesignSpectoTruebiases the sampler toward fewer loops and substantially more helical content, with reduced sheet content. Setting it toFalsebiases toward more loops and correspondingly fewer helices. Leaving itNone(the default) applies no topology preference. The parameter is a single boolean flag and offers no per-residue or fractional control.- Finer sampler tuning is grouped under the
sampler_tuningfield, a typedRFdiffusion3SamplerTuningwhose settings map to the upstreaminference_samplerblock: the noise schedule (noise_scale,p,s_trans), the stochasticity thresholdgamma_min(paired withgamma_0), and the motif-noise settingsallow_realignment/s_jitter_origin(effective only for motif-conditioned designs). Each defaults toNoneand is forwarded to the sampler only when set, otherwise inheriting the checkpoint’s upstream default. Pass them as a dict, e.g.RFdiffusion3Config(sampler_tuning={"noise_scale": 1.003}); any other validinference_samplersetting (e.g.cfg_t_max) may be included and is forwarded verbatim.
Toolkit Notes
These apply to every RFdiffusion3 tool in this toolkit (rfdiffusion3-design).
- A GPU is required for any practical use. CPU execution is supported via
device="cpu"but is prohibitively slow for typical workloads. The default execution environment uses CUDA. - GPU memory consumption scales with
diffusion_batch_sizeand the length of the designs. When GPU memory is exhausted, first reducediffusion_batch_size. If memory exhaustion persists, setlow_memory_mode=Trueto enable memory-efficient tokenization at the cost of throughput. - Designs are returned as a structure together with a sequence and carry no built-in confidence score. A standard validation procedure is to score the designed sequence against the designed structure with ProteinMPNN, then predict the structure of the designed sequence with ESMFold or Boltz2 and compare the predicted backbone to the designed backbone.
- Each design exposes
structure(coordinates) and a predictor-readycomplex(Complex).complexcarries the per-chain sequences and entity types; feeddesign.complexstraight to a structure predictor. A chain’s sequence isdesign.complex.chains[i].sequence. - Output chains are returned with positional IDs (
A,B, …) in emission order. Symmetric designs are emitted with transformation-suffixed chain IDs (e.g.A1, A2, A3for a C3 trimer); these are normalized toA, B, Cso a chain’s ID is its position. Identify a chain by position or sequence rather than assuming a fixed semantic role. seeddoes not guarantee bit-exact reproducibility. The diffusion sampler relies on non-deterministic CUDA operations, so repeated runs with the same seed will produce different designs. Generate and rank a batch of designs rather than relying on a single seeded sample.

Institute for Protein Design