Proto is not affiliated with ByteDance. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
Protenix (ByteDance Research, 2025) predicts the joint 3D structure of a biomolecular assembly from the sequences and chemical components it contains. It is a trainable, openly licensed reproduction of the AlphaFold3 architecture: like AlphaFold3, one model folds complexes that mix proteins, DNA, RNA, and small-molecule ligands, and predicts how those components are arranged relative to one another. Each protein chain can be paired with a multiple-sequence alignment (MSA) of evolutionarily related sequences, whose covariation patterns supply the evolutionary signal the model uses to place residues. Architecturally, Protenix follows AlphaFold3 rather than AlphaFold2: it carries a single representation of the input tokens and a pairwise representation over token pairs, refines them through a Pairformer trunk, and generates all-atom coordinates with a diffusion module that starts from noise and iteratively denoises into a structure, in place of AlphaFold2’s structure module. Several structures are sampled per random seed and ranked by a confidence score. Protenix is distributed in several sizes: full-parameter base models for highest accuracy, and lighter mini and tiny variants for faster, lower-memory prediction; themini_esm and mini_ism variants replace the MSA with learned embeddings — from the ESM-2 protein language model or the ISM inverse-structure model, respectively — so they can fold without an alignment. Predicted confidence includes a per-residue predicted local distance difference test (pLDDT) for local reliability, a predicted aligned error (PAE) for the relative placement of any two tokens, a global predicted distance error (gPDE), and predicted template-modeling (pTM) and interface predicted template-modeling (ipTM) scores that summarize overall and interface accuracy.
The reference implementation is open-sourced at bytedance/Protenix, with both the code and the model parameters released under the Apache-2.0 license for academic and commercial use. It was developed by ByteDance’s AI4Science team as a comprehensive reproduction of AlphaFold3, trained on comparable data to reach competitive accuracy across protein, nucleic-acid, and protein-ligand benchmarks.
Learning Resources
- bytedance/Protenix (ByteDance) - the official repository, with a model card for each variant, benchmark results across protein, nucleic-acid, and ligand tasks, and a link to the hosted Protenix web server.
Tools
Protenix Structure Prediction (protenix-prediction)
Predicts the 3D structure of a biomolecular complex. Each input complex can combine protein, DNA, RNA, and ligand chains, with optional post-translational and nucleotide modifications; the assembly is folded by Protenix and returned as a predicted Structure per complex with confidence metrics: average pLDDT, pTM, interface pTM, per-chain and pairwise-chain scores, a global predicted distance error, and predicted aligned error.API Reference
Input: ProtenixInput
Input: ProtenixInput
StructurePredictionInput. Each complex can contain multiple chains of proteins, DNA, RNA, and/or ligands.ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by preprocess() or supplied directly.Config: ProtenixConfig
Config: ProtenixConfig
protenix_base_default_v1.0.0, protenix_base_20250630_v1.0.0, protenix_base_default_v0.5.0, protenix_base_constraint_v0.5.0, protenix_mini_esm_v0.5.0, protenix_mini_ism_v0.5.0, protenix_mini_default_v0.5.0, protenix_tiny_default_v0.5.0, protenix-v2num_diffusion_samples independent structure samples. Multiple seeds increase diversity of the sampled conformations. A single seed is sufficient for most use cases; more seeds may help for challenging docking tasks such as antibody-antigen complexes.None uses the upstream schedule: 200 for base/constraint, 5 for mini/tiny. Default None.None uses the upstream schedule: 10 for base/constraint, 4 for mini/tiny. Default None.StructurePredictionConfig. Default: False."cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".None waits indefinitely. Default: 1200.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.pae (avg_pae always emitted). Default: False.MSAStructurePredictionConfig. Default: True.use_msa=True. Inherited from MSAStructurePredictionConfig. Default: None.MSAStructurePredictionConfig. Default: True.Output: ProtenixOutput
Output: ProtenixOutput
ProtenixMetrics instance on .metrics.structures item)| Metric | Type | Range | Availability |
|---|---|---|---|
confidence_score | float | unbounded | always |
ptm | float | 0.0 to 1.0 | always |
iptm | float | 0.0 to 1.0 | always |
avg_plddt | float | 0.0 to 1.0 | always |
gpde | float | ≥ 0.0 | always |
avg_pae | float | ≥ 0.0 | always |
pae | list[list[float]] | ≥ 0.0 | when include_pae_matrix=True |
chain_ptm | list[float] | 0.0 to 1.0 | depends on model output |
chain_plddt | list[float] | 0.0 to 1.0 | depends on model output |
chain_pair_iptm | list[list[float]] | 0.0 to 1.0 | depends on model output |
has_clash | bool | unbounded | depends on model output |
Applications
This tool predicts the structure of multi-component assemblies such as protein-DNA and protein-RNA complexes, protein-ligand binding poses, and chains carrying modified residues. For a multi-chain complex it also reports how confidently the chains are placed relative to one another: interface pTM (ipTM) gives a single 0-to-1 score for the overall inter-chain arrangement, per-chain-pair ipTM scores each individual interface, and the cross-chain blocks of the PAE matrix show which specific inter-chain regions are positioned confidently versus uncertainly. These let you rank or filter predicted complexes and judge whether a docking pose or binding interface is reliable before trusting it downstream.Usage Tips
model_nameselects the accuracy/speed trade-off. The defaultprotenix_base_default_v1.0.0is the most accurate (10 Pairformer cycles, 200 diffusion steps); theminiandtinyvariants are far faster with fewer cycles and steps, andprotenix_mini_esm_v0.5.0/protenix_mini_ism_v0.5.0use protein language-model embeddings for MSA-free prediction.protenix-v2weights are gated by ByteDance. They are not currently distributed publicly; if you have a copy, drop it into the resolved weights directory before selectingmodel_name="protenix-v2". Seenotes/storage.mdfor path resolution.use_msadefaults toTrue. A ColabFold search generates an MSA for each protein chain; set itFalse, attach precomputed MSAs, or use an ESM/ISM mini variant to skip alignments entirely.- Diffusion sampling is controlled by
seedsandnum_diffusion_samples. Protenix drawsnum_diffusion_samples(default5) structures per seed and keeps the best by ranking score; the total number of candidates islen(seeds)timesnum_diffusion_samples. Settingseedoverridesseedswith a single value for reproducibility. num_pairformer_cyclesandnum_diffusion_stepstrade accuracy for time. Defaults are checkpoint-aware: base and constraint variants use10cycles and200steps, while mini and tiny variants use4cycles and5steps, matching each checkpoint’s native upstream schedule. Override either field to apply a custom schedule regardless ofmodel_name.- Confidence is reported as pLDDT, pTM, ipTM, gPDE, and PAE.
confidence_score, the ranking score and primary metric, selects the best sample;avg_plddtis on a 0 to 1 scale and PAE and gPDE are in angstroms.has_clashflags steric clashes. Setinclude_pae_matrixto attach the full per-token PAE matrix. - Modified residues are supported. Protein PTMs and DNA/RNA modifications are passed through as CCD codes, as in AlphaFold3.
Toolkit Notes
These apply to every Protenix tool in this toolkit (protenix-prediction).
- Requires a GPU. Protenix runs through a PyTorch backend and needs an NVIDIA GPU; base models are memory-intensive and slower, while mini and tiny variants run on more modest hardware. CPU execution is not practical.
- Open AlphaFold3 reproduction. Unlike AlphaFold3, whose weights are gated and non-commercial, Protenix releases both code and weights under Apache-2.0 for academic and commercial use. Like Boltz-2 it follows the AlphaFold3 diffusion architecture, and additionally accepts modified residues.
- Predictions are stochastic. Structures come from a diffusion process, so repeated runs vary unless sampling is seeded.

ByteDance