Proto is not affiliated with Chai Discovery. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
Chai-1 (Chai Discovery, 2024) predicts the joint 3D structure of a biomolecular assembly from the sequences and chemical components it contains. It is a multi-modal foundation model that folds proteins together with small-molecule ligands, nucleic acids, glycans, and covalent modifications in a single model. Each protein chain can be conditioned on evolutionary signal, either through a multiple-sequence alignment (MSA) of related sequences or through embeddings from an ESM protein language model. Internally, Chai-1 follows the all-atom co-folding approach popularized by AlphaFold3. It tokenizes the assembly the same way, with one token per amino-acid residue or nucleotide and one token per atom for ligands and modified residues. A trunk network then builds and refines token and pairwise representations, optionally conditioned on the MSA and ESM embeddings, and a diffusion module generates the all-atom coordinates by starting from noise and iteratively denoising into a structure. Several structures are sampled per input and ranked by an aggregate confidence score. Predicted confidence includes a per-atom predicted local distance difference test (pLDDT) for local reliability, a predicted aligned error (PAE) for the relative placement of any two tokens, and predicted template-modeling (pTM) and interface predicted template-modeling (ipTM) scores that summarize overall and interface accuracy. The reference implementation is open-sourced by Chai Discovery at chaidiscovery/chai-lab, with both the code and the model weights released under the Apache-2.0 license for academic and commercial use, including drug discovery. Chai Discovery also runs the model as a hosted web platform at lab.chaidiscovery.com.Learning Resources
- chaidiscovery/chai-lab (Chai Discovery) - the official repository and inference code, linking the technical report and the hosted Chai Discovery web platform for running predictions in the browser.
Tools
Chai-1 Structure Prediction (chai1-prediction)
Predicts the 3D structure of a biomolecular complex. Each input complex can combine protein, ligand, and glycan chains; the assembly is folded by Chai-1 and returned as a predicted Structure per complex with confidence metrics: average pLDDT, pTM, interface pTM, predicted aligned error, and an overall confidence score.API Reference
Input: Chai1Input
Input: Chai1Input
StructurePredictionInput. Each complex can contain multiple chains of proteins, ligands, and/or glycans. Total token count per complex must not exceed 2,048 (see Note below).ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by preprocess() or supplied directly.Config: Chai1Config
Config: Chai1Config
use_msa; both can be enabled together and Chai-1 conditions on the ESM embeddings and the MSA simultaneously. Default: True.StructurePredictionConfig. Default: False."cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".None waits indefinitely. Default: 1200.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.False.MSAStructurePredictionConfig. Default: True.use_msa=True. Inherited from MSAStructurePredictionConfig. Default: None.MSAStructurePredictionConfig. Default: True.Output: Chai1Output
Output: Chai1Output
Chai1Metrics instance on .metrics.structures item)| Metric | Type | Range | Availability |
|---|---|---|---|
avg_plddt | float | 0.0 to 1.0 | always |
ptm | float | 0.0 to 1.0 | always |
iptm | float | 0.0 to 1.0 | always |
avg_pae | float | ≥ 0.0 | always |
pae | list[list[float]] | ≥ 0.0 | when include_pae_matrix=True |
confidence_score | float | 0.0 to 1.0 | always |
Applications
This tool predicts the structure of multi-component assemblies such as protein-ligand binding poses and glycosylated proteins, which makes it well suited to drug-discovery screening and modeling carbohydrate-decorated targets. For a multi-chain complex it also reports how confidently the chains are placed relative to one another: interface pTM (ipTM) gives a single 0-to-1 score for the overall inter-chain arrangement, and the cross-chain blocks of the PAE matrix show which inter-chain regions are positioned confidently versus uncertainly, so you can rank or filter predicted complexes before trusting a pose downstream.Usage Tips
- Total length is capped at 2,048 tokens per complex (1 per amino-acid residue, 1 per heavy atom for ligands and glycans); longer inputs are rejected.
use_esm_embeddingsdefaults toTrue. Chai-1 conditions on embeddings from an ESM protein language model; they are used with or without an MSA.use_msadefaults toTrue. A ColabFold search generates an MSA for each protein chain; set itFalsefor single-sequence prediction, or attach precomputed MSAs to the input.- Sampling and refinement are configurable.
num_diffn_samples(default5) independent samples are drawn per complex and the best is kept byconfidence_score;num_diffn_timesteps(default200) sets the denoising steps andnum_trunk_recycles(default3) trades accuracy for runtime. - Confidence is reported as pLDDT, pTM, ipTM, PAE, and a confidence score.
avg_plddt, the primary metric, is on a 0 to 1 scale; ipTM is meaningful only for multi-chain complexes. Setinclude_pae_matrixto attach the full per-token PAE matrix.
Toolkit Notes
These apply to every Chai-1 tool in this toolkit (chai1-prediction).
- Requires a GPU. Chai-1 runs through a PyTorch backend and needs an NVIDIA GPU; CPU execution is not practical.
low_memory(defaultTrue) streams features per sample to reduce peak GPU memory at some cost in speed. - Protein, ligand, and glycan only The Chai-1 model additionally supports DNA, RNA, and covalent modifications; this toolkit currently wraps protein, ligand, and glycan prediction. Use AlphaFold3, Boltz-2, or Protenix for nucleic-acid complexes.
- Predictions are stochastic. Structures come from a diffusion process; set
seedfor reproducible sampling.recycle_msa_subsampleand unseeded runs are intentionally non-deterministic.

Chai Discovery