Skip to main content
FAMPNN
License: FAMPNN is open source and free for academic and commercial use under an MIT license. Please refer to the license for full terms.

This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


richardshuai/fampnn
richardshuai/fampnn
Sidechain conditioning and modeling for full-atom protein sequence design
138 stars
View repo
@inproceedings{widatalla2025fampnn,
  title={Sidechain conditioning and modeling for full-atom protein sequence design with {FAMPNN}},
  author={Widatalla, Talal and Shuai, Richard W. and Hie, Brian L. and Huang, Po-Ssu},
  booktitle={Proceedings of the 42nd International Conference on Machine Learning},
  year={2025},
  series={PMLR},
  volume={267},
  address={Vancouver, Canada}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/inverse_folding/fampnn
View source
Open Notebook
Open notebook
Coming soon!
Run this tool directly in Proto with no setup required.
FunctionDescription
run_fampnn_pack()Pack protein sidechains using FAMPNN with per-atom confidence (pSCE) (GPU) Docs Source
run_fampnn_sample()Design protein sequences with full-atom sidechain co-generation using FAMPNN (GPU) Docs Source
run_fampnn_score()Score protein mutations with full-atom context using FAMPNN (GPU) Docs Source
run_fampnn_score_all_mutations()Score every possible single mutation at every position using FAMPNN (GPU) Docs Source

Background

FAMPNN (Widatalla et al., 2025) solves the full-atom inverse-folding problem: given a fixed protein backbone, it predicts both an amino-acid sequence that folds into it and the sidechain conformation of every residue. Most fixed-backbone designers reason about sidechain interactions only implicitly, through backbone geometry and sequence labels, even though the three-dimensional arrangement of sidechain atoms drives protein conformation, stability, and function. Internally, FAMPNN learns a per-residue joint distribution over the discrete amino-acid identity and the continuous sidechain conformation, trained with a combined categorical cross-entropy and diffusion objective. Sequences are generated by iterative unmasking, starting from a fully masked state and revealing residues over several steps, and the sidechain atoms are produced by a per-token Euclidean diffusion process in each residue’s local backbone frame. A confidence module predicts the per-atom sidechain error (pSCE) in angstroms, which correlates with true packing error both per atom and, averaged over a residue, per residue. Learning sequence and sidechains jointly is synergistic, improving both native-sequence recovery and sidechain packing relative to modeling the sequence alone. The reference implementation is maintained at richardshuai/fampnn and was developed at Stanford University.

Tools

FAMPNN Sampling (fampnn-sample)

Designs a sequence for a backbone and co-generates its full-atom sidechains. Each input structure is returned as a designed structure with packed sidechain coordinates and a per-residue pSCE.

API Reference

Source
inputs
List[FAMPNNStructureInput]
required
Per-structure inputs, each containing a structure and optional chains_to_redesign / fixed_positions / fixed_sidechain_positions selections.
Source
model_variant
string
default:"0.3"
FAMPNN checkpoint variant. ‘0.3’ for sequence design (PDB-trained, 0.3A noise), ‘0.0’ for sidechain packing (PDB-trained, 0.0A noise), ‘0.3_cath’ for mutation scoring (CATH-trained).
num_steps
integer
default:"100"
Number of iterative unmasking steps for sequence design. More steps yield higher quality but slower inference. 10 steps is sufficient for high self-consistency; 100 for best quality.
seq_only
boolean
default:"False"
If True, skip sidechain generation during sampling.
repack_last
boolean
default:"True"
If True, repack sidechains after final sequence is determined.
psce_threshold
number
default:"0.3"
Only condition on sidechains with predicted sidechain error below this threshold during iterative sampling.
scn_diffusion_steps
integer
default:"50"
Number of sidechain diffusion denoising steps.
scn_step_scale
number
default:"1.5"
Step scale for sidechain diffusion (eta parameter).
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run the model on. Options include ‘cuda’ (NVIDIA GPU), ‘cpu’ (CPU execution), or specific GPU devices like ‘cuda:0’. Defaults to ‘cuda’.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed to use for sampling.
num_sequences_per_structure
integer
default:"1"
Total number of sequences to generate per input structure.
batch_size
integer
Number of sequences to process simultaneously on GPU. Defaults to num_sequences_per_structure.
temperature
number
default:"0.1"
Controls randomness in sampling from logits.
Source
design_sets
List[FAMPNNDesignSet]
required
One FAMPNNDesignSet per input structure, in input order.

Applications

Use this to redesign or stabilize a protein when sidechain packing matters, for example interface or active-site redesign, since the model commits to a concrete sidechain arrangement rather than leaving it implicit. The pSCE flags residues whose packing the model is unsure about.

Usage Tips

  • psce_threshold (default 0.3) controls which generated sidechains the model conditions on. FAMPNN designs by iterative unmasking, and at each step only sidechains whose predicted error (pSCE, in angstroms) falls below this threshold are kept as context for decoding the remaining residues. Sidechains the model is less certain about (pSCE at or above the threshold) are not used as context, so low-confidence placements do not bias later predictions. Lower it to condition only on the most confident sidechains. Raise it to let more, lower-confidence sidechains inform the design.
  • fixed_positions and fixed_sidechain_positions are different knobs. fixed_positions holds a residue’s amino-acid identity, while fixed_sidechain_positions instead conditions on its existing sidechain coordinates. Both are per chain and indexed from 1 to match biological residue selection conventions.
  • seq_only (default False) skips sidechain co-generation. Setting it True is faster but gives up the full-atom modeling that distinguishes FAMPNN, so leave it off unless you only need a sequence.
  • Output is structured per design. output.design_sets[i].complexes[j] is a FAMPNNDesign with .chains, the full-atom .structure, and .metrics["avg_psce"]. FAMPNNDesigns are a Complex subclass and can be passed directly to structure predictors.

FAMPNN Sidechain Packing (fampnn-pack)

Places sidechain atoms onto a structure whose sequence is fixed, returning the packed structure with per-atom pSCE confidence.

API Reference

Source
inputs
List[FAMPNNStructureInput]
required
List of structure inputs for sidechain packing.
Source
model_variant
string
default:"0.0"
Checkpoint variant. ‘0.0’ recommended for best packing accuracy.
num_samples_per_structure
integer
default:"1"
Number of packing samples per input structure.
batch_size
integer
default:"16"
Number of samples to process simultaneously on GPU.
scn_diffusion_steps
integer
default:"50"
Number of sidechain diffusion denoising steps.
scn_step_scale
number
default:"1.5"
Step scale for sidechain diffusion.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
packed_structures
List[array]
required
Packed structures with sidechain coordinates. Outer list corresponds to input structures, inner list to packing samples. B-factor column carries per-atom pSCE.
psce
List[array]
required
Per-residue predicted sidechain error (Angstroms) for each sample.

Applications

Use this to build or rebuild sidechains for a known sequence and backbone, for example after backbone-only design or to repair incomplete models before docking or molecular dynamics.

Usage Tips

  • pSCE tells you which sidechains to trust. It is the predicted per-atom sidechain error in angstroms, so high-pSCE residues are the ones to inspect or rebuild.
  • Lower batch_size if you run out of GPU memory. It defaults to 16 samples per forward pass; reduce it for large structures.

FAMPNN Mutation Scoring (fampnn-score)

Scores specified mutations on a structure with full-atom context, returning a likelihood-based score per mutation.

API Reference

Source
inputs
List[MutationInput]
required
List of MutationInput objects, each containing a structure and mutations to score.
Source
model_variant
string
default:"0.3_cath"
Checkpoint variant. ‘0.3_cath’ recommended for scoring.
batch_size
integer
default:"16"
Number of mutations to score simultaneously on GPU.
seq_only
boolean
default:"False"
If True, score without sidechain context (backbone-only).
scn_diffusion_steps
integer
default:"50"
Number of sidechain diffusion denoising steps.
scn_step_scale
number
default:"1.5"
Step scale for sidechain diffusion.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[MutationScoreResult]
required
List of MutationScoreResult objects, one per input structure.

Applications

Use this to estimate the effect of specific point or multi-site mutations, for example triaging a designed variant list or interpreting a mutational scan, with sidechain context rather than backbone alone.

Usage Tips

  • Mutation strings are 1-indexed and colon-joined. Use A1V for a single mutation and A1V:G5L for a multi-site variant. The position is counted from 1.
  • seq_only (default False) removes sidechain context. Leaving it off scores with full-atom context, which is the point of FAMPNN; set it True only for a faster, weaker backbone-style score.

FAMPNN Score All Mutations (fampnn-score-all-mutations)

Scores every possible single substitution at every position of a structure, returning a position-by-residue map of log-likelihood-ratio scores.

API Reference

Source
inputs
List[Structure]
required
List of structures to score all mutations for.
Source
model_variant
string
default:"0.3_cath"
Checkpoint variant. ‘0.3_cath’ recommended for scoring.
batch_size
integer
default:"16"
Number of positions to score simultaneously on GPU.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[AllMutationsScoreResult]
required
List of AllMutationsScoreResult objects, one per input structure.

Applications

Use this for a full in-silico deep mutational scan, to find stabilizing or tolerated substitutions across a protein without enumerating mutations by hand.

Usage Tips

  • Lower batch_size if you run out of GPU memory. It defaults to scoring 16 positions per forward pass, which dominates memory for large proteins.
  • Scores are log-likelihood ratios relative to the native residue. Positive values mean the substitution is favored over the wild-type residue under the model, so rank candidates by score rather than reading absolute values.

Toolkit Notes

These apply to every FAMPNN tool in this toolkit (fampnn-sample, fampnn-pack, fampnn-score, fampnn-score-all-mutations).
  • Requires a GPU. The diffusion-based sidechain model is not practical on CPU.
  • Each tool already defaults model_variant to the right checkpoint. 0.3 for design (fampnn-sample), 0.0 for packing (fampnn-pack), and 0.3_cath for scoring (fampnn-score, fampnn-score-all-mutations). Each checkpoint is trained and calibrated for its own task, so reusing one across tasks degrades results.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.