FAMPNN - Proto

License: FAMPNN is open source and free for academic and commercial use under an MIT license. Please refer to the license for full terms.

This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.

GitHub 138 GitHub 138 Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook Open on Proto Open on Proto

richardshuai/fampnn

Sidechain conditioning and modeling for full-atom protein sequence design

138 stars

View repo

@inproceedings{widatalla2025fampnn,
  title={Sidechain conditioning and modeling for full-atom protein sequence design with {FAMPNN}},
  author={Widatalla, Talal and Shuai, Richard W. and Hie, Brian L. and Huang, Po-Ssu},
  booktitle={Proceedings of the 42nd International Conference on Machine Learning},
  year={2025},
  series={PMLR},
  volume={267},
  address={Vancouver, Canada}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/inverse_folding/fampnn

View source

Open Notebook

Open notebook

Coming soon!

Run this tool directly in Proto with no setup required.

Function	Description
`run_fampnn_pack()`	Pack protein sidechains using FAMPNN with per-atom confidence (pSCE) (GPU)	Docs Source
`run_fampnn_sample()`	Design protein sequences with full-atom sidechain co-generation using FAMPNN (GPU)	Docs Source
`run_fampnn_score()`	Score protein mutations with full-atom context using FAMPNN (GPU)	Docs Source
`run_fampnn_score_all_mutations()`	Score every possible single mutation at every position using FAMPNN (GPU)	Docs Source

Background

FAMPNN (Widatalla et al., 2025) solves the full-atom inverse-folding problem: given a fixed protein backbone, it predicts both an amino-acid sequence that folds into it and the sidechain conformation of every residue. Most fixed-backbone designers reason about sidechain interactions only implicitly, through backbone geometry and sequence labels, even though the three-dimensional arrangement of sidechain atoms drives protein conformation, stability, and function. Internally, FAMPNN learns a per-residue joint distribution over the discrete amino-acid identity and the continuous sidechain conformation, trained with a combined categorical cross-entropy and diffusion objective. Sequences are generated by iterative unmasking, starting from a fully masked state and revealing residues over several steps, and the sidechain atoms are produced by a per-token Euclidean diffusion process in each residue’s local backbone frame. A confidence module predicts the per-atom sidechain error (pSCE) in angstroms, which correlates with true packing error both per atom and, averaged over a residue, per residue. Learning sequence and sidechains jointly is synergistic, improving both native-sequence recovery and sidechain packing relative to modeling the sequence alone. The reference implementation is maintained at richardshuai/fampnn and was developed at Stanford University.

Tools

FAMPNN Sampling (`fampnn-sample`)

Designs a sequence for a backbone and co-generates its full-atom sidechains. Each input structure is returned as a designed structure with packed sidechain coordinates and a per-residue pSCE.

API Reference

Source

Input: FAMPNNSampleInput

inputs

List[FAMPNNStructureInput]

required

Per-structure inputs, each containing a structure and optional chains_to_redesign / fixed_positions / fixed_sidechain_positions selections.

Show FAMPNNStructureInput

fixed_sidechain_positions

ResidueSelection

Per-chain residue positions whose sidechain coordinates condition the model during sampling/packing (1-indexed). Accepts shorthand {"A": [1, 2]} at construction.

structure

Structure

required

Protein structure. Accepts a file path, raw PDB/CIF content string, Structure object, or a dict in the shape produced by Structure.model_dump(mode='json').

chains_to_redesign

ChainSelection

Chains to redesign. None means redesign every chain in the structure. Accepts shorthand "A" or ["A", "B"] at construction.

fixed_positions

ResidueSelection

Per-chain positions whose residue identity is held fixed during design (1-indexed). Accepts shorthand {"A": [1, 2, 3]} at construction.

Source

Config: FAMPNNSampleConfig

model_variant

string

default:"0.3"

FAMPNN checkpoint variant. ‘0.3’ for sequence design (PDB-trained, 0.3A noise), ‘0.0’ for sidechain packing (PDB-trained, 0.0A noise), ‘0.3_cath’ for mutation scoring (CATH-trained).

num_steps

integer

default:"100"

Number of iterative unmasking steps for sequence design. More steps yield higher quality but slower inference. 10 steps is sufficient for high self-consistency; 100 for best quality.

seq_only

boolean

default:"False"

If True, skip sidechain generation during sampling.

repack_last

boolean

default:"True"

If True, repack sidechains after final sequence is determined.

psce_threshold

number

default:"0.3"

Only condition on sidechains with predicted sidechain error below this threshold during iterative sampling.

scn_diffusion_steps

integer

default:"50"

Number of sidechain diffusion denoising steps.

scn_step_scale

number

default:"1.5"

Step scale for sidechain diffusion (eta parameter).

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run the model on. Options include ‘cuda’ (NVIDIA GPU), ‘cpu’ (CPU execution), or specific GPU devices like ‘cuda:0’. Defaults to ‘cuda’.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed to use for sampling.

num_sequences_per_structure

integer

default:"1"

Total number of sequences to generate per input structure.

batch_size

integer

Number of sequences to process simultaneously on GPU. Defaults to num_sequences_per_structure.

temperature

number

default:"0.1"

Controls randomness in sampling from logits.

Source

Output: FAMPNNSampleOutput

design_sets

List[FAMPNNDesignSet]

required

One FAMPNNDesignSet per input structure, in input order.

Show FAMPNNDesignSet

complexes

List[FAMPNNDesign]

required

The FAMPNN complexes generated for one input, each a complete multi-chain complex.

Applications

Use this to redesign or stabilize a protein when sidechain packing matters, for example interface or active-site redesign, since the model commits to a concrete sidechain arrangement rather than leaving it implicit. The pSCE flags residues whose packing the model is unsure about.

Usage Tips

psce_threshold (default 0.3) controls which generated sidechains the model conditions on. FAMPNN designs by iterative unmasking, and at each step only sidechains whose predicted error (pSCE, in angstroms) falls below this threshold are kept as context for decoding the remaining residues. Sidechains the model is less certain about (pSCE at or above the threshold) are not used as context, so low-confidence placements do not bias later predictions. Lower it to condition only on the most confident sidechains. Raise it to let more, lower-confidence sidechains inform the design.
fixed_positions and fixed_sidechain_positions are different knobs. fixed_positions holds a residue’s amino-acid identity, while fixed_sidechain_positions instead conditions on its existing sidechain coordinates. Both are per chain and indexed from 1 to match biological residue selection conventions.
seq_only (default False) skips sidechain co-generation. Setting it True is faster but gives up the full-atom modeling that distinguishes FAMPNN, so leave it off unless you only need a sequence.
Output is structured per design. output.design_sets[i].complexes[j] is a FAMPNNDesign with .chains, the full-atom .structure, and .metrics["avg_psce"]. FAMPNNDesigns are a Complex subclass and can be passed directly to structure predictors.

FAMPNN Sidechain Packing (`fampnn-pack`)

Places sidechain atoms onto a structure whose sequence is fixed, returning the packed structure with per-atom pSCE confidence.

API Reference

Source

Input: FAMPNNPackInput

inputs

List[FAMPNNStructureInput]

required

List of structure inputs for sidechain packing.

Show FAMPNNStructureInput

fixed_sidechain_positions

ResidueSelection

Per-chain residue positions whose sidechain coordinates condition the model during sampling/packing (1-indexed). Accepts shorthand {"A": [1, 2]} at construction.

structure

Structure

required

Protein structure. Accepts a file path, raw PDB/CIF content string, Structure object, or a dict in the shape produced by Structure.model_dump(mode='json').

chains_to_redesign

ChainSelection

Chains to redesign. None means redesign every chain in the structure. Accepts shorthand "A" or ["A", "B"] at construction.

fixed_positions

ResidueSelection

Per-chain positions whose residue identity is held fixed during design (1-indexed). Accepts shorthand {"A": [1, 2, 3]} at construction.

Source

Config: FAMPNNPackConfig

model_variant

string

default:"0.0"

Checkpoint variant. ‘0.0’ recommended for best packing accuracy.

num_samples_per_structure

integer

default:"1"

Number of packing samples per input structure.

batch_size

integer

default:"16"

Number of samples to process simultaneously on GPU.

scn_diffusion_steps

integer

default:"50"

Number of sidechain diffusion denoising steps.

scn_step_scale

number

default:"1.5"

Step scale for sidechain diffusion.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

Source

Output: FAMPNNPackingResult

packed_structures

List[array]

required

Packed structures with sidechain coordinates. Outer list corresponds to input structures, inner list to packing samples. B-factor column carries per-atom pSCE.

psce

List[array]

required

Per-residue predicted sidechain error (Angstroms) for each sample.

Applications

Use this to build or rebuild sidechains for a known sequence and backbone, for example after backbone-only design or to repair incomplete models before docking or molecular dynamics.

Usage Tips

pSCE tells you which sidechains to trust. It is the predicted per-atom sidechain error in angstroms, so high-pSCE residues are the ones to inspect or rebuild.
Lower batch_size if you run out of GPU memory. It defaults to 16 samples per forward pass; reduce it for large structures.

FAMPNN Mutation Scoring (`fampnn-score`)

Scores specified mutations on a structure with full-atom context, returning a likelihood-based score per mutation.

API Reference

Source

Input: FAMPNNScoreInput

inputs

List[MutationInput]

required

List of MutationInput objects, each containing a structure and mutations to score.

Show MutationInput

structure

Structure

required

Protein structure to evaluate mutations against.

mutations

List[string]

required

List of mutation strings. Each mutation uses the format ‘<WT><1-indexed_position><MUT>’ with single-letter amino acid codes. Multiple simultaneous mutations are joined with colons: ‘N1P:N2R’.

Source

Config: FAMPNNScoreConfig

model_variant

string

default:"0.3_cath"

Checkpoint variant. ‘0.3_cath’ recommended for scoring.

batch_size

integer

default:"16"

Number of mutations to score simultaneously on GPU.

seq_only

boolean

default:"False"

If True, score without sidechain context (backbone-only).

scn_diffusion_steps

integer

default:"50"

Number of sidechain diffusion denoising steps.

scn_step_scale

number

default:"1.5"

Step scale for sidechain diffusion.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Source

Output: FAMPNNScoreOutput

results

List[MutationScoreResult]

required

List of MutationScoreResult objects, one per input structure.

Show MutationScoreResult

mutations

List[string]

required

Mutation strings that were scored.

scores

List[number]

required

Log-likelihood ratio scores for each mutation. Positive = mutation is more likely than wild-type.

Applications

Use this to estimate the effect of specific point or multi-site mutations, for example triaging a designed variant list or interpreting a mutational scan, with sidechain context rather than backbone alone.

Usage Tips

Mutation strings are 1-indexed and colon-joined. Use A1V for a single mutation and A1V:G5L for a multi-site variant. The position is counted from 1.
seq_only (default False) removes sidechain context. Leaving it off scores with full-atom context, which is the point of FAMPNN; set it True only for a faster, weaker backbone-style score.

FAMPNN Score All Mutations (`fampnn-score-all-mutations`)

Scores every possible single substitution at every position of a structure, returning a position-by-residue map of log-likelihood-ratio scores.

API Reference

Source

Input: FAMPNNScoreAllMutationsInput

inputs

List[Structure]

required

List of structures to score all mutations for.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

Source

Config: FAMPNNScoreAllMutationsConfig

model_variant

string

default:"0.3_cath"

Checkpoint variant. ‘0.3_cath’ recommended for scoring.

batch_size

integer

default:"16"

Number of positions to score simultaneously on GPU.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Source

Output: FAMPNNScoreAllMutationsOutput

results

List[AllMutationsScoreResult]

required

List of AllMutationsScoreResult objects, one per input structure.

Show AllMutationsScoreResult

scores

Dict[string, Dict[string, number]]

required

Dictionary mapping position labels (e.g., ‘1A’ for position 1, wild-type Ala) to dictionaries of {mutant_residue: score}. Scores are log-likelihood ratios (positive = favored over wild-type).

Applications

Use this for a full in-silico deep mutational scan, to find stabilizing or tolerated substitutions across a protein without enumerating mutations by hand.

Usage Tips

Lower batch_size if you run out of GPU memory. It defaults to scoring 16 positions per forward pass, which dominates memory for large proteins.
Scores are log-likelihood ratios relative to the native residue. Positive values mean the substitution is favored over the wild-type residue under the model, so rank candidates by score rather than reading absolute values.

Toolkit Notes

These apply to every FAMPNN tool in this toolkit (fampnn-sample, fampnn-pack, fampnn-score, fampnn-score-all-mutations).

Requires a GPU. The diffusion-based sidechain model is not practical on CPU.
Each tool already defaults model_variant to the right checkpoint. 0.3 for design (fampnn-sample), 0.0 for packing (fampnn-pack), and 0.3_cath for scoring (fampnn-score, fampnn-score-all-mutations). Each checkpoint is trained and calibrated for its own task, so reusing one across tasks degrades results.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Tools

​FAMPNN Sampling (fampnn-sample)

​API Reference

​Applications

​Usage Tips

​FAMPNN Sidechain Packing (fampnn-pack)

​API Reference

​Applications

​Usage Tips

​FAMPNN Mutation Scoring (fampnn-score)

​API Reference

​Applications

​Usage Tips

​FAMPNN Score All Mutations (fampnn-score-all-mutations)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Tools

FAMPNN Sampling (`fampnn-sample`)

API Reference

Applications

Usage Tips

FAMPNN Sidechain Packing (`fampnn-pack`)

API Reference

Applications

Usage Tips

FAMPNN Mutation Scoring (`fampnn-score`)

API Reference

Applications

Usage Tips

FAMPNN Score All Mutations (`fampnn-score-all-mutations`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides