Skip to main content
License: FoldMason has a GPL-3.0 license. Please refer to the license for full terms.

Proto is not affiliated with the Steinegger Lab. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


steineggerlab/foldmason
steineggerlab/foldmason
Multiple Protein Structure Alignment at Scale with FoldMason
272 stars
View repo
search.foldseek.com
Visit website
Multiple protein structure alignment at scale with FoldMason
Cameron L. M. Gilchrist, Milot Mirdita and Martin Steinegger
Science (2026)
Read paper
@article{gilchrist2026foldmason,
  title={Multiple protein structure alignment at scale with {F}old{M}ason},
  author={Gilchrist, Cameron L. M. and Mirdita, Milot and Steinegger, Martin},
  journal={Science},
  volume={391},
  number={6784},
  pages={485--488},
  year={2026},
  publisher={American Association for the Advancement of Science},
  doi={10.1126/science.ads6733}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/structure_alignment/foldmason
View source
Open Notebook
Open notebook
FunctionDescription
run_foldmason_msa()Multiple structure alignment via FoldMason — remote (server) or local (CLI) Docs Source
run_foldmason_score_msa()Score a structural MSA with average + per-column LDDT using FoldMason msa2lddt Docs Source

Background

FoldMason (Gilchrist, Mirdita & Steinegger, 2026) is a progressive multiple-structure alignment method that scales to hundreds of thousands of protein structures. Each input structure is first encoded as a string over the 3Di alphabet, the structural alphabet introduced with Foldseek that represents the local backbone geometry of each residue as a discrete letter. FoldMason then aligns the 3Di strings alongside their amino-acid sequences through a progressive procedure that follows a structural guide tree, using Foldseek and TM-align as the pairwise structural aligners at each merge step. An optional iterative refinement procedure can re-align the result to maximise its LDDT score. The output is a column-by-column alignment expressed in both alphabets together with the Newick guide tree. Alignment quality is summarised with the Local Distance Difference Test (lDDT) (Mariani et al., 2013), a superposition-free metric that scores local atomic-distance agreement between two structures. FoldMason’s msa2lddt computes LDDT on each pairwise sub-alignment, maps the per-residue scores back to MSA columns, and averages across pairs to produce one column-wise score and one overall average. The reference implementation is released as open source by the Steinegger Lab at steineggerlab/foldmason. The same group operates a public web service at search.foldseek.com/foldmason that the remote execution mode of this toolkit targets.

Learning Resources

  • steineggerlab/foldmason (Steinegger Lab, Seoul National University) - official repository, command-line interface for easy-msa, structuremsa, refinemsa, and msa2lddt, and the FASTA output format that this toolkit parses.
  • search.foldseek.com/foldmason (Steinegger Lab) - the public web service that the remote execution mode targets, useful for a single browser-based alignment before scripting against the tool.

Tools

FoldMason MSA (foldmason-msa)

Aligns two or more PDB structures and returns the amino-acid and 3Di MSAs as FASTA strings together with the Newick guide tree, the alignment length, and the number of sequences aligned. The tool executes against the public Steinegger Lab web service in remote mode and against the bundled foldmason easy-msa program in local mode.

API Reference

Source
structures
List[Structure]
required
Structures to align (≥2). Accepts Structure objects, file paths, or raw PDB/CIF content strings per item; each is normalised to a Structure.
structure_ids
array
Optional IDs per structure (default: 'structure_0', 'structure_1', …). Length must match structures. IDs become the FASTA record headers and Newick leaf labels in the output.
Source
search_mode
enum
default:"remote"
‘remote’ (default) hits the public FoldMason server; ‘local’ runs the FoldMason CLI.Available options: remote, local
poll_interval_seconds
number
default:"5.0"
Remote-only — delay between status polls.
timeout_seconds
number
default:"600.0"
Remote-only — max wall-clock time.
gap_open
integer
default:"10"
Local-only — gap open cost.
gap_extend
integer
default:"1"
Local-only — gap extension cost.
refine_iters
integer
default:"0"
Local-only — number of alignment-refinement iterations. 0 = no refinement.
precluster
boolean
default:"False"
Local-only — pre-cluster structures before MSA construction. Recommended for large datasets (>1k structures).
guide_tree_newick
string
Local-only — Newick guide tree to use instead of computing one. Leaf labels must match structure_ids.
num_threads
integer
default:"4"
Local-only — CPU threads.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cpu"
Device to run the tool on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
ticket_id
string
required
Remote job ticket ID; empty in local mode.
aa_msa_fasta
string
required
Amino-acid alphabet MSA in FASTA format.
three_di_msa_fasta
string
required
3Di alphabet MSA in FASTA format.
newick_tree
string
required
Newick guide tree.
num_sequences
integer
required
Number of sequences in the alignment.
alignment_length
integer
required
Number of MSA columns.
result_url
string
required
Remote result-archive URL; empty in local mode.

Applications

This tool is appropriate for aligning a fold family retrieved from a Foldseek search, for comparing designed scaffolds against their target backbone, or for assembling a multi-structure template ensemble for downstream template-based modelling. It also applies to AlphaFold predictions across an evolutionary set, where the alignment can identify residues that are structurally conserved as well as loops that vary in conformation.

Usage Tips

  • foldmason-msa supports both remote (search_mode="remote", the default) and local (search_mode="local") execution. Remote mode targets the Steinegger Lab web service. Local mode runs the bundled FoldMason program and accepts the full set of alignment parameters.
  • The Steinegger Lab web service does not accept alignment parameters. The configuration fields gap_open, gap_extend, refine_iters, precluster, and guide_tree_newick therefore require search_mode="local".
  • refine_iters controls how many iterative LDDT-maximising refinement passes run after the initial progressive alignment. Each pass adds runtime, and the default of 0 is appropriate for most workflows. Increase it only when an alignment shows poor quality in difficult regions.
  • The remote service has no authentication and no published rate limit. search.foldseek.com/foldmason is a free public academic resource. High-throughput or batch workloads should be performed in local mode to avoid overloading the shared service.

FoldMason Score MSA (foldmason-score-msa)

Accepts a precomputed amino-acid MSA in FASTA format together with the underlying PDB structures, and returns the average MSA-wide LDDT score, the per-column LDDT scores, the number of columns considered, and the total alignment length.

API Reference

Source
structures
List[Structure]
required
Structures (≥2). Accepts Structure objects, file paths, or raw PDB/CIF content strings per item; each is normalised to a Structure. Order must match the rows of msa.
structure_ids
array
Optional IDs per structure (default: 'structure_0', …). Must match the FASTA record headers in msa so msa2lddt can resolve each row to its structure.
msa
MSA
required
Amino-acid MSA, typically from foldmason-msa. Accepts an MSA object or a raw FASTA string.
Source
pair_threshold
number
default:"0.0"
Minimum fraction of pair sub-alignments with LDDT information required to score a column (0-1). 0.0 (default) keeps all columns.
only_scoring_cols
boolean
default:"False"
If True, normalise the average LDDT by the number of scoring columns rather than total alignment length.
guide_tree_newick
string
Newick guide tree to score against; leaf labels must match structure_ids. None lets foldmason recompute the tree internally.
num_threads
integer
default:"4"
CPU threads.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cpu"
Device to run the tool on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
average_lddt
number
required
Average MSA LDDT score (0-1) across all scored columns.
columns_considered
integer
required
Number of columns that had enough pairwise information to be scored.
alignment_length
integer
required
Total number of MSA columns.
column_scores
List[number]
Per-column LDDT scores, length equal to alignment_length.

Applications

This tool is appropriate for assigning a structural quality score to an MSA produced elsewhere, for identifying low-LDDT columns that should be masked or treated as variable loops before downstream analysis, or for comparing two candidate alignments of the same structures using a single summary score.

Usage Tips

  • FASTA record headers must match structure_ids. msa2lddt resolves each MSA row to its corresponding structure by matching the header against the supplied identifiers. Headers that do not correspond to a supplied structure are not scored, which can produce a misleadingly high score derived from a partial alignment.
  • only_scoring_cols=True normalises the average LDDT by the number of scored columns rather than by the total alignment length. Use this option when comparing alignments with different gap content. Leaving it False (the default) includes gap columns in the denominator.
  • This tool runs only in local mode. The public web service does not provide an msa2lddt endpoint, so every foldmason-score-msa call requires the local FoldMason program.

Toolkit Notes

These apply to every FoldMason tool in this toolkit (foldmason-msa, foldmason-score-msa).
  • FoldMason runs on CPU only. Neither the remote service nor the local program uses a GPU. Local-mode runtime grows with both the number of structures and their lengths, since each progressive merge step performs a pairwise structural alignment.
  • Inputs are PDB-format text strings. Each entry is written to disk as {structure_id}.pdb before alignment, so each structure should be supplied as PDB-format text in structures. The upstream FoldMason CLI also accepts mmCIF, but this toolkit does not currently support mmCIF input.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.