Proto is not affiliated with the Steinegger Lab. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
| Function | Description | |
|---|---|---|
run_foldseek_cluster() | Cluster a set of protein structures by structural similarity using Foldseek easy-cluster | Docs Source |
run_foldseek_multimer_search() | Search Foldseek multimer (complex) structural homology — remote (server) or local (CLI) | Docs Source |
run_foldseek_multimercluster() | Cluster a set of protein complexes by multimer-level structural similarity using Foldseek easy-mu… | Docs Source |
run_foldseek_rbh() | Find reciprocal best-hit structural alignments between a query and a target DB using Foldseek eas… | Docs Source |
run_foldseek_search() | Search Foldseek structural homology against PDB100/AlphaFold DB (remote) or a local DB (local) | Docs Source |
Background
Foldseek (van Kempen et al., 2024) performs structural homology search, identifying distant evolutionary relatives of a query protein by structural similarity rather than sequence similarity. Each residue of a protein structure is represented as a discrete letter over a learned structural alphabet (the 3Di alphabet) that captures the tertiary interactions between that residue and its spatial neighbours. Pairs of structures are then aligned by running MMseqs2-style sensitive sequence alignment over the 3Di strings together with the underlying amino-acid sequences. The original publication reports that this approach decreases computation times by four to five orders of magnitude relative to the established structural aligners Dali, TM-align, and CE. Foldseek can also accept amino-acid sequences directly, in which case the bundled ProstT5 language model predicts a 3Di sequence before alignment. Foldseek-Multimer (Kim et al., 2025) extends the same machinery to multi-chain complexes. It computes pairwise chain-to-chain alignments and then clusters their superposition vectors to identify mutually compatible chain pairs. The multimer publication reports speedups of three to four orders of magnitude over the gold-standard multimer aligner while producing comparable alignments, and demonstrates that the method aligns billions of complex pairs within 11 hours of compute. The Foldseek codebase is released as open source by the Steinegger Lab at steineggerlab/foldseek, and the same group operates a public web service at search.foldseek.com that the remote execution modes of this toolkit target.Learning Resources
- steineggerlab/foldseek (Steinegger Lab, Seoul National University). Official repository and command-line interface for
easy-search,easy-cluster,easy-multimersearch,easy-multimercluster, andeasy-rbh. - search.foldseek.com (Steinegger Lab). The public web service that the remote execution mode targets.
Tools
Foldseek Search (foldseek-search)
Aligns a single-chain query structure against one or more reference databases and returns a ranked list of structural hits. The remote execution mode submits the query to the Steinegger Lab web service and downloads the result archive. The local execution mode runs foldseek easy-search against a user-supplied target database.API Reference
Input: FoldseekSearchInput
Input: FoldseekSearchInput
Structure object, a file path, or raw PDB/CIF content; normalised internally.Config: FoldseekSearchConfig
Config: FoldseekSearchConfig
remote, local3diaa, tmalign, lolalign0, 1, 2, 3True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: FoldseekSearchOutput
Output: FoldseekSearchOutput
len(hits).Applications
This tool is the structural analogue of BLAST. It is the appropriate first step for detecting distant homologues that fall below the sequence-similarity twilight zone (commonly cited as below 30 percent pairwise identity), for finding structural templates against the AlphaFold Database when no experimental structures are available for a target, and for assessing whether a designed protein recapitulates a known fold or represents a novel topology.Usage Tips
- The remote service is the default execution mode and provides a hosted set of reference databases. Selectable databases are
pdb100,afdb50,afdb-swissprot,afdb-proteome,mgnify_esm30,gmgcl_id,BFVD,cath50, andbfmd. The remote default queriespdb100andafdb50. Override the selection through thedatabasesconfiguration field. - The alignment algorithm is selected by
modein remote execution and byalignment_typein local execution. Formode, the default3diaaperforms 3Di-plus-amino-acid local alignment,tmalignruns the global TM-align, andlolalignruns the LoL-aligner local alignment. The local-mode equivalentalignment_typetakes the integer values0(3Di),1(TM-align),2(3Di+AA, the default), and3(LoL). - Local execution requires a target database. Provide either a prebuilt Foldseek database or a directory of PDB files via the
local_dbconfiguration field. Foldseek constructs a temporary database from a directory of files at runtime, but a prebuilt database fromfoldseek createdbis more efficient for repeated queries. sensitivitycontrols the prefilter stage during local execution. Higher values recover more distant homologues at the cost of additional runtime. The wrapper default of 9.5 matches the upstream--sensitivitydefault.- Local execution can be GPU-accelerated. Set
use_gpu=Trueto run with--gpu 1on a compatible NVIDIA GPU host (see Toolkit Notes for requirements).
Foldseek Cluster (foldseek-cluster)
Groups a set of structures into clusters by 3Di structural similarity using foldseek easy-cluster. Inputs can be structure text (PDB or mmCIF) or amino-acid sequences (FASTA). The latter are routed through the bundled ProstT5 language model, which predicts a 3Di sequence per input before clustering proceeds.API Reference
Input: FoldseekClusterInput
Input: FoldseekClusterInput
structure_ids).structure_0, …); derived from filename stems for a directory.Config: FoldseekClusterConfig
Config: FoldseekClusterConfig
0, 1, 20, 1, 2, 3resolve_weights_dir("foldseek")/prostt5/weights on first FASTA call (honors PROTO_FOLDSEEK_WEIGHTS_DIR / PROTO_MODEL_CACHE).True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Applications
This tool is appropriate for deduplicating a set of designed structures before downstream analysis, for surveying fold families across a screened library, and for partitioning a large structure collection into representative groups for further inspection. Clusters with a single member identify structurally isolated entries that share no near-neighbour in the input set.Usage Tips
structuresaccepts either a list or a directory path. Provide an in-memory list of structure or FASTA text strings (Structure objects and file paths are also accepted per item), or a single path to a directory of supported files, in which case filename stems become the structure identifiers.- A single call must use one input format. Mixing FASTA inputs with PDB or mmCIF inputs is rejected by input validation. Format is auto-detected per input entry.
min_seq_id=0.0is intentional and lets 3Di structural similarity dominate cluster assignment. Raising it adds a sequence-identity floor to cluster membership. Use a non-zero value only when a sequence-similarity constraint is desired alongside structural similarity.- There is no parameter that requests an exact cluster count. Foldseek clusters by similarity threshold, not by a target count. To approximate a target number of clusters, sweep the
covfield and select the run whose cluster count is closest to the target.
Foldseek Multimer Search (foldseek-multimer-search)
Aligns a multi-chain query complex against multimer-aware reference databases using the same execution-mode pattern as foldseek-search. The remote service hosts the multimer endpoint, and the local execution mode runs foldseek easy-multimersearch against a user-supplied target database.API Reference
Input: FoldseekMultimerSearchInput
Input: FoldseekMultimerSearchInput
Structure object, a file path, or raw PDB/CIF content.Config: FoldseekMultimerSearchConfig
Config: FoldseekMultimerSearchConfig
search.foldseek.com/foldmulti) or ‘local’ (runs foldseek easy-multimersearch locally).Available options: remote, localcomplex- automatically.Available options: 3diaa, tmalign, lolalign0, 1, 2, 3True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: FoldseekMultimerSearchOutput
Output: FoldseekMultimerSearchOutput
len(hits).Applications
This tool ranks reference complexes by structural similarity to a query complex. It is appropriate for finding natural complexes that resemble a designed binder-target pose, for identifying multi-chain assemblies that share interface architecture with a query, and for mining experimentally determined complexes that match a hypothesised binding mode. Sequence-only methods cannot perform the equivalent search because chain compatibility is governed by tertiary contacts rather than sequence similarity.Usage Tips
- The default remote database is
pdb100. Override through thedatabasesconfiguration field with any of the values in the database list documented underfoldseek-search. - The
modevalue is sent to the remote endpoint with acomplex-prefix internally. Configure the field as plain3diaa,tmalign, orlolalign. The toolkit applies the multimer wire-format prefix during submission. - Local execution requires a target database via
local_db. As with single-chain search, either a prebuilt Foldseek database or a directory of multimer files is accepted.
Foldseek Multimer Cluster (foldseek-multimercluster)
Groups a set of multi-chain assemblies into clusters using foldseek easy-multimercluster, which combines per-chain TM-score and interface lDDT into a multimer-level similarity score. Inputs are multi-chain PDB or mmCIF text.API Reference
Input: FoldseekMultimerClusterInput
Input: FoldseekMultimerClusterInput
structure_ids).multimer-0, …); derived from filename stems for a directory. No _.Config: FoldseekMultimerClusterConfig
Config: FoldseekMultimerClusterConfig
--multimer-tm-threshold. Multimer-level TM-score (0-1) above which two multimers cluster together.--chain-tm-threshold. Per-chain TM-score (0-1) used to filter chain-pair alignments before assembling the multimer score.--interface-lddt-threshold. Interface lDDT (0-1) for chain-pair alignments.0, 1, 2, 3True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: FoldseekMultimerClusterOutput
Output: FoldseekMultimerClusterOutput
{multimer_id}_{chain} suffixes per Foldseek’s chain-aware schema.len(clusters).#multimer_id group separators between chains).Applications
This tool is appropriate for partitioning a candidate set of designed complexes by overall complex geometry, for selecting structurally diverse representatives from a larger pool of binder-target poses, and for analysing the structural diversity of an experimentally determined complex collection.Usage Tips
- Structure identifiers must not contain an underscore. Foldseek emits cluster member identifiers as
{multimer_id}_{chain}, so an underscore in the multimer identifier would silently corrupt downstream parsing. Both user-supplied and filename-derived identifiers are validated and rejected if they contain an underscore. - Three thresholds control cluster membership.
multimer_tm_threshold(default0.65) sets the multimer-level TM-score required for inclusion.chain_tm_threshold(default0.001) governs the per-chain TM-score required during chain-pair filtering.interface_lddt_threshold(default0.5) sets the interface quality required for a chain-pair alignment to contribute to the multimer score.
Foldseek Reciprocal Best Hits (foldseek-rbh)
Performs a reciprocal-best-hits structural search between a single-chain query and a target database using foldseek easy-rbh. Only mutual best matches are returned, in contrast to the all-hit output of foldseek-search.API Reference
Input: FoldseekRBHInput
Input: FoldseekRBHInput
Structure object, a file path, or raw PDB/CIF content.Config: FoldseekRBHConfig
Config: FoldseekRBHConfig
/data/pdb100) or a directory of PDB files (Foldseek auto-builds a temporary DB). Required.setStructureRbhDefaults (which, unlike the search workflow, does not bump sensitivity to 9.5).0, 1, 2, 3cov is measured: 0=bidirectional, 1=target-only, 2=query-only.Available options: 0, 1, 2True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Applications
This tool produces conservative one-to-one structural correspondences. It is appropriate for structural orthology calls between species, for mapping designed proteins to their closest natural counterpart in a curated reference set, and for any analysis in which the absence of a reciprocal best match should be interpreted as no confident correspondence.Usage Tips
- This tool runs only in local execution mode. No remote endpoint exists for reciprocal best hits, and a
local_dbvalue pointing at a prebuilt database or a directory of PDB files is required. - The output is sparse by construction. Most queries return zero or one hit, and the absence of a reciprocal best match indicates that no target in the database satisfies the reciprocity criterion.
Toolkit Notes
These apply to every Foldseek tool in this toolkit (foldseek-search, foldseek-cluster, foldseek-multimer-search, foldseek-multimercluster, foldseek-rbh).
- Local memory consumption scales linearly with database size. The upstream documentation gives a per-residue cost of
(6 + 1 + 1) bytes × num_residuesfor Cα coordinates, 3Di letters, and amino-acid letters, and reports that the 54 million entries in AFDB50 require approximately 151 GB of RAM under default settings. - Hits use a 12-column M8 tabular schema with
sequence_identitynormalised to the range 0 to 1. Filtering structural hits by sequence identity defeats the purpose of structural search, since distant homologues commonly share fold without sharing sequence.evalueandbit_scoreare the appropriate ranking criteria. - Accepted input formats differ by tool.
foldseek-search,foldseek-multimer-search, andfoldseek-rbhcurrently accept only raw PDB text,foldseek-clusteraccepts PDB, mmCIF, or FASTA, andfoldseek-multimerclusteraccepts PDB or mmCIF. - Local execution requires a user-supplied target. Either a prebuilt Foldseek database or a directory of structure files must be provided through the
local_dbfield. No reference database is bundled with the toolkit. - A directory passed to
structurescaches by file content, not directory path. Modifying files in place between calls correctly invalidates the cache, so structure-set updates do not produce stale results. - Local search can use an NVIDIA GPU. Set
use_gpu=Trueon any local-mode tool; the GPU build auto-installs on Linux x86_64 hosts with a compatible NVIDIA driver (>= 525.60.13).

Steinegger Lab