Foldseek - Proto

License: Foldseek has a GPL-3.0 license. Please refer to the license for full terms.

Proto is not affiliated with the Steinegger Lab. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.

GitHub 1.2k GitHub 1.2k Website Website Publication Publication Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook

steineggerlab/foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.

Fast and accurate protein structure search with Foldseek

Michel van Kempen, Stephanie S. Kim, … Martin Steinegger

Nature Biotechnology (2024)

Read paper

@article{vanKempen2023foldseek,
  title={Fast and accurate protein structure search with {F}oldseek},
  author={van Kempen, Michel and Kim, Stephanie S. and Tumescheit, Charlotte and Mirdita, Milot and Lee, Jeongjae and Gilchrist, Cameron L. M. and S{\"o}ding, Johannes and Steinegger, Martin},
  journal={Nature Biotechnology},
  volume={42},
  pages={243--246},
  year={2024},
  publisher={Nature Publishing Group},
  doi={10.1038/s41587-023-01773-0}
}

@article{kim2025foldseekmultimer,
  title={Rapid and sensitive protein complex alignment with {F}oldseek-{M}ultimer},
  author={Kim, Woosub and Mirdita, Milot and Levy Karin, Eli and Gilchrist, Cameron L. M. and Schweke, Hugo and S{\"o}ding, Johannes and Levy, Emmanuel D. and Steinegger, Martin},
  journal={Nature Methods},
  volume={22},
  pages={469--472},
  year={2025},
  publisher={Nature Publishing Group},
  doi={10.1038/s41592-025-02593-7}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/structure_alignment/foldseek

View source

Open Notebook

Open notebook

Function	Description
`run_foldseek_cluster()`	Cluster a set of protein structures by structural similarity using Foldseek easy-cluster	Docs Source
`run_foldseek_multimer_search()`	Search Foldseek multimer (complex) structural homology — remote (server) or local (CLI)	Docs Source
`run_foldseek_multimercluster()`	Cluster a set of protein complexes by multimer-level structural similarity using Foldseek easy-mu…	Docs Source
`run_foldseek_rbh()`	Find reciprocal best-hit structural alignments between a query and a target DB using Foldseek eas…	Docs Source
`run_foldseek_search()`	Search Foldseek structural homology against PDB100/AlphaFold DB (remote) or a local DB (local)	Docs Source

Background

Foldseek (van Kempen et al., 2024) performs structural homology search, identifying distant evolutionary relatives of a query protein by structural similarity rather than sequence similarity. Each residue of a protein structure is represented as a discrete letter over a learned structural alphabet (the 3Di alphabet) that captures the tertiary interactions between that residue and its spatial neighbours. Pairs of structures are then aligned by running MMseqs2-style sensitive sequence alignment over the 3Di strings together with the underlying amino-acid sequences. The original publication reports that this approach decreases computation times by four to five orders of magnitude relative to the established structural aligners Dali, TM-align, and CE. Foldseek can also accept amino-acid sequences directly, in which case the bundled ProstT5 language model predicts a 3Di sequence before alignment. Foldseek-Multimer (Kim et al., 2025) extends the same machinery to multi-chain complexes. It computes pairwise chain-to-chain alignments and then clusters their superposition vectors to identify mutually compatible chain pairs. The multimer publication reports speedups of three to four orders of magnitude over the gold-standard multimer aligner while producing comparable alignments, and demonstrates that the method aligns billions of complex pairs within 11 hours of compute. The Foldseek codebase is released as open source by the Steinegger Lab at steineggerlab/foldseek, and the same group operates a public web service at search.foldseek.com that the remote execution modes of this toolkit target.

Learning Resources

steineggerlab/foldseek (Steinegger Lab, Seoul National University). Official repository and command-line interface for easy-search, easy-cluster, easy-multimersearch, easy-multimercluster, and easy-rbh.
search.foldseek.com (Steinegger Lab). The public web service that the remote execution mode targets.

Tools

Foldseek Search (`foldseek-search`)

Aligns a single-chain query structure against one or more reference databases and returns a ranked list of structural hits. The remote execution mode submits the query to the Steinegger Lab web service and downloads the result archive. The local execution mode runs foldseek easy-search against a user-supplied target database.

API Reference

Source

Input: FoldseekSearchInput

structure

Structure

required

Query structure. Accepts a Structure object, a file path, or raw PDB/CIF content; normalised internally.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

Source

Config: FoldseekSearchConfig

search_mode

enum

default:"remote"

‘remote’ (default) hits the public Foldseek server; ‘local’ runs the Foldseek CLI against a local DB.Available options: remote, local

databases

List[string]

Remote-only — server-hosted databases to search.

mode

enum

default:"3diaa"

Remote-only — alignment mode; ‘3diaa’ (default) is fast 3Di+AA local; ‘tmalign’ is global; ‘lolalign’ is local LoL.Available options: 3diaa, tmalign, lolalign

poll_interval_seconds

number

default:"5.0"

Remote-only — delay between status polls.

timeout_seconds

number

default:"600.0"

Remote-only — max wall-clock time for the search.

local_db

string

Local-only (required) — path to a local Foldseek DB.

evalue

number

default:"10.0"

Local-only — E-value cutoff (lower = stricter).

sensitivity

number

default:"9.5"

Local-only — prefilter sensitivity (1.0-9.5; higher = slower + more sensitive).

max_seqs

integer

default:"1000"

Local-only — max prefilter targets per query.

alignment_type

enum

default:"2"

Local-only — alignment scoring method (0=3Di, 1=TMalign, 2=3Di+AA, 3=LoL).Available options: 0, 1, 2, 3

tmscore_threshold

number

default:"0.0"

Local-only — keep alignments with TM-score above this (0-1). 0.0 keeps all.

lddt_threshold

number

default:"0.0"

Local-only — keep alignments with LDDT above this (0-1). 0.0 keeps all.

num_threads

integer

default:"4"

Local-only — CPU threads.

use_gpu

boolean

default:"False"

Local-only — run with —gpu 1 on a Linux x86_64 NVIDIA GPU host.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cpu"

Device to run the tool on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

Source

Output: FoldseekSearchOutput

ticket_id

string

required

Remote job ticket ID (re-fetchable for ~24h); empty in local mode.

hits

List[FoldseekHit]

All alignment hits across the queried databases, in the order Foldseek returned them.

Show FoldseekHit

database

string

required

Source database the hit came from (e.g. ‘pdb100’).

target_id

string

required

Database-specific target identifier (e.g. ‘1tup_A’, ‘AF-P04637-F1’).

sequence_identity

number

required

Sequence identity over the aligned region, as a fraction in [0, 1].

alignment_length

integer

required

Length of the aligned region in residues.

mismatches

integer

required

Number of mismatched columns.

gap_openings

integer

required

Number of gap-opening events.

query_start

integer

required

1-indexed start position in the query.

query_end

integer

required

1-indexed end position in the query.

target_start

integer

required

1-indexed start position in the target.

target_end

integer

required

1-indexed end position in the target.

evalue

number

required

Expectation value.

bit_score

number

required

Bit score.

num_hits

integer

required

len(hits).

databases_queried

List[string]

required

Databases included in this search; in local mode contains the single local DB path.

result_url

string

required

Remote result-archive URL; empty in local mode.

Applications

This tool is the structural analogue of BLAST. It is the appropriate first step for detecting distant homologues that fall below the sequence-similarity twilight zone (commonly cited as below 30 percent pairwise identity), for finding structural templates against the AlphaFold Database when no experimental structures are available for a target, and for assessing whether a designed protein recapitulates a known fold or represents a novel topology.

Usage Tips

The remote service is the default execution mode and provides a hosted set of reference databases. Selectable databases are pdb100, afdb50, afdb-swissprot, afdb-proteome, mgnify_esm30, gmgcl_id, BFVD, cath50, and bfmd. The remote default queries pdb100 and afdb50. Override the selection through the databases configuration field.
The alignment algorithm is selected by mode in remote execution and by alignment_type in local execution. For mode, the default 3diaa performs 3Di-plus-amino-acid local alignment, tmalign runs the global TM-align, and lolalign runs the LoL-aligner local alignment. The local-mode equivalent alignment_type takes the integer values 0 (3Di), 1 (TM-align), 2 (3Di+AA, the default), and 3 (LoL).
Local execution requires a target database. Provide either a prebuilt Foldseek database or a directory of PDB files via the local_db configuration field. Foldseek constructs a temporary database from a directory of files at runtime, but a prebuilt database from foldseek createdb is more efficient for repeated queries.
sensitivity controls the prefilter stage during local execution. Higher values recover more distant homologues at the cost of additional runtime. The wrapper default of 9.5 matches the upstream --sensitivity default.
Local execution can be GPU-accelerated. Set use_gpu=True to run with --gpu 1 on a compatible NVIDIA GPU host (see Toolkit Notes for requirements).

Foldseek Cluster (`foldseek-cluster`)

Groups a set of structures into clusters by 3Di structural similarity using foldseek easy-cluster. Inputs can be structure text (PDB or mmCIF) or amino-acid sequences (FASTA). The latter are routed through the bundled ProstT5 language model, which predicts a 3Di sequence per input before clustering proceeds.

API Reference

Source

Input: FoldseekClusterInput

structures

List[Structure | string] | string | Path

Items to cluster (≥2) — a list of Structure objects / file paths / PDB·mmCIF·FASTA text, or a directory path (filename stems become structure_ids).

structure_ids

array

Optional IDs for the list form (default structure_0, …); derived from filename stems for a directory.

Source

Config: FoldseekClusterConfig

min_seq_id

number

default:"0.0"

Sequence-identity threshold (0-1). Default 0.0 because Foldseek clusters by 3Di structural similarity, not seq id.

cov

number

default:"0.8"

Coverage threshold (0-1) for the alignment.

cov_mode

enum

default:"0"

Foldseek coverage mode (0: bidirectional,Available options: 0, 1, 2

evalue

number

default:"0.01"

E-value cutoff for cluster-membership alignments (lower = stricter; default 0.01 matches the foldseek cluster workflow’s runtime default).

alignment_type

enum

default:"2"

Alignment scoring method (0=3Di, 1=TMalign, 2=3Di+AA, 3=LoL).Available options: 0, 1, 2, 3

tmscore_threshold

number

default:"0.0"

Keep cluster-membership alignments with TM-score above this (0-1). 0.0 keeps all.

lddt_threshold

number

default:"0.0"

Keep cluster-membership alignments with LDDT above this (0-1). 0.0 keeps all.

prostt5_weights_dir

string

Path to ProstT5 model weights for FASTA inputs. If None, weights are auto-provisioned under resolve_weights_dir("foldseek")/prostt5/weights on first FASTA call (honors PROTO_FOLDSEEK_WEIGHTS_DIR / PROTO_MODEL_CACHE).

num_threads

integer

default:"4"

CPU threads.

use_gpu

boolean

default:"False"

Run with —gpu 1 on a Linux x86_64 NVIDIA GPU host (driver >= 525.60.13).

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cpu"

Device to run the tool on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Source

Output: FoldseekClusterOutput

clusters

List[FoldseekCluster]

One entry per cluster, each holding a representative and its members.

Show FoldseekCluster

representative_id

string

required

ID of the cluster representative.

member_ids

List[string]

required

IDs of all members (includes the representative).

num_clusters

integer

required

len(clusters).

num_structures

integer

required

Total number of input structures clustered.

Applications

This tool is appropriate for deduplicating a set of designed structures before downstream analysis, for surveying fold families across a screened library, and for partitioning a large structure collection into representative groups for further inspection. Clusters with a single member identify structurally isolated entries that share no near-neighbour in the input set.

Usage Tips

structures accepts either a list or a directory path. Provide an in-memory list of structure or FASTA text strings (Structure objects and file paths are also accepted per item), or a single path to a directory of supported files, in which case filename stems become the structure identifiers.
A single call must use one input format. Mixing FASTA inputs with PDB or mmCIF inputs is rejected by input validation. Format is auto-detected per input entry.
min_seq_id=0.0 is intentional and lets 3Di structural similarity dominate cluster assignment. Raising it adds a sequence-identity floor to cluster membership. Use a non-zero value only when a sequence-similarity constraint is desired alongside structural similarity.
There is no parameter that requests an exact cluster count. Foldseek clusters by similarity threshold, not by a target count. To approximate a target number of clusters, sweep the cov field and select the run whose cluster count is closest to the target.

Foldseek Multimer Search (`foldseek-multimer-search`)

Aligns a multi-chain query complex against multimer-aware reference databases using the same execution-mode pattern as foldseek-search. The remote service hosts the multimer endpoint, and the local execution mode runs foldseek easy-multimersearch against a user-supplied target database.

API Reference

Source

Input: FoldseekMultimerSearchInput

structure

Structure

required

Multi-chain query complex. Accepts a Structure object, a file path, or raw PDB/CIF content.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

Source

Config: FoldseekMultimerSearchConfig

search_mode

enum

default:"remote"

‘remote’ (default; hits the public Foldseek-Multimer endpoint at search.foldseek.com/foldmulti) or ‘local’ (runs foldseek easy-multimersearch locally).Available options: remote, local

databases

List[string]

Remote-only — server-hosted multimer-aware databases. Default [‘pdb100’].

mode

enum

default:"3diaa"

Remote-only — alignment mode. Wire-level the mode is prefixed complex- automatically.Available options: 3diaa, tmalign, lolalign

poll_interval_seconds

number

default:"5.0"

Remote-only — delay between status polls.

timeout_seconds

number

default:"600.0"

Remote-only — max wall-clock time.

local_db

string

Local-only (required) — path to a local multimer-aware Foldseek DB.

evalue

number

default:"10.0"

Local-only — E-value cutoff (lower = stricter).

sensitivity

number

default:"4.0"

Local-only — prefilter sensitivity (1.0-9.5; higher = slower + more sensitive).

max_seqs

integer

default:"300"

Local-only — max prefilter targets per query.

alignment_type

enum

default:"2"

Local-only — alignment scoring method (0=3Di, 1=TMalign, 2=3Di+AA, 3=LoL).Available options: 0, 1, 2, 3

tmscore_threshold

number

default:"0.0"

Local-only — keep alignments with TM-score above this (0-1). 0.0 keeps all.

lddt_threshold

number

default:"0.0"

Local-only — keep alignments with LDDT above this (0-1). 0.0 keeps all.

num_threads

integer

default:"4"

Local-only — CPU threads.

use_gpu

boolean

default:"False"

Local-only — run with —gpu 1 on a Linux x86_64 NVIDIA GPU host.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cpu"

Device to run the tool on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Source

Output: FoldseekMultimerSearchOutput

ticket_id

string

required

Remote job ticket ID; empty in local mode.

hits

List[FoldseekHit]

Multimer alignment hits.

Show FoldseekHit

database

string

required

Source database the hit came from (e.g. ‘pdb100’).

target_id

string

required

Database-specific target identifier (e.g. ‘1tup_A’, ‘AF-P04637-F1’).

sequence_identity

number

required

Sequence identity over the aligned region, as a fraction in [0, 1].

alignment_length

integer

required

Length of the aligned region in residues.

mismatches

integer

required

Number of mismatched columns.

gap_openings

integer

required

Number of gap-opening events.

query_start

integer

required

1-indexed start position in the query.

query_end

integer

required

1-indexed end position in the query.

target_start

integer

required

1-indexed start position in the target.

target_end

integer

required

1-indexed end position in the target.

evalue

number

required

Expectation value.

bit_score

number

required

Bit score.

num_hits

integer

required

len(hits).

databases_queried

List[string]

required

Databases included in this search; in local mode contains the single local DB path.

result_url

string

required

Remote result-archive URL; empty in local mode.

Applications

This tool ranks reference complexes by structural similarity to a query complex. It is appropriate for finding natural complexes that resemble a designed binder-target pose, for identifying multi-chain assemblies that share interface architecture with a query, and for mining experimentally determined complexes that match a hypothesised binding mode. Sequence-only methods cannot perform the equivalent search because chain compatibility is governed by tertiary contacts rather than sequence similarity.

Usage Tips

The default remote database is pdb100. Override through the databases configuration field with any of the values in the database list documented under foldseek-search.
The mode value is sent to the remote endpoint with a complex- prefix internally. Configure the field as plain 3diaa, tmalign, or lolalign. The toolkit applies the multimer wire-format prefix during submission.
Local execution requires a target database via local_db. As with single-chain search, either a prebuilt Foldseek database or a directory of multimer files is accepted.

Foldseek Multimer Cluster (`foldseek-multimercluster`)

Groups a set of multi-chain assemblies into clusters using foldseek easy-multimercluster, which combines per-chain TM-score and interface lDDT into a multimer-level similarity score. Inputs are multi-chain PDB or mmCIF text.

API Reference

Source

Input: FoldseekMultimerClusterInput

structures

List[Structure | string] | string | Path

Multi-chain items to cluster (≥2) — a list of Structure objects / file paths / PDB·mmCIF text, or a directory path (filename stems become structure_ids).

structure_ids

array

Optional IDs for the list form (default multimer-0, …); derived from filename stems for a directory. No _.

Source

Config: FoldseekMultimerClusterConfig

multimer_tm_threshold

number

default:"0.65"

Maps to --multimer-tm-threshold. Multimer-level TM-score (0-1) above which two multimers cluster together.

chain_tm_threshold

number

default:"0.001"

Maps to --chain-tm-threshold. Per-chain TM-score (0-1) used to filter chain-pair alignments before assembling the multimer score.

interface_lddt_threshold

number

default:"0.5"

Maps to --interface-lddt-threshold. Interface lDDT (0-1) for chain-pair alignments.

alignment_type

enum

default:"2"

Alignment scoring method (0=3Di, 1=TMalign, 2=3Di+AA, 3=LoL).Available options: 0, 1, 2, 3

tmscore_threshold

number

default:"0.0"

Keep chain-pair alignments with TM-score above this (0-1). 0.0 keeps all.

lddt_threshold

number

default:"0.0"

Keep chain-pair alignments with LDDT above this (0-1). 0.0 keeps all.

num_threads

integer

default:"4"

CPU threads.

use_gpu

boolean

default:"False"

Run with —gpu 1 on a Linux x86_64 NVIDIA GPU host (driver >= 525.60.13).

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cpu"

Device to run the tool on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Source

Output: FoldseekMultimerClusterOutput

clusters

List[FoldseekCluster]

One entry per cluster, each holding a representative multimer and its members. Member IDs may include {multimer_id}_{chain} suffixes per Foldseek’s chain-aware schema.

Show FoldseekCluster

representative_id

string

required

ID of the cluster representative.

member_ids

List[string]

required

IDs of all members (includes the representative).

num_clusters

integer

required

len(clusters).

num_multimers

integer

required

Total number of input multimers clustered.

rep_seq_fasta

string

required

Representative-multimer FASTA produced by Foldseek (with #multimer_id group separators between chains).

Applications

This tool is appropriate for partitioning a candidate set of designed complexes by overall complex geometry, for selecting structurally diverse representatives from a larger pool of binder-target poses, and for analysing the structural diversity of an experimentally determined complex collection.

Usage Tips

Structure identifiers must not contain an underscore. Foldseek emits cluster member identifiers as {multimer_id}_{chain}, so an underscore in the multimer identifier would silently corrupt downstream parsing. Both user-supplied and filename-derived identifiers are validated and rejected if they contain an underscore.
Three thresholds control cluster membership. multimer_tm_threshold (default 0.65) sets the multimer-level TM-score required for inclusion. chain_tm_threshold (default 0.001) governs the per-chain TM-score required during chain-pair filtering. interface_lddt_threshold (default 0.5) sets the interface quality required for a chain-pair alignment to contribute to the multimer score.

Foldseek Reciprocal Best Hits (`foldseek-rbh`)

Performs a reciprocal-best-hits structural search between a single-chain query and a target database using foldseek easy-rbh. Only mutual best matches are returned, in contrast to the all-hit output of foldseek-search.

API Reference

Source

Input: FoldseekRBHInput

structure

Structure

required

Single-chain query structure. Accepts a Structure object, a file path, or raw PDB/CIF content.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

Source

Config: FoldseekRBHConfig

local_db

string

Path to the target — either a prebuilt Foldseek DB (e.g. /data/pdb100) or a directory of PDB files (Foldseek auto-builds a temporary DB). Required.

evalue

number

default:"10.0"

E-value cutoff (lower = stricter).

sensitivity

number

default:"4.0"

Prefilter sensitivity (1.0-9.5; higher = slower + more sensitive). Default 4.0 matches foldseek’s setStructureRbhDefaults (which, unlike the search workflow, does not bump sensitivity to 9.5).

max_seqs

integer

default:"1000"

Max prefilter targets per query.

alignment_type

enum

default:"2"

Alignment scoring method (0=3Di, 1=TMalign, 2=3Di+AA, 3=LoL). Note: foldseek’s RBH workflow only branches on TMalign (1) and 3Di+AA (2); 0 falls through to the same alignment branch as 2.Available options: 0, 1, 2, 3

cov

number

default:"0.0"

Minimum aligned-residue coverage for an RBH pair (0-1). 0.0 keeps all.

cov_mode

enum

default:"0"

How cov is measured: 0=bidirectional, 1=target-only, 2=query-only.Available options: 0, 1, 2

tmscore_threshold

number

default:"0.0"

Keep RBH pairs with TM-score above this (0-1). 0.0 keeps all.

lddt_threshold

number

default:"0.0"

Keep RBH pairs with LDDT above this (0-1). 0.0 keeps all.

num_threads

integer

default:"4"

CPU threads.

use_gpu

boolean

default:"False"

Run with —gpu 1 on a Linux x86_64 NVIDIA GPU host (driver >= 525.60.13).

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cpu"

Device to run the tool on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Source

Output: FoldseekRBHOutput

hits

List[FoldseekHit]

Mutual best-hit alignments. Each hit is a standard 12-column M8 row, identical schema to foldseek-search.

Show FoldseekHit

database

string

required

Source database the hit came from (e.g. ‘pdb100’).

target_id

string

required

Database-specific target identifier (e.g. ‘1tup_A’, ‘AF-P04637-F1’).

sequence_identity

number

required

Sequence identity over the aligned region, as a fraction in [0, 1].

alignment_length

integer

required

Length of the aligned region in residues.

mismatches

integer

required

Number of mismatched columns.

gap_openings

integer

required

Number of gap-opening events.

query_start

integer

required

1-indexed start position in the query.

query_end

integer

required

1-indexed end position in the query.

target_start

integer

required

1-indexed start position in the target.

target_end

integer

required

1-indexed end position in the target.

evalue

number

required

Expectation value.

bit_score

number

required

Bit score.

num_hits

integer

required

len(hits).

target_db

string

required

The target DB path that was queried.

Applications

This tool produces conservative one-to-one structural correspondences. It is appropriate for structural orthology calls between species, for mapping designed proteins to their closest natural counterpart in a curated reference set, and for any analysis in which the absence of a reciprocal best match should be interpreted as no confident correspondence.

Usage Tips

This tool runs only in local execution mode. No remote endpoint exists for reciprocal best hits, and a local_db value pointing at a prebuilt database or a directory of PDB files is required.
The output is sparse by construction. Most queries return zero or one hit, and the absence of a reciprocal best match indicates that no target in the database satisfies the reciprocity criterion.

Toolkit Notes

These apply to every Foldseek tool in this toolkit (foldseek-search, foldseek-cluster, foldseek-multimer-search, foldseek-multimercluster, foldseek-rbh).

Local memory consumption scales linearly with database size. The upstream documentation gives a per-residue cost of (6 + 1 + 1) bytes × num_residues for Cα coordinates, 3Di letters, and amino-acid letters, and reports that the 54 million entries in AFDB50 require approximately 151 GB of RAM under default settings.
Hits use a 12-column M8 tabular schema with sequence_identity normalised to the range 0 to 1. Filtering structural hits by sequence identity defeats the purpose of structural search, since distant homologues commonly share fold without sharing sequence. evalue and bit_score are the appropriate ranking criteria.
Accepted input formats differ by tool. foldseek-search, foldseek-multimer-search, and foldseek-rbh currently accept only raw PDB text, foldseek-cluster accepts PDB, mmCIF, or FASTA, and foldseek-multimercluster accepts PDB or mmCIF.
Local execution requires a user-supplied target. Either a prebuilt Foldseek database or a directory of structure files must be provided through the local_db field. No reference database is bundled with the toolkit.
A directory passed to structures caches by file content, not directory path. Modifying files in place between calls correctly invalidates the cache, so structure-set updates do not produce stale results.
Local search can use an NVIDIA GPU. Set use_gpu=True on any local-mode tool; the GPU build auto-installs on Linux x86_64 hosts with a compatible NVIDIA driver (>= 525.60.13).

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​Foldseek Search (foldseek-search)

​API Reference

​Applications

​Usage Tips

​Foldseek Cluster (foldseek-cluster)

​API Reference

​Applications

​Usage Tips

​Foldseek Multimer Search (foldseek-multimer-search)

​API Reference

​Applications

​Usage Tips

​Foldseek Multimer Cluster (foldseek-multimercluster)

​API Reference

​Applications

​Usage Tips

​Foldseek Reciprocal Best Hits (foldseek-rbh)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

Foldseek Search (`foldseek-search`)

API Reference

Applications

Usage Tips

Foldseek Cluster (`foldseek-cluster`)

API Reference

Applications

Usage Tips

Foldseek Multimer Search (`foldseek-multimer-search`)

API Reference

Applications

Usage Tips

Foldseek Multimer Cluster (`foldseek-multimercluster`)

API Reference

Applications

Usage Tips

Foldseek Reciprocal Best Hits (`foldseek-rbh`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides