TMalign - Proto

License: TMalign is licensed under Custom (Zhang Lab academic-use license) and may require explicit attribution when utilized. Please refer to the license for full terms.

This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.

GitHub 176 GitHub 176 Publication Publication Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook Open on Proto Open on Proto

pylelab/USalign

Universal Structure Alignment of Monomeric and Complex Structure of Nucleic Acids and Proteins

176 stars

View repo

TM-align: a protein structure alignment algorithm based on the TM-score

Yang Zhang and Jeffrey Skolnick

Nucleic Acids Research (2005)

Read paper

@article{zhang2005tmalign,
  title={{TM-align}: a protein structure alignment algorithm based on the {TM-score}},
  author={Zhang, Yang and Skolnick, Jeffrey},
  journal={Nucleic Acids Research},
  volume={33},
  number={7},
  pages={2302--2309},
  year={2005},
  publisher={Oxford University Press},
  doi={10.1093/nar/gki524}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/structure_alignment/tmalign

View source

Open Notebook

Open notebook

Coming soon!

Run this tool directly in Proto with no setup required.

Function	Description
`run_tmalign()`	Pairwise protein structure alignment using TMalign (Zhang & Skolnick, 2005). Returns TM-scores no…	Docs Source

Background

The Template Modeling score (TM-score) (Zhang and Skolnick, 2004) is a length-independent measure of topological similarity between two protein structures. It scores each pair of corresponding residues with a distance-based weight that uses a protein-size-dependent normalisation, which eliminates the inherent length dependence of RMSD-style scores and lets the same TM-score value be compared across proteins of different sizes. The score ranges from 0 to 1, with 1 indicating identical structures. TMalign (Zhang and Skolnick, 2005) is a structure alignment algorithm that identifies the optimal pairwise structural superposition by combining a TM-score-based rotation matrix with dynamic programming. Three initial alignments are seeded from secondary-structure matching, gapless threading, and a hybrid scoring matrix, and the residue-to-residue correspondence is then iteratively refined by alternating rigid-body rotation with dynamic programming on the TM-score-weighted distance matrix until the alignment converges. Unlike alignment methods that optimise RMSD, TMalign directly optimises the TM-score, which decouples the alignment objective from chain length. The published benchmark reports that TMalign produces alignments with higher coverage and accuracy than CE, DALI, and SAL while running approximately four times faster than CE and twenty times faster than DALI on the same workload. A subsequent statistical analysis of the TM-score (Xu and Zhang, 2010) provides quantitative interpretation guidance. The authors compare TM-scores across all pairs in a non-redundant set of 6,684 single-domain protein structures and report that the score follows an extreme value distribution. They show that a TM-score above 0.5 is a strong probabilistic indicator of shared SCOP and CATH fold classification, while scores below 0.5 mostly indicate different folds.

Learning Resources

pylelab/USalign (Pyle Lab, Yale University). The canonical distribution that bundles TMalign together with USalign, MMalign, and TMscore. This toolkit compiles the TMalign program from this repository.
Zhang Lab TMalign page (Zhang Lab). Background documentation and an online TMalign web service maintained by the original developers.

Tools

TMalign Structure Alignment (`tmalign-alignment`)

Aligns two protein structures with TMalign and returns the Template Modeling score normalised by the length of each input chain. The tool takes a query and reference Structure, runs the compiled TMalign program, and reports tm_score_chain_1 (normalised by the query length) and tm_score_chain_2 (normalised by the reference length).

API Reference

Source

Input: TMalignInput

query_structure

Structure

required

Query / candidate structure.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

reference_structure

Structure

required

Reference / target structure.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

Source

Config: TMalignConfig

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cpu"

Device to run the tool on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

Source

Output: TMalignOutput

metrics

TMalignMetrics

Pairwise alignment scores. Access metrics via output.metrics.tm_score_chain_1 or output.tm_score_chain_1 (the forwarded shortcut from :class:BaseToolOutput).

Show TMalignMetrics

primary_metric

string

Name of the metric that best summarizes the result overall (e.g. "avg_plddt" for AlphaFold2). Used by downstream UI and reporting to pick a headline value.

Metrics

Metric	Type	Range	Availability
`tm_score_chain_1`	float	0.0 to 1.0	always
`tm_score_chain_2`	float	0.0 to 1.0	always

Applications

This tool is the standard method for pairwise protein structure comparison. Representative applications include validating that a designed protein adopts the intended fold, ranking predicted structures by topological similarity to a reference, classifying experimentally determined structures into known folds, and detecting distant structural homology where sequence similarity is too low for sequence-based comparison.

Usage Tips

The two TM-scores differ when the query and reference have different lengths. Each score is normalised by the length of the named chain, so the score normalised by the shorter chain is typically the larger of the two. Use the score normalised by the chain whose length matters for the comparison, typically the reference or target when ranking candidates against a fixed structure.
A TM-score above 0.5 indicates the structures share the same fold. This threshold is statistically derived from a non-redundant analysis of the Protein Data Bank (Xu and Zhang, 2010) and is the standard fold-similarity cutoff in the literature. Scores above 0.3 are significantly above random with a P-value below 0.001, while scores below 0.17 are indistinguishable from random pairs (the random-pair distribution is centred near a TM-score of 0.15).
TMalign is designed for monomeric protein chains. Multi-chain assemblies are processed as a single chain and chain breaks are not preserved. For genuine multi-chain alignment use the USalign tool in this category, which is built for protein complexes.
Very short inputs produce unreliable scores. The TM-score d0 length-normalisation factor is calibrated for chains of approximately 15 residues and above and saturates rapidly for shorter chains, so short-chain comparisons lose the standard topological interpretation. Restrict comparison to chains of meaningful length before drawing fold-level conclusions.

Toolkit Notes

These apply to every TMalign tool in this toolkit (tmalign-alignment).

Outputs are returned as typed metric objects. Each TMalignMetrics result carries both tm_score_chain_1 and tm_score_chain_2. Results can be exported to JSON through the standard export method.
Inputs accept a Structure object, a file path, or raw PDB or mmCIF content. Each input is normalised to a Structure before scoring, and the corresponding PDB text is passed to TMalign through a temporary file.
TMalign runs on CPU and is fast enough for all-against-all comparison of large structure sets. No GPU is used, and per-pair runtime scales with the product of the two chain lengths.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​TMalign Structure Alignment (tmalign-alignment)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

TMalign Structure Alignment (`tmalign-alignment`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides