USalign - Proto

License: USalign is licensed under Custom (Zhang Lab academic-use license) and may require explicit attribution when utilized. Please refer to the license for full terms.

This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.

GitHub 176 GitHub 176 Publication Publication Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook Open on Proto Open on Proto

pylelab/USalign

Universal Structure Alignment of Monomeric and Complex Structure of Nucleic Acids and Proteins

176 stars

View repo

US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes

Chengxin Zhang, Morgan Shine, … Yang Zhang

Nature Methods (2022)

Read paper

@article{zhang2022usalign,
  title={{US-align}: universal structure alignments of proteins, nucleic acids, and macromolecular complexes},
  author={Zhang, Chengxin and Shine, Morgan and Pyle, Anna Marie and Zhang, Yang},
  journal={Nature Methods},
  volume={19},
  number={9},
  pages={1109--1115},
  year={2022},
  publisher={Nature Publishing Group},
  doi={10.1038/s41592-022-01585-1}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/structure_alignment/usalign

View source

Open Notebook

Open notebook

Coming soon!

Run this tool directly in Proto with no setup required.

Function	Description
`run_usalign()`	Universal structure alignment using USalign (Zhang et al., 2022). Supports monomers and multimers…	Docs Source

Background

USalign (Zhang, Shine, Pyle, and Zhang, 2022) is a universal structure alignment platform that aligns monomer and complex structures of proteins, RNA, and DNA under a single Template Modeling (TM-score) objective. Multi-chain complexes are aligned jointly by combining residue-level structural alignment with a chain-to-chain mapping search, and nucleic acid residues are anchored on the C3’ backbone atom in place of the protein Cα. The published benchmark reports consistent advantages over state-of-the-art methods in pairwise and multiple structure alignment across these molecular types, and demonstrates that heterogeneous oligomeric complexes such as protein-RNA assemblies can be aligned within the same framework. The Template Modeling score (TM-score) (Zhang and Skolnick, 2004) is a length-independent measure of topological similarity between two structures. The score lies between 0 and 1, with 1 indicating identical structures. A subsequent statistical analysis (Xu and Zhang, 2010) established that a TM-score above 0.5 is a strong probabilistic indicator of shared SCOP and CATH fold classification for single-domain proteins, while scores below 0.17 are indistinguishable from random pairs (the random-pair distribution is centred near a TM-score of 0.15). The same 0.5 threshold is the convention used in the literature for multi-chain complex alignment, although the underlying statistical study was performed on monomers.

Learning Resources

pylelab/USalign (Pyle Lab, Yale University). The canonical distribution that bundles USalign together with TMalign, MMalign, and TMscore. This toolkit compiles the USalign program from this repository.
Zhang Lab US-align page (Zhang Lab). Background documentation, command-line reference, and an online USalign web service maintained by the original developers.

Tools

USalign Structure Alignment (`usalign-alignment`)

Aligns two macromolecular structures with USalign and returns the Template Modeling score normalised by the length of each input structure. The tool takes a query and reference Structure, runs the compiled USalign program in multi-chain oligomeric mode (-mm 1 -ter 1), and reports tm_score_structure_1 (normalised by the query length) and tm_score_structure_2 (normalised by the reference length).

API Reference

Source

Input: USalignInput

query_structure

Structure

required

Query / candidate structure.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

reference_structure

Structure

required

Reference / target structure.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

Source

Config: USalignConfig

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cpu"

Device to run the tool on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

Source

Output: USalignOutput

metrics

USalignMetrics

Pairwise alignment scores. Access metrics via output.metrics.tm_score_structure_1 or output.tm_score_structure_1 (the forwarded shortcut from :class:BaseToolOutput).

Show USalignMetrics

primary_metric

string

Name of the metric that best summarizes the result overall (e.g. "avg_plddt" for AlphaFold2). Used by downstream UI and reporting to pick a headline value.

Metrics

Metric	Type	Range	Availability
`tm_score_structure_1`	float	0.0 to 1.0	always
`tm_score_structure_2`	float	0.0 to 1.0	always

Applications

This tool is the appropriate choice for any pairwise structure comparison that may include multiple chains, nucleic acid components, or a mix of protein and nucleic acid. Representative applications include validating a predicted multimeric complex against an experimental reference, ranking designed binder-target poses by interface architecture, comparing predicted RNA tertiary structures against known folds, and assessing predicted protein-nucleic acid assemblies against experimentally determined complexes.

Usage Tips

The two TM-scores differ when the query and reference have different total lengths. Each score is normalised by the length of the named structure (summed across all aligned chains), so the score normalised by the shorter structure is typically the larger of the two. Use the score normalised by the structure whose length matters for the comparison, typically the reference or target when ranking candidates against a fixed structure.
A TM-score above 0.5 indicates the structures share the same fold or complex architecture. This threshold is statistically derived from a non-redundant analysis of the Protein Data Bank (Xu and Zhang, 2010) and is the standard fold-similarity cutoff in the literature. The same interpretation applies to monomers, multimers, and nucleic acid structures.
For single-chain protein-only inputs, prefer the TMalign tool. TMalign runs the original single-chain TM-score alignment algorithm and is faster for guaranteed single-chain protein inputs. USalign is the appropriate choice when either input may be multi-chain or may contain nucleic acid residues.
The tool always runs in multi-chain oligomeric mode (-mm 1) and aligns every chain of the first model in each input (-ter 1). This is the recommended mode for complex structure comparison and is also valid for monomer inputs. Molecule type is auto-detected from the input residues, so the same call handles proteins, RNA, DNA, and mixed assemblies without explicit configuration.
Multi-chain inputs should carry distinct chain identifiers. The chain-to-chain mapping algorithm uses the chain IDs from the input PDB to track the joint alignment, so duplicated or missing chain IDs in a multi-chain input can result in suboptimal chain pairings.

Toolkit Notes

These apply to every USalign tool in this toolkit (usalign-alignment).

Outputs are returned as typed metric objects. Each USalignMetrics result carries both tm_score_structure_1 and tm_score_structure_2. Results can be exported to JSON through the standard export method.
Inputs accept a Structure object, a file path, or raw PDB or mmCIF content. Each input is normalised to a Structure before scoring, and the corresponding PDB text is passed to USalign through a temporary file.
USalign runs on CPU and is fast enough for batch comparison of large structure sets. No GPU is used, and per-pair runtime scales with the combined length of the two structures and the number of chains.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​USalign Structure Alignment (usalign-alignment)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

USalign Structure Alignment (`usalign-alignment`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides