Skip to main content

Tools

For full tool documentation, API references, and standalone usage guides, see the Proto Tools docs.
Tools are standardized Python wrappers around 120+ bioinformatics tools: structure predictors, sequence scorers, gene annotators, and alignment engines. When a constraint needs to predict a protein’s 3D structure or search for sequence motifs, it calls a tool. Tools are rarely called directly; constraints call them behind the scenes, and the optimizer manages caching and execution. Understanding tools nonetheless helps in configuring constraints effectively.

The Input / Config / Output Pattern

Every tool follows the same three-part pattern using Pydantic models:
ToolInput(primary data)ToolConfig(parameters)Tool Functionrun_tool()ToolOutput(results + metadata)
ToolInput(primary data)ToolConfig(parameters)Tool Functionrun_tool()ToolOutput(results + metadata)
  • Input: What to analyze. The primary data: sequences, structures, files.
  • Config: How to analyze it. Parameters and settings. Always optional; sensible defaults are built in.
  • Output: Results plus standardized metadata (execution time, success status, tool ID, warnings).

Tool Categories

Structure Prediction

Predict 3D structures from sequences.AlphaFold2, AlphaFold3, Boltz2, Chai1, ESMFold, Protenix, ViennaRNA

Structure Design

Generate novel protein backbone structures.RFDiffusion3

Structure Dynamics

Sample conformational ensembles.BioEmu

Inverse Folding

Design sequences for target structures.ProteinMPNN, LigandMPNN, FAMPNN

Masked Models

Protein language models for scoring and sampling.ESM2, ESM3

Causal Models

Autoregressive models for generation and scoring.Evo1, Evo2, ProGen2

Sequence Scoring

Predict functional effects from genomic sequences.Enformer, Borzoi, AlphaGenome, Segmasker

Gene Annotation

Annotate sequences with genes, domains, and motifs.PyHMMER, CRISPR-tracr, MinCED

Sequence Alignment

Search databases and align sequences.BLAST, MMseqs2, MAFFT, ColabFold Search

ORF Prediction

Find open reading frames in DNA.Orfipy, Prodigal

RNA Splicing

Predict splice sites and specificity.SpliceTransformer

Database Retrieval

Fetch sequences and structures from public databases.UniProt, PDB, NCBI, SequenceFetch

Structure Alignment

Align and compare 3D protein structures.TMAlign, USAlign

How Tools Connect to Constraints

When a constraint such as structure_plddt_constraint is created, the constraint function internally calls the appropriate tool. Here is the flow:
Optimizerscore_energy()Constraintevaluate()Constraint Functionstructure_plddt_constraint()Toolrun_esmfold()Raw Results(structures, scores)Normalized Score0.0 to 1.0
Optimizerscore_energy()Constraintevaluate()Constraint Functionstructure_plddt_constraint()Toolrun_esmfold()Raw Results(structures, scores)Normalized Score0.0 to 1.0
The tool is configured through the constraint’s function_config. For example, the structure_plddt_constraint config selects which structure predictor to use:
python
from proto_language.constraint import structure_plddt_constraint

# The constraint internally calls the specified tool
Constraint(
    inputs=[protein_segment],
    function=structure_plddt_constraint,
    function_config={
        "structure_tool": "esmfold",  # Which tool to use: "esmfold", "boltz2", "alphafold3", ...
    },
    weight=2.0,
)

Tool Caching

When the same sequence is evaluated by multiple constraints that use the same tool, the tool cache prevents redundant computation:
python
# Both constraints use ESMFold internally.
# The second call hits the cache -- no redundant GPU work.
plddt_constraint = Constraint(
    inputs=[segment],
    function=structure_plddt_constraint,
    function_config={"structure_tool": "esmfold"},
)
ptm_constraint = Constraint(
    inputs=[segment],
    function=structure_ptm_constraint,
    function_config={"structure_tool": "esmfold"},
)
The optimizer manages cache lifecycle. See Optimizers: Tool Cache Management for configuration options.

GPU vs CPU Tools

GPU Tools

Deep learning models that require NVIDIA GPUs.
  • Structure Prediction: AlphaFold3, Boltz2, Chai1, ESMFold, Protenix
  • Inverse Folding: ProteinMPNN, LigandMPNN
  • Language Models: ESM2, ESM3, Evo2, ProGen2
  • Sequence Scoring: Enformer, Borzoi, AlphaGenome
  • Structure Design: RFDiffusion3
  • Structure Dynamics: BioEmu
  • RNA Splicing: SpliceTransformer

CPU Tools

Classical bioinformatics algorithms. Run anywhere.
  • Gene Annotation: PyHMMER, MinCED
  • Sequence Alignment: BLAST, MMseqs2, MAFFT, ColabFold Search
  • ORF Prediction: Orfipy, Prodigal
  • Structure Prediction: ViennaRNA (RNA only)
When designing optimization pipelines, put CPU-based filter constraints early to screen out bad proposals before GPU-based scoring constraints run. This minimizes expensive GPU time.

Next Steps

Tools Documentation

Full tool API references, standalone usage guides, and detailed documentation

Constraints

See how tools power constraint evaluation

Generators

Generators that wrap language model tools

Optimizers

How optimizers manage tool caching