Skip to main content

Overview

proto-tools provides standardized Python wrappers around 60+ bioinformatics tools: structure predictors, sequence scorers, gene annotators, alignment engines, and more. Every tool follows the same Input / Config / Output pattern, so learning one tool transfers to the rest.

The Input / Config / Output Pattern

Every tool follows a three-part pattern using Pydantic models:
Input: What to analyze. The primary data: sequences, structures, files.
python
ESMFoldInput(
    complexes=[
        Complex(
            chains=[
                Chain(sequence="MKTAYLLIGL...")
            ]
        )
    ]
)
Config: How to analyze it. Parameters and settings. Always optional; sensible defaults are built in.
python
ESMFoldConfig(
    num_recycles=4,
)
Output: Results plus standardized metadata (execution time, success status, tool ID, warnings):
python
output = run_esmfold(input, config)

output.success           # True
output.execution_time    # 12.3 (seconds)
output.tool_id           # "esmfold-prediction"
output.structures[0]     # Structure object with coordinates, pLDDT, etc.

Tool Categories

Structure Prediction

Predict 3D structures from sequences.AlphaFold2, AlphaFold3, Boltz2, Chai1, ESMFold, ESMFold2, Protenix, ViennaRNA

Structure Design

Generate novel protein backbone structures.RFDiffusion3

Structure Dynamics

Sample conformational ensembles.BioEmu

Inverse Folding

Design sequences for target structures.ProteinMPNN, LigandMPNN, FAMPNN, ESM-IF1

Masked Models

Protein language models for scoring and sampling.ESM2, ESM3, ESMC, AbLang

Causal Models

Autoregressive models for generation and scoring.Evo1, Evo2, ProGen2, ProGen3

Sequence Scoring

Predict functional effects from genomic sequences.Enformer, Borzoi, AlphaGenome, Malinois, Puffin, Segmasker

Gene Annotation

Annotate sequences and find functional elements.PyHMMER, CRISPRtracrRNA, MinCED, Promoter Calculator

Sequence Alignment

Align sequences and search databases for homologs.BLAST, MMseqs2, MAFFT, ColabFold Search

ORF Prediction

Find open reading frames in DNA.Orfipy, Prodigal

RNA Splicing

Predict splice sites and specificity.SpliceTransformer, Pangolin, SpliceAI

Structure Alignment

Align and compare 3D structures.TMAlign, USAlign, Foldseek, FoldMason, PyMOL RMSD

Database Retrieval

Fetch sequences and structures from public databases.UniProt, PDB, NCBI, SequenceFetch

Structure Scoring

Score and analyze 3D structure quality.DSSP, IPSAE, pDockQ2, PyRosetta, Structure Metrics

Binder Design

De novo antibody and binder design pipelines.BindCraft, Germinal

Mutagenesis

Random sequence mutagenesis.Random Protein, Random Nucleotide

GPU vs CPU Tools

GPU Tools

Deep learning models that require NVIDIA GPUs. Faster but require specific hardware.
  • Structure Prediction: AlphaFold3, Boltz2, Chai1, ESMFold, Protenix
  • Inverse Folding: ProteinMPNN, LigandMPNN, FAMPNN
  • Language Models: ESM2, ESM3, ESMC, AbLang, Evo1, Evo2, ProGen2, ProGen3
  • Sequence Scoring: Enformer, Borzoi, AlphaGenome
  • Structure Design: RFDiffusion3
  • Structure Dynamics: BioEmu
  • RNA Splicing: SpliceTransformer

CPU Tools

Classical bioinformatics algorithms and binary tools. Run anywhere.
  • Gene Annotation: PyHMMER, CRISPRtracrRNA, MinCED, Promoter Calculator
  • Sequence Alignment: BLAST, MMseqs2, MAFFT, ColabFold Search
  • ORF Prediction: Orfipy, Prodigal
  • Structure Prediction: ViennaRNA (RNA only)
  • Structure Alignment: TMAlign, USAlign
  • Database Retrieval: UniProt, PDB, NCBI

Environment Isolation

Some tools have complex or conflicting dependencies. These tools use isolated virtual environments managed by ToolInstance:
  • Each tool with isolated deps has a standalone/ directory with setup.sh and run.py
  • Virtual environments are created automatically on first use
  • Execution is handled transparently; you call the same run_tool() API
See the Tool Environments guide for details on how this works.

Tool Registry

All tools are registered via the @tool() decorator, enabling automatic discovery and schema generation:
python
from proto_tools.tools.tool_registry import ToolRegistry

# List all available tools
all_tools = ToolRegistry.list_all()
for tool_spec in all_tools:
    print(f"{tool_spec.key}: {tool_spec.label}")

# Get a specific tool's schema
schema = ToolRegistry.get_schemas("esmfold-prediction")

# Get a minimal example input
example = ToolRegistry.get_example_input("esmfold-prediction")

# Get citation
citation = ToolRegistry.get_citation("esmfold-prediction")

Next Steps

Entities

Structure and Ligand data objects used by tools

Quickstart

Run your first tool in 5 minutes

Tool Persistence

Batch workloads with persistent tool instances

Device Management

GPU allocation and multi-device execution