Skip to main content

Installation

Proto requires Python 3.10+ and runs on Linux and macOS. The package installs with pip and pulls in the proto-tools execution layer automatically, so there is no conda environment to create and no submodule to check out.

Setup

1

Install the package

bash
pip install git+https://github.com/proto-bio/proto-language.git
This installs the proto-tools execution layer automatically. The system build tools that standalone tool environments need (git, curl, gcc, make, cmake) are provisioned on first use through proto-tools’ shared foundation environment, so there is nothing else to install.
A direct PyPI install (pip install proto-language) is planned. Until proto-tools is published to PyPI, the GitHub installation above is the supported path.
Contributing to Proto itself? Use the editable installation instead.
2

Configure storage (optional)

All persistent data (model weights, tool environments, and the micromamba binary) lives under PROTO_HOME, which defaults to ~/.proto/ and is inherited from proto-tools. To move it elsewhere (recommended for lab and HPC environments), set it in your shell profile:
bash
# Add to your ~/.bashrc:
export PROTO_HOME=/path/to/your/proto_home
To override only the model-weights location, set export PROTO_MODEL_CACHE=/path/to/shared/weights.
3

Gated model access (optional)

Some generators and constraints load gated models (for example ESM3, AlphaGenome, and AlphaFold3) that require accepting a license and authenticating with Hugging Face. After accepting each model’s terms, export your token:
bash
export HF_TOKEN=your_token_here
See the proto-tools installation guide for the full procedure and the list of gated models.

Verify Installation

Run this script to confirm everything is working:
python
from proto_language.core import Segment, Construct
from proto_language.generator import RandomNucleotideGenerator, RandomNucleotideGeneratorConfig

# Create a segment and construct
segment = Segment(length=50, sequence_type="dna")
construct = Construct(segments=[segment])

# Set up a generator
generator = RandomNucleotideGenerator(RandomNucleotideGeneratorConfig())
generator.assign(segment)

print(f"Segment: {segment.sequence_length}bp {segment.sequence_type}")
print("Installation successful!")
Expected output:
Segment: 50bp dna
Installation successful!

Developers

Contributors install editable checkouts of both layers from the proto-tools submodule:
bash
git clone https://github.com/proto-bio/proto-language.git
cd proto-language
git submodule update --init --recursive

pip install -e ".[dev]"               # language layer and dev tools (proto-tools installed from git)
pip install -e "./proto-tools[dev]"   # override with the editable submodule
Run the proto-tools editable install last: it replaces the git-installed proto-tools with the local submodule so edits within proto-tools/ take effect immediately. System build tools are still provisioned automatically through the foundation environment.

Install Options

ExtraWhat’s IncludedUse Case
(none)Core framework onlyWriting and running optimization programs
devpytest (+asyncio, cov, forked, randomly), ruff, mypy, docstring-parserTesting and linting

Prerequisites

  • Python 3.10+
  • pip
  • Git (only for the editable developer install)
  • NVIDIA GPU with CUDA 12.1+ (optional, for ML-based tools)

GPU Requirements

ML-based generators and constraints run substantially faster on a GPU; the table below lists per-tool speedups and VRAM requirements.
ToolPurposeGPU BenefitVRAM Required
ESMFoldProtein structure prediction~10x faster8-16 GB
ESM2 / ESM3Protein language model generationRequired for large batches4-16 GB
Evo2DNA generationRequired16+ GB
ProteinMPNNStructure-conditioned protein design~5x faster4-8 GB
Boltz2 / Chai1Multi-chain structure prediction~20x faster16-24 GB
AlphaFold3Structure prediction~20x faster16-24 GB
Enformer / BorzoiGenomic expression prediction~50x faster8-16 GB

Troubleshooting

Reduce the number of proposals maintained by the optimizer:
python
config = MCMCOptimizerConfig(
    num_results=1,  # Reduce from larger values
    num_steps=100,
)
You can also use a smaller model checkpoint for ESM2:
python
config = ESM2GeneratorConfig(
    model_checkpoint="esm2_t6_8M_UR50D",  # Smallest: 8M params
)
A ModuleNotFoundError for packages like esm, torch, or boltz in your main interpreter is expected. ML-based generators and constraints (ESM2, ESMFold, ProteinMPNN, and others) get their dependencies, including PyTorch, from proto-tools’ isolated tool environments, not from the main package.proto-tools builds each tool’s environment on first use and provisions the required system build tools through the shared foundation environment. If a tool fails to find its environment, confirm that proto-tools is installed with pip show proto-tools; the GitHub installation above pulls it in automatically.
flash-attn requires CUDA toolkit headers. Ensure you have CUDA 12.1+ installed on your system, then retry:
bash
pip install flash-attn --no-build-isolation
If it still fails, you can skip flash-attn; it is an optional performance optimization for attention-heavy models, not a hard requirement.

Next Steps

Quickstart

Design a first sequence

Core Concepts

Understand segments, constructs, generators, constraints, and optimizers