Installation

Proto requires Python 3.10+ and runs on Linux and macOS. The package installs with pip and pulls in the proto-tools execution layer automatically, so there is no conda environment to create and no submodule to check out.

Setup

Install the package

bash

pip install git+https://github.com/proto-bio/proto-language.git

This installs the proto-tools execution layer automatically. The system build tools that standalone tool environments need (git, curl, gcc, make, cmake) are provisioned on first use through proto-tools’ shared foundation environment, so there is nothing else to install.

A direct PyPI install (pip install proto-language) is planned. Until proto-tools is published to PyPI, the GitHub installation above is the supported path.

Contributing to Proto itself? Use the editable installation instead.

Configure storage (optional)

All persistent data (model weights, tool environments, and the micromamba binary) lives under PROTO_HOME, which defaults to ~/.proto/ and is inherited from proto-tools. To move it elsewhere (recommended for lab and HPC environments), set it in your shell profile:

bash

# Add to your ~/.bashrc:
export PROTO_HOME=/path/to/your/proto_home

To override only the model-weights location, set export PROTO_MODEL_CACHE=/path/to/shared/weights.

Gated model access (optional)

Some generators and constraints load gated models (for example ESM3, AlphaGenome, and AlphaFold3) that require accepting a license and authenticating with Hugging Face. After accepting each model’s terms, export your token:

bash

export HF_TOKEN=your_token_here

See the proto-tools installation guide for the full procedure and the list of gated models.

Verify Installation

Run this script to confirm everything is working:

python

from proto_language.core import Segment, Construct
from proto_language.generator import RandomNucleotideGenerator, RandomNucleotideGeneratorConfig

# Create a segment and construct
segment = Segment(length=50, sequence_type="dna")
construct = Construct(segments=[segment])

# Set up a generator
generator = RandomNucleotideGenerator(RandomNucleotideGeneratorConfig())
generator.assign(segment)

print(f"Segment: {segment.sequence_length}bp {segment.sequence_type}")
print("Installation successful!")

Expected output:

Segment: 50bp dna
Installation successful!

Developers

Contributors install editable checkouts of both layers from the proto-tools submodule:

bash

git clone https://github.com/proto-bio/proto-language.git
cd proto-language
git submodule update --init --recursive

pip install -e ".[dev]"               # language layer and dev tools (proto-tools installed from git)
pip install -e "./proto-tools[dev]"   # override with the editable submodule

Run the proto-tools editable install last: it replaces the git-installed proto-tools with the local submodule so edits within proto-tools/ take effect immediately. System build tools are still provisioned automatically through the foundation environment.

Install Options

Extra	What’s Included	Use Case
(none)	Core framework only	Writing and running optimization programs
`dev`	pytest (+asyncio, cov, forked, randomly), ruff, mypy, docstring-parser	Testing and linting

Prerequisites

Python 3.10+
pip
Git (only for the editable developer install)
NVIDIA GPU with CUDA 12.1+ (optional, for ML-based tools)

GPU Requirements

ML-based generators and constraints run substantially faster on a GPU; the table below lists per-tool speedups and VRAM requirements.

Tool	Purpose	GPU Benefit	VRAM Required
ESMFold	Protein structure prediction	~10x faster	8-16 GB
ESM2 / ESM3	Protein language model generation	Required for large batches	4-16 GB
Evo2	DNA generation	Required	16+ GB
ProteinMPNN	Structure-conditioned protein design	~5x faster	4-8 GB
Boltz2 / Chai1	Multi-chain structure prediction	~20x faster	16-24 GB
AlphaFold3	Structure prediction	~20x faster	16-24 GB
Enformer / Borzoi	Genomic expression prediction	~50x faster	8-16 GB

Troubleshooting

CUDA out of memory

Reduce the number of proposals maintained by the optimizer:

python

config = MCMCOptimizerConfig(
    num_results=1,  # Reduce from larger values
    num_steps=100,
)

You can also use a smaller model checkpoint for ESM2:

python

config = ESM2GeneratorConfig(
    model_checkpoint="esm2_t6_8M_UR50D",  # Smallest: 8M params
)

Import errors for ML models

A ModuleNotFoundError for packages like esm, torch, or boltz in your main interpreter is expected. ML-based generators and constraints (ESM2, ESMFold, ProteinMPNN, and others) get their dependencies, including PyTorch, from proto-tools’ isolated tool environments, not from the main package.proto-tools builds each tool’s environment on first use and provisions the required system build tools through the shared foundation environment. If a tool fails to find its environment, confirm that proto-tools is installed with pip show proto-tools; the GitHub installation above pulls it in automatically.

flash-attn installation fails

flash-attn requires CUDA toolkit headers. Ensure you have CUDA 12.1+ installed on your system, then retry:

bash

pip install flash-attn --no-build-isolation

If it still fails, you can skip flash-attn; it is an optional performance optimization for attention-heavy models, not a hard requirement.

Installation

Installation

Setup

Verify Installation

Developers

Install Options

Prerequisites

GPU Requirements

Troubleshooting

Next Steps

Quickstart

Core Concepts

​Installation

​Setup

​Verify Installation

​Developers

​Install Options

​Prerequisites

​GPU Requirements

​Troubleshooting

​Next Steps

Quickstart

Core Concepts

Installation

Setup

Verify Installation

Developers

Install Options

Prerequisites

GPU Requirements

Troubleshooting

Next Steps