Tool Environments

Each tool in proto_tools runs inside its own isolated environment, managed by micromamba. On a tool’s first invocation, the library detects the host hardware, resolves the tool’s dependencies, installs them into a dedicated environment under $PROTO_HOME/proto_tool_envs/, and executes the tool there. Subsequent invocations reuse the cached environment and proceed directly to execution. Because each tool is confined to its own environment, tools with mutually incompatible dependencies can be used together from a single Python program. One tool may require PyTorch 2.1 and another PyTorch 2.4, or one may pin an older Python release and another the most recent; the conflicting requirements never coexist in one environment, and no manual environment management is required. Open as a runnable notebook

The problem it solves

Computational biology tools are difficult to install alongside one another. AlphaFold3 requires a build of JAX targeting a specific CUDA version; ESM2 expects a particular release of PyTorch; BLAST is a system binary that must be available on the executable search path; RFdiffusion pins a specific Python version; and each tool stores its model weights in a different location. Installing several of them into a single environment frequently produces unresolvable dependency conflicts. The conventional solution is to maintain a separate Conda environment per tool and to activate the appropriate one before each run. proto_tools instead provisions and manages a dedicated environment for each tool automatically. Because no two tools share an environment, their conflicting requirements never interact.

Your script calls into proto_tools, which dispatches each tool into its own isolated environment

Contents of an environment

Each environment resides at $PROTO_HOME/proto_tool_envs/{tool_key}_env/ and is fully self-contained. It comprises an isolated Python interpreter pinned to the version the tool requires; the tool’s dependencies, resolved as hardware-appropriate package builds (for example, the CUDA build matching the available GPU, or a CPU-only build on a machine without one); and a local cache for the compiled extensions and just-in-time (JIT) artifacts produced at run time. Tools that provide a command-line binary, such as BLAST or MAFFT, have that binary installed into the environment, and any variables declared in the tool’s env_vars.txt file (for example, Hugging Face access tokens or backend selections) are applied automatically when the tool runs. Model weights are the sole component stored outside the environment. They reside separately under $PROTO_HOME/proto_model_cache/{tool_key}/ (configurable via PROTO_MODEL_CACHE), so that rebuilding an environment does not re-download large model checkpoints.

First call vs. cached call

On the first invocation of a tool on a given machine, the library performs the complete setup procedure. It reads the tool’s standalone setup scripts, detects the host hardware (GPU availability, CUDA version, operating system, and processor architecture), and resolves a compatible dependency set, selecting the appropriate PyTorch or JAX build, the required Python version, and any necessary binaries. It then creates the isolated environment under $PROTO_HOME/proto_tool_envs/, installs the resolved dependencies, and executes the tool.

First call installs the env; subsequent calls reuse the cache

This setup cost is incurred only once. Every subsequent invocation finds the environment already present and proceeds directly to execution; setup is therefore a one-time cost per tool, per machine. The process can be observed by inspecting the environment directory. Before the first invocation it is empty, as no tool has yet required an environment:

python

import os

os.environ.setdefault("PROTO_HOME", os.path.expanduser("~/.proto"))
!ls $PROTO_HOME/proto_tool_envs/ 2>/dev/null || echo "(no tool environments yet)"

(no tool environments yet)

The following call invokes ESM2 for the first time. It uses the small esm2_t6_8M_UR50D checkpoint on CPU so that the example is reproducible on any machine. Because this first call builds the esm2_env environment before running the model, it takes considerably longer than the embedding computation alone:

python

from proto_tools.tools.masked_models.esm2.esm2_embeddings import (
    ESM2EmbeddingsConfig,
    ESM2EmbeddingsInput,
    run_esm2_embeddings,
)

# First call: builds the isolated environment, then runs inference.
output = run_esm2_embeddings(
    ESM2EmbeddingsInput(sequences=["MKTLLILAVVAAALA"]),
    ESM2EmbeddingsConfig(model_checkpoint="esm2_t6_8M_UR50D", device="cpu"),
)
print(f"Mean embedding dimension: {len(output.results[0].mean_embedding)}")

The environment now exists on disk and persists for all subsequent calls:

python

!ls $PROTO_HOME/proto_tool_envs/

esm2_env

A second invocation finds the environment already built and proceeds directly to inference, with no installation step.

Hardware awareness

A single shared detection routine determines the available hardware, including the CUDA version, GPU architecture, operating system, and processor architecture, and supplies this information to every tool’s setup. As a result, each tool installs the build appropriate to the host rather than a generic one: the PyTorch or JAX build is selected for the specific GPU present; a tool runs on CPU on a machine without a GPU and with GPU acceleration on a machine that has one; and tools that compile CUDA code at run time request the GCC and nvcc versions compatible with the detected CUDA toolkit. The correct environment is produced without manual configuration.

Debugging setup

When a tool’s environment fails to build, most commonly on a newly encountered HPC cluster, two environment variables provide additional visibility into the execution of its setup.sh:

export PROTO_ENV_VERBOSE=1           # stream setup output to the terminal
export PROTO_ENV_LOG_DIR=./env-logs  # also retain a copy of each setup log

Setting PROTO_ENV_VERBOSE=1 streams each line of the tool’s setup.sh to the terminal as it executes, allowing a lengthy installation (for example, PyTorch or flash-attn) to be monitored in real time and the point of failure to be identified. Setting PROTO_ENV_LOG_DIR copies the completed log to <PROTO_ENV_LOG_DIR>/<toolkit>_setup.log once setup finishes, which is useful when the environment directory is ephemeral. Regardless of these settings, the complete log is always written to <env_path>/setup.log during setup and can be inspected afterward.

Overriding a tool’s setup

Each tool’s installation script is written to be as general and portable as possible, but a particular system may nonetheless present a conflict that warrants further customization. In such cases, the recommended approach is to patch that toolkit’s setup.sh through the override mechanism, which supplies an editable copy of the tool’s environment definition for proto_tools to use in place of the packaged one. It requires no modification to the installed package and behaves identically whether proto_tools is run from a source checkout or a pip installation. The command proto-tools eject-standalone <toolkit> copies the tool’s setup files into ./proto_standalone/<toolkit>/ and prints the environment variable that references them:

proto-tools eject-standalone esm2                   # -> ./proto_standalone/esm2/
# edit ./proto_standalone/esm2/setup.sh
export PROTO_ESM2_STANDALONE_DIR=$PWD/proto_standalone/esm2

Once this variable is set, subsequent calls to the tool build from the local copy rather than the packaged one. The reference below documents the full behavior, including environment isolation, validation, and per-project scoping.

Go deeper

For the complete implementation reference, including standalone setup patterns, env_vars.txt conventions, GCC/nvcc compatibility matrices, the to_device() protocol, version-pinned tools, and instructions for authoring a new tool environment, consult the developer notes in the proto-tools repository: Tool Environments ReferenceStandalone env setup, compute deps, GCC/nvcc, caches, binaries, and the to_device() protocol.

Go deeper

For the implementation details behind this guide, consult the developer notes in the proto-tools repository: Tool Environments ReferenceCompute-dependency detection, standalone env builds and overrides, shared environments, Python version pinning, binary installs, and CUDA/ABI compatibility.

Next Steps

Device Management

How tools are placed on GPUs, with LRU eviction and CPU offload.

Tool Persistence

Keep a model loaded across calls to skip repeated load times.

Parallel Execution

Fan work out across every available GPU.

Cloud Inference

Dispatch tool runs to remote compute.

​The problem it solves

​Contents of an environment

​First call vs. cached call

​Hardware awareness

​Debugging setup

​Overriding a tool’s setup

​Go deeper

​Go deeper

​Next Steps

Device Management

Tool Persistence

Parallel Execution

Cloud Inference

The problem it solves

Contents of an environment

First call vs. cached call

Hardware awareness

Debugging setup

Overriding a tool’s setup

Go deeper

Go deeper

Next Steps