proto_tools runs inside its own isolated environment, managed by micromamba. On a tool’s first invocation, the library detects the host hardware, resolves the tool’s dependencies, installs them into a dedicated environment under $PROTO_HOME/proto_tool_envs/, and executes the tool there. Subsequent invocations reuse the cached environment and proceed directly to execution.
Because each tool is confined to its own environment, tools with mutually incompatible dependencies can be used together from a single Python program. One tool may require PyTorch 2.1 and another PyTorch 2.4, or one may pin an older Python release and another the most recent; the conflicting requirements never coexist in one environment, and no manual environment management is required.
Open as a runnable notebook
The problem it solves
Computational biology tools are difficult to install alongside one another. AlphaFold3 requires a build of JAX targeting a specific CUDA version; ESM2 expects a particular release of PyTorch; BLAST is a system binary that must be available on the executable search path; RFdiffusion pins a specific Python version; and each tool stores its model weights in a different location. Installing several of them into a single environment frequently produces unresolvable dependency conflicts. The conventional solution is to maintain a separate Conda environment per tool and to activate the appropriate one before each run.proto_tools instead provisions and manages a dedicated environment for each tool automatically. Because no two tools share an environment, their conflicting requirements never interact.
Contents of an environment
Each environment resides at$PROTO_HOME/proto_tool_envs/{tool_key}_env/ and is fully self-contained. It comprises an isolated Python interpreter pinned to the version the tool requires; the tool’s dependencies, resolved as hardware-appropriate package builds (for example, the CUDA build matching the available GPU, or a CPU-only build on a machine without one); and a local cache for the compiled extensions and just-in-time (JIT) artifacts produced at run time. Tools that provide a command-line binary, such as BLAST or MAFFT, have that binary installed into the environment, and any variables declared in the tool’s env_vars.txt file (for example, Hugging Face access tokens or backend selections) are applied automatically when the tool runs.
Model weights are the sole component stored outside the environment. They reside separately under $PROTO_HOME/proto_model_cache/{tool_key}/ (configurable via PROTO_MODEL_CACHE), so that rebuilding an environment does not re-download large model checkpoints.
First call vs. cached call
On the first invocation of a tool on a given machine, the library performs the complete setup procedure. It reads the tool’s standalone setup scripts, detects the host hardware (GPU availability, CUDA version, operating system, and processor architecture), and resolves a compatible dependency set, selecting the appropriate PyTorch or JAX build, the required Python version, and any necessary binaries. It then creates the isolated environment under$PROTO_HOME/proto_tool_envs/, installs the resolved dependencies, and executes the tool.
python
esm2_t6_8M_UR50D checkpoint on CPU so that the example is reproducible on any machine. Because this first call builds the esm2_env environment before running the model, it takes considerably longer than the embedding computation alone:
python
python
Hardware awareness
A single shared detection routine determines the available hardware, including the CUDA version, GPU architecture, operating system, and processor architecture, and supplies this information to every tool’s setup. As a result, each tool installs the build appropriate to the host rather than a generic one: the PyTorch or JAX build is selected for the specific GPU present; a tool runs on CPU on a machine without a GPU and with GPU acceleration on a machine that has one; and tools that compile CUDA code at run time request the GCC and nvcc versions compatible with the detected CUDA toolkit. The correct environment is produced without manual configuration.Debugging setup
When a tool’s environment fails to build, most commonly on a newly encountered HPC cluster, two environment variables provide additional visibility into the execution of itssetup.sh:
PROTO_ENV_VERBOSE=1 streams each line of the tool’s setup.sh to the terminal as it executes, allowing a lengthy installation (for example, PyTorch or flash-attn) to be monitored in real time and the point of failure to be identified. Setting PROTO_ENV_LOG_DIR copies the completed log to <PROTO_ENV_LOG_DIR>/<toolkit>_setup.log once setup finishes, which is useful when the environment directory is ephemeral. Regardless of these settings, the complete log is always written to <env_path>/setup.log during setup and can be inspected afterward.
Overriding a tool’s setup
Each tool’s installation script is written to be as general and portable as possible, but a particular system may nonetheless present a conflict that warrants further customization. In such cases, the recommended approach is to patch that toolkit’ssetup.sh through the override mechanism, which supplies an editable copy of the tool’s environment definition for proto_tools to use in place of the packaged one. It requires no modification to the installed package and behaves identically whether proto_tools is run from a source checkout or a pip installation.
The command proto-tools eject-standalone <toolkit> copies the tool’s setup files into ./proto_standalone/<toolkit>/ and prints the environment variable that references them:
Go deeper
For the complete implementation reference, including standalone setup patterns,env_vars.txt conventions, GCC/nvcc compatibility matrices, the to_device() protocol, version-pinned tools, and instructions for authoring a new tool environment, consult the developer notes in the proto-tools repository:
Tool Environments ReferenceStandalone env setup, compute deps, GCC/nvcc, caches, binaries, and the to_device() protocol.
Go deeper
For the implementation details behind this guide, consult the developer notes in the proto-tools repository: Tool Environments ReferenceCompute-dependency detection, standalone env builds and overrides, shared environments, Python version pinning, binary installs, and CUDA/ABI compatibility.Next Steps
Device Management
How tools are placed on GPUs, with LRU eviction and CPU offload.
Tool Persistence
Keep a model loaded across calls to skip repeated load times.
Parallel Execution
Fan work out across every available GPU.
Cloud Inference
Dispatch tool runs to remote compute.