Skip to main content
The canonical version of this document lives in the proto-tools repo.
This note covers where proto_tools stores model weights, tool environments, caches, and databases, and the environment variables that control those locations. Tools download model weights on first use, and PROTO_HOME (defaults to ~/.proto/) controls where those weights are stored.
# Recommended: add to ~/.bashrc
export PROTO_HOME=/path/to/your/proto_home
PROTO_MODEL_CACHE can override just the model weights location (defaults to PROTO_HOME/proto_model_cache/). PROTO_DATABASES_DIR can override just the sequence-databases location (defaults to PROTO_MODEL_CACHE/databases/). See Databases.

Storage layout

Everything lives under PROTO_HOME regardless of install mode:
PROTO_HOME/                   (default: ~/.proto/)
├── proto_model_cache/        model weights (HF_HOME, TORCH_HOME, resolve_weights_dir)
│   └── databases/            provisioned MMseqs2 databases (override: PROTO_DATABASES_DIR)
├── proto_tool_envs/          micromamba-managed tool venvs
├── uv_cache/                 uv package download cache (UV_CACHE_DIR)
├── pip_cache/                pip HTTP cache (PIP_CACHE_DIR)
└── .micromamba/              micromamba binary + package cache

uv / pip package caches

Tool env builds use uv pip install (and occasionally pip install) to fetch Python wheels. By default, proto-tools routes both caches under PROTO_HOME so all disk usage is consolidated and cleanable atomically.
  • UV_CACHE_DIR defaults to PROTO_HOME/uv_cache/ (extracted archives + HTTP cache for uv)
  • PIP_CACHE_DIR defaults to PROTO_HOME/pip_cache/ (HTTP + wheel cache for pip)
Both are injected by persistent_worker._build_subprocess_env() via setdefault, so any value set in your shell overrides the default. To share the cache across projects, set them explicitly:
export UV_CACHE_DIR=~/.cache/uv
export PIP_CACHE_DIR=~/.cache/pip
The cache and tool envs share a filesystem by default, which lets uv hard-link from the cache into envs (saves bulk, because the extracted archive is not duplicated). Wiping PROTO_HOME/proto_tool_envs/ preserves the cache, so the next rebuild is fast.

Modes

ModeHF_HOMENon-HF weightsTORCH_HOME
(unset, default){PROTO_HOME}/proto_model_cache/huggingface/{PROTO_HOME}/proto_model_cache/{toolkit}/{PROTO_HOME}/proto_model_cache/torch/
/absolute/path/absolute/path/huggingface//absolute/path/{toolkit}//absolute/path/torch/
IN_ENV{venv}/cache/huggingface/{venv}/model_weight_cache/{venv}/cache/torch/
NONEParent HF_HOME passthrough{venv}/weights/Parent TORCH_HOME passthrough
The default (proto_model_cache/ under PROTO_HOME) keeps weights outside tool envs so they survive env rebuilds.

Shared weights for teams

For teams sharing weights across collaborators, set PROTO_MODEL_CACHE to a shared directory while keeping PROTO_HOME per-user:
# Per-user: tool envs and micromamba (should NOT be shared between users,
# as different users may have different CUDA versions, library paths, etc.)
export PROTO_HOME=~/.proto

# Shared with collaborators: just model weights (safe for concurrent access;
# HuggingFace uses file locks internally to handle simultaneous downloads)
export PROTO_MODEL_CACHE=/shared/team/model_weights
Do not share PROTO_HOME itself across users, because tool environments are user-specific and should remain per-user. Only model weights (PROTO_MODEL_CACHE) are safe to share.

Databases

Sequence databases for mmseqs2-homology-search (UniRef30, the ColabFold envdb, etc.) are large (tens to hundreds of GB each) and, once indexed, read-only. By default they live under PROTO_MODEL_CACHE/databases/, but PROTO_DATABASES_DIR overrides the databases root directly, so you can keep them on a separate filesystem from model weights, e.g. a high-capacity scratch volume:
# Databases on scratch; weights wherever PROTO_MODEL_CACHE points
export PROTO_DATABASES_DIR=/scratch/$USER/proto_databases
The override applies to both provisioning (setup_databases.py writes there) and runtime (the tool resolves datasets there), so a single export keeps them consistent, with no symlinks. Like weights, the databases root is safe to NFS-mount and share across collaborators (read-only after indexing). Provision into it with:
python -m proto_tools.tools.sequence_alignment.mmseqs2.setup_databases <dataset>   # e.g. uniref30-2302
setup_databases.py --workdir <dir> overrides the root for a single invocation instead.

Per-tool override

PROTO_{TOOL_NAME}_WEIGHTS_DIR always wins, regardless of mode:
export PROTO_FAMPNN_WEIGHTS_DIR=/custom/path/fampnn
export PROTO_PROTENIX_WEIGHTS_DIR=/custom/path/protenix