The canonical version of this document lives in the proto-tools repo.
proto_tools stores model weights, tool environments, caches, and databases, and the environment variables that control those locations.
Tools download model weights on first use, and PROTO_HOME (defaults to ~/.proto/) controls where those weights are stored.
PROTO_MODEL_CACHE can override just the model weights location (defaults to PROTO_HOME/proto_model_cache/).
PROTO_DATABASES_DIR can override just the sequence-databases location (defaults to PROTO_MODEL_CACHE/databases/). See Databases.
Storage layout
Everything lives underPROTO_HOME regardless of install mode:
uv / pip package caches
Tool env builds useuv pip install (and occasionally pip install) to fetch Python wheels. By default, proto-tools routes both caches under PROTO_HOME so all disk usage is consolidated and cleanable atomically.
UV_CACHE_DIRdefaults toPROTO_HOME/uv_cache/(extracted archives + HTTP cache foruv)PIP_CACHE_DIRdefaults toPROTO_HOME/pip_cache/(HTTP + wheel cache forpip)
persistent_worker._build_subprocess_env() via setdefault, so any value set in your shell overrides the default. To share the cache across projects, set them explicitly:
uv hard-link from the cache into envs (saves bulk, because the extracted archive is not duplicated). Wiping PROTO_HOME/proto_tool_envs/ preserves the cache, so the next rebuild is fast.
Modes
| Mode | HF_HOME | Non-HF weights | TORCH_HOME |
|---|---|---|---|
| (unset, default) | {PROTO_HOME}/proto_model_cache/huggingface/ | {PROTO_HOME}/proto_model_cache/{toolkit}/ | {PROTO_HOME}/proto_model_cache/torch/ |
/absolute/path | /absolute/path/huggingface/ | /absolute/path/{toolkit}/ | /absolute/path/torch/ |
IN_ENV | {venv}/cache/huggingface/ | {venv}/model_weight_cache/ | {venv}/cache/torch/ |
NONE | Parent HF_HOME passthrough | {venv}/weights/ | Parent TORCH_HOME passthrough |
proto_model_cache/ under PROTO_HOME) keeps weights outside tool envs so they survive env rebuilds.
Shared weights for teams
For teams sharing weights across collaborators, setPROTO_MODEL_CACHE to a shared directory while keeping PROTO_HOME per-user:
PROTO_HOME itself across users, because tool environments are user-specific and should remain per-user. Only model weights (PROTO_MODEL_CACHE) are safe to share.
Databases
Sequence databases formmseqs2-homology-search (UniRef30, the ColabFold envdb, etc.) are large (tens to hundreds of GB each) and, once indexed, read-only. By default they live under PROTO_MODEL_CACHE/databases/, but PROTO_DATABASES_DIR overrides the databases root directly, so you can keep them on a separate filesystem from model weights, e.g. a high-capacity scratch volume:
setup_databases.py writes there) and runtime (the tool resolves datasets there), so a single export keeps them consistent, with no symlinks. Like weights, the databases root is safe to NFS-mount and share across collaborators (read-only after indexing). Provision into it with:
setup_databases.py --workdir <dir> overrides the root for a single invocation instead.
Per-tool override
PROTO_{TOOL_NAME}_WEIGHTS_DIR always wins, regardless of mode: