ToolPool does this automatically. A call wrapped in a with ToolPool(): context is partitioned across every available device, the pieces run in parallel worker processes, and the results are reassembled in the original order.
ToolPool is the multi-GPU counterpart to ToolInstance.persist(). persist() is appropriate when one worker on one GPU is enough to amortize the load cost across a batch; ToolPool() is appropriate when that same batch should be spread across several GPUs to reduce wall-clock time. The two mechanisms cooperate: inside a ToolPool block, every worker stays warm for later calls in the same block, so persistence is obtained automatically.
| Feature | What it does |
|---|---|
| Transparent interception | Any tool with an iterable_input_field is auto-partitioned. |
| Cost-aware scheduling | Items are distributed via LPT bin-packing using each tool’s item_cost() estimate. |
| Built-in persistence | Workers stay alive across calls within the pool, so reloading happens at most once per GPU. |
| Device auto-detection | Discovers all visible GPUs, or accepts an explicit device list. |
| Automatic dedup | Duplicate items are computed once and expanded back to their original positions. |
ToolPool is for local inference on hardware under the caller’s control. Under cloud inference, partitioning and parallelism are handled remotely, so ToolPool is not needed.1. Basic usage (auto-detect GPUs)
The simplest form ofToolPool takes no arguments. Every GPU the process can see joins the pool, and any run_* call inside the block is partitioned across all of them. Results are returned in input order, so the multi-GPU execution is invisible to the caller.
python
2. Restricting or choosing devices
In some cases not every visible GPU should be used: another job may occupy some of the cards, or a benchmark may require a fixed number of devices. Passinggpus=[...] restricts the pool explicitly.
python
3. Persistence within a pool
Once a worker loads inside aToolPool block, it stays resident for every later call in that same block. The first call pays the model-loading cost on every GPU in the pool; every later call skips the load and runs against workers that are already warm.
python
4. Cost-aware scheduling
Real batches rarely contain items of uniform cost. One protein sequence may be 40 residues and another 800; a longer sequence requires proportionally more compute, so it dominates its partition. Distributing items round-robin makes whichever worker received the long items the bottleneck while the others sit idle.ToolPool avoids this with longest-processing-time-first (LPT) bin-packing. Each tool reports a per-item cost estimate (for structure prediction, for example, the total residue count). ToolPool sorts the batch by descending cost and assigns each item to whichever worker currently has the least total work. As a result, every GPU finishes at approximately the same wall-clock time, regardless of how the input sizes are distributed.
item_cost() is reasonable, a mixed batch of short and long inputs is balanced automatically. For a batch of equally sized inputs, scheduling reduces to round-robin.
5. Automatic deduplication
When the same input appears multiple times in a batch,ToolPool computes it once and expands the result back to every position where it appeared. This is transparent, so the returned list always has the same length and order as the input.
python
6. When to use ToolPool versus ToolInstance.persist()
ToolPool and ToolInstance.persist() address related but distinct problems:
| Situation | Use |
|---|---|
| One GPU, batch of calls | ToolInstance.persist() |
| Multiple GPUs, single large batch | ToolPool() |
| Multiple GPUs, manual control over which tool runs where | ToolInstance.persist_tool(instance_name=...) |
| Cloud inference | Neither; the provider handles it |
ToolPool only accelerates tools that declare an iterable_input_field (for example, complexes for ESMFold, sequences for ESM2). A tool that takes a single indivisible input runs on one GPU regardless of the pool size, because there is nothing to partition.
Configuration reference
python