Skip to main content
License: ProGen3 uses Apache-2.0 for code and CC-BY-NC-SA-4.0 for model weights and has restrictions around commercial use and may require explicit attribution when utilized. Please refer to the code license and model weights license for full terms.

Proto is not affiliated with Profluent. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


Profluent-AI/progen3
Profluent-AI/progen3
Public Release of ProGen3 family of Models
104 stars
View repo
Scaling unlocks broader generation and deeper functional understanding of proteins
Aadyot Bhatnagar, Sarthak Jain, … Ali Madani
bioRxiv (2025)
Read preprint
@article{bhatnagar2025progen3,
  title={Scaling unlocks broader generation and deeper functional understanding of proteins},
  author={Bhatnagar, Aadyot and Jain, Sarthak and Beazer, Joel and Curran, Samuel C and Hoffnagle, Alexander M and Ching, Kyle S and Martyn, Michael and Nayfach, Stephen and Ruffolo, Jeffrey A and Madani, Ali},
  journal={bioRxiv},
  year={2025},
  doi={10.1101/2025.04.15.649055},
  publisher={Cold Spring Harbor Laboratory}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/causal_models/progen3
View source
Open Notebook
Open notebook
FunctionDescription
run_progen3_sample()Sample protein sequences using ProGen3 language model (GPU) Docs Source
run_progen3_score()Score protein sequences using ProGen3 language model (GPU) Docs Source

Background

ProGen3 (Bhatnagar et al., 2025) is a family of generative protein language models from Profluent. ProGen3 models employ a sparse mixture-of-experts (MoE) architecture, which routes model activations in the transformer feed-forward layers to smaller specialized MLPs to make each forward pass more computationally tractable. The published family spans 112 million to 46 billion parameters; this toolkit exposes the progen3-112m through progen3-3b checkpoints. Pre-training used roughly 1.5 trillion amino-acid tokens sampled from the Profluent Protein Atlas, a curated collection of full-length natural proteins. Unlike a strictly left-to-right model, ProGen3 is trained autoregressively in both directions: forward predicts each residue from the N-terminus toward the C-terminus, and reverse predicts from the C-terminus toward the N-terminus. Generation runs in a chosen direction, and scoring combines both directions into a single per-residue likelihood. Two capabilities follow from this objective. Sampling from the predicted next-residue distributions produces new candidate protein sequences, and the likelihood the model assigns to an existing sequence provides a zero-shot proxy-fitness score with no additional task-specific training.

Learning Resources

  • ProGen3 showcase (Profluent) - an accessible overview of ProGen3, the Profluent Protein Atlas training data, and downstream applications such as antibody design and compact gene editors.

Tools

ProGen3 Sampling (progen3-sample)

Generates protein sequences by autoregressive sampling. Given one or more prompt sequences, the model extends each prompt one amino acid at a time, drawing each residue from the model’s predicted distribution under the configured temperature and top_p settings, in the chosen direction, until max_new_tokens residues have been generated (at least min_new_tokens).

API Reference

Source
prompts
List[string]
required
Prompt sequences to condition generation on. Can be provided as a single string or a list of strings.
Source
model_checkpoint
enum
default:"progen3-762m"
ProGen3 weights variant. Sizes range from 112M (fastest) to 3B (highest quality).Available options: progen3-112m, progen3-219m, progen3-339m, progen3-762m, progen3-1b, progen3-3b
local_path
string
Override HuggingFace download with a local weights directory.
direction
enum
default:"forward"
"forward" generates N→C, "reverse" generates C→N.Available options: forward, reverse
max_new_tokens
integer
default:"256"
Maximum new tokens to generate per prompt (excludes prompt).
min_new_tokens
integer
default:"1"
Minimum new tokens to generate per prompt before stopping is allowed.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run the model on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
prepend_prompt
boolean
default:"True"
If True, returned sequences include the prompt and newly generated residues; if False, only the newly generated residues.
temperature
number
default:"0.2"
Softmax temperature; lower values are more deterministic, higher values increase diversity.
top_p
number
default:"0.95"
Nucleus sampling threshold over per-position token probabilities.
batch_size
integer
default:"1"
Maximum number of same-length prompts to process simultaneously on GPU.
Source
sequences
List[string]
required
Generated sequences.

Applications

This tool performs de novo protein design, generating novel sequences that resemble natural proteins, optionally conditioned on a prompt. Because generation can run in reverse (C-terminus toward N-terminus), a C-terminal fragment can be used as the prompt and the rest of the sequence grown toward the N-terminus, which a strictly left-to-right model cannot do.

Usage Tips

  • direction chooses which terminus is generated. "forward" (the default) continues a prompt from the N-terminus toward the C-terminus; "reverse" treats the prompt as a C-terminal fragment and generates toward the N-terminus. Note that the reverse generation will append to the prompt to grow the sequence on the left. All starting sequences should still be provided in the left to right direction from N->C.
  • Sampling defaults are conservative. temperature defaults to 0.2 and top_p to 0.95, which keep generations close to natural-looking sequences; raise temperature for more diverse but riskier designs. This tool exposes only nucleus (top_p) sampling for ProGen3; there is no top-k cutoff.
  • max_new_tokens and min_new_tokens bound the generated length. They count only newly generated residues (default 256 and 1), separate from the prompt length.
  • Output includes the prompt by default. prepend_prompt=True (the toolkit default) returns the prompt joined to its continuation; set it False to receive only the newly generated residues.
  • Generated sequences are candidates. Validate them with downstream tools (for example structure prediction, function annotation, or homology search) before drawing biological conclusions.

ProGen3 Scoring (progen3-score)

Scores existing protein sequences under ProGen3 using bidirectional likelihood. For each sequence it runs both a forward (N→C) and a reverse (C→N) pass, averages the per-position log-likelihoods into a single bidirectional value, and aggregates these into a log-likelihood, an average log-likelihood per residue, and a perplexity. It also exposes the forward, reverse, and bidirectional per-position values, and optionally the per-position logits.

API Reference

Source
sequences
List[string]
required
Sequences to score. Can be provided as a single string or a list of strings.
Source
model_checkpoint
enum
default:"progen3-762m"
ProGen3 weights variant. Sizes range from 112M to 3B parameters.Available options: progen3-112m, progen3-219m, progen3-339m, progen3-762m, progen3-1b, progen3-3b
local_path
string
Override HuggingFace download with a local weights directory.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run the model on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
batch_size
integer
default:"1"
Number of sequences to process simultaneously on GPU.
return_logits
boolean
default:"False"
Whether to include forward-pass per-position logits in the output. Reverse-pass info is already exposed via per_position_metrics (forward/reverse/bidirectional log-likelihoods).
Source
scores
List[CausalModelScoringMetrics]
required
List of scoring outputs, one per input sequence. Each entry is a Metrics subclass with scalar metrics (log_likelihood, avg_log_likelihood, perplexity) and optional per-position _pp-suffixed list extras; logits and vocab are declared fields for raw model outputs.
Metrics (one set per scores item)
MetricTypeRangeAvailability
log_likelihoodfloat≤ 0.0always
avg_log_likelihoodfloat≤ 0.0always
perplexityfloat≥ 1.0always

Applications

This tool gives a zero-shot measure of how consistent a protein sequence is with ProGen3’s training distribution, usable as a fitness or plausibility signal without additional task-specific training. Because it uses both directions, every residue is scored with full surrounding context rather than left context only. Use it to rank or filter candidate sequences (including the output of progen3-sample), to compare variants of a sequence, or to flag sequences far from the model’s training distribution.

Usage Tips

  • Scores are bidirectional, not a single-direction log-likelihood. The reported log_likelihood, avg_log_likelihood, and perplexity are derived from the averaged forward and reverse per-position values, so they are not directly comparable to a one-directional model’s scores.
  • Compare length-normalized scores within one checkpoint. Total log_likelihood scales with sequence length, so use perplexity or avg_log_likelihood when comparing sequences of different lengths. Different checkpoints learn different distributions that are not calibrated to a common scale, so scores from different model_checkpoint values are hard to compare directly; a lower perplexity means the sequence is more consistent with that checkpoint’s training distribution.
  • return_logits defaults to False. Leave it off unless you need the per-position distributions, since the logits tensor is large (sequence length by the token vocabulary).

Toolkit Notes

These apply to every ProGen3 tool in this toolkit (progen3-sample, progen3-score).
  • Requires a GPU; memory scales with checkpoint size. This toolkit exposes the progen3-112m through progen3-3b checkpoints; larger checkpoints are more capable but need substantially more GPU memory. CPU execution is not practical.
  • batch_size trades memory for throughput across both tools. It sets how many same-length prompts (progen3-sample) or sequences (progen3-score) are processed per GPU forward pass. Raise it for higher throughput on many short sequences; lower it (default 1) if generation or scoring runs out of GPU memory.
  • model_checkpoint selects the model size. The default is progen3-762m; smaller checkpoints (progen3-112m, progen3-219m, progen3-339m) are faster and lighter, while progen3-1b and progen3-3b are more capable at higher memory cost.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.