ProGen3 - Proto

License: ProGen3 uses Apache-2.0 for code and CC-BY-NC-SA-4.0 for model weights and has restrictions around commercial use and may require explicit attribution when utilized. Please refer to the code license and model weights license for full terms.

Proto is not affiliated with Profluent. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.

GitHub 104 GitHub 104

HuggingFace

HuggingFace Preprint Preprint Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook

Profluent-AI/progen3

Public Release of ProGen3 family of Models

Scaling unlocks broader generation and deeper functional understanding of proteins

Aadyot Bhatnagar, Sarthak Jain, … Ali Madani

bioRxiv (2025)

Read preprint

@article{bhatnagar2025progen3,
  title={Scaling unlocks broader generation and deeper functional understanding of proteins},
  author={Bhatnagar, Aadyot and Jain, Sarthak and Beazer, Joel and Curran, Samuel C and Hoffnagle, Alexander M and Ching, Kyle S and Martyn, Michael and Nayfach, Stephen and Ruffolo, Jeffrey A and Madani, Ali},
  journal={bioRxiv},
  year={2025},
  doi={10.1101/2025.04.15.649055},
  publisher={Cold Spring Harbor Laboratory}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/causal_models/progen3

View source

Open Notebook

Open notebook

Function	Description
`run_progen3_sample()`	Sample protein sequences using ProGen3 language model (GPU)	Docs Source
`run_progen3_score()`	Score protein sequences using ProGen3 language model (GPU)	Docs Source

Background

ProGen3 (Bhatnagar et al., 2025) is a family of generative protein language models from Profluent. ProGen3 models employ a sparse mixture-of-experts (MoE) architecture, which routes model activations in the transformer feed-forward layers to smaller specialized MLPs to make each forward pass more computationally tractable. The published family spans 112 million to 46 billion parameters; this toolkit exposes the progen3-112m through progen3-3b checkpoints. Pre-training used roughly 1.5 trillion amino-acid tokens sampled from the Profluent Protein Atlas, a curated collection of full-length natural proteins. Unlike a strictly left-to-right model, ProGen3 is trained autoregressively in both directions: forward predicts each residue from the N-terminus toward the C-terminus, and reverse predicts from the C-terminus toward the N-terminus. Generation runs in a chosen direction, and scoring combines both directions into a single per-residue likelihood. Two capabilities follow from this objective. Sampling from the predicted next-residue distributions produces new candidate protein sequences, and the likelihood the model assigns to an existing sequence provides a zero-shot proxy-fitness score with no additional task-specific training.

Learning Resources

ProGen3 showcase (Profluent) - an accessible overview of ProGen3, the Profluent Protein Atlas training data, and downstream applications such as antibody design and compact gene editors.

Tools

ProGen3 Sampling (`progen3-sample`)

Generates protein sequences by autoregressive sampling. Given one or more prompt sequences, the model extends each prompt one amino acid at a time, drawing each residue from the model’s predicted distribution under the configured temperature and top_p settings, in the chosen direction, until max_new_tokens residues have been generated (at least min_new_tokens).

API Reference

Source

Input: CausalModelSampleInput

prompts

List[string]

required

Prompt sequences to condition generation on. Can be provided as a single string or a list of strings.

Source

Config: ProGen3SampleConfig

model_checkpoint

enum

default:"progen3-762m"

ProGen3 weights variant. Sizes range from 112M (fastest) to 3B (highest quality).Available options: progen3-112m, progen3-219m, progen3-339m, progen3-762m, progen3-1b, progen3-3b

local_path

string

Override HuggingFace download with a local weights directory.

direction

enum

default:"forward"

"forward" generates N→C, "reverse" generates C→N.Available options: forward, reverse

max_new_tokens

integer

default:"256"

Maximum new tokens to generate per prompt (excludes prompt).

min_new_tokens

integer

default:"1"

Minimum new tokens to generate per prompt before stopping is allowed.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run the model on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

prepend_prompt

boolean

default:"True"

If True, returned sequences include the prompt and newly generated residues; if False, only the newly generated residues.

temperature

number

default:"0.2"

Softmax temperature; lower values are more deterministic, higher values increase diversity.

top_p

number

default:"0.95"

Nucleus sampling threshold over per-position token probabilities.

batch_size

integer

default:"1"

Maximum number of same-length prompts to process simultaneously on GPU.

Source

Output: CausalModelSampleOutput

sequences

List[string]

required

Generated sequences.

Applications

This tool performs de novo protein design, generating novel sequences that resemble natural proteins, optionally conditioned on a prompt. Because generation can run in reverse (C-terminus toward N-terminus), a C-terminal fragment can be used as the prompt and the rest of the sequence grown toward the N-terminus, which a strictly left-to-right model cannot do.

Usage Tips

direction chooses which terminus is generated. "forward" (the default) continues a prompt from the N-terminus toward the C-terminus; "reverse" treats the prompt as a C-terminal fragment and generates toward the N-terminus. Note that the reverse generation will append to the prompt to grow the sequence on the left. All starting sequences should still be provided in the left to right direction from N->C.
Sampling defaults are conservative. temperature defaults to 0.2 and top_p to 0.95, which keep generations close to natural-looking sequences; raise temperature for more diverse but riskier designs. This tool exposes only nucleus (top_p) sampling for ProGen3; there is no top-k cutoff.
max_new_tokens and min_new_tokens bound the generated length. They count only newly generated residues (default 256 and 1), separate from the prompt length.
Output includes the prompt by default. prepend_prompt=True (the toolkit default) returns the prompt joined to its continuation; set it False to receive only the newly generated residues.
Generated sequences are candidates. Validate them with downstream tools (for example structure prediction, function annotation, or homology search) before drawing biological conclusions.

ProGen3 Scoring (`progen3-score`)

Scores existing protein sequences under ProGen3 using bidirectional likelihood. For each sequence it runs both a forward (N→C) and a reverse (C→N) pass, averages the per-position log-likelihoods into a single bidirectional value, and aggregates these into a log-likelihood, an average log-likelihood per residue, and a perplexity. It also exposes the forward, reverse, and bidirectional per-position values, and optionally the per-position logits.

API Reference

Source

Input: CausalModelScoringInput

sequences

List[string]

required

Sequences to score. Can be provided as a single string or a list of strings.

Source

Config: ProGen3ScoringConfig

model_checkpoint

enum

default:"progen3-762m"

ProGen3 weights variant. Sizes range from 112M to 3B parameters.Available options: progen3-112m, progen3-219m, progen3-339m, progen3-762m, progen3-1b, progen3-3b

local_path

string

Override HuggingFace download with a local weights directory.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run the model on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

batch_size

integer

default:"1"

Number of sequences to process simultaneously on GPU.

return_logits

boolean

default:"False"

Whether to include forward-pass per-position logits in the output. Reverse-pass info is already exposed via per_position_metrics (forward/reverse/bidirectional log-likelihoods).

Source

Output: CausalModelScoringOutput

scores

List[CausalModelScoringMetrics]

required

List of scoring outputs, one per input sequence. Each entry is a Metrics subclass with scalar metrics (log_likelihood, avg_log_likelihood, perplexity) and optional per-position _pp-suffixed list extras; logits and vocab are declared fields for raw model outputs.

Show CausalModelScoringMetrics

logits

array

Per-position logits array (seq_len, vocab_size). None unless return_logits=True.

vocab

array

Token ordering for logits.

primary_metric

string

Name of the metric that best summarizes the result overall (e.g. "avg_plddt" for AlphaFold2). Used by downstream UI and reporting to pick a headline value.

Metrics (one set per scores item)

Metric	Type	Range	Availability
`log_likelihood`	float	≤ 0.0	always
`avg_log_likelihood`	float	≤ 0.0	always
`perplexity`	float	≥ 1.0	always

Applications

This tool gives a zero-shot measure of how consistent a protein sequence is with ProGen3’s training distribution, usable as a fitness or plausibility signal without additional task-specific training. Because it uses both directions, every residue is scored with full surrounding context rather than left context only. Use it to rank or filter candidate sequences (including the output of progen3-sample), to compare variants of a sequence, or to flag sequences far from the model’s training distribution.

Usage Tips

Scores are bidirectional, not a single-direction log-likelihood. The reported log_likelihood, avg_log_likelihood, and perplexity are derived from the averaged forward and reverse per-position values, so they are not directly comparable to a one-directional model’s scores.
Compare length-normalized scores within one checkpoint. Total log_likelihood scales with sequence length, so use perplexity or avg_log_likelihood when comparing sequences of different lengths. Different checkpoints learn different distributions that are not calibrated to a common scale, so scores from different model_checkpoint values are hard to compare directly; a lower perplexity means the sequence is more consistent with that checkpoint’s training distribution.
return_logits defaults to False. Leave it off unless you need the per-position distributions, since the logits tensor is large (sequence length by the token vocabulary).

Toolkit Notes

These apply to every ProGen3 tool in this toolkit (progen3-sample, progen3-score).

Requires a GPU; memory scales with checkpoint size. This toolkit exposes the progen3-112m through progen3-3b checkpoints; larger checkpoints are more capable but need substantially more GPU memory. CPU execution is not practical.
batch_size trades memory for throughput across both tools. It sets how many same-length prompts (progen3-sample) or sequences (progen3-score) are processed per GPU forward pass. Raise it for higher throughput on many short sequences; lower it (default 1) if generation or scoring runs out of GPU memory.
model_checkpoint selects the model size. The default is progen3-762m; smaller checkpoints (progen3-112m, progen3-219m, progen3-339m) are faster and lighter, while progen3-1b and progen3-3b are more capable at higher memory cost.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​ProGen3 Sampling (progen3-sample)

​API Reference

​Applications

​Usage Tips

​ProGen3 Scoring (progen3-score)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

ProGen3 Sampling (`progen3-sample`)

API Reference

Applications

Usage Tips

ProGen3 Scoring (`progen3-score`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides