Skip to main content
License: PyRosetta is licensed under Custom (PyRosetta Software License) and has restrictions around commercial use and may require explicit attribution when utilized. Please refer to the license for full terms.

Proto is not affiliated with RosettaCommons. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


RosettaCommons/rosetta
RosettaCommons/rosetta
The Rosetta Bio-macromolecule modeling package. Available through license with the University of Washington.
409 stars
View repo
pyrosetta.org
Visit website
PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta
Sidhartha Chaudhury, Sergey Lyskov and Jeffrey J Gray
Bioinformatics (2010)
Read paper
@article{chaudhury2010pyrosetta,
  title={PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta},
  author={Chaudhury, Sidhartha and Lyskov, Sergey and Gray, Jeffrey J},
  journal={Bioinformatics},
  volume={26},
  number={5},
  pages={689--691},
  year={2010},
  publisher={Oxford University Press},
  doi={10.1093/bioinformatics/btq007}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/structure_scoring/pyrosetta
View source
Open Notebook
Open notebook
FunctionDescription
run_pyrosetta_energy()Compute Rosetta energy scores for protein structures (with optional FastRelax preprocess via conf… Docs Source
run_pyrosetta_interface_analyzer()Compute interface-quality metrics for a two-chain complex via Rosetta’s InterfaceAnalyzerMover + … Docs Source
run_pyrosetta_relax()Run PyRosetta FastRelax on a structure and return the relaxed Structure plus its total score Docs Source
run_pyrosetta_sap()Compute Spatial Aggregation Propensity (SAP) scores for protein structures using PyRosetta Docs Source
run_pyrosetta_sasa()Compute Solvent Accessible Surface Area (SASA) for protein structures using PyRosetta Docs Source

Background

The Rosetta molecular modelling suite (Alford et al., 2017) provides an all-atom energy function that combines van der Waals interactions, hydrogen bonding, electrostatics, and an implicit solvation model into a single score reported in Rosetta Energy Units (REU). The current community-standard energy function, REF2015, is parametrised against small-molecule and X-ray crystal structure data and is the default score function used by every tool in this toolkit. PyRosetta (Chaudhury, Lyskov, and Gray, 2010) exposes the Rosetta sampling and scoring functions through a Python interface, which this toolkit invokes to compute per-residue and overall energies together with a breakdown by score term. Spatial Aggregation Propensity (SAP) (Chennamsetty, Voynov, Kayser, Helk, and Trout, 2009) quantifies how much hydrophobic surface area is exposed on a protein. SAP combines per-residue hydrophobicity with local solvent exposure within a sphere around each surface atom and aggregates the contributions across the protein, with higher values corresponding to greater aggregation risk. The published method was originally developed for therapeutic antibody engineering and has become a standard developability filter in protein design. Solvent Accessible Surface Area (SASA) measures the surface area of a protein that is accessible to a spherical solvent probe (1.4 Å for water by default). Per-residue SASA values distinguish buried residues that contribute to the hydrophobic core from solvent-exposed residues that interact with the surroundings. Rosetta’s FastRelax protocol performs many rounds of side-chain repacking and energy minimisation while gradually ramping the repulsive weight in the score function, which finds a low-energy conformation near the input structure and resolves the steric clashes that would otherwise dominate the energy. The InterfaceAnalyzerMover extracts a set of structural descriptors that characterise the binding interface of a two-chain complex, including binding-energy difference (dG_separated), interface buried SASA (dSASA_int), hydrogen bond count (hbonds_int), packing statistic (packstat), and shape complementarity (sc_value), and is widely used as the basis for filter cascades in binder-design pipelines.

Learning Resources

  • PyRosetta documentation (Gray Lab, Johns Hopkins University). Tutorials, API reference, and installation guidance for the underlying Python interface.
  • RosettaCommons documentation (RosettaCommons). Reference manual for the Rosetta scoring functions, movers, and protocols invoked by this toolkit.
  • FastRelax mover reference (RosettaCommons). Documentation of the FastRelax protocol exposed as pyrosetta-relax.

Tools

PyRosetta Energy Score (pyrosetta-energy)

Scores one or more protein structures with a Rosetta score function and returns the total energy, a per-term breakdown (fa_atr, fa_rep, fa_sol, hbond_*, etc.), and a per-residue energy contribution. The full pose is always scored regardless of any chain selection. By default the input structure is scored as given. Set pre_relax_structures=True to run FastRelax first.

API Reference

Source
inputs
List[ScoringStructureInput]
required
Protein structures to score, each with optional chain selection. Accepts bare Structure objects, PDB file paths, or PDB content strings for convenience.
Source
scorefxn
string
default:"ref2015"
Rosetta score function name. ref2015 is the current community standard.
pre_relax_structures
boolean
default:"False"
If True, run pyrosetta-relax on each input structure before scoring (the actual settings come from :attr:relax_config). Default False — energy is reported on the input structure as-given. Set to True for raw predicted structures with steric clashes that would otherwise inflate fa_rep.
relax_config
PyRosettaRelaxConfig
Settings used when pre_relax_structures=True. Ignored otherwise.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cpu"
Device to run the tool on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[PyRosettaEnergyMetrics]
Energy scores, one per input structure. Each entry carries total_energy + relaxed as specced metrics plus energy_terms and per_residue as declared non-metric fields.
Metrics (one set per results item)
MetricTypeRangeAvailability
total_energyfloatunboundedalways

Applications

This tool is appropriate for relative energy comparison across variants of the same protein. Representative applications include ranking sequence designs by predicted stability after a relax pass, identifying problematic residues through their per-residue energy contributions, and quantifying the energy cost of mutations or conformational changes.

Usage Tips

  • Compare REU values only across variants of the same protein with the same score function. Rosetta energies are not absolute thermodynamic quantities and do not transfer across proteins of different sizes or across different score function settings. Switching scorefxn from ref2015 to beta_nov16 produces a different scale and the values are not comparable.
  • Run with pre_relax_structures=True when scoring raw predicted complexes. Predicted structures from AlphaFold, Chai, Boltz, and similar tools commonly carry steric clashes that produce extremely high fa_rep values and dominate the total energy. Relaxing first resolves these clashes so the other energy terms become interpretable.
  • Chain selection filters the per-residue breakdown only. When chains_to_score is set on a ScoringStructureInput, total_energy and energy_terms are still computed on the full pose. Each per-residue energy reflects that residue’s contribution within the full complex, including pair interactions with the unselected chains. Score a chain in isolation by extracting it into its own Structure first.

PyRosetta Interface Analyzer (pyrosetta-interface-analyzer)

Runs Rosetta’s InterfaceAnalyzerMover on a complex and returns seven always-on interface descriptors together with an optional eighth (delta_unsat_hbonds, available when DAlphaBall is installed). The interface is defined by the target_chains and binder_chain fields on each InterfaceStructureInput (multiple target chains score the binder against all of them, binder-vs-rest) and is validated at input construction.

API Reference

Source
inputs
List[InterfaceStructureInput]
required
Complexes to analyze, each paired with the target_chains and binder_chain labels that define its interface. A bare Structure / path / content string / dict is wrapped into a single-element list with default chains ["A"] / "B".
Source
scorefxn
string
default:"ref2015"
Rosetta score function name. ref2015 is the current community standard.
pre_relax_structures
boolean
default:"False"
If True, run pyrosetta-relax on each input structure before analyzing (settings come from :attr:relax_config). Default False — the interface is analyzed on the input structure as-given. Set to True for raw predicted complexes with steric clashes that would otherwise distort interface_dG and related energy-based metrics.
relax_config
PyRosettaRelaxConfig
Settings used when pre_relax_structures=True. Ignored otherwise.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cpu"
Device to run the tool on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[PyRosettaInterfaceAnalyzerMetrics]
Interface-analysis metrics, one per input structure.
Metrics (one set per results item)
MetricTypeRangeAvailability
interface_scfloat0.0 to 1.0always
interface_hbondsint≥ 0.0always
interface_dGfloatunboundedalways
interface_dSASAfloat≥ 0.0always
interface_packstatfloat0.0 to 1.0always
interface_hydrophobicityfloat0.0 to 100.0always
surface_hydrophobicityfloat0.0 to 1.0always
delta_unsat_hbondsint≥ 0.0optional

Applications

This tool is appropriate for filtering and ranking designed protein binders against a target. Representative applications include gating candidate binders on shape complementarity and hydrogen bond count, ranking by predicted binding-energy difference, and identifying poses with excessive interface-buried hydrophobic surface area.

Usage Tips

  • The seven always-on metrics span well-defined ranges. interface_sc is in 0 to 1 (higher is better fit), interface_packstat is in 0 to 1 (higher is better packing), interface_hydrophobicity is in 0 to 100 (percent apolar plus aromatic interface residues), surface_hydrophobicity is in 0 to 1 (apolar plus aromatic fraction of the binder surface), interface_hbonds is an integer count, interface_dSASA is in Ų, and interface_dG is in REU (more negative indicates more favourable binding).
  • delta_unsat_hbonds requires DAlphaBall and is reported as None when the SASA dependency is unavailable. The Rosetta BuriedUnsatHbonds filter uses DAlphaBall for accurate buried-surface SASA. The standalone environment installs DAlphaBall when possible. When the metric is None, the rest of the seven always-on metrics are still produced normally.
  • Relax raw predicted complexes before reading the energy-derived metrics. interface_dG and interface_packstat are sensitive to steric clashes in unrelaxed structures. Set pre_relax_structures=True on the configuration to run FastRelax first, or call pyrosetta-relax explicitly and pass the relaxed structure back in.
  • Chain labels follow the input format. PDB stores chain IDs as a single character, while mmCIF accepts multi-character labels. The tool transparently shortens multi-character labels to single characters when dispatching to PyRosetta and restores the originals in the output.

PyRosetta FastRelax (pyrosetta-relax)

Runs Rosetta’s FastRelax protocol on one or more input structures and returns the relaxed coordinates as a Structure together with the total Rosetta energy. The returned structure preserves the original chain labels and source format so that it composes directly into any of the other tools in this toolkit or into geometric Structure methods.

API Reference

Source
inputs
List[ScoringStructureInput]
required
Protein structures to relax. Accepts bare Structure objects, PDB file paths, or PDB content strings for convenience.
Source
scorefxn
string
default:"ref2015"
Rosetta score function name. ref2015 is the current community standard.
relax_cycles
integer
default:"1"
Number of FastRelax repeats. Germinal uses 1 for speed in cofolding filter pipelines; raise for better convergence at the cost of runtime.
constrain_to_start
boolean
default:"True"
When True, add a coordinate-constraint term to the relax score function and call constrain_relax_to_start_coords(True) on the FastRelax mover so atoms stay near their input positions. Recommended for filter use cases where large geometric deviations would defeat the purpose.
max_iter
integer
Maximum minimizer iterations per relax cycle. None uses PyRosetta’s default (2500). Upstream BindCraft uses 200 for faster turnaround in binder-design pipelines.
disable_jumps
boolean
default:"False"
Lock inter-chain rigid-body DOFs so chains cannot translate or rotate relative to each other during relax.
min_type
string
Optional minimizer type forwarded to FastRelax.min_type. BindCraft uses "lbfgs_armijo_nonmonotone".
align_to_start
boolean
default:"False"
If True, align the relaxed pose back to the input pose after FastRelax. BindCraft does this before saving its relaxed PDBs so coordinates remain in the original frame.
copy_b_factors_from_start
boolean
default:"False"
If True, copy the input pose’s per-residue B-factors onto the relaxed pose. BindCraft uses this to preserve AF2 pLDDT values after relaxation.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cpu"
Device to run the tool on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[PyRosettaRelaxMetrics]
One entry per input structure, in input order. Each carries total_score + relaxed as specced metrics plus relax (a :class:RelaxResult) carrying the relaxed Structure.
Metrics (one set per results item)
MetricTypeRangeAvailability
total_scorefloatunboundedalways

Applications

This tool is appropriate as a preprocessing step before downstream energy scoring, interface analysis, or geometric filtering of raw predicted structures. Representative applications include resolving steric clashes in cofolded complexes before binder-design filter cascades, generating a relaxed reference pose before screening sequence variants, and producing a stable starting point for further structural analyses.

Usage Tips

  • relax_cycles defaults to 1 and accepts integer values from 1 to 15. A single FastRelax cycle matches the default used by the Germinal binder-design pipeline and is appropriate for most filter-cascade applications. Increase to 5 to 15 for higher-quality convergence at proportional runtime cost.
  • constrain_to_start=True (the default) prevents FastRelax from drastically altering the structure. This adds a coordinate-constraint term to the relax score function so atoms stay near their input positions. Set to False for unconstrained minimisation when the goal is to find the nearest energy minimum.
  • Additional FastRelax controls are available on PyRosettaRelaxConfig. Pass disable_jumps=True to lock the inter-chain rigid-body degrees of freedom during relaxation, align_to_start=True to superpose the relaxed pose back onto the starting pose after relaxation, or copy_b_factors_from_start=True to copy the per-residue B-factor values from the input.

PyRosetta SAP Score (pyrosetta-sap)

Scores one or more protein structures with the Spatial Aggregation Propensity protocol from Rosetta’s core.pack.guidance_scoreterms.sap module and returns the overall SAP score together with a per-residue SAP contribution breakdown. Higher values indicate greater predicted aggregation risk.

API Reference

Source
inputs
List[ScoringStructureInput]
required
Protein structures to score, each with optional chain selection. Accepts bare Structure objects, PDB file paths, or PDB content strings for convenience.
Source
pre_relax_structures
boolean
default:"False"
If True, run pyrosetta-relax on each input structure before scoring. Default False.
relax_config
PyRosettaRelaxConfig
Settings used when pre_relax_structures=True. Ignored otherwise.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cpu"
Device to run the tool on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[PyRosettaSAPMetrics]
SAP scores with per-residue breakdown, one per input structure.
Metrics (one set per results item)
MetricTypeRangeAvailability
sap_scorefloat≥ 0.0always

Applications

This tool is appropriate for developability assessment during therapeutic protein and antibody engineering, where surface aggregation propensity is a critical liability. Representative applications include ranking antibody variants by predicted aggregation risk, identifying surface mutations that reduce SAP without affecting binding, and screening computationally designed proteins for developability before experimental characterisation.

Usage Tips

  • SAP is size-dependent and only meaningfully compared across variants of the same protein. Larger proteins naturally have higher absolute SAP values because more total surface area contributes. Comparisons across different proteins or different chain compositions are not informative.
  • chains_to_score controls which residues contribute to the score. Setting chains_to_score=["A"] on a ScoringStructureInput restricts the SAP sum to residues of chain A. The full structure is still loaded so the surrounding context informs the burial calculation, but only the selected chain’s atoms contribute to the score.

PyRosetta SASA (pyrosetta-sasa)

Computes total and per-residue Solvent Accessible Surface Area using Rosetta’s SasaCalc module with a configurable probe radius. Returns the total SASA in Ų together with a per-residue breakdown of chain, 1-indexed residue index, three-letter residue name, and SASA value.

API Reference

Source
inputs
List[ScoringStructureInput]
required
Protein structures to analyze, each with optional chain selection. Accepts bare Structure objects, PDB file paths, or PDB content strings for convenience.
Source
probe_radius
number
default:"1.4"
Radius of the solvent probe sphere in Angstroms. Standard water probe is 1.4 A.
pre_relax_structures
boolean
default:"False"
If True, run pyrosetta-relax on each input structure before scoring. Default False.
relax_config
PyRosettaRelaxConfig
Settings used when pre_relax_structures=True. Ignored otherwise.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cpu"
Device to run the tool on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[PyRosettaSASAMetrics]
SASA results, one per input structure.
Metrics (one set per results item)
MetricTypeRangeAvailability
total_sasafloat≥ 0.0always

Applications

This tool is appropriate for identifying buried and exposed residues, characterising hydrophobic surface patches, and computing a surface-area baseline for downstream developability or interaction analyses. Representative applications include flagging exposed hydrophobic residues as redesign candidates, summarising the surface-residue composition of a designed protein, and computing the buried surface area difference between bound and unbound states.

Usage Tips

  • probe_radius defaults to 1.4 Ã…. This is the conventional water probe radius. Larger probe values are sometimes used to approximate the accessibility seen by larger solvent molecules or interaction partners.
  • Per-residue SASA values near 0 indicate fully buried residues. Values above approximately 100 Ų indicate significant solvent exposure for a typical residue. Combine with residue identity to identify exposed hydrophobic residues as aggregation hotspots.
  • Total SASA scales with protein size. Normalise by residue count or surface area when comparing across proteins of different sizes.

Toolkit Notes

These apply to every PyRosetta tool in this toolkit (pyrosetta-energy, pyrosetta-interface-analyzer, pyrosetta-relax, pyrosetta-sap, pyrosetta-sasa).
  • Every tool accepts a list of inputs in a single call. The scoring, relaxation, and SASA tools take a list of ScoringStructureInput entries, and the interface analyzer takes a list of InterfaceStructureInput entries. Each entry independently accepts a Structure object, a file path, a PDB or mmCIF content string, or a dict shorthand. A single bare input is automatically wrapped in a list. Results are returned in the same order as the inputs.
  • The four scoring and interface-analyzer tools share an opt-in pre_relax_structures preprocess that runs pyrosetta-relax first. Set pre_relax_structures=True and optionally pass a PyRosettaRelaxConfig to relax every input structure before scoring. The framework’s preprocess hook dispatches pyrosetta-relax and substitutes the relaxed structures, so there is exactly one FastRelax implementation in the codebase.
  • Per-residue output uses 1-indexed positions consistent with PDB numbering. Residue indices in the per-residue energy and per-residue SASA breakdowns correspond directly to the residue numbers in the input structure.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.