The Pipeline
Every design follows the same five-stage pipeline:Core Components
Sequence
The fundamental data unit: a biological string (DNA, RNA, protein, or ligand) with type validation and rich metadata.
Segment
A region to design. Maintains dual pools of proposal and result sequences during optimization.
Construct
An ordered collection of Segments representing a complete biological design, like a gene expression cassette.
Generator
Proposes new sequences through mutation, autoregressive generation, inverse folding, or gradient-based design.
Constraint
Scores how well sequences meet requirements. Returns 0.0 (perfect) to 1.0 (worst violation).
Optimizer
Orchestrates the generate-evaluate-select loop using search strategies such as MCMC or beam search.
Data Flow
Define the design space
Segments specify the regions to be designed, for example a 100 bp promoter, a 300-residue enzyme, or an existing sequence to optimize. They are combined into a Construct representing the full biological unit.
python
Assign generators to segments
Each Generator is assigned to a Segment and proposes candidate sequences. Different generators use different strategies: random mutation, model-guided generation, or inverse folding.
python
Define constraints with scoring functions
Constraints evaluate sequences and return a score from 0.0 (perfect) to 1.0 (worst). They can operate on individual segments or across multiple segments.
python
Configure and run the optimizer
The Optimizer drives the search loop: generate proposals, score them with constraints, select the best. It repeats until a stopping criterion is reached.
python
The Dual Pool Architecture
A key design decision in Proto is that each Segment maintains two separate sequence pools. This separation lets optimizers explore freely without losing the best solutions found so far.Proposal Pool
- Purpose: Workspace for the optimizer
- Populated by: Generators (mutations, new proposals)
- Consumed by: Constraints (scoring) and Optimizer (selection)
- Lifecycle: Rebuilt every optimization step
Result Pool
- Purpose: Best results found so far
- Populated by: Optimizer (after ranking proposals)
- Consumed by: User (final output), next stage in a Program
- Lifecycle: Persists across optimization steps and stages
Design Patterns
- Single Constraint
- Multi-Constraint
- Multi-Stage
- Multi-Segment
The simplest pattern: one segment, one constraint, one optimizer, used for a single design objective.Example: Design a 100 bp DNA promoter with 50-60% GC content.
python
Choosing a Pattern
Next Steps
Sequences
The fundamental data unit
Segments
Design regions and dual pools
Constraints
Scoring functions for design objectives
Quickstart
A first design, end to end