Segments
A Segment is a region of biological sequence to be designed or optimized. It is analogous to a gene annotation on a genome map: a bounded region with a specific function (a promoter, a coding sequence, a linker domain) that the framework fills with an optimal sequence. Each Segment maintains two pools of sequences that the optimization loop uses: one for exploring proposals, and one for preserving the best results found so far.Creating Segments
- From Length (design from scratch)
- From Sequence (optimize existing)
Creates an entirely new sequence for a region from scratch. The framework fills it during optimization.The Segment starts with an empty sequence. The Generator will populate it with proposals at the start of optimization.
python
Sequence Types
| Sequence type | Valid characters | Example |
|---|---|---|
| DNA | A C G T | Segment(length=100, sequence_type="dna") |
| RNA | A C G U | Segment(length=50, sequence_type="rna") |
| Protein | 20 standard amino acids | Segment(length=300, sequence_type="protein") |
| Ligand | SMILES syntax (RDKit-validated) | Segment(sequence="CCO", sequence_type="ligand") |
Ligand segments must be initialized with a sequence (SMILES string), not just a length. This is because SMILES syntax cannot be randomly generated; the molecule must be chemically valid.
The Dual Pool Model
Each Segment maintains two separate lists of Sequence objects that serve different purposes during optimization:proposal_sequences
The working space. Generators fill this pool with new proposals each step. Constraints score every proposal. The Optimizer ranks them and decides which survive.- Rebuilt every optimization step
- Can contain many sequences (e.g., 100 proposals)
- Internal to the optimization loop
result_sequences
The results space. The Optimizer promotes the best proposals here. This pool persists across optimization steps, and when using multi-stage Programs, carries results from one stage to the next.- Persists across steps and stages
- Contains the top-K best sequences
- User-facing output after optimization
python
Properties Reference
Pool properties
Pool properties
| Property | Type | Description |
|---|---|---|
proposal_sequences | List[Sequence] | Current proposals (working space) |
result_sequences | List[Sequence] | Best sequences found (results space) |
num_proposals | int | Number of sequences in the proposal pool |
num_results | int | Number of sequences in the result pool |
Sequence properties
Sequence properties
| Property | Type | Description |
|---|---|---|
sequence_type | SequenceType | "dna", "rna", "protein", or "ligand" (read-only) |
valid_chars | Optional[Set[str]] | Allowed characters for this segment (read-only) |
sequence_length | int | Expected length of sequences in this segment |
original_sequence | Sequence | The original sequence provided at construction (read-only) |
has_original_sequence | bool | True if created with a sequence (vs. just a length) |
State properties
State properties
| Property | Type | Description |
|---|---|---|
populated_sequences | bool | Whether segment has sequences from input or prior optimization |
proposals_populated | bool | Whether proposal pool has non-empty sequences |
is_ligand | bool | Whether this is a ligand segment (ligands cannot be mutated) |
label | Optional[str] | Identifier for this segment (auto-assigned if not provided) |
construct_label | Optional[str] | Label of the parent Construct (set by Program) |
Labels
Labels identify segments in multi-segment designs and appear in constraint metadata, so per-segment results are attributable to a named region.python
segment_0, segment_1, etc.
Custom Valid Characters
Restrict the allowed characters for specialized applications:python
valid_chars constraint is enforced during validation and used by Generators to only propose valid characters.
Iteration and Indexing
Segments support direct iteration and indexing into the result pool (results):python
Creation Patterns
Serialization
Segments serialize to dictionaries, preserving both pools and all metadata:Next Steps
Constructs
Combine Segments into complete biological designs
Generators
How Generators propose new sequences for Segments
Sequences
The data model underlying each pool entry
Overview
See how Segments fit into the full architecture