Skip to main content

Constructs

A Construct is an ordered collection of Segments representing a complete biological design. Just as a gene is assembled from regulatory and coding parts (a promoter, ribosome binding site, coding sequence, and terminator), a Construct assembles Segment objects into a coherent whole.
Construct: gene_constructPromoter100 bpSegmentRBSAGGAGGSegmentCDS900 bpSegmentTerminator50 bpSegment
Construct: gene_constructPromoter100 bpSegmentRBSAGGAGGSegmentCDS900 bpSegmentTerminator50 bpSegment
The Construct handles validation (all segments must share the same type), auto-labeling, and, crucially, joined sequence concatenation, which gives the full designed sequence after optimization.

Creating Constructs

Pass an ordered list of Segments to create a Construct:
python
from proto_language.core import Segment, Construct

promoter = Segment(length=100, sequence_type="dna", label="promoter")
rbs = Segment(sequence="AGGAGG", sequence_type="dna", label="rbs")
cds = Segment(length=900, sequence_type="dna", label="cds")
terminator = Segment(length=50, sequence_type="dna", label="terminator")

gene_construct = Construct(
    [promoter, rbs, cds, terminator],
    label="gene_construct"
)
Segment order matters; it defines the physical arrangement of the final biological sequence.

Validation Rules

Constructs enforce several rules at creation time to catch design errors early:
All segments must share the same sequence type. DNA and protein segments cannot be mixed in one Construct. To represent a gene and its protein product, use separate Constructs.
All segments must share the same valid character set. If one segment uses custom valid_chars, all segments in the Construct must use the same set.
Segment labels must be unique within a Construct. Duplicate labels cause a ValueError.
python
# Valid: all DNA segments
construct = Construct([
    Segment(length=50, sequence_type="dna", label="upstream"),
    Segment(length=100, sequence_type="dna", label="target"),
])

# Invalid: mixed types -> raises ValueError
construct = Construct([
    Segment(length=50, sequence_type="dna", label="gene"),
    Segment(length=100, sequence_type="protein", label="protein"),  # Error!
])

# Invalid: duplicate labels -> raises ValueError
construct = Construct([
    Segment(length=50, sequence_type="dna", label="region"),
    Segment(length=100, sequence_type="dna", label="region"),  # Error!
])

Joined Sequences

The most important property of a Construct is joined_sequences. After optimization, this gives the full concatenated sequence from all segments, with merged metadata from each segment.
Individual SegmentsATGCSeg 1 result:GGAASeg 2 result:TTCCSeg 3 result:ATGCGGAATTCCjoined_sequences:
Individual SegmentsATGCSeg 1 result:GGAASeg 2 result:TTCCSeg 3 result:ATGCGGAATTCCjoined_sequences:
python
# After optimization, get the full sequences
for seq in construct.joined_sequences:
    print(f"Full sequence ({len(seq)} bp): {seq.sequence[:50]}...")
    print(f"Segments: {list(seq.metadata.get('segments', {}).keys())}")

Multiple Results (Top-K)

When the optimizer selects multiple results per segment (e.g., top-3), joined_sequences pairs them by index:
python
# If each segment has 3 result sequences:
# segment_1.result_sequences = [Seq("AAA"), Seq("TTT"), Seq("GGG")]
# segment_2.result_sequences = [Seq("CCC"), Seq("AAA"), Seq("TTT")]

construct.joined_sequences
# [Sequence("AAACCC"),   # index 0 from each segment
#  Sequence("TTTAAA"),   # index 1 from each segment
#  Sequence("GGGTTT")]   # index 2 from each segment
All segments in a Construct must have the same number of result sequences. joined_sequences raises a RuntimeError if the per-segment result pools have mismatched lengths.

Auto-Labeling

Both Constructs and Segments support automatic labeling:
python
# Unlabeled segments get position-based labels
construct = Construct([
    Segment(length=50, sequence_type="dna"),         # -> "segment_0"
    Segment(length=100, sequence_type="dna"),        # -> "segment_1"
    Segment(length=30, sequence_type="dna", label="my_term"),  # -> "my_term"
])

construct.segments[0].label  # "segment_0"
construct.segments[2].label  # "my_term"
Always provide explicit labels for clarity. They appear in constraint metadata and make optimization results easier to interpret.

Biological Design Patterns

The classic pattern for designing gene circuits: promoter, RBS, coding sequence, and terminator in series.
python
# Bacterial gene
promoter = Segment(length=100, sequence_type="dna", label="promoter")
rbs = Segment(sequence="AGGAGG", sequence_type="dna", label="rbs")
cds = Segment(length=900, sequence_type="dna", label="cds")
terminator = Segment(
    sequence="AACAAAATCGCAATGATTTCGATTTTAAAAGGTCTG",
    sequence_type="dna",
    label="terminator"
)

gene_construct = Construct(
    [promoter, rbs, cds, terminator],
    label="gene_construct"
)
Use cross-segment constraints to evaluate predicted expression levels considering all elements together.

Working with Optimizers

Optimizers take a list of Constructs to optimize. Multiple optimizers in a Program must share the same Construct objects by identity so that results persist between stages.
python
# Correct: same construct object passed to both stages
construct = Construct([promoter, cds], label="my_design")

stage1 = RejectionSamplingOptimizer(
    constructs=[construct],  # same object
    generators=[broad_gen],
    constraints=[fast_constraint],
    config=RejectionSamplingOptimizerConfig(num_samples=5000, num_results=50)
)

stage2 = MCMCOptimizer(
    constructs=[construct],  # same object: results flow through
    generators=[fine_gen],
    constraints=[expensive_constraint],
    config=MCMCOptimizerConfig(num_steps=500)
)

program = Program(optimizers=[stage1, stage2], num_results=10)
program.run()

# Results are in the construct's segments
for seq in construct.joined_sequences:
    print(seq.sequence)
Do not create separate Construct instances for each optimizer stage. The result sequences from stage 1 would be lost. Always reuse the same Construct object.

Properties

PropertyTypeDescription
segmentstuple[Segment, ...]Ordered tuple of Segment objects
sequence_typeSequenceTypeShared type across all segments (read-only)
valid_charsOptional[Set[str]]Shared valid characters (read-only)
joined_sequencesList[Sequence]Concatenated sequences from result pools
labelOptional[str]Identifier for this construct

Serialization

Constructs serialize to dictionaries, including all their segments and both sequence pools:
data = construct.to_dict()
# {
#     "segments": [
#         { "label": "promoter", "sequence_length": 100, ... },
#         { "label": "cds", "sequence_length": 900, ... },
#     ],
#     "sequence_type": "dna",
#     "valid_chars": ["A", "C", "G", "T"],
#     "label": "gene_construct"
# }

Next Steps

Generators

How generators propose candidate sequences for each segment

Constraints

Scoring functions for design objectives

Programs

Chain optimizers into multi-stage pipelines

Overview

See how Constructs fit into the full architecture