Segment, a Construct, a Generator, one or more Constraint objects, an Optimizer, and a
Program.
Open as a runnable notebook
View as a Python script
Define the construct
ASegment is the stretch of sequence being designed; a Construct groups the segments that
make up one molecule. Here there is a single segment: passing length=100 with no starting
sequence leaves all 100 positions open for the optimizer to fill, sequence_type="dna"
restricts them to A, C, G, and T, and label="insert" is the name its results are filed under.
The Construct wraps that one segment.
python
Assign a generator
The generator proposes new sequences for the optimizer to score.RandomNucleotideGenerator
substitutes random bases at masked positions, and because the segment starts empty it also fills
in the initial random sequence. generator.assign(insert) binds the generator to the segment it
will mutate.
python
Add constraints
Constraints score how well a sequence meets the design objective; the optimizer searches for sequences that raise these scores. Two built-ins suffice here.gc_content_constraint with
min_gc=40 and max_gc=60 rewards sequences whose GC content falls in that range, and
max_homopolymer_constraint with max_length=5 penalizes any single-base run longer than five
bases. Each Constraint lists the segment it reads in inputs and carries a label, which is
the key its scores appear under in the result metadata.
python
Configure and run the optimizer
TheMCMCOptimizer ties the construct, generator, and constraints together and runs
Metropolis-Hastings: at each step it generates proposals, scores them, and accepts or rejects,
always keeping improvements and accepting worse proposals with a probability that falls as the
temperature anneals. The config sets num_steps=100 steps along a single trajectory
(num_results=1, proposals_per_result=1) with max_temperature=1.0 as the starting
temperature. The optional custom_logging callable receives the step number and current outputs;
here it prints the GC content every 20 steps. The Program runs the optimizer and collects the
result.
python
Inspect the result
The final design is the construct’s joined sequence. Per-segment results live undermetadata["segments"][<label>], and each constraint’s diagnostics sit under the constraints
entry keyed by the label set above. Reading gc_content and max_homopolymer_length back out
confirms the design lands inside the 40-60% GC window with no run longer than five bases.
python
Next Steps
Using Constraints
Score sequences with built-in and custom constraints.
Using Generators
Propose candidate sequences.
Using Optimizers
Run and chain optimizers.
Multi-Stage DNA Optimization
The same objective refined across two optimizer stages.