Skip to main content
step 1step 2step 3step 4prompteach step: extend +beam_length tokens → score full sequence → keep top-Kkeptprunedbest path
step 1step 2step 3step 4prompteach step: extend +beam_length tokens → score full sequence → keep top-Kkeptprunedbest path

This optimizer is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Source
proto-bio/proto-language/proto_language/optimizer/beam_search_optimizer.py
View source
Beam search optimizer for sequence generation. This optimizer implements beam search for sequence optimization where a single target segment is generated with beam search. The optimizer maintains K beams (running sequences) and generates K x N total proposals at each step by producing N variations per beam. After constraint evaluation on the FULL accumulated sequence, only the top K sequences by energy are retained for the next step. The segment is split into ceil(sequence_length / beam_length) steps. Each step asks the single autoregressive generator for beam_length new tokens per proposal (the last step is truncated to the remaining tokens), scores each proposal on its full accumulated sequence, and resamples any beam left with fewer than proposals_per_result valid proposals (up to max_resample_attempts, raising RuntimeError if a beam still falls short) before ranking. Within a beam, proposals are kept by their most recent step energy; across beams, the top num_results survivors are ranked by score_by (“mean” averages a beam’s per-step energies, “last” uses only the most recent), become the next step’s parent beams, and seed the final result_sequences. prepend_prompt controls whether the prompt is included in the output, and use_kv_caching reuses generator cache state across steps (requires a KV-cache-capable generator). Use it for long autoregressive design under sequence-level constraints; it targets a single segment and requires a protein/DNA language-model generator (Evo1/Evo2/ProGen2), not a CPU generator.

How It Works

Beam search extends a single segment step by step, pruning weak beams as it grows. Beam search grows one segment left to right. The segment is split into num_beams = ⌈ L / beam_length ⌉ steps; at each step every one of the K = num_results beams is extended by beam_length tokens in proposals_per_result variations, each candidate is scored on its full accumulated sequence, and only the top K beams survive:
num_beams = ⌈ L / beam_length ⌉
score_agg(beam) = mean(beam_scores)   if score_by = "mean"
                = beam_scores[-1]      if score_by = "last"
keep the K beams with smallest score_agg
Beams left with too few valid proposals are resampled (up to max_resample_attempts). Beam search starts from prompt and ignores upstream results; use_kv_caching reuses the generator’s KV cache across steps.

API Reference

ConfigBeamSearchOptimizerConfig Source
Configuration object for BeamSearchOptimizer.This class defines configuration parameters for the beam search optimizer, which generates a single long segment by splitting it into beams of beam_length tokens and performing beam search at each beam boundary.
prompt
string
required
Non-empty seed sequence that every beam begins from and extends (e.g. ‘ATCG’ for DNA).
num_results
integer
Number of beams (top-K by energy) retained at each beam boundary. Overrides program-level count.
proposals_per_result
integer
required
Number of proposals to generate per result sequence at each beam step.
beam_length
integer
required
Tokens per beam-search step before re-ranking; segment split into ceil(len/this) steps.
score_by
enum
default:"mean"
‘mean’ averages a beam’s per-step energies across all steps; ‘last’ uses only the most recent.Options: mean, last
prepend_prompt
boolean
default:"True"
Whether to prepend the prompt to the generated sequence in the output.
use_kv_caching
boolean
default:"False"
Reuse cached KV state across beam steps to speed up generation; needs a KV-capable generator.
max_resample_attempts
integer
default:"3"
Maximum number of times to resample beams with invalid (inf/NaN) energies before giving up.
seed
integer
Random seed for reproducible optimization, generator, and constraint tool streams.
tracking_interval
integer
default:"1"
Save history and log progress every N steps. Step 0 and final step always saved.
track_proposals
boolean
default:"False"
Save granular per-proposal results (accept/reject) in history snapshots.
verbose
boolean
default:"False"
Emit per-step debug information about proposals, scores, and acceptance through the logger.

Usage

python
>>> from proto_language.constraint import gc_content_constraint
>>> from proto_language.core import Constraint, Construct, Segment
>>> from proto_language.generator import Evo2Generator, Evo2GeneratorConfig
>>>
>>> segment = Segment(length=10000, sequence_type="dna")
>>> generator = Evo2Generator(Evo2GeneratorConfig(prompts="ATCG"))
>>> gc = Constraint(
...     inputs=[segment], function=gc_content_constraint, function_config={"min_gc": 40, "max_gc": 60}
... )
>>> beam_search = BeamSearchOptimizer(
...     target_segment=segment,
...     constructs=[Construct([segment])],
...     generators=[generator],
...     constraints=[gc],
...     config=BeamSearchOptimizerConfig(
...         prompt="ATCG", beam_length=2000, num_results=5, proposals_per_result=10
...     ),
... )
>>> # beam_search.run() drives the loop

Metadata

PropertyValue
Keybeam-search
ClassBeamSearchOptimizer
Targets Single SegmentTrue
Uses GPUFalse
Compatible Generatorsevo1, evo2, progen2