
This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.
Background
MinCED is a derivative of the CRISPR Recognition Tool (CRT) (Bland et al., 2007), maintained by Connor Skennerton. CRISPR arrays are blocks of short, near-identical direct repeats (typically 23 to 47 nt) separated by unique spacer sequences (typically 26 to 50 nt) that record fragments of past viral and plasmid infections; they form the heritable memory of the CRISPR-Cas adaptive immune system of bacteria and archaea. Internally, MinCED uses a k-mer seed-and-extend strategy. It scans for short exact k-mer matches that recur at a consistent spacing, then extends each seed bidirectionally to the actual repeat length, and finally validates the candidate by checking that the inter-repeat spacers fall within the configured length window. The algorithm runs on raw DNA, has linear time complexity in sequence length, and finishes in seconds on a typical 5 Mb bacterial genome on commodity CPU hardware.Learning Resources
- ctSkennerton/minced (Connor Skennerton) - official repository with the canonical command-line flag surface, installation instructions, and example output.
- PMC1924867 (CRT paper) (Bland et al.) - the full text of the algorithm description, including the seed-and-extend mechanism and the comparison against PatScan and PILER-CR.
Tools
MinCED CRISPR Array Detection (minced-crispr)
Detects CRISPR arrays in one or more nucleotide sequences. Returns, per input sequence, a list of CrisprArray objects; each carries an ordered list of CrisprRepeatSpacer units with the repeat’s start position, the repeat sequence, and the following spacer (the last unit has no spacer).API Reference
Input: MincedInput
Input: MincedInput
seq_0, seq_1, …); results are returned in input order.Config: MincedConfig
Config: MincedConfig
min_repeat_length.min_spacer_length.True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: MincedOutput
Output: MincedOutput
Applications
Use this to confirm and catalog CRISPR loci across newly sequenced bacterial and archaeal genomes, or to mine spacer libraries from metagenomic assemblies for phage-host interaction studies. As a pre-filter, runminced-crispr first to verify that a candidate contig actually carries a CRISPR array before spending compute on downstream Cas and tracrRNA analysis with pyhmmer-hmmsearch for Cas effector domains and crispr-tracr-rna for tracrRNA on the same locus. The spacer set returned for each array can then be aligned against phage or plasmid sequence databases to reconstruct the host’s immune history.Usage Tips
min_num_repeatscontrols the sensitivity-versus-specificity trade-off. The default of 3 balances both for typical bacterial and archaeal genomes. Lower it to 2 to catch partial or degraded arrays at the cost of more false positives, and raise it to 4 or more when only high-confidence arrays should pass through.- The 23 to 47 nt repeat and 26 to 50 nt spacer windows match canonical CRISPR loci. Widen
max_repeat_lengthandmax_spacer_lengthto detect atypical families such as Type IV-A or CRISPR systems with unusually long spacers, and lowermin_repeat_lengthonly when chasing partial repeats since values below 20 nt start to pick up generic tandem repeats. - MinCED only locates the array; it does not identify Cas genes or classify the CRISPR system. Type assignment requires downstream Cas-effector annotation, typically
pyhmmer-hmmsearchagainst curated Cas HMMs or a dedicated classifier such as CRISPRcasIdentifier. - Inverted length ranges are caught at config time. Setting
max_repeat_length < min_repeat_lengthormax_spacer_length < min_spacer_lengthraisesValueErrorbefore the run starts, so the call fails fast instead of completing with an empty result set. - Spacer count is not an immunity-breadth metric. Multiple spacers in an array can target the same phage, and many spacers are degraded remnants of historical encounters, so the number of spacers overestimates how many distinct threats the host can recognize today.
Toolkit Notes
These apply to every MinCED tool in this toolkit (minced-crispr).
- Runs on CPU only. MinCED is a Java program; the standalone install bundles a Java runtime alongside the
mincedprogram. There is no GPU acceleration to enable, and runtime is seconds per typical bacterial genome. - Self-contained after install. The standalone
setup.shdownloads themincedprogram once; subsequent runs need no network access and no model weights or reference databases. - Sequences are processed one at a time. The wrapper iterates over
inputs.sequencessequentially rather than parallelizing across them. For large batches, run independent calls in parallel from the caller side.