
This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.
- CRISPRcasIdentifier: GPL-3.0
Background
CRISPRtracrRNA (Mitrofanov et al., 2022) detects trans-activating CRISPR RNA (tracrRNA) sequences in nucleotide CRISPR loci. A tracrRNA is a small non-coding RNA that base-pairs with the precursor crRNA in the Class 2 effector systems that depend on one, namely Type II (Cas9) and the tracrRNA-bearing Type V subtypes such as Cas12b, Cas12c, and Cas12e, and the resulting RNA duplex licenses Cas-mediated cleavage of target DNA. It is also the second component fused into the single-guide RNA used in modern genome editing. Because tracrRNAs share little primary-sequence conservation across families, single-model approaches such as Infernal covariance-model search alone miss divergent tracrRNAs in newly sequenced and metagenomic genomes. Internally, the pipeline runs an array-detection step with CRISPRidentify (machine learning), a Cas-cassette step with CRISPRcasIdentifier (HMM and machine learning), a tracrRNA candidate scan with Infernalcmsearch against curated covariance models, an anti-repeat alignment step using fasta36, vmatch, Clustal Omega, and BLAST, an RNA-RNA interaction step with IntaRNA, and a transcription-terminator step with erpin. A final ranking step combines the per-candidate features into a single weighted score, and a faster model_run mode performs only the covariance-model scan and skips the validation evidence and the ranking step.
Learning Resources
- BackofenLab/CRISPRtracrRNA (Bioinformatics Group Freiburg) - official repository with installation instructions, the canonical configuration surface, and the curated covariance models distributed with the tool.
- EddyRivasLab/infernal (The Eddy/Rivas Laboratory, Harvard) - official repository and User’s Guide for the covariance-model search engine and the
cmsearchE-value statistics that score tracrRNA candidates. - BackofenLab/IntaRNA (Bioinformatics Group Freiburg) - official repository for the RNA-RNA interaction predictor that scores the anti-repeat to repeat duplex.
Tools
CRISPRtracrRNA Prediction (crispr-tracr-rna)
Predicts tracrRNA candidates from one or more nucleotide sequences and returns, per input sequence, a list of CrisprTracrRNAPrediction rows sorted by ranking score. Each row carries the candidate position and sequence, CRISPR array context, anti-repeat similarity and coverage, predicted RNA-RNA interaction with the repeat, terminator location and score, distance to the nearest Cas-effector cassette, and a single weighted multi-evidence score.API Reference
Input: CrisprTracrRNAInput
Input: CrisprTracrRNAInput
seq_0, seq_1, …); results are returned in input order.Config: CrisprTracrRNAConfig
Config: CrisprTracrRNAConfig
II, allcomplete_run, model_runTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: CrisprTracrRNAOutput
Output: CrisprTracrRNAOutput
Applications
Use this to confirm and characterize Type II and Type V CRISPR-Cas loci, since a detected tracrRNA is the component that completes a functional Class 2 locus and distinguishes a Cas9 or Cas12 system from an unaccompanied CRISPR array. Pair it withminced on a confirmed array to recover the crRNA spacers, then design a single-guide RNA by fusing a spacer-bearing crRNA with the detected tracrRNA scaffold for genome-editing experiments. Run it across metagenomes and uncultured genomes to discover novel Cas9 or Cas12 systems whose tracrRNAs are too divergent to be caught by covariance-model search alone.Usage Tips
- Provide each CRISPR locus with at least 5 kb of flanking sequence on either side. The multi-evidence pipeline needs adjacent context to locate the Cas cassette and the downstream transcription terminator. Loci submitted as narrow windows lose those evidence channels and fall back to a covariance-model-only score.
model_typedefaults to"II", which only screens for Cas9 systems. To also screen tracr-bearing Type V (Cas12b, Cas12c, Cas12e, …) loci, setmodel_type="all"andperform_type_v_anti_repeat_analysis=True. The Type V path is off by default because it is slower and irrelevant when only Cas9 loci are of interest.- Type I and Type III CRISPR systems do not use a tracrRNA. A
complete_runon such a locus returns array context and Cas annotations but empty tracrRNA fields, with the ranking score reflecting only the partial evidence. - The ten
weight_*ranking knobs interact. Sweep them together against a held-out positive and negative set rather than tuning a single weight in isolation, and keep upstream’s documented defaults when there is no specific objective to optimize for. run_type="model_run"is the high-throughput pre-filter, not the final answer. It runs only the Infernalcmsearchstep and returns candidates with E-values but none of the array, interaction, or terminator evidence, so re-run promising candidates throughcomplete_runbefore drawing conclusions.
Toolkit Notes
These apply to every CRISPRtracrRNA tool in this toolkit (crispr-tracr-rna).
- Runs on CPU only. The pipeline drives Infernal, IntaRNA, fasta36, vmatch, Clustal Omega, BLAST, and erpin, all CPU-based programs. There is no GPU acceleration to enable.
- Initial install pulls model archives from Google Drive.
complete_runmode requires the CRISPRcasIdentifier ML and HMM archives, which the standalone install fetches once. Google Drive rate-limits anonymous fetches, so on a failed install retry after a minute or follow the upstream README to place the two archives in the CRISPRcasIdentifier directory by hand. After install the runtime needs no further network access. num_workersparallelizes across input sequences, not within a sequence. Each worker runs the full pipeline in its own working directory to avoid file-name collisions between concurrent jobs. The default of 1 is single-process; set it explicitly when batch-scanning many loci. The wrapper caps the effective worker count atlen(sequences), so over-provisioning is safe.