Proto is not affiliated with NCBI. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
BLAST (Altschul et al., 1990) performs sequence-similarity search through a heuristic algorithm that approximates the exhaustive Smith-Waterman local alignment at a fraction of its computational cost. The query is first broken into short fixed-length words, exact word matches are located in the database, and each match is extended in both directions until the running alignment score drops below a threshold. The statistical significance of each surviving alignment is expressed as an E-value derived from the Karlin-Altschul statistics, which represents the number of alignments with at least the observed score that would be expected to occur by chance for a database of the given size. BLAST supports five program variants that pair query and database types appropriately.blastn aligns a nucleotide query against a nucleotide database. blastp aligns a protein query against a protein database. blastx translates a nucleotide query and aligns the translations against a protein database. tblastn aligns a protein query against a database of translated nucleotide sequences. tblastx translates both query and database. The toolkit’s local execution mode uses the NCBI BLAST+ command-line distribution (Camacho et al., 2009), which provides the blastn, blastp, blastx, tblastn, tblastx, and makeblastdb command-line programs that this toolkit invokes. The remote execution mode dispatches to the public NCBI BLAST web service through the QBLAST API.
Learning Resources
- NCBI BLAST web service (NCBI). The public hosted interface that the remote execution mode targets, useful for an interactive run before scripting against the tool.
- NCBI BLAST+ User Manual (NCBI Bookshelf). The reference manual for the command-line distribution that the local execution mode runs.
Tools
BLAST Search (blast-search)
Aligns a query sequence against a reference database and returns the resulting hits. The remote execution mode submits the query to the NCBI BLAST web service through the QBLAST API. The local execution mode invokes the appropriate BLAST+ program (blastn, blastp, blastx, tblastn, or tblastx) against a user-supplied database. The query field accepts either a raw nucleotide or protein sequence string or a path to a FASTA file, and the input form is detected automatically.API Reference
Config: BlastSearchConfig
Config: BlastSearchConfig
"online" routes to NCBI QBLAST; "local" runs BLAST+ CLI against a local database.Available options: online, localblastn, blastp, blastx, tblastn, tblastxnt, nr, refseq_rna, refseq_protein, swissprot, pdb, pataa, patnt["-max_hsps", "1"]). Local mode only; online mode goes through NCBIWWW.qblast which doesn’t accept arbitrary CLI tokens.True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Applications
This tool is the standard first step in any analysis that begins with an unknown sequence and asks what it resembles. Representative applications include functional annotation of a newly assembled gene through homology to characterised proteins, taxonomic identification of an environmental DNA fragment, off-target screening of a PCR primer or CRISPR guide against a reference genome, and tracing the evolutionary distribution of a gene across species.Usage Tips
- The
programfield must match the query and database types. Mismatched combinations return no hits and waste a search. Useblastnfor nucleotide-against-nucleotide,blastpfor protein-against-protein,blastxfor a nucleotide query against a protein database,tblastnfor a protein query against a nucleotide database, andtblastxfor translated nucleotide against translated nucleotide. - Remote execution targets the NCBI BLAST web service and is limited by NCBI rate limits. The
databasefield selects from the hosted reference databases (nt,nr,refseq_rna,refseq_protein,swissprot,pdb,pataa,patnt). High-throughput or batch workloads should use local execution to avoid being throttled or blocked by NCBI. - Local execution requires a
local_dbvalue pointing at a prebuilt database. Build one withblast-create-dbor download a prebuilt NCBI database. The path is the database stem with no file extension. The configuration validator hard-errors whenlocal_dbis missing in local mode. evalueis the primary parameter controlling sensitivity. The BLAST+ default of10.0is permissive and returns spurious hits. Set it to1e-5or stricter to filter out alignments that would occur by chance, or use a higher value when searching for short or divergent matches.extra_argsaccepts verbatim BLAST+ CLI tokens and applies only in local execution. Pass any CLI flag not exposed as a typed field through this list (for example["-max_hsps", "1"]). The remote QBLAST API does not accept arbitrary CLI tokens, soextra_argsis ignored whensearch_mode="online"and the configuration validator emits a warning in that case.
Create BLAST Database (blast-create-db)
Builds a local BLAST database from a FASTA file using the BLAST+ makeblastdb program. The output is a set of indexed files referenced by a common stem path. The stem path is returned as db_path and can be passed directly as local_db to blast-search.API Reference
Input: CreateBlastDbInput
Input: CreateBlastDbInput
Config: CreateBlastDbConfig
Config: CreateBlastDbConfig
"nucl" for DNA/RNA, "prot" for protein. Must match the input FASTA.Available options: nucl, protNone falls back to the input FASTA stem.makeblastdb falls back to the input file name when None.blastdbcmd can address sequences by ID; required for v5 taxonomy lookups.5 (taxonomy- aware) is the upstream default since BLAST+ 2.10.Available options: 4, 5"1GB"); upstream caps at "4GB".makeblastdb CLI tokens passed verbatim (e.g. ["-mask_data", "/path/to/mask"]). Escape hatch for flags not exposed as typed fields above.True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: CreateBlastDbOutput
Output: CreateBlastDbOutput
local_db parameter in BlastSearchConfig. For example, if db_path is "/data/mydb", makeblastdb will have created multiple files like "/data/mydb.nhr", "/data/mydb.nin", "/data/mydb.nsq" (for nucleotide databases) or similar extensions for protein databases.Applications
This tool is the prerequisite for any local BLAST workflow that searches against a custom reference set, such as an in-house genome assembly, a curated subset of a public database, or a panel of designed sequences. Building a local database once and reusing it across many queries avoids repeated network traffic to NCBI and gives full control over the reference content.Usage Tips
dbtypemust match the input FASTA type. Use"nucl"for nucleotide sequences and"prot"for amino-acid sequences. The configuration validator hard-errors on any other value, and a mismatch against the FASTA content will be caught bymakeblastdbat runtime.out_prefixdefaults to the input FASTA stem in the same directory. Set it explicitly when the database should live in a different location or under a different name.parse_seqids=Trueis required for FASTA identifiers to be addressable. Enable it when downstream calls need to retrieve sequences by identifier throughblastdbcmdor when building a taxonomy-aware database. Pair it withhash_index=Truefor faster identifier lookups.extra_argsaccepts verbatimmakeblastdbCLI tokens. Use it for niche flags not exposed as typed fields, such as["-mask_data", "/path/to/mask"]for premasking input or["-gi_mask", "..."]for taxonomy-related options.
Toolkit Notes
These apply to every BLAST tool in this toolkit (blast-search, blast-create-db).
- Hits use the standard BLAST
-outfmt 6tabular schema. EachBlastHitcarries the twelve canonical fieldsqseqid,sseqid,pident,length,mismatch,gapopen,qstart,qend,sstart,send,evalue, andbitscore.pidentis reported on a 0-to-100 scale. - The local installation downloads the platform-specific NCBI BLAST+ distribution on first use. The standalone setup pulls the appropriate NCBI BLAST+ tarball and extracts the
blastn,blastp,blastx,tblastn,tblastx, andmakeblastdbexecutables. No reference database is bundled, so local execution requires either a user-built database fromblast-create-dbor a separately downloaded NCBI database. - The two tools differ in execution mode.
blast-searchsupports both online (search_mode="online", the default) and local (search_mode="local") execution.blast-create-dbruns only locally because the NCBI web service does not exposemakeblastdb.

NCBI