Skip to main content
License: UniProt has a CC-BY-4.0 license and may require explicit attribution when utilized. Please refer to the license for full terms.

Proto is not affiliated with SIB, EMBL-EBI, and PIR. This toolkit is open source and builds on the implementations produced by these organizations. Product names, logos, and trademarks are the property of their respective owners.


/
/
View repo
uniprot.org
Visit website
UniProt: the Universal Protein Knowledgebase in 2025
The UniProt Consortium
Nucleic Acids Research (2025)
Read paper
@article{theuniprotconsortium2025,
  title={UniProt: the Universal Protein Knowledgebase in 2025},
  author={The UniProt Consortium},
  journal={Nucleic Acids Research},
  volume={53},
  number={D1},
  pages={D609--D617},
  year={2025},
  publisher={Oxford University Press},
  doi={10.1093/nar/gkae1010}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/database_retrieval/uniprot
View source
Open Notebook
Open notebook
Coming soon!
Run this tool directly in Proto with no setup required.
FunctionDescription
run_uniprot_fetch()Fetch protein entries from UniProt by accession or search by name and organism Docs Source

Background

UniProt (The UniProt Consortium, 2025) is the central, freely accessible resource for protein sequence and functional annotation, maintained by SIB, EMBL-EBI, and PIR. Its core database, UniProtKB, has two sections: Swiss-Prot, whose entries are manually reviewed and curated from the literature, and TrEMBL, whose entries are automatically annotated. As of release 2026_01 (January 2026), Swiss-Prot contains 574,627 reviewed entries, alongside hundreds of millions of unreviewed TrEMBL entries; current counts are published on the UniProt statistics page. Internally, the tool calls the UniProt REST API at rest.uniprot.org. Given an accession it fetches that UniProtKB entry directly; given a protein or gene name and an organism it runs a UniProt search and selects one entry deterministically, preferring an exact gene-name match, then optionally entries with linked PDB structures, then reviewed Swiss-Prot status. It extracts the sequence, length, review status, gene symbols, and PDB cross-references, and also returns the complete entry JSON; the fields option narrows the API response. Results reflect the live database at query time rather than a fixed release snapshot. Records and their provenance come directly from UniProt’s official REST API, maintained by the UniProt consortium.

Learning Resources

Tools

UniProt Fetch (uniprot-fetch)

Retrieves a single UniProtKB entry, either by accession or by a ranked name-and-organism search, and returns its sequence, length, gene names, review status, PDB cross-references, source URL, and the full entry JSON.

API Reference

Source
uniprot_id
string
UniProt accession for direct entry lookup.
target_name
string
Gene or protein name for search-based lookup.
organism
string
Organism name for disambiguation during search.
prefer_pdb_crossref
boolean
default:"False"
When searching, prefer entries that have linked PDB structures.
max_candidates
integer
default:"5"
Maximum number of search results to evaluate when ranking.
Source
fields
array
UniProt’s fields= query parameter — if set, restrict the API response to these fields. None (default) returns the full entry (~880 KB for human TP53); a targeted selection can shrink it ~1000x. Caveat: typed Output fields are only populated when the corresponding API field is included, so callers using accession / sequence / entry_type / gene_names / pdb_crossrefs must include "accession" / "sequence" / "reviewed" / "gene_names" / "xref_pdb". Search-mode ranking reads reviewed / gene_names / xref_pdb. Full list:
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cpu"
Device to run the tool on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
accession
string
required
Primary UniProt accession.
sequence
string
Protein sequence string.
length
integer
Sequence length.
entry_type
string
Review status (e.g. ‘UniProtKB reviewed (Swiss-Prot)’ for curated entries).
gene_names
List[string]
Extracted gene name symbols.
pdb_crossrefs
List[string]
PDB structure IDs linked to this protein entry.
source_url
string
required
UniProt entry URL.
raw_entry
Dict[string, any]
Complete UniProt JSON record for advanced programmatic access.

Applications

Use this to pull a reference protein sequence and its annotation into a pipeline: fetch a target by accession before sequence design or optimization, resolve a gene symbol plus organism to a canonical reviewed entry, or discover which experimental structures are linked to a protein before structure-based work. The returned PDB identifiers feed directly into the PDB and AlphaFold DB tools.

Usage Tips

  • Provide either uniprot_id or both target_name and organism. An accession does a direct lookup; a name requires the organism to disambiguate, and the search returns the single best-ranked entry, not a list.
  • prefer_pdb_crossref only affects search ranking. It biases the name-and-organism search toward entries with linked PDB structures; it has no effect on a direct accession lookup and never filters out entries that lack structures.
  • fields narrows the response but can blank typed outputs. Restricting fields shrinks large entries substantially, but the typed outputs are only populated when their source fields are kept, so include accession, sequence, reviewed, gene_names, and xref_pdb if you read those.
  • Results track the live database. The same call can return updated annotation as UniProt releases change; it is not pinned to a fixed release.

Toolkit Notes

These apply to every UniProt tool in this toolkit (uniprot-fetch).
  • Requires network access. The tool calls the live UniProt REST API; it does not run offline and keeps no local copy of the database.
  • Subject to UniProt rate limits. Large or rapid batches may be throttled by the UniProt API; space out high-volume requests.
  • Runs on CPU. There is no model and no GPU; latency is dominated by the network round-trip.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.