Proto is not affiliated with EMBL-EBI. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
InterPro (Blum et al., 2025) is a freely accessible classification of protein families, domains, conserved sites, and homologous superfamilies, maintained by EMBL-EBI. A protein family is a set of evolutionarily related proteins that descend from a shared ancestor and share detectable sequence similarity, typically along with a common three-dimensional fold or biological function. A single InterPro entry groups orthogonal member-database signatures, such as a Pfam HMM and a CATH-Gene3D structural model, under one accession. InterProScan is the analysis pipeline that runs the member-database models against a sequence, and EBI exposes it as a public web service. Internally, the direct path issuesGET https://www.ebi.ac.uk/interpro/api/entry/all/protein/uniprot/{accession}, walking the opaque next cursor across paginated responses until the result set is exhausted. The submit path issues POST https://www.ebi.ac.uk/Tools/services/rest/iprscan5/run/ with a required contact email and the sequence, receives a plain-text job ID, polls /status/{job_id} every three seconds until the job reaches FINISHED, then fetches /result/{job_id}/json. Both paths flatten matches into the same row schema, with each member-database match contributing rows carrying 1-indexed inclusive start and end coordinates to match biological residue selection conventions, a unified type label, the parent InterPro accession when integrated, and optional Gene Ontology (GO) and pathway cross-references.
Annotations and their provenance come directly from EMBL-EBI’s official InterPro REST API and iprscan5 service. Results reflect the live resource at query time rather than a fixed release snapshot.
Learning Resources
- InterPro documentation (EMBL-EBI) - official documentation covering InterPro entries, member databases, and the REST API.
- Job Dispatcher web services documentation (EMBL-EBI) - reference for the iprscan5 submit-and-poll REST service, including fair-use guidance.
Tools
InterProScan Fetch (interproscan-fetch)
Retrieves InterPro domain annotations for a protein, either by direct REST lookup of a UniProt accession or by submitting a raw sequence to the iprscan5 service, and returns the resolved accession, sequence length, the list of member-database hits, the source URL, the iprscan5 job ID on the sequence path, and the raw API entries.API Reference
Config: InterProScanFetchConfig
Config: InterProScanFetchConfig
INTERPROSCAN_EMAIL environment variable; an explicit value passed to the config overrides the env var.None runs the EBI default set (every application enabled, matching upstream appl[] defaults).goterms form param on the submit path; filters parser output on the direct path.nucleic tells iprscan5 to 6-frame translate the input.Available options: protein, nucleicTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: InterProScanFetchOutput
Output: InterProScanFetchOutput
None when the sequence path returns a result without a UniProt cross-reference.len(domains).Applications
Use this to attach domain, family, and site annotation to a protein before design or filtering: identify the residues of anactive_site or conserved_site match to lock before a redesign loop, partition a sequence into typed family and domain regions, or collect GO and pathway cross-references for functional grouping. The resolved accession and the parent InterPro identifiers compose with the UniProt and AlphaFold DB tools for accession resolution and structural context.Usage Tips
- The sequence-submission path requires a contact email. When
sequenceis provided,config.emailmust be set. Provide it either via theemailconfig attribute or via theINTERPROSCAN_EMAILenvironment variable; an explicit config value overrides the env var. The tool raises a clearValueErrorbefore contacting the server if neither is set. The direct accession path ignoresemail. - Provide exactly one of
uniprot_idorsequence. The input validator rejects a call that supplies both or neither. scoreunits are not uniform across rows. The field carries whichever value the source member database publishes, an e-value for some databases and a bit-score for others, so filter bymember_databasebefore comparing scores.- The direct path returns no pathway cross-references. InterPro’s UniProt-keyed endpoint does not surface pathway data, so
pathwaysstays empty on that path regardless of configuration. Pathways are only populated on the sequence-submission path. - A direct lookup raises when the accession is not indexed. Very recent or removed UniProt accessions outside InterPro’s coverage return no entries, surfacing as a
ValueErrorrather than an empty result.
Toolkit Notes
These apply to every InterProScan tool in this toolkit (interproscan-fetch).
- Requires network access. The tool calls the live InterPro REST API and iprscan5 service. It does not run offline and keeps no local copy of the data.
- The sequence-submission path requires a contact email for identification. This email lets EBI contact the submitter about job issues. It does not raise any bandwidth or rate allowance.
- Sequence submissions are subject to a fair-use concurrency cap. EBI asks that jobs be submitted in batches of no more than 30 concurrent jobs.

EMBL-EBI