
This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.
Background
The Protein Data Bank (Berman et al., 2000) is the single worldwide archive of experimentally determined macromolecular structures, served here through the RCSB PDB. It is operated by the Research Collaboratory for Structural Bioinformatics (RCSB) at Rutgers University and the University of California San Diego, with funding from the National Science Foundation, the National Institutes of Health, and the Department of Energy. Entries are solved by X-ray crystallography, cryo-electron microscopy, nuclear magnetic resonance spectroscopy, and other experimental methods. The tools call two RCSB HTTP endpoints directly.pdb-fetch-entry issues a GET request to the RCSB Data API core entry endpoint (https://data.rcsb.org/rest/v1/core/entry/{pdb_id}) and reads the structure title from struct.title, the experimental method from the first exptl record, and the resolution from rcsb_entry_info.resolution_combined, which covers both X-ray and cryo-EM entries; entries solved by NMR have no resolution value. pdb-fetch-fasta requests the FASTA endpoint (https://www.rcsb.org/fasta/entry/{pdb_id}), parses each record, extracts the author-assigned chain identifiers from the header, and classifies a sequence as protein when it contains amino-acid letters that do not also occur in nucleotide alphabets. Both tools uppercase the supplied accession, retry transient HTTP failures with backoff, and return an empty result when the accession is not found (HTTP 404). Results reflect the live archive at query time rather than a fixed release snapshot.
Learning Resources
- RCSB PDB Data API documentation (RCSB PDB) - reference for the REST endpoints, query syntax, and rate limits.
- PDB-101 training and education (RCSB PDB) - guided material on PDB data, structure determination methods, and how to interpret entries.
Tools
PDB Fetch Entry (pdb-fetch-entry)
Retrieves structure metadata for a PDB accession from the RCSB Data API core entry endpoint, returning the structure title, the experimental method, the resolution in angstroms, and the request URL.API Reference
Config: PdbFetchConfig
Config: PdbFetchConfig
True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Applications
Use this to assess whether an experimental structure is suitable as a reference before structure-based design or benchmarking: check the experimental method and resolution, then decide whether to use the entry. It pairs with UniProt, whose returned PDB cross-references can be ranked by resolution, and with PDB Fetch FASTA to pull the chain sequences once a suitable entry is selected.Usage Tips
- Resolution is absent for some methods. NMR and fiber-diffraction entries have no resolution value, so
resolutionisNone; filter on it before sorting entries by quality. - This is metadata only. The tool returns the title, method, and resolution, not atomic coordinates or a structure file.
- An unknown accession is not an error. A missing or obsolete accession returns an empty output rather than raising, so check the populated fields before using the result.
PDB Fetch FASTA (pdb-fetch-fasta)
Retrieves the chain sequences of a PDB entry from the RCSB FASTA endpoint, returning one record per unique sequence with the author-assigned chain identifiers that share it, the FASTA header, the sequence, and a protein/nucleic-acid classification, plus the request URL.API Reference
Config: PdbFetchConfig
Config: PdbFetchConfig
True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Applications
Use this to extract reference sequences from an experimental structure for sequence design, alignment, or comparison against computational predictions. Filterchains by is_protein to separate protein subunits from nucleic-acid chains in a complex, and deduplicate identical sequences to recover the unique entities of a homo-oligomer. It follows PDB Fetch Entry once a suitable entry is chosen and consumes PDB identifiers surfaced by UniProt.Usage Tips
- One record can cover several chains. A single
PdbChaincarries every author-assigned chain identifier that shares its sequence, so a homo-oligomer collapses to one record with multiplechain_ids. - Protein classification is heuristic. A chain is called protein only when it contains amino-acid letters absent from nucleotide alphabets; peptide nucleic acids and other hybrid molecules may be misclassified.
- An unknown accession is not an error. A missing or obsolete accession returns an empty
chainslist rather than raising. - Exporting to
fastawrites the original headers. Thefastaexport emits each record using its stored FASTA header verbatim;jsonandcsvare also supported, with thecsvform joining shared chain identifiers with a semicolon.
Toolkit Notes
These apply to every PDB tool in this toolkit (pdb-fetch-entry, pdb-fetch-fasta).
- Requires network access. The tools call the live RCSB PDB HTTP endpoints; they do not run offline and keep no local copy of the archive.
- Subject to RCSB rate limits. RCSB throttles clients that exceed a few requests per second and returns HTTP 429 when the limit is exceeded; space out high-volume requests, since no account or API key is available to raise the limit.
- Runs on CPU. There is no model and no GPU; latency is dominated by the network round-trip.