Proto is not affiliated with NCBI. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
PubChem (Kim et al., 2023) is a freely accessible chemistry resource hosted by NCBI. It aggregates compound records with well-defined chemical structures, depositor-supplied substance records, and bioassay results contributed by hundreds of data sources. Each unique compound is assigned a stable Compound Identifier (CID), and standardized structure representations and computed descriptors are derived from a uniform processing pipeline. Internally, the tool calls the PUG REST endpoint athttps://pubchem.ncbi.nlm.nih.gov/rest/pug. It first resolves the supplied identifier to one or more CIDs. A name, SMILES, or InChIKey is sent as a URL-encoded GET against the matching /compound/{domain}/{value}/cids/JSON endpoint, an InChI is submitted via POST, and a CID skips resolution entirely. It then fetches the configured property bundle from /compound/cid/{cid}/property/{properties}/JSON, and optionally retrieves synonyms, descriptions, and BioAssay identifiers through additional endpoints. Results reflect the live database at query time rather than a fixed release snapshot.
Learning Resources
- PUG REST documentation (PubChem) - official reference for the request grammar, compound domains, property names, and response formats.
- Programmatic access (PubChem) - overview of the programmatic interfaces and the published usage policies and rate limits.
Tools
PubChem Fetch (pubchem-fetch)
Resolves a single small-molecule identifier to a PubChem CID and returns the requested property bundle, the full list of matched CIDs, and optionally synonyms, textual descriptions, BioAssay identifiers, the source URL, and the raw property record.API Reference
Input: PubChemFetchInput
Input: PubChemFetchInput
Config: PubChemFetchConfig
Config: PubChemFetchConfig
/description/JSON)./aids/JSON). For common compounds this can return thousands of assay IDs.True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: PubChemFetchOutput
Output: PubChemFetchOutput
iupac_name.include_synonyms is False).include_description is False).include_aids is False). For common compounds this can return thousands of IDs.Applications
Use this to resolve a ligand to its canonical structure and properties before structure-based or chemical-constraint work: convert a user-supplied name or SMILES into a canonical CID and standardized SMILES/InChI/InChIKey before docking, deduplicate or join compound sets on canonical identifiers, or pull descriptor counts for rule-of-five style filtering. PubChem CIDs anchor cross-references into other chemistry resources. Pair this with NCBI E-utilities to pull linked literature or biomolecule records once a CID is resolved.Usage Tips
- Ambiguous names resolve to multiple CIDs. A generic name can match many compounds. The tool deterministically selects the first CID and records the full list in
all_matched_cids. Pass a CID directly when the identity must be exact. - Prefer CID inputs for large batches. Supplying a CID skips the resolution call and reduces the request count per query, which matters under the rate limits.
- Synonym, description, and BioAssay retrieval each add a request. Enabling them issues an extra HTTP call, and for common compounds the BioAssay list can return thousands of identifiers.
- Results track the live database. The same call can return updated structures or properties as PubChem ingests new depositions. It is not pinned to a release.
Toolkit Notes
These apply to every PubChem tool in this toolkit (pubchem-fetch).
- Requires network access. The tool calls the live PubChem PUG REST API. It does not run offline and keeps no local copy of the database.
- Subject to PUG REST throttling. PubChem applies dynamic per-user throttling, with limits of no more than 5 requests per second, 400 requests per minute, and 300 seconds of running time per minute. Exceeding them returns HTTP 503.

NCBI