Proto is not affiliated with PDBe, EMBL-EBI, and wwPDB. This toolkit is open source and builds on the implementations produced by these organizations. Product names, logos, and trademarks are the property of their respective owners.
Background
The wwPDB Chemical Component Dictionary (CCD) is the dictionary of every chemical component observed in the Protein Data Bank: small-molecule ligands, modified amino acids, ions, cofactors, nucleotides, and sugars. Each component has a 1- to 5-character identifier (for exampleATP for adenosine triphosphate, HEM for heme, MG for magnesium ion, SEP for phosphoserine), and each entry stores atoms, bonds, formula, IUPAC name, descriptors (SMILES / InChI / InChIKey), release status, and, for modified residues, a parent component (for example SEP to SER).
Cross-references via UniChem link CCD entries to external chemistry databases (DrugBank, ChEMBL, PubChem, ChEBI), so the same molecule can be looked up across resources. The tool reads a bundled copy of the CCD components.cif (Kunnakkattu et al., 2023) loaded via pdbeccdutils.core.ccd_reader. SMILES inputs are canonicalized with RDKit and matched against an index built over the bundled dictionary by canonical SMILES and InChIKey. Records and their provenance come directly from the wwPDB Chemical Component Dictionary, distributed by PDBe.
Learning Resources
- PDBe CCDUtils documentation (PDBe) - official documentation for the underlying library, covering the CCD reader, descriptors, and depiction.
- Chemical Component Dictionary (wwPDB) - the reference description of the CCD, its identifiers, and what each entry stores.
- UniChem (EMBL-EBI) - the cross-reference service used to map CCD entries to external chemistry databases.
Tools
CCD Lookup (ccd-lookup)
Enriches wwPDB Chemical Component Dictionary entries. Accepts CCD codes (such as "ATP") or SMILES strings, in mixed batches, and returns a CcdLookupOutput containing a Ligands collection plus parallel CcdEnrichment records: formula, descriptors, parent component, RDKit physicochemical properties, optional UniChem cross-references, and optional PDB usage.API Reference
Input: CcdLookupInput
Input: CcdLookupInput
"ATP") or SMILES strings. A single string is normalized to a list.Config: CcdLookupConfig
Config: CcdLookupConfig
True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: CcdLookupOutput
Output: CcdLookupOutput
ccd_code=None); see enrichments[i] for the resolution status.ligands.fragments.Applications
Use this for user-facing enrichment workflows such as notebooks, scripts, dashboards, and ligand reports. Resolve a CCD code or SMILES to a canonicalFragment with formula, descriptors, and physicochemical properties before structure prediction or docking, map a ligand to external chemistry databases via UniChem, or discover which experimental structures contain a ligand before structure-based work. The returned Ligands collection feeds directly into tools that take ligands as input, and PDB identifiers from pdb_structures pair naturally with the PDB tool.Usage Tips
- For per-fragment SMILES-to-CCD lookups, use
proto_tools.entities.ligands.ccd_utils.map_smiles_to_ccd_codeinstead. It runs the same lookup in the current Python process, without the subprocess startup this tool incurs, so when you are not using the persistent tool context it can be much faster. - A SMILES with no CCD match returns
ccd_code=Nonerather than an error. Checkenrichment.ccd_code is not Nonebefore treating an entry as found.result.num_unresolvedgives the batch-level count. parent_ccd_codeis populated only when the CCD entry declares a canonical parent component. Modified residues likeSEP(phosphoserine, parentSER),MSE(selenomethionine, parentMET), orPTR(phosphotyrosine, parentTYR) carry a parent code. Most small-molecule ligands have no parent and returnNone.pdb_structurescan be very large. For common cofactors and ions (HEM,ATP,NAG,MG,ZN) the list runs to many thousands of PDB IDs. It may be helpful to process it in chunks rather than loading every entry at once when you pass it to a downstream step.
Toolkit Notes
These apply to every CCD Lookup tool in this toolkit (ccd-lookup).
- Offline by default. The tool reads a bundled copy of the wwPDB CCD and runs fully offline. Only
include_cross_references(UniChem) andinclude_pdb_usage(RCSB) require network access, and both default to off. - One-time data download. First use downloads the roughly 70 MB compressed CCD bundle (
components.cif.gz) to$PROTO_MODEL_CACHE/ccd_lookup/and decompresses it to acomponents.cifof roughly 500 MB, which grows as the dictionary grows. Subsequent runs reuse the decompressed file.

PDBe
EMBL-EBI
wwPDB