We know users find our curated PDB links of high value, particularly where the ligand co-crystalised within the target protein is an activity-mapped approved drug, clinical candidate or key metabolite. The comparative SAR insights these can provide are substantial, especially for those located in recent GPCR and ion channel structures. Notwithstanding, our experience indicates that verification of an explicit link (i.e. “this” chemical structure is in “this” protein sequence) is distinctly non-trivial for a range of reasons (some of which are outlined in this blog post). This means we keep our curation rules under review and carry out statistical audits of our PDB ligand content. This post is somewhat technical but what you see below is a Venn diagram that compares our ligands with two independent sources of small-molecule structure assignments (note these are derived from 3D coordinates but the analysis is actually comparing 2D structures).
The keys for the segments are as follows:
- “GtoP>PDB 293” indicates the ligands where we have curated a direct PDB link between the protein and chemical structure, counted as PubChem CIDs.
- “UC>GtoP>PDBe 938” refers to any GtoPdb ligands where (via UniChem mappings) a PDBEurope (PDBe Chem) link is recorded and have CIDs.
- “GtoP>MMDB 988” refers to any GtoPdb ligands where the PDB link is recorded via the NCBI Molecular Modelling Database (MMDB) not PDBe.
The high level interpretations are: a) We freely admit to a backlog of ~ 500 PDB ligand structures for which we could retrospectively add links. The main reason is simply a legacy effect, where older ligands have more recently appeared as PDB structures. However, some of these will be hetero-atom structures rather than specifically bound ligands or in the “wrong” targets (e.g. ACE inhibitor in Drospholia ACE protein). b) The 170 and 35 intersects are both interesting and problematic. The represent discordant PDB small molecule structures assigned either by PDBe (35) or MMDB (170) but not both (i.e. the 560).
We have three consequences going forward to enhance the database. Firstly, we have established contacts with both the UniChem and PDBe teams at the EBI (we already have one of the principal scientists from the MMDB on our chemical Curation Commitee) so we can engage with them on technical aspects. Secondly, priorities permitting, we will certainly do some back-filing of the ~ 500 missing links by triage (e.g. select newest clinical candidates first). Thirdly, we are now in a better position to cross-check any new PDB ligands for the types of discontinuities outlined above. We can then add curatorial comments and cross-pointers as appropriate.
As ever, comments on this topic are welcome.