Our database now contains over 6500 ligands, spanning 7 chemical classes. An important part of our curation work is to ensure the chemical structures of these ligands are accurate through both careful curation of new ligands and the running of quality control checks on existing ligand structures. In a series of blog posts over the coming weeks we will discuss some case studies from our recent chemical curation work. These case studies were encountered when we worked on a dataset provided by the PubChem team, which involved quality control checks based on cross-referencing between our ligands and their corresponding CID-assigned entries on PubChem. The GtoPdb team are working on adding links to PubChem from all our ligand pages via CIDs therefore we are trying to ensure there is a consensus between the two resources on the structures of our ligands, and that in cases where a compound has been assigned multiple CIDs or is available in different preparations that our choice of curated structures and links, and where relevant supporting comments, reflects this. The topic of this post is discussed in further detail by our team member Chris Southan on his blog and recent poster.
Cross-referencing of lists of INN-assigned compounds between GtoPdb and PubChem revealed there were some mismatches in the specified stereochemistry, and in some cases nomenclature, of the INN-assigned compounds. In other cases, the structures for our ligands agreed with PubChem but the curated CIDs on the ligand pages pointed to PubChem entries with different stereochemistry, or in one instance with a slightly different structure. In order to resolve these differences, we completed cross-checking between our structures, those specified by FDA labels, INN documents and other resources, and the ‘consensus structures’ on PubChem. For readers unfamiliar with the term ‘consensus structure’, this refers to the CID entry on PubChem with the highest number of same structure matches in terms of SIDs for reported bioactivity data. We aim to link to the PubChem entry which has the highest number of same structure matches reported in bioactivity data references. However, in exceptional cases, the consensus structure on PubChem may not match our structure.
Following a review of each of our flagged ligand entries, we made a decision to either add a curators’ comment to our ligand entry explaining any differences between the resources, or where necessary make revisions to the chemical structure, ligand name or PubChem CID links included on the ligand page. This exercise revealed some ambiguities in several chemical structures and posed the challenge of deciding how they are best represented in databases. Some of the issues that arose during the course of this project were already familiar to the team while others were encountered for the first time during this exercise. We are working on devising protocols for our chemical curation methods which some of these case studies will help to define. As usual, we welcome feedback from our users, so please feel free to contact us with any comments you have on the series of posts to follow.
Contributed by Helen Benson