Given the intense interest in drugs and their human targets, both in pharmacology and chemical biology, obtaining a simple list of either is surprisingly difficult. There are many reasons for this. One of them is that those lists that can be obtained vary in which names, synonyms or identifiers are used both on the drug side for the specification of chemical structures and on the target side for the gene/protein. An example of the former is that the INN may be assigned to the parent structure for the approved drug name (e.g. atorvastatin CID 60823), whereas the USAN specifies the salt form (atorvastatin calcium CID 11227182), while the official label for the medication specifies atorvastatin hemi-calcium trihydrate (CID 656846). The target of atorvastatin can be variously designated as Entrez Gene 3156, UniProt P04035, HMGCR or NP_000850. A second reason that makes the exercise challenging are the different “rules” by which any list is populated, either as a published collation or a selectable subset of any particular database. This produces various types of inter-list discordance. Consequently, no independently produced lists agree 100% on either precisely which chemical structures are represented or how these are “mapped” (in the molecular mechanism of action sense) to protein identifiers.
In the course of expanding and enhancing our own database, we have assessed a substantial number of such lists. While we much appreciated their availability, digging these out was non-trivial and we also had to tackle the task of normalising their entity content. This is necessary for comparative analysis and to understand the basis for their overlaps and differences (as described recently in this paper). Given the effort and expertise needed to prepare these lists, we decided to share them with our user community. You can see the result on this page. As ever, feedback is welcome.
contributed by Chris Southan