The key value of our curation is the extraction of chemistry-activity-target data from papers. Giving this relationship a formal structure in our database records not only provides direct value for users but this is also propagated globally by other databases that link to and/or subsume our content. Within the pharmacology/chemogenomic database ecosystem the largest source of chemistry <> PubMed ID links is PubChem. Many PubChem records include depositor-provided cross-references to scientific articles in PubMed, both related to chemical structures and bioassay data. The recent paper by Kim and the PubChem team [1] includes a detailed statistical analysis of these relationships that add up to 5.6 million connections between 2.2 million PMIDs and 301,000 compound records (CIDs). The paper also describes and compares in detail the different depositors, publisher-supplied and Mesh chemisty <> PMID links.
Since we are one of the PubChem depositors of these relationships, we were pleased to see not only a positive mention in this paper but also a detailed breakdown of our own contribution of 11,250 CID <> PMID relationships (presented in Table 1). Although these are small numbers compared to the total, it should be noted that ~95% of these are generated automatically (i.e. not curated) by the IBM patent extraction system that they operated on PubMed in parallel with patent document processing up to 2010. Note this chemistry-to-literature connectivity is slowly being expanded by journals, include the British Journal of Pharmacology [2].
Comments by Curation Team