Case study in chemical curation II: Racemates and how to represent them

Where the drug is a racemate, should we go ‘flat’?

Example: ketoconazole

Many drugs are racemates, as indicated by their INN document or FDA label. This presents us with a curatorial decision: should we represent both of the enantiomers as separate ligand entries, one ligand entry displaying the structure of one of the enantiomers, but representing the racemate, or a flat structure representing the mixture? And what structure should we link out to in PubChem and other resources?

Ketoconazole is a racemate of a 2R, 4S enantiomer and a 2S, 4R one. For this ligand and other racemates we have chosen to display a ‘flat’ structure, not specifying stereochemistry to represent the mixture. This follows a cross-check against PubChem to ensure the flat structure CID is supported by sufficient number of ‘same structure’ matches, and we also check that the references from which we have derived our biological activity data for the compound do not specify that they used a particular enantiomer in their experiments. Once we are sure of a consensus match for the flat structure on PubChem, and that our activity data can be mapped to a non-specific structure, the structure is curated and the corresponding CID added to the ligand entry. In these cases,  we have been sure to add contextual curators’ comments to our ligand entries and links to the CIDs representing the enantiomers. Where possible we try to ensure the links in our database links table also specify the flat structure, but where this is not possible we have indicated this in our comments.

N.B. There are cases where we have activity data for a particular chiral specification and for the racemate and in these instances we create ligand entries to represent both structures. In a subset of these cases, the single enantiomer is also a drug in its own right. For example: cetirizine and (R)-cetirizine. In a limited number of cases we have data for the racemate and both enantiomers and therefore maintain three separate ligand entries cross-linked by comments.

Special cases: active and inactive enantiomers 

Fluvastatin– racemic mixture with a more active enantiomer.

Fluvastatin was added to our database 3 years ago as part of a pilot project on enzyme pathway curation focusing on the lanosterol synthesis pathway. Unlike all the other statin drugs, the preparation of fluvastatin found in the marketed drug/INN-assigned structure is a racemate of two enantiomers: A ‘3R, 5S’ enantiomer (CID 446155), and a ‘3S, 5R’ enantiomer (CID 1548972). The following image is from the INN document for the drug:

Fluvastatin_INN document

(Image from: , WHO MedNet, Accessed 02/7/14)

However, these two enantiomers do not have equal potency at the target (HMGCR), and it is the 3R, 5S enantiomer that is more active (PMID 16480934). A preparation of the compound with this particular chirality is used experimentally as a ‘standard’ against which new potential inhibitors of the enzyme are compared.

As discussed above, we normally display a ‘flat’ structure to represent racemic drugs. However, for fluvastatin the structure we chose to display represents the more active enantiomer found in the preparation of the approved drug, and we have annotated our ligand entry for fluvastatin with curators’ comments to justify our curatorial decision. PubChem users will note that if you look at the ‘Same, Isotopes’ link for the CID representing our ligand, you will find no less than 15 CIDs. These are permeated by the 2 stereocentres and the E/Z bond. We may write a future blog post to further discuss the complexity of curating representations of fluvastatin, and how this ligand compares to the other statin drugs.

Watch out for the next posts in this series, coming soon.

Contributed by Helen Benson

Posted in Chemical curation
2 comments on “Case study in chemical curation II: Racemates and how to represent them
  1. keithttaylor says:

    Representing racemates as flat structures downgrades the information about the substance. The flat representation represents any mixture of any of the diastereomers and this is clearly wrong for a known racemate, especially when you encounter a substance that is an arbitrary mixture of diastereomers, or genuinely unknown.

    IUPAC has the AND Enantiomer representation for these types of substances. This representation is supported by most (if not all) chemical drawing packages and chemical structure databases.

  2. This observation is correct and the issue was raised in a comment on the Minimum Standards for a Biomolecular Interaction paper ( However, from the publications we use for activity mapping the inclusion of experimentally determined isomeric excess ratios (e.e.) explicitly aligned to activity results is very rare. In the few reports of R/S resolution (an e.e. of 100%) matched to significantly different biological activity (e.g. distinct IC50s) we do record separate database records. Note also that the major public databases we link to are unlikely to be able to represent e.e. in structure records (even if it was possible to include this in comment lines for the very few sources that might specify it). Thus, the bioactivity records in PubChem and ChEMBL, as well as the linked information in DrugBank records, are often linked to the “flat” versions which may also be specified on the INN approvals and FDA drug labels. Users can thus be assured we careful asses the stereo representational options for drug structures with a view to maximising connectivity to bioactivity results.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: