Hot topics: Will the real splice variants please stand up?

The number of alternative mRNA splice forms that map to human protein coding loci has increased to the point that nearly all proteins have such associated database records. This gives rise to the paradox that the gene build pipeline from the latest Ensembl GRCh38 reference genome assembly indicates 19,919 protein coding loci (which shrinks to 19,022 with HGNC annotation stringency) but 198,002 transcripts (i.e. nearly 10 transcripts per protein). Their is no question that a small number of these alternative splice forms, AS, (plus alternative initiations) have not only been verified to exist as proteins, have some kind of alternative biochemical functions and are also of pharmacological importance [1].  Notwithstanding, compared to the massive transcript profiling that RNAseq now provides routinely, experimentally verifying AS existence at the protein level at large scale is extremely difficult. This is because it can only be done by splice form specific antibodies, western blots detecting different size forms, top down proteomics (i.e. intact mass measurement) or the detection of alternative exon-specific trypic peptides. A recent  review [2] proposes that expanding data sets from the latter approach are consistently detecting only single quantitatively dominant protein isoforms from each locus. The provocative inference is that the vast majority of the 200K odd predicted and/or verified alternative mRNA transcripts are not actually translated into proteins.  This can be seen as an interesting methodological detection “gulph” between RNAseq and MS-proteomics.  However, their has been previous support for the “single isoform” idea on the basis of transcript data alone [3]. An ancillary conclusion from this paper, generally overlooked in terms of its significance, was that when CDS length was taken into account approximately 50% of major transcripts did not corresponding to the ‘canonical’, max-exon, transcript as annotated in Swiss-Prot. This crucial topic is further discussed in [4].

[1] Bonner, T.I. (2014). Should pharmacologists care about alternative splicing? IUPHAR Review 4. Br J Pharmacol. Mar;171(5):1231-40. doi: 10.1111/bph.12526. PMID: 24670145.

[2] Tress et al. (2016). Alternative Splicing May Not Be the Key to Proteome Complexity. Trends Biochem Sci. Sep 16. doi: 10.1016/j.tibs.2016.08.008. PMID: 27712956.

[3] Gonzàlez-Porta et al. (2013). Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene. Genome Biol.  Jul 1;14(7):R70. doi: 10.1186/gb-2013-14-7-r70. PMID: 23815980.

[4] Will the real cannoical protein please stand up.
https://cdsouthan.blogspot.se/2016/11/will-real-canonical-proteins-please.html

Comments by Chris Southan

Posted in Hot Topics

GtoImmuPdb: technical update November 2016

logo_banner

During October we have made the first alpha-release (v1.0) of the Guide to IMMUNOPHARMACOLOGY. This blog post summarises some of the main features of the release and work on the documentation.

This first release marks an important step towards the public deployment of the first beta-release of GtoImmuPdb, scheduled for Spring 2017. We expect to make further alpha-releases over the next few months, as additional features are added.

An early synopsis of the project can be found in this blog post. Previous technical blogs are available for February, May, August & September 2016.

Development Progress

Alpha-Release v1.0

The portal has its own unique branding (header bar, logo and colour scheme) to distinguish it, but retains many of the layout features from the main GtoPdb site. This consistency should help users already familiar with GtoPdb to orientate themselves with the new GtoImmuPdb.

alpha_release_portal

Screenshot of the GtoImmuPdb Portal, alpha-release v1.0

The portal provides a starting-point for accessing data in GtoImmuPdb, tailored to the requirements of users with a specific interest in immunopharmacology. Browsing by target, process and cell-type have been implemented in the alpha_v1.0 release. Ligands can be browsed, but there isn’t yet a immuno specific view for the results.

The portal and other pages with the GtoImmuPdb view toggled on will display a specific Guide to IMMUNOPHARMACOLOGY header and menu-bar. A consistent feature on the GtoImmuPdb pages is a ‘toggle’ button that enables the user to switch out to the standard GtoPdb view (and back).

family2

Family page on GtoImmuPdb, showing new header and toggle button (a key feature of GtoImmuPdb)

Alpha-Release v1.0 Documentation

The main area of development over October 2016 has been to prepare the documentation for the alpha-release. These provide an explanation of the features included, how data was obtained and curated and how to use the site. Detailed release notes have been prepared, which will be incrementally added to or appended to on subsequent releases. They cover the following main sections:

  • GtoImmuPdb portal
  • Receptor Family pages
  • Family Pages
  • Detailed Target pages
  • Immuno Process Association List pages
  • Immuno Cell Type Association List pages
  • Search
  • Database Development

Documentation has also been prepared that gives details on how the data for both the process and cell type associations has been obtained. This includes a detailed spreadsheet on the full GO annotations, obtained via UniProt that form the basis of the immuno process associations.

We have also prepared a tutorial document that is a guide to navigating from the new portal, to access GtoImmuPdb data and understand the new GtoImmuPdb pages.

Alpha-Release v1.0 Data

GtoImmPdb uses the same underlying database as GtoPdb. This is has been extended to include and integrate GtoImmPdb data. The primary data-types of interest to GtoPdb, that have been addresses so far, are processes and cell-types. The database schema has been extended to accommodate these data-types and to associate them with targets in the database.

Immuno Process Data

GtoImmuPdb has defined its own set of top-level immunological process categories against which targets in the database can be annotated and which form the basis of organising, navigating and searching for immunological processes and associations.

These categories are:

  • Immune system development and differentiation
  • Proliferation and cell death
  • Production of signals and mediators
  • Regulation and responses to signals
  • Migration and chemotaxis
  • Cell-mediated immunity
  • Inflammation

We have associated sets of Gene Ontology (GO) terms with each of these categories. This enables us to auto-curate targets annotated to any of those terms (or their children) by GO into our top-level immunological categories. GO data is obtained via an OBO file (http://purl.obolibrary.org/obo/go.obo) for the ontology, which is edited to restrict it to immuno-specific terms. We auto-curate targets to the top-level process terms by using GO annotation information from UniProt. Through UniProt, targets were selected that were annotated to the subset of GO terms and also cross-referenced in GtoPdb. This gave a total of 1,855 annotation to 401 targets.

The table below summaries the unique targets (UniProt) annotated under each category

GtoImmPdb ‘High-Level’ Process Distinct UniProt
Immune System Development and Differentiation 124
Proliferation and Cell Death 33
Production of Signals and Mediators 74
Regulation and Responses to Signals 355
Migration and Chemotaxis 81
Cell-Mediated Immunity 99
Inflammation 261

Provision has been made in the database schema to capture curator comments against process information and annotations and the design is fully-adaptable to future changes.

Cell Type Data

The Cell Ontology provides the formalised vocabulary against which we annotated target to cell type associations. GtoImmuPdb has defined its own set of top-level immunological cell type categories against which targets in the database can be annotated and which form the basis of organising, navigating and searching for immunological cell types and associations.

These categories are:

  • pro-B-lymphocytes, B lymphocytes & Plasma cells
  • T lymphocytes (alpha-beta type) and their immediate progenitors
  • T lymphocytes (gamma-delta type) and their immediate progenitors
  • Natural Killer (NK) cells
  • Polymorphonuclear leukocytes (neutrophils, eosinophils, basophils)
  • Mononuclear leukocytes (syn: monocytes) (macrophages, dendritic cells, Kupffer cells)
  • Mast Cells
  • Innate Lymphoid Cell (added November 2016)

We have assigned one or more Cell Ontology terms to each of these categories. The assigned CO terms represents the highest level parent term(s) within the ontology for that category. For the purposes of annotation, it is these CO terms and their children that can be used when annotating a target to a given category. The exception is innate lymphoid cells which at present are not defined and included in the Cell Ontology.

Other Developments & Next Steps

Fixes have been made to out submission tool to include the ability to add/remove cell type categories and to add definitions/description of them.

Our focus in the next month is to develop the ligand browse landing pages (accessed via Ligand panel on the portal home), and add in icons to highlight immuno-flagged ligands throughout the main GtoPdb site.

We also want to develop the menu-bar navigation for GtoImmuPdb, as this will be important for the beta-release.

This project is supported by a 3-year grant awarded to Professor Jamie Davies at the University of Edinburgh by the Wellcome Trust (WT).

Posted in Technical

GtoPdb Ligands in PubChem

GtoPdb and  its precursor IUPHAR-DB have been capturing the structures of pharmacologically relevant ligands since 2005.  The fig.1. snapshot below  shows the approved drug section of our eight-category ligand classification

As an active collaboration with the  PubChem team, we have submitted our ligand records for every GtoPdb release since  2012.  For the current release of 2016.4 the query  (“IUPHAR/BPS Guide to PHARMACOLOGY”[SourceName])   retrieves 8674 Substance Identifiers (SIDs)  and  6565 Compound Identifiers (CIDs). The excess of 2109 SIDs is accounted for by antibodies, small proteins and large peptides that cannot form CIDs.  At just over 92 million CIDs from 473 sources, a range of property filters and full Boolean operations for combining query sets,  PubChem provides an opportunity to “slice and dice” our ligand set in detailed, comparative  and informative ways.  A set of results is shown below.

gtop_02

The utilities of these intersects are outlined below (in order of counts):

  1. CNER refers to “Chemical Named Entity Recognition” for the automated extraction of chemistry from patents by sources submitting to PubChem (of which SureChEMBL is the largest at 16.3 million). This means that users can track-back most of our ligands to early  patent filings that can often include more SAR than eventually appeared in the papers.
  2. Our low overlap with DrugBank indicates both sources are complementary in bioactive compound selection (i.e. the OR union is 12605)
  3. The possibility of sourcing purchasable compounds is important for experimental pharmacologists. From the 64 million vendor structures in PubChem we have nearly an 80% overlap and similarity searches may pick up analogues where there is no exact match.
  4. The “BioAssay active” tag overlaps extensively with ChEMBL entries but users can check for a range of activities for a ligand that maybe additional to the values we have extracted from selected papers.
  5. The MeSH term “pharmacological action” is useful but our impression is that NLM is falling behind in the PubChem indexing of this term.
  6. PDB ligand structures are valued database cross-references for many reasons.
  7. We have introduced a new feature that allows users to retrieve just our 1291 approved drug SID entries (Query “approved[Comment] AND “IUPHAR/BPS Guide to PHARMACOLOGY”[SourceName]”). The “PubChem Same Compound” select  then generates 1174 small-molecule CIDs. This facilitates different types of comparative analysis between drug lists.
  8. As expected, our overlap with ChEMBL structures is high but we have captured 1147 structures not in this source, mainly due to different journal capture and shorter release cycles.
  9. The selection “unique to GtoPdb” indicates those CIDs where we are the only source in the whole of PubChem. These are predominantly novel structures we have extracted from papers but in some cases we have selected a different structure from other sources.
  10. There may be interest in which pharmacologically active peptides we have CIDs for. A simple Mw-cut isolates 178 entries

In regard to 7) a snapshot from our list of approved drugs is shown below

gtop_03

 

 

Posted in Uncategorized

Hot topics: X-ray structure of the endothelin ETB receptor

Endothelin is a peptide that acts via two G-protein coupled receptors. ETA mainly causes vasoconstriction. In contrast ETB  predominantly acts as a beneficial clearing receptor and by the release of endothelium derived relaxing factors, vasodilatation [1,2]. This paper  describes for the first time the crystal structure of  the endothelin ETB receptor [3]. To date less than 20 structures of Family A, GPCRs (targets of nearly half of all drugs) have been solved experimentally. The number solved for small peptides ligands are limited to the opioid receptor and  the 13 amino acid neurotensin. This manuscript extends information to a much larger  21 amino acid peptide and interestingly demonstrates interaction over a substantial portion of the molecule. The authors propose a model whereby the N-terminal tail and the ECL2 β-sheet of ETB together form a lid-like architecture that covers the orthosteric pocket, predicted to form a very stable complex. This provides one structural explanation for the unusual property of ET-1  in causing long lasting responses. Mutations in ETB in receptors can result in Hirschsprung disease in humans, characterized by an absence of enteric ganglia in the distal colon and a failure of innervation in the gastrointestinal tract [2]. ETB receptor mutations are also associated with lethal white foal syndrome in horses as a result of limiting migration of melanocytes, pigment-producing cells found in hair follicles and skin.

[1] Guide to PHARMACOLOGY: ETB receptor

[2] Davenport et al. (2016). Endothelin. Pharmacol Rev. 68:357-418. PMID: 26956245

[3] Shihoya et al. (2016). Activation mechanism of endothelin ETB receptor by endothelin-1. Nature, 537, 363-368. PMID: 27595334

Comments by Anthony Davenport

Posted in Hot Topics

Hot topics: Synthesis and SAR for depsipeptide natural products as selective G protein inhibitors

A team including the Gloriam Group at the University of Copenhagen (also the home of GPCRDB) have paper out in Nature Chemistry reporting the first total synthesis of YM-254890 and FR900359 [1] . These are related cyclic depsipeptide natural products that specifically and potently inhibit the Gq subfamily of G proteins, a relatively rare but useful and pharmacological property [3]. By a combination of solution and solid-phase approaches the team generated sufficient YM-254890 and FR900359 material for confirmation of the structures , pharmacological characterisation and the synthesis of ten new analogues of YM-254890 for SAR analysis. The paper also includes docking studies based on the X-ray crystal structure of YM-254890 in PDB 3AH8 [3]

[1] Xiong et al. (2016). Total synthesis and structure–activity relationship studies of a series of selective G protein inhibitors. Nat Chem, advance online publication, doi:10.1038/nchem.2577

[2] Schrage R, et.al. (2015) The experimental power of FR900359 to study Gq-regulated biological processes. Nat Commun. 14;6:10156. doi: 10.1038/ncomms10156, PMID 26658454

[3] Nishimura A. et. al.(2010) Structural basis for the specific inhibition of heterotrimeric Gq protein by a small molecule. Proc Natl Acad Sci; 107(31): 13666–13671. doi: 10.1073/pnas.1003553107, PMID 20639466

The two key potent ligands from the paper are included in the new GtoPdb release 2016.4. Details of this particular curation exercise are given in this blog post.
http://guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=9335

lig9335

http://guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=9336

lig9336

Comments by Chris Southan

Posted in Hot Topics

Hot topics: X-ray structure of P2X3 receptor

Extracellular ATP is able to activate two families of cell-surface receptors, one of which is the ligand-gated ion channel family of P2X receptors. This family of cation channels is distinct from the remainder of the ligand-gated ion channels, as they are constructed of three (usually homomeric) subunits each with two transmembrane domains. Amongst the P2X receptors, the P2X3 is associated particularly with synaptic transmission in the sensory system and has, therefore, attracted a lot of attention as a potential target for novel analgesics and/or bladder dysfunction therapies.

In this report [1], multiple crystal structures of the P2X3 receptor are described, which allow a novel insight into the gating of a ligand-gated ion channel during the rest-agonist activated-refractory cycle, as well as with antagonist bound.

[1] Mansoor et al. (2016). X-ray structures define human P2X3 receptor gating cycle and antagonist action. Nature 538:66-71. doi: 10.1038/nature19367. [PMID 27626375].

Comments by Steve Alexander

 

Posted in Hot Topics

GtoPdb database release 2016.4

We are pleased to announce our fourth database release of 2016. Version 2016.4 was published on 13th October 2016. The database is available through the Guide to Pharmacology website, download pages and web-services.

Target updates:

Website updates

A new dendrogram visualisation of VGICs is included on the ion channel page (http://www.guidetopharmacology.org/GRAC/ReceptorFamiliesForward?type=IC). It shows a representation of the amino acid sequence relations of the minimal pore regions of the voltage-gated ion channel superfamily. the visualisation was taken from:

The VGL-Chanome: A Protein Superfamily Specialized for Electrical Signaling and Ionic Homeostasis. Frank H. Yu and William A. Catterall. Sci STKE. 2004 Oct 5;2004(253):re15. PMID: 15467096. DOI: 10.1126/stke.2532004re15

Synpharm

We have created a new sister database to the main Guide to PHARMACOLOGY – SynPharm, a database of drug-responsive protein sequences. The sequences in SynPharm are derived from interactions from the Guide to PHARMACOLOGY and using data from the Protein Data Bank. It is expected that the SynPharm database will grow as the principle Guide to PHARMACOLOGY database is updated – or indeed as further structural data is added to the PDB database pertaining to interactions already documented.

Please read the introductory SynPharm blog post (4th October 2016).

A summary of the current data can be found at synpharm.guidetopharmacology.org/about/data.

Database Statistics

In total the database now contains 14,701curated interactions across 2,794 human targets and 8,675 ligands. More specifically, the database contain 1,465 human targets that have quantitative interactions to a ligand.

human_targets_pie

Number of human targets in GtoPdb 2016.4. Measured by number of distinct UniProt entries includes for a given target class

ligand_bars

Breakdown of ligand classes in GtoPdb 2016.4

PubChem Links

We refresh our PubChem Substance (SID) submissions at every release and this takes a week or so to surface in their system.  For 2016.4 our  SIDs increased  from 8612 to 8675  (if you want to execute the same query use “IUPHAR/BPS Guide to PHARMACOLOGY”[SourceName]).  The same query at the Compound Identifier (CID) level increases from 6519 to 6565.  As previously  mentioned the 2,110 SIDs that do not merge into CIDs are antibodies, small proteins and large peptides.  Note we have a slight shortfall in the CID numbers you  will find listed in our ligand download lists.  This is because for novel compounds where we were the first submitters to PubChem we now have to catch up with adding the new CIDs into our records.

Posted in Database updates