Using LLMs for Text-to-SQL generation on the Guide to Pharmacology

Posted on May 12, 2025 by guidetopharmacology — Leave a comment

In the last academic year (September 24 – April 25) we supported two undergraduate, final-year projects (Honours Projects) that looked into uses of ‘artificial intelligence’ tools to improve access to or capabilities of the Guide to Pharmacology. Both projects focussed on the uses of Large Language Models (LLMs) (such as OpenAI’s ChatGPT) to translate natural language queries to Structured Query Language (SQL) statements. This text-to-SQL conversion has the potential to enhance the ways users can ask questions of the data in the Guide to Pharmacology, removing the need to understand the database schema or SQL in order to retrieve data.

The two students, Ian Little and Nikita Rameshkumar, worked on individual projects with the same aims. Their work included analysing various LLMs to determine the ones most suitable to the task of text-to-SQL conversion in GtoPdb – these considered aspects such as cost, response time, performance and usability, as well as taking into account the unique features of the GtoPdb schema and data.

Each project developed its own model, which was trained using a set of 50 natural language queries (with associated gold standard SQL), and then tested using an previously unseen set of 30 natural language queries.

Their work investigated how the prompt for the LLM should be optimally constructed, including looking at what degree of information about the database schema to provide, provision of example queries, query validation and error handling.

Ian and Nikita developed performance metrics in order to evaluate their models, which include a partial execution accuracy (PEX) score which indicated if the model was able to generate an SQL query for the natural language query that provides the same set of results as the gold-standard, without the SQL syntax having to be identical and allowing for additional columns in the result set.

These projects were very interesting to be involved with and provided great insight into how to develop a more robust and comprehensive LLM that can reliably and accurately convert natural language queries into SQL. We are hoping to run further projects like these again in the next academic year that build on this work, perhaps by building in a user-interface to allow interaction with and refinement of the generated queries.

If you are interested in learning about the projects in more detail, you can access the project dissertations here:

Leveraging Large Language Models for Text-to-SQL on the IUPHAR/BPS Guide to Pharmacology Database

Ian Little, 2025, 4th Year Project Report

Computer Science and Mathematics, School of Informatics, University of Edinburgh

Tuning Large Language Models for Text-to-SQL on the IUPHAR/BPS Guide to Pharmacology

Nikita Rameshkumar, 2025, MInf Project (Part 1) Report

Master of Informatics, School of Informatics, University of Edinburgh