Will AI revolutionise pharmaceutical R&D?
Katrina Costa, MA (Oxon), MSc
Science Writer, Open Pharma Research Ltd. January 2021
In the last few decades, the generation of big data in the life sciences, coupled with rapid advances in data processing technology, has led to pharma’s interest in the promise of AI. Scientists hope AI will speed up drug discovery and help create better medicines. The move towards precision medicine will also accelerate the progress and adoption of AI as the industry becomes more focused on individual patient outcomes.
But is AI truly transformative?
Creating smart machines that think in a human-like way, using rules to mimic (and potentially replace) human intelligence is central to AI. An important application is Machine Learning (ML), in which computerised technology uses algorithms to ‘learn’ models based on ‘training’ data it encounters. These models are then used to make predictions or decisions without being directly programmed to do so. ML algorithms become increasingly effective with exposure to more data – intelligent email spam filters are one such example. Within pharma, machine learning can be used to analyse information on the structure and function of molecules to predict which candidates will make promising drug targets. Cognitive analytics can take this further and allow machines to ‘think new thoughts’ and make sense of unstructured data.
For drug discovery, AI and ML offer many advantages to traditional computational methods that require explicit programming to achieve a desired result, including:
1. Minimal human intervention (frees up time and resources, and reduces human errors)
2. Faster, more accurate and more efficient data processing
3. Advanced predictive analysis
4. More data driven decisions
The traditional drug discovery process faces numerous problems, including:
• Escalating costs of drug discovery
• Long lead times
• Huge volumes of heterogeneous, often unstructured data that are difficult to manage and extract meaning from
• High failure rates of lead compounds
AI and ML can overcome these issues and help scientists extract more meaning from the data. The key benefit is speed: these techniques can quickly generate meaningful insights from a flood of data, and get it into an easy to analyse format.
The use of AI in pharma is currently in its infancy, and it is mostly used at an early stage (such as target discovery). This makes sense because the better the target, the easier the clinical trials are going to be downstream. But AI also holds a lot of promise further along the lifecycle, such as in patient recruitment for clinical trials, disease prediction, acting as a digital assistant to patients in clinical trials, analysing data from wearable technology, and handling the reporting of side effects.
AI in action
Improving drug target identification at Pfizer:
Pfizer are using AI in early stage R&D to improve literature mining to support target selection. Four areas are:
1. Natural Language Processing (NLP): an automated way of making sense of human speech. In everyday life, NLP is used in search and spelling correction. In pharma, it is mainly used for knowledge extraction, to elucidate entity relationships, ontology engineering and relationship prediction. The technology has grown from identifying word frequencies without context, through to learning complex grammar rules with AI.
Pfizer are using three NLP algorithms:
i. Latent Dirichlet Allocation: can determine key words and identify topics
ii. Word2vec: predicts the likely presence of a word from surrounding text (context) and can quantify the syntax (the way words are used)
iii. BERT: superior language modelling; can represent lots of different grammar. It is used for text classification, question/answering machines and chatbots. It figures out the relationship between words, for instance knowing what ‘it’ is in a sentence
2. Knowledge graphs: enable access to and integrating data from various sources in a human-readable way, and providing context to the data. For example you can view the relationships between genes, SNPs and diseases. This can then be used to generate hypotheses, for example this gene might be associated with binding to this drug.
3. The Winnow project: an algorithm that can be used to search by disease or by gene. It takes all 30 million PubMed abstracts and searches for co-occurrences, for example of genes with your disease of interest. It then hypothesises and narrows down the list.
4. Genie Trend Detection Project: the goal here is to predict which gene—disease relationship will successfully progress to a clinical trial within the next 5 years, using data from PubMed and ClinicalTrials.gov.
Better lead selection at Denovicon Therapeutics
A core goal for Denovicon Therapeutics is to shrink early R&D from 7 years to just 2 years. Currently they are 18 months into this process and are pacing with this timeline.
The company is an early adopter of the cloud-based BIOVIA Generative Therapeutics Design (GTD) system, which combines virtual and real-world screening processes. The system ‘learns’ from real experiments to screen and optimise candidate compounds. The predictions are supported by real-world experiments to synthesise and test the most promising candidates. The data is then cycled back into the virtual screening to improve predictions and refine the chemical space, and the iterative process continues until a suitable lead is found.
An important problem with machine generated molecules is that it is very difficult to model what constitutes ‘reasonable chemical matter’, whereas a chemist would spot this very quickly. To overcome this, scientists can tell GTD not to change certain parts of the molecule that they know to be significant. However, it is still important to grade compounds to help filter out ‘bad substructures’. Other problems are that the machine might suggest molecules that are too tricky to synthesise, which is why scientists need to create the molecules in the real world to provide active learning for the system.
Denovicon Therapeutics are focusing on more virtual screening (VS) than traditional high-throughput screening (HTS), which allows:
• Billions of compounds to be explored
• Large chemical space exploration
• Much reduced screening times
• Higher hit rates and higher quality hits
• Shorter hit-to-lead times
Simulating structural biology at DeepMind
For over 50 years scientists have been struggling to determine how proteins fold spontaneously into their unique 3D protein shapes. This is important because the function of a protein is largely determined by its 3D shape.
Professor Dame Janet Thornton, Director Emeritus and Senior Scientist at EMBL-EBI, says “Every living thing – from the smallest bacteria to plants, animals and humans – is defined and powered by the proteins that help it function at the molecular level. So far, this mystery remained unsolved, and determining a single protein structure often required years of experimental effort”.
But this year, London-based lab DeepMind unveiled their AI algorithm, AlphaFold, which made significant advances in modelling 3D protein structure.
Professor Thornton continues, “You need to know the structure of proteins to design drugs more easily, and this can also assist with vaccine design. It helps us to understand how a single amino acid change can impact disease. There is also potential for using computational structure prediction to build our own proteins in the future, such as green enzymes that break down plastics or more nutritious crops.”
Challenges to AI
A recent Lab of the Future survey revealed that lack of standards and disconnected data are the biggest obstacles to pharma adopting AI (see Digital Dialogues in ‘further reading’). There are other challenges we need to address to successfully integrate AI into the drug discovery process, including:
• Lack of computational expertise and understanding by the scientists
• Sufficient storage and processing for the massive amounts of data generated by AI algorithms
• AI is beyond human control, so we need checkpoints and validation systems to ensure things are progressing correctly
• ‘Concept validation’ is tricky – a human can instantly see if a Facebook tag is correct, but it can take months to confirm if a suggested molecule is effective in the real world
• AI can be difficult to regulate because we do not understand the inner workings of much of the AI technology being deployed
• Lack of interoperability in lab equipment can be a challenge for the data automation required for AI
• Poor quality data will affect the results. It is estimated that 80% of scientists’ time is spent preparing the data for modelling
AI holds much promise for the drug discovery process. As Denovicon Therapeutics have shown, AI has the potential to significantly cut the early R&D process, reducing costs and lead times.
In fact, AI could benefit the entire pharmaceutical lifecycle, from target identification. The benefits do not end there. AI is already creating significant improvements to clinical trial design and analysis, and even diagnosis – a recent example applies AI to process x-ray images of lungs to identify patients with coronavirus diseases.
Lab of the Future spoke with Professor John Overington, Chief Informatics Officer at Medicines Discovery Catapult, who says: “The most interesting potential lies at the interface of AI and automation of chemical synthesis. Lee Cronin’s lab in Glasgow have created a ‘Chemputer’ that can synthesise compounds according to a programming language. That’s potentially transformative.
However, the challenges we face in implementing AI, such as a lack of standards and disconnected data, mean that these changes are going to be gradual. Further, AI will not replace the drug discovery process altogether – or scientists for that matter – but it can improve the process and make it more efficient.
A big debate in this field is how these methods actually compare to the expertise of humans. The forthcoming changes to the lab will require a shift in STEM workforce skills to include more training for life scientists in data analytics and programming”.
The key is finding ways the algorithms can support the work of scientists, and having close collaborations between the lab scientists and the data scientists. AI is already improving early stage R&D, and even greater benefits will be seen as it is applied further along the drug discovery process in the future. AI may indeed be truly transformative for drug discovery.
1. Strategies for Accelerating Drug Discovery with AI. (5 November 2020). Digital Dialogues session, Lab of the Future. https://www.lab-of-the-future.com/digital-dialogues/
2. Artificial Intelligence In Drug Discovery: A Bubble Or A Revolutionary Transformation? (3 August 2017). Council Post, Forbes. https://www.forbes.com/sites/forbestechcouncil/2017/08/03/artificial-intelligence-in-drug-discovery-a-bubble-or-a-revolutionary-transformation/
3. Artificial intelligence in pharma: utilising a valuable resource. (10 March 2020). European Pharmaceutical Review. https://www.europeanpharmaceuticalreview.com/article/114914/artificial-intelligence-in-pharma-utilising-a-valuable-resource/