AI in Drug Discovery – Harnessing the Power of LLMs

By David Hughes / Graph Practice Director

August 15, 2023


Reading Time: 6 minutes

The promise of artificial intelligence (AI) in drug discovery is quickly becoming a practical reality, particularly with the advent of Large Language Models (LLMs). Leveraging AI in the drug discovery and development process will fundamentally transform how pharmaceutical companies will be able to compete in coming years when producing new drugs, by leveraging the power of LLMs. Find out more in this article.

More than AI in Drug Discovery – The Significance of LLMs for the Pharmaceutical Industry

The pharmaceutical industry has a continuously growing demand for new, differentiating tech solutions as consumer demand and competition accelerate innovation. LLMs represent a significant opportunity for companies- and far beyond just drug discovery- offering unprecedented acceleration for industry transformation and differentiation. This democratization of machine learning models has opened up new opportunities for pharmaceutical companies to develop innovative solutions, improve drug discovery, clinical operations, commercialization, and ultimately patient outcomes.

LLMs enable a transformative approach to data analysis and prediction in processing and making usable vast amounts of unstructured text, enabling analysts to extract valuable insights. Patient data analysis, for instance, is one key area that stands to gain enormously from LLMs. These models can sift through an immense amount of unstructured data in the form of patient records, clinical notes, and other unstructured data, identifying patterns humans can easily overlook. These insights can lead to more accurate diagnoses, personalized treatments, improved patient journey mapping and improved patient outcomes.

ai for pattern discovery for disease progression
Pattern identification of disease progression from clinical data.
AI in Drug Discovery and Development / LLM Drug Discovery

If you’re in tune with the latest advancements in AI, you may already be familiar with LLMs and their transformative power. You may not know how these sophisticated models hold tremendous potential for the the drug discovery and development process, where cutting-edge approaches like this can be a true differentiator.

As part of our continuing series on LLMs, we continue to dig in, this time through the lens of the pharmaceutical drug discovery field. We will unpack the reasons for urgency, showing how LLMs are poised to revolutionize pharmaceutical drug discovery and why adoption sooner than later is critical.

We’ll explore the tangible benefits of early adoption, the significant risks for those who lag behind, and how we can leverage LLMs to pioneer innovative solutions in the pharmaceutical drug discovery field.

What is the Drug Discovery/Launch Process?

A helpful view of the drug discovery and launch process is depicted in this image below, and it helps set the stage for the many ways graph databases / Knowledge Graphs, LLMs and AI in drug discovery and development can improve the process substantially:

drug discovery and launch pipeline
AI Drug Discovery with LLMs

Beyond making patient data more accessible and useful, LLMs can also contribute significantly to improving the impact of AI in drug discovery and development. Bringing new drugs to market is incredibly complex, costly and time-consuming. With their ability to analyze vast datasets, LLMs have the potential to help predict drug interactions, side effects, and efficacy, accelerating the drug discovery process and reducing costs.

One example of this is recent work where Graphable utilized LLMs to extract key insights critical to drug discovery from an ever-growing document warehouse at a pharmaceutical company. Another example was making worldwide research available to researchers in time to improve hypothesis formulation and positively impact costly experiments/assays.

LLMs with machine learning for drug discovery represents a quantum leap forward in capability, with the potential to drive many times the productivity rates for previously unaided human researchers. The best way we have come to describe this is it provides significant mechanical advantage to researchers and many others in the drug discovery and launch process.

Advantages of Embracing LLMs Sooner than Later

Early adoption of LLMs in the pharmaceutical industry and the associated use of AI in drug discovery and development can drive significant competitive advantages in the industry regarding cost, timeframes, innovation, launch timing and competitive intelligence.

As LLMs evolve, and through continued use, they continually improve the capacity for analysis and prediction through fine tuning and incorporating empirical data and domain expertise, providing ever-more valuable insights and solutions. Early adopters stand to gain by making informed decisions, accelerating research and development, and driving better patient outcomes, in a field where minor improvements can represent millions in margin.

Real-World Applications of LLMs in the Pharmaceutical Industry

Large LLMs have already started making an impact in the industry, with several pioneering organizations leveraging their capabilities to enhance operations and improve outcomes. These real-world applications of LLMs illuminate the potential of AI in drug discovery and development and give us a glimpse into a future shaped by these advanced technologies. One example is the Sherlock™ product from Graphable as shown below:

ai in drug discovery

This is an example in drug discovery where LLMs are being used is to analyze vast amounts of scientific literature, patents, and clinical trial data along with many other internal and external sources, in order to keep relevant information within easy reach of key stakeholders, in time to impact the program at hand.

LLMs are also valuable extensions to other analytic platforms and enhance efforts like clinical trial data quality. Another example is the AI-powered systems developed by Nvidia, which employs LLMs like MegaMolBART to accelerate the discovery process. By analyzing existing data, the system demonstrates various cheminformatics applications in drug discovery with the potential to identify therapeutic targets and drug candidates much faster than traditional methods, reducing the time and cost of drug development.

In personalized medicine, LLMs play a pivotal role in analyzing genetic data and patient medical histories. Companies like Graphable use LLMs to extract insights from unstructured clinical data, which can inform patient journey mapping, improve patient outcomes, and enable democratized access to data using natural language interfaces and knowledge graphs.

Looking toward the future, the potential research and clinical applications of LLMs are compelling. Imagine a world where AI models can predict patient responses to certain treatments in silico (avoiding costly in vitro testing), where they can simulate and validate drug development hypotheses, or where they can provide real-time, personalized health advice based on an individual’s genetic makeup and lifestyle. Thanks to the progress in AI technologies like Meta’s ESM-2 LLM, which can help design new proteins, these possibilities are no longer out of reach.

The combination of AI and healthcare is a perfect match, and LLMs are significant in driving the value forward. As we continue to unlock the power of LLMs, their applications in the pharmaceutical industry will only broaden and deepen, driving us toward a future where healthcare is more personalized, efficient, and effective than ever before.

Navigating the Challenges of LLM Implementation

Despite the immense potential of LLMs and AI in drug discovery and development, their implementation is not without challenges. Two prominent issues that can arise are data privacy concerns and the necessity of large, diverse datasets to function most effectively. LLMs, by their nature, require substantial amounts of data for training and operation, which often includes sensitive patient information. Therefore, the management of this data needs to be handled meticulously to ensure the privacy and trust of individuals are not compromised.

One effective strategy to address this is using anonymized datasets, which remove identifiable information, as well as federated learning techniques, where the model is trained across multiple decentralized devices or servers holding local data samples without exchanging the data itself. Furthermore, establishing robust data governance policies and adhering to data privacy regulations can help mitigate privacy risks.

Additional mitigation strategies include using on-premise LLMs like Azure’s OpenAI Service, where LLMs and datasets are maintained locally and remain secure and within the guidelines of data usage agreements. Semantic obfuscation of data is another approach and further exploration will validate its potential.

Regarding the challenge of the need for large and diverse datasets, there are newer techniques in model training that are helping, but the performance of LLMs in the pharmaceutical industry heavily depends on the quality and quantity of the data they are on which they are trained.

Collaboration may provide one option or solution to this issue. For example, pharmaceutical companies can join forces with research institutions, healthcare providers, and technology companies to share and pool anonymized data. This collaborative approach could provide the volume and variety of data necessary for effective LLM training. Data sharing frameworks will need to address challenges such as differences in study designs, endpoint definitions, and other aspects of analysis to realize the full potential of sharing data.

The path to realizing the full potential of LLMs in the pharmaceutical industry is complex, demanding a careful balance between leveraging AI capabilities and maintaining privacy and ethical standards. However, with a well-thought-out strategy and the right safeguards in place, we can navigate these challenges successfully, unlocking the transformative potential of large language models in this space.


From accelerating drug discovery to personalizing medicine, LLMs are already reshaping how we approach healthcare and AI in drug discovery and development in particular. The competitive edge gained by early adopters is clear, and the risks of lagging in this AI revolution are substantial. As such, the urgency to harness the power of LLMs in the pharmaceutical industry is not just about keeping pace with technological advancements but about revolutionizing patient outcomes and transforming the future of healthcare.

But the journey doesn’t end here. As the potential applications of LLMs are better understood, the future of AI in pharmaceuticals and machine learning for drug discovery in particular, continues to expand, and this is still largely uncharted territory.

Stay tuned as we continue to delve into this topic in our upcoming posts, exploring the role of AI in pharmaceuticals, healthcare, and other industries, as well as the challenges we will navigate, and the opportunities that lie ahead.

Graphable delivers insightful graph database (e.g. Neo4j consulting) / machine learning (ml) / natural language processing (nlp) projects as well as graph and Domo consulting for BI/analytics, with measurable impact. We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.

Still learning? Check out a few of our introductory articles to learn more:

Want to find out more about our Hume consulting on the Hume (GraphAware) Platform? As the Americas principal reseller, we are happy to connect and tell you more. Book a demo today.

We would also be happy to learn more about your current project and share how we might be able to help. Schedule a consultation with us today. We can discuss Neo4j pricing or Domo pricing, or any other topic. We look forward to speaking with you!

We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.
Contact us for more information: