CONTACT US
Clinical Trial Data Analytics – Getting Started
Clinical trial data analytics can best provide meaningful insights with strong clinical trial data quality when insights are surfaced as rapidly as possible. In this and related articles, we will demonstrate how to shorten the time-to-value for leveraging critical data sources by quickly developing an application using Streamlit to be able to visualize the data and perform analytics leveraging graph algorithms.
What are Clinical Trials?
Clinical trials are studies in humans that examine the safety and effectiveness of new drugs, devices, procedures, treatments, or tests. They are monitored by the U.S. Food and Drug Administration, also known as the FDA. Each trial progresses through four phases with the first phase focusing on safety in a small group, the second phase examining effectiveness in a larger group, the third phase exploring both safety and effectiveness in different populations and at a larger scale, and the final phase for drugs and device monitoring of long term effects.
Clinical trials are conducted across various diseases and conditions. In this article we will explore the neurodegenerative disease amyotrophic lateral sclerosis (ALS) since it was the focus of our project, however, the methods in this blog series would work on a collection of all clinical trials as well. ALS is a progressive condition that affects the motor neurons and results in the loss of the abilities to speak, move, eat, and eventually breathe. Clinical trials are advancing clinical knowledge and treatments for this devastating disease and the difficult journey for patients and their families.
What are Clinical Trial Data Analytics?
Clinical trial data analytics refers to the process of collecting, cleaning, and analyzing data from clinical trials in order to gain insights and make decisions about the safety and efficacy of a medical treatment or device. This can include identifying patterns and trends in the data, comparing the results of different trials, and using statistical techniques to evaluate the significance of the findings.
The ultimate goal of clinical trial data analytics is to improve the efficiency and effectiveness of the clinical trial process and to ultimately improve patient care. As one concrete example, this can help medical professionals understand how treatments are responding to patient populations, better design future trials, improve drug development processes, and much more.
Finding Treasures in Data Sources
Occasionally, during the development of a large knowledge graph project, a data source is identified as a unique opportunity for a standalone solution. In a recent project, we developed a clinical trial data analytics application using the NIH’s Clinical Trials database. It was one of many data sources for the development of an expansive clinical trials knowledge graph (What is a Knowledge Graph?) for biotech research.
Project stakeholders requested that this data source be developed into an application for clinical trial data analytics to derive meaningful insights to various internal audiences. In this article and others we will explore the ClinicalTrials.gov data source and its API. In a related article on graph etl / Neo4j etl, we look at ingesting the data into a graph database using GraphAware Hume (as one way to do that) to then create the clinical trial data analytics themselves.
We also documented in yet another related article how to rapidly develop a standalone clinical data trial analytics application using Streamlit as a proof of concept in this Streamlit tutorial. In the final article in the series, we also consider the next steps and opportunities for improvement given lessons learned along the way, and ultimately how to maximize the effectiveness and impact of patient journey mapping.
How to Get Global Clinical Trials Data
The central collection of clinical trials is located at the website ClinicalTrials.gov. The site can be manually searched using the UI which returns exportable text.
This site provides a well-documented API for programmatically exploring, searching, and retrieving trial data. A UI exists for learning the API and testing queries.
Since we are interested in the disease ALS we will develop a query focused on amyotrophic lateral sclerosis trials that are:
- Conducted in the United States
- Have actively recruiting research sites (we will use these in our clinical trial data analytics)
- Is an interventional study rather than observational
- The primary intervention is of type ‘Drug’
The parameterized query for these inputs is:
amyotrophic lateral sclerosis AND SEARCH[Location](AREA[LocationCountry]United States AND AREA[LocationStatus]Recruiting) AND AREA[StudyType]Interventional AND AREA[InterventionType]Drug
This query conforms to a well-established API logic that the site implements. For example, the query that we are using above adheres to the logic in this graphic from ClinicalTrials.gov API:
and looks like this in the clinical trials site’s Full Studies API UI
The query returns 39 studies for ALS. In our production system, we will retrieve all active clinical trials across all diseases. The return data structure is JSON as requested in our query. Within the response data elements that our clinical trial data analytics and application will use are the trial’s title, brief summary, locations, eligibility criteria, and other features.
{
"EligibilityModule": {
"EligibilityCriteria": "Inclusion Criteria:\n\nPatients with clinically definite, probable, laboratory supported probable, or possible ALS per revised El Escorial criteria\nCramp frequency greater than 4 cramps per week during 2 week run in\nALS functional rating scale-revised (ALSFRS-R) score of greater than 24\nAble to lie on back for study procedures\n\nExclusion Criteria:\n\nTracheostomy invasive ventilation, or use of non-invasive ventilation greater than 12 hours per day\nPregnant or lactating\nParticipation in a prior experimental drug trial less than 30 days prior to screening\nPatients taking ranolazine\nPatients taking medications which are contraindicated for use with ranolazine such as strong CYP3 inhibitors (ketoconazole, clarithromycin, nelfinavir), and CYP3 inducers (rifampin, phenobarbital)\nPatients with clinically significant medical comorbidities (hepatic, renal, cardiac, etc)\nPatients with baseline QT interval prolongation on Electrocardiography (ECG)\nPatients pre-disposed to secondary QT prolongation for other health conditions like family history of congenital long QT syndrome, heart failure, bradycardia, or cardiomyopathies"
}
}
While much of the data from the API is structured, there are also key fields such as the eligibility criteria that are more semi-structured. In our data transformation pipeline described in our graph etl / Neo4j etl article, we show how to perform basic graph NLP using regular expressions to break this field into inclusion and exclusion criteria. For both inclusion and exclusion criteria, we show how to further process those into individual inclusions and exclusions to be used in our clinical trial data analytics and application features. We also leverage location data in our clinical trial data analytics pipeline and Streamlit application.
Conclusion
Fundamental to impactful clinical trial data analytics is delivering high-quality results, quickly. By utilizing reliable sources mentioned above, standing up applications quickly with capabilities like Streamlit, and driving out critical connected insights by leveraging graph database, graph data science and related technologies truly impactful clinical trial data analytics is now possible at scale.
Read here for more specifics on the related clinical trials Streamlit application (Streamlit tutorial), and here for the article on graph etl / Neo4j etl with a focus on clinical trial data, and here for clinical trial data quality, as well as this article on patient journey mapping.
More Articles
- AI in Drug Discovery – Harnessing the Power of LLMs
- Large Language Models (LLMs) for Enterprises: A Comprehensive Guide to Navigating the New AI Frontier
- What is Prompt Engineering? 5 Ways to Unlock the Potential of Large Language Models
- Natural Language Interface for Knowledge Graphs
- What is ChatGPT? A Complete Explanation
- Resolution for 3 Errors: ‘ChatGPT Internal Server Error’, ‘ChatGPT is at Capacity Right Now’, ‘ChatGPT Network Error on Long Responses’
- Domo custom apps using Domo DDX Bricks with the assistance of ChatPGT
- Understanding Large Language Models (LLMs)
Still learning? Check out a few of our introductory articles to learn more:
- What is a Graph Database?
- What is Neo4j (Graph Database)?
- What Is Domo (Analytics)?
- What is Hume (GraphAware)?
Additional discovery:
- Hume consulting / Hume (GraphAware) Platform
- Neo4j consulting / Graph database
- Domo consulting / Analytics - BI
We would also be happy to learn more about your current project and share how we might be able to help. Schedule a consultation with us today. We can also discuss pricing on these initial calls, including Neo4j pricing and Domo pricing. We look forward to speaking with you!