Clinical Trial Data Analytics – Getting Started

By David Hughes / Graph Practice Director

January 26, 2023


Reading Time: 5 minutes

Clinical trial data analytics can best provide meaningful insights with strong clinical trial data quality when insights are surfaced as rapidly as possible. In this and related articles, we will demonstrate how to shorten the time-to-value for leveraging critical data sources by quickly developing an application using Streamlit to be able to visualize the data and perform analytics leveraging graph algorithms.

What are Clinical Trials?

Clinical trials are studies in humans that examine the safety and effectiveness of new drugs, devices, procedures, treatments, or tests. They are monitored by the U.S. Food and Drug Administration, also known as the FDA. Each trial progresses through four phases with the first phase focusing on safety in a small group, the second phase examining effectiveness in a larger group, the third phase exploring both safety and effectiveness in different populations and at a larger scale, and the final phase for drugs and device monitoring of long term effects.

Clinical trials are conducted across various diseases and conditions. In this article we will explore the neurodegenerative disease amyotrophic lateral sclerosis (ALS) since it was the focus of our project, however, the methods in this blog series would work on a collection of all clinical trials as well. ALS is a progressive condition that affects the motor neurons and results in the loss of the abilities to speak, move, eat, and eventually breathe. Clinical trials are advancing clinical knowledge and treatments for this devastating disease and the difficult journey for patients and their families.

What are Clinical Trial Data Analytics?

Clinical trial data analytics refers to the process of collecting, cleaning, and analyzing data from clinical trials in order to gain insights and make decisions about the safety and efficacy of a medical treatment or device. This can include identifying patterns and trends in the data, comparing the results of different trials, and using statistical techniques to evaluate the significance of the findings.

The ultimate goal of clinical trial data analytics is to improve the efficiency and effectiveness of the clinical trial process and to ultimately improve patient care. As one concrete example, this can help medical professionals understand how treatments are responding to patient populations, better design future trials, improve drug development processes, and much more.

Finding Treasures in Data Sources

Occasionally, during the development of a large knowledge graph project, a data source is identified as a unique opportunity for a standalone solution. In a recent project, we developed a clinical trial data analytics application using the NIH’s Clinical Trials database. It was one of many data sources for the development of an expansive clinical trials knowledge graph (What is a Knowledge Graph?) for biotech research.

Project stakeholders requested that this data source be developed into an application for clinical trial data analytics to derive meaningful insights to various internal audiences. In this article and others we will explore the data source and its API. In a related article on graph etl / Neo4j etl, we look at ingesting the data into a graph database using GraphAware Hume (as one way to do that) to then create the clinical trial data analytics themselves.

We also documented in yet another related article how to rapidly develop a standalone clinical data trial analytics application using Streamlit as a proof of concept in this Streamlit tutorial. In the final article in the series, we also consider the next steps and opportunities for improvement given lessons learned along the way, and ultimately how to maximize the effectiveness and impact of patient journey mapping.

How to Get Global Clinical Trials Data

The central collection of clinical trials is located at the website The site can be manually searched using the UI which returns exportable text.

clinical trial data analytics
Example result from UI search

This site provides a well-documented API for programmatically exploring, searching, and retrieving trial data. A UI exists for learning the API and testing queries.

Since we are interested in the disease ALS we will develop a query focused on amyotrophic lateral sclerosis trials that are:

  • Conducted in the United States
  • Have actively recruiting research sites (we will use these in our clinical trial data analytics)
  • Is an interventional study rather than observational
  • The primary intervention is of type ‘Drug’

The parameterized query for these inputs is:

amyotrophic lateral sclerosis AND SEARCH[Location](AREA[LocationCountry]United States AND AREA[LocationStatus]Recruiting) AND AREA[StudyType]Interventional AND AREA[InterventionType]Drug

This query conforms to a well-established API logic that the site implements. For example, the query that we are using above adheres to the logic in this graphic from API:

clinical trial data analytics
Example result from

and looks like this in the clinical trials site’s Full Studies API UI

clinical trial data analytics
Example result from UI search

The query returns 39 studies for ALS. In our production system, we will retrieve all active clinical trials across all diseases. The return data structure is JSON as requested in our query. Within the response data elements that our clinical trial data analytics and application will use are the trial’s title, brief summary, locations, eligibility criteria, and other features.

    "EligibilityModule": {
        "EligibilityCriteria": "Inclusion Criteria:\n\nPatients with clinically definite, probable, laboratory supported probable, or possible ALS per revised El Escorial criteria\nCramp frequency greater than 4 cramps per week during 2 week run in\nALS functional rating scale-revised (ALSFRS-R) score of greater than 24\nAble to lie on back for study procedures\n\nExclusion Criteria:\n\nTracheostomy invasive ventilation, or use of non-invasive ventilation greater than 12 hours per day\nPregnant or lactating\nParticipation in a prior experimental drug trial less than 30 days prior to screening\nPatients taking ranolazine\nPatients taking medications which are contraindicated for use with ranolazine such as strong CYP3 inhibitors (ketoconazole, clarithromycin, nelfinavir), and CYP3 inducers (rifampin, phenobarbital)\nPatients with clinically significant medical comorbidities (hepatic, renal, cardiac, etc)\nPatients with baseline QT interval prolongation on Electrocardiography (ECG)\nPatients pre-disposed to secondary QT prolongation for other health conditions like family history of congenital long QT syndrome, heart failure, bradycardia, or cardiomyopathies"

While much of the data from the API is structured, there are also key fields such as the eligibility criteria that are more semi-structured. In our data transformation pipeline described in our graph etl / Neo4j etl article, we show how to perform basic graph NLP using regular expressions to break this field into inclusion and exclusion criteria. For both inclusion and exclusion criteria, we show how to further process those into individual inclusions and exclusions to be used in our clinical trial data analytics and application features. We also leverage location data in our clinical trial data analytics pipeline and Streamlit application.


Fundamental to impactful clinical trial data analytics is delivering high-quality results, quickly. By utilizing reliable sources mentioned above, standing up applications quickly with capabilities like Streamlit, and driving out critical connected insights by leveraging graph database, graph data science and related technologies truly impactful clinical trial data analytics is now possible at scale.

Read here for more specifics on the related clinical trials Streamlit application (Streamlit tutorial), and here for the article on graph etl / Neo4j etl with a focus on clinical trial data, and here for clinical trial data quality, as well as this article on patient journey mapping.

More Articles

Graphable delivers insightful graph database (e.g. Neo4j consulting) / machine learning (ml) / natural language processing (nlp) projects as well as graph and Domo consulting for BI/analytics, with measurable impact. We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.

Still learning? Check out a few of our introductory articles to learn more:

Want to find out more about our Hume consulting on the Hume (GraphAware) Platform? As the Americas principal reseller, we are happy to connect and tell you more. Book a demo today.

We would also be happy to learn more about your current project and share how we might be able to help. Schedule a consultation with us today. We can discuss Neo4j pricing or Domo pricing, or any other topic. We look forward to speaking with you!

We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.
Contact us for more information: