CONTACT US
Connected Anatomy: Building a Biomechanics Graph with a Named Entity Recognition LLM
The anatomy of the human body displays a vastly complex system of bone, fascia, and muscle. The field of biomechanics seeks to understand this complex system and how its elements interact with one another. In the first installment of this series, we look to model a small portion of this complex system as a graph in an attempt to understand its connections as a holistic network of interconnected parts. Using a named entity recognition LLM (Large Language Model), we extract its underlying components from semi-structured data and use the resulting output to create a graph representation of the human skeletal and musculatory system, starting with the lower limbs in this NER LLM example.
Why a Named Entity Recognition LLM? The Data
In order to create our graph representation of the human body, we look to The University of Michigan’s Medical Gross Anatomy Anatomy Tables as our data source. Here we find several data elements which we’ll use to construct our graph:
- Muscle: The muscle in question
- Origin: The point(s) where the muscle originates
- Insertion: the point(s) where the muscle inserts
- Action: The action(s) the muscle performs
- Innervation: The nerve(s) which innervates the muscle
- Artery: The Artery which supplies blood to the muscle
- Notes: General notes relating to the muscle
- Image: At the time of writing, this field appears to be defective
After scraping these tables, we are able to extract the following dataframe:
For the purposes of the first part of this series regarding the named entity recognition LLM combination, we will focus on three main fields from this data: Muscle, Origin, and Insertion. However, the manually written nature of this data means that we will need to use an LLM for named entity recognition (NER) in order to infer what bones a given muscle is originating from or inserting into.
Disclaimer: While many origin and insertion points attach to soft tissue such as fascia or tendons, for the purposes of this initial network, we will exclusively focus on the relationships between muscles and bones, with the intent to address soft tissue attachments in later installments.
Lets look at an example where a LLM for NER may be necessary:
Muscle | Origin | Insertion |
semimembranosus | upper, outer surface of the ischial tuberosity | medial condyle of the tibia |
In this example, the while the insertion for the semimembranosus (a muscle within the inner thigh) specifically names the tibia, its origin originates in the “upper, outer surface of the ischial tuberosity”. However, for our purposes, we are interested in the bone which is referenced, rather than the specific location of the bone, in this case the ischium (forming part of the hip bone).
To identify these indirect references, we employ a named entity recognition LLM.
Prompting an LLM for Named Entity Recognition
In order to perform this NER and after some experimentation, we developed the following prompt:
You are an NLP system that performs NER and classification on text. Specifically, you identify bones from a list of input phrases. You return results in a list of tuples where the first element is an input phrase, the second element is a list the identified bones. The accepted list of entities for bones are:
<list of human bones>
## If you cannot identify a bone contained in the phrase, attempt to infer what bones are related to the phrase given normal human anatomy.
## Only return bones from the accepted list of entities for bones
## This is an example
Input: ["base of the distal phalanx of the great toe", "medial portion of the superior pubic ramus", "bodies and transverse processes of lumbar vertebrae"]
Output: [("base of the distal phalanx of the great toe", ["phalanges (foot) (28)"]), ("medial portion of the superior pubic ramus", ["pubis"]), ("bodies and transverse processes of lumbar vertebrae", [lumbar vertebra 1 (L1)", "lumbar vertebra 2 (L2)", "lumbar vertebra 3 (L3)", "lumbar vertebra 4 (L4)", "lumbar vertebra 5 (L5)"])]
## Only return the list of tuples in your response.
## Do not explain your answer.
## Do not preface the result in any way. Only return the output.
Input: <origin or insertion text>
Output:
Here we employ several methods of prompt engineering of our named entity recognition LLM:
- First, we provide it the persona of a “NLP system that performs NER and classification on text”. Next we define the input/output and provide a list of acceptable human bones to identify.
- After this, we provide two additional forms of guidance via the “##” notation. The first instructs the model not to simply look for the name of a bone in the phrase, but rather infer it based on context (as seen in our example with the semimembranosus). The second acts to ensure the NER LLM does not deviate from our list of bones, which will be critical as we construct our graph.
- Next we provide some few-shot-learning examples of our expected input and output, being sure to provide examples which are representative of the the instances of our data which the NER LLM will struggle with.
- After we provide these examples, we provide some specific guidance on output. ChatGPT 3.5 is known for how verbose it can be in its responses, to avoid this we inject additional instruction to drive home the point that the only thing we are interested in is the raw output.
- Lastly, we provide our input data and prompt the model for an output.
If you’d like to dive deeper into the methodology behind this prompt engineering, I discuss these concepts at length in my blog What is Prompt Engineering?.
Wrangling the Output
After developing our prompt and running it for the origin points for the lower limb within our data, we receive the following an output. Lets look at a small sample:
[('medial and lateral sides of the tuberosity of the calcaneus', ['calcaneus']),
('medial side of the tuberosity of calcaneus', ['calcaneus']),
('inferior pubic ramus', ['pubis']),
('oblique head: bases of metatarsals 2-4; transverse head: heads of metatarsals \r\n 3-5', ['metatarsal bones (10)']),
('medial portion of the superior pubic ramus', ['pubis']),
('ischiopubic ramus and ischial tuberosity', ['ischium']),
('lower portion of the inferior pubic ramus', ['pubis']),
('anterior surface of the femur above the patellar surface', ['femur']),
('long head: ischial tuberosity; short head: lateral lip of the linea aspera', ['femur'])
Here we can see the exact output we requested, a list of tuples where the first element is the origin/insertion text and the second is a list of any bones from our list. If you’re wondering “Why this format?”, the answer is found in its destination: Pandas.
Plugging this output into a Pandas DataFrame we get the following:
And by repeating this for our insertion points and adding these columns to our main DataFrame we get something like this:
Now that we’ve reached our goal of having the the origin and insertion bones for each muscle in the lower limb, we can finally start constructing our graph.
Building the Graph
Since we did the bulk of our data transformations ahead of time to get started using our LLM for NER, our scripts to load this data into Neo4j are quite strait forward:
First, we establish our Neo4j connection and a helpful run_cypher()
function:
# Create a Neo4j driver
driver = GraphDatabase.driver(uri,
auth=(user, password),
database=db)
def run_cypher(cypher, results=False):
with driver.session() as session:
r = session.run(cypher).data()
if results:
return r
Loading the Bones
for bone in human_bones:
cypher = f"""
MERGE (b:Bone {{name: '{(bone)}'}})
"""
run_cypher(cypher)
Loading the Muscles
for i in range(0, lower_limb_df.shape[0]):
cypher = f"""
MERGE (m:Muscle {{name: '{lower_limb_df.iloc[i].Muscle}'}})
"""
run_cypher(cypher)
Loading the Origin and Insertion Connections
for i in range(0, lower_limb_df.shape[0]):
for bone in lower_limb_df.iloc[i].Origin_Bones:
cypher = f"""
MATCH (m:Muscle {{name:'{lower_limb_df.iloc[i].Muscle}'}})
MATCH (b:Bone {{name: '{bone}'}})
MERGE (m)-[r:HAS_ORIGIN]->(b)
RETURN COUNT(m) as muscle_count, COUNT(b) as bone_count
"""
run_cypher(cypher)
for bone in lower_limb_df.iloc[i].Insertion_Bones:
cypher = f"""
MATCH (m:Muscle {{name:'{lower_limb_df.iloc[i].Muscle}'}})
MATCH (b:Bone {{name: '{bone}'}})
MERGE (m)-[r:HAS_INSERTION]->(b)
RETURN COUNT(m) as muscle_count, COUNT(b) as bone_count
"""
run_cypher(cypher)
With this loaded, lets take a look at our newly constructed graph and examine our results.
Examining the Final Graph
Finally, lets load up Neo4j and take a look at the graph we constructed using this data. We encourage you to take a closer look for yourself at the NER LLM detail.
Having realized our hypothesis, we can clearly see the layout of human leg. Reading from left to right, we start with several of the small bones in the lower back and hips which almost all connect the the femur. We then see attachments from the mid-leg (patella, tibia, and fibula) to the many small muscles and bones of the foot.
One thing we can also note are a few locations where we were unable to find both the origin and insertion for a given muscle. This could be addressed in a number of ways we’ll explore in future blogs.
Named Entity Recognition LLM – Conclusion
Using an LLM for NER is clearly a powerful new way to solve for this significant requirement in so may datasets semi-structured and unstructured datasets. With this example working well, next we will look at ways of automating our NER LLM calls and expand the graph to cover the entire human body. Stay tuned for more updates as we explore the world of human anatomy through graphs.
Related Articles
- AI in Drug Discovery – Harnessing the Power of LLMs
- AI consulting guidebook
- What is ChatGPT? A Complete Explanation
- Domo custom apps using Domo DDX Bricks with the assistance of ChatPGT
- Resolution for various ChatGPT errors, including ChatGPT Internal Server Error
- Understanding Large Language Models (LLMs)
- What is Prompt Engineering? Unlock the value of LLMs
- What are Graph Algorithms?
- Conductance Graph Community Detection: Python Examples
- What is Neo4j Graph Data Science?
- What is ChatGPT? A Complete Explanation
- ChatGPT for Analytics: Getting Access & 6 Valuable Use Cases
- Understanding Large Language Models (LLMs)
- What is Prompt Engineering? Unlock the value of LLMs
- LLM Pipelines / Graph Data Science Pipelines / Data Science Pipeline Steps
- Using a Named Entity Recognition LLM in a Biomechanics Use Case
- What is Databricks? For data warehousing, data engineering, and data science
- Databricks Architecture – A Clear and Concise Explanation
- Databricks vs Snowflake compared
- What is Databricks Dolly 2.0 LLM?
Still learning? Check out a few of our introductory articles to learn more:
- What is a Graph Database?
- What is Neo4j (Graph Database)?
- What Is Domo (Analytics)?
- What is Hume (GraphAware)?
Additional discovery:
- Hume consulting / Hume (GraphAware) Platform
- Neo4j consulting / Graph database
- Domo consulting / Analytics - BI
We would also be happy to learn more about your current project and share how we might be able to help. Schedule a consultation with us today. We can also discuss pricing on these initial calls, including Neo4j pricing and Domo pricing. We look forward to speaking with you!