Connected Anatomy: Building a Biomechanics Graph with a Named Entity Recognition LLM

By Sean Robinson, MS / Director of Data Science

September 29, 2023

Blog

Reading Time: 6 minutes

The anatomy of the human body displays a vastly complex system of bone, fascia, and muscle. The field of biomechanics seeks to understand this complex system and how its elements interact with one another. In the first installment of this series, we look to model a small portion of this complex system as a graph in an attempt to understand its connections as a holistic network of interconnected parts. Using a named entity recognition LLM (Large Language Model), we extract its underlying components from semi-structured data and use the resulting output to create a graph representation of the human skeletal and musculatory system, starting with the lower limbs in this NER LLM example.

Why a Named Entity Recognition LLM? The Data

In order to create our graph representation of the human body, we look to The University of Michigan’s Medical Gross Anatomy Anatomy Tables as our data source. Here we find several data elements which we’ll use to construct our graph:

  • Muscle: The muscle in question
  • Origin: The point(s) where the muscle originates
  • Insertion: the point(s) where the muscle inserts
  • Action: The action(s) the muscle performs
  • Innervation: The nerve(s) which innervates the muscle
  • Artery: The Artery which supplies blood to the muscle
  • Notes: General notes relating to the muscle
  • Image: At the time of writing, this field appears to be defective

After scraping these tables, we are able to extract the following dataframe:

Medical Gross Anatomy Anatomy Tables Dataframe

For the purposes of the first part of this series regarding the named entity recognition LLM combination, we will focus on three main fields from this data: Muscle, Origin, and Insertion. However, the manually written nature of this data means that we will need to use an LLM for named entity recognition (NER) in order to infer what bones a given muscle is originating from or inserting into.

Disclaimer: While many origin and insertion points attach to soft tissue such as fascia or tendons, for the purposes of this initial network, we will exclusively focus on the relationships between muscles and bones, with the intent to address soft tissue attachments in later installments.

Lets look at an example where a LLM for NER may be necessary:

MuscleOriginInsertion
semimembranosusupper, outer surface of the ischial tuberositymedial condyle of the tibia

In this example, the while the insertion for the semimembranosus (a muscle within the inner thigh) specifically names the tibia, its origin originates in the “upper, outer surface of the ischial tuberosity”. However, for our purposes, we are interested in the bone which is referenced, rather than the specific location of the bone, in this case the ischium (forming part of the hip bone).

To identify these indirect references, we employ a named entity recognition LLM.

Prompting an LLM for Named Entity Recognition

In order to perform this NER and after some experimentation, we developed the following prompt:

You are an NLP system that performs NER and classification on text. Specifically, you identify bones from a list of input phrases.  You return results in a list of tuples where the first element is an input phrase, the second element is a list the identified bones. The accepted list of entities for bones are:
<list of human bones>

## If you cannot identify a bone contained in the phrase, attempt to infer what bones are related to the phrase given normal human anatomy.
## Only return bones from the accepted list of entities for bones

## This is an example
Input: ["base of the distal phalanx of the great toe", "medial portion of the superior pubic ramus", "bodies and transverse processes of lumbar vertebrae"]
Output: [("base of the distal phalanx of the great toe", ["phalanges (foot) (28)"]), ("medial portion of the superior pubic ramus", ["pubis"]), ("bodies and transverse processes of lumbar vertebrae", [lumbar vertebra 1 (L1)", "lumbar vertebra 2 (L2)", "lumbar vertebra 3 (L3)", "lumbar vertebra 4 (L4)", "lumbar vertebra 5 (L5)"])]

## Only return the list of tuples in your response. 
## Do not explain your answer. 
## Do not preface the result in any way. Only return the output.

Input: <origin or insertion text>
Output: 

Here we employ several methods of prompt engineering of our named entity recognition LLM:

  • First, we provide it the persona of a “NLP system that performs NER and classification on text”. Next we define the input/output and provide a list of acceptable human bones to identify.
  • After this, we provide two additional forms of guidance via the “##” notation. The first instructs the model not to simply look for the name of a bone in the phrase, but rather infer it based on context (as seen in our example with the semimembranosus). The second acts to ensure the NER LLM does not deviate from our list of bones, which will be critical as we construct our graph.
  • Next we provide some few-shot-learning examples of our expected input and output, being sure to provide examples which are representative of the the instances of our data which the NER LLM will struggle with.
  • After we provide these examples, we provide some specific guidance on output. ChatGPT 3.5 is known for how verbose it can be in its responses, to avoid this we inject additional instruction to drive home the point that the only thing we are interested in is the raw output.
  • Lastly, we provide our input data and prompt the model for an output.

If you’d like to dive deeper into the methodology behind this prompt engineering, I discuss these concepts at length in my blog What is Prompt Engineering?.

Wrangling the Output

After developing our prompt and running it for the origin points for the lower limb within our data, we receive the following an output. Lets look at a small sample:

[('medial and lateral sides of the tuberosity of the calcaneus', ['calcaneus']),
('medial side of the tuberosity of calcaneus', ['calcaneus']),
('inferior pubic ramus', ['pubis']),
('oblique head: bases of metatarsals 2-4; transverse head: heads of metatarsals \r\n 3-5', ['metatarsal bones (10)']),
('medial portion of the superior pubic ramus', ['pubis']),
('ischiopubic ramus and ischial tuberosity', ['ischium']),
('lower portion of the inferior pubic ramus', ['pubis']),
('anterior surface of the femur above the patellar surface', ['femur']),
('long head: ischial tuberosity; short head: lateral lip of the linea aspera', ['femur'])

Here we can see the exact output we requested, a list of tuples where the first element is the origin/insertion text and the second is a list of any bones from our list. If you’re wondering “Why this format?”, the answer is found in its destination: Pandas.

Plugging this output into a Pandas DataFrame we get the following:

Pandas DataFrame Target with Cypher

And by repeating this for our insertion points and adding these columns to our main DataFrame we get something like this:

Pandas DataFrame - additional columns being added

Now that we’ve reached our goal of having the the origin and insertion bones for each muscle in the lower limb, we can finally start constructing our graph.

Building the Graph

Since we did the bulk of our data transformations ahead of time to get started using our LLM for NER, our scripts to load this data into Neo4j are quite strait forward:

First, we establish our Neo4j connection and a helpful run_cypher() function:

# Create a Neo4j driver
driver = GraphDatabase.driver(uri, 
                              auth=(user, password), 
                              database=db)

def run_cypher(cypher, results=False):
    with driver.session() as session:
        r = session.run(cypher).data()
    if results:
        return r
Loading the Bones
for bone in human_bones:
    cypher = f"""
    MERGE (b:Bone {{name: '{(bone)}'}})
    """
    run_cypher(cypher)
Loading the Muscles
for i in range(0, lower_limb_df.shape[0]):
    cypher = f"""
    MERGE (m:Muscle {{name: '{lower_limb_df.iloc[i].Muscle}'}})
    """
    run_cypher(cypher)
Loading the Origin and Insertion Connections
for i in range(0, lower_limb_df.shape[0]):

    for bone in lower_limb_df.iloc[i].Origin_Bones:
        cypher = f"""
                MATCH (m:Muscle {{name:'{lower_limb_df.iloc[i].Muscle}'}})
                MATCH (b:Bone {{name: '{bone}'}})
                MERGE (m)-[r:HAS_ORIGIN]->(b)
                RETURN COUNT(m) as muscle_count, COUNT(b) as bone_count
                """
        run_cypher(cypher)

    for bone in lower_limb_df.iloc[i].Insertion_Bones:
        cypher = f"""
                MATCH (m:Muscle {{name:'{lower_limb_df.iloc[i].Muscle}'}})
                MATCH (b:Bone {{name: '{bone}'}})
                MERGE (m)-[r:HAS_INSERTION]->(b)
                RETURN COUNT(m) as muscle_count, COUNT(b) as bone_count
                """
        run_cypher(cypher)

With this loaded, lets take a look at our newly constructed graph and examine our results.

Examining the Final Graph

Finally, lets load up Neo4j and take a look at the graph we constructed using this data. We encourage you to take a closer look for yourself at the NER LLM detail.

Named Entity Recognition LLM

Having realized our hypothesis, we can clearly see the layout of human leg. Reading from left to right, we start with several of the small bones in the lower back and hips which almost all connect the the femur. We then see attachments from the mid-leg (patella, tibia, and fibula) to the many small muscles and bones of the foot.

One thing we can also note are a few locations where we were unable to find both the origin and insertion for a given muscle. This could be addressed in a number of ways we’ll explore in future blogs.

Named Entity Recognition LLM – Conclusion

Using an LLM for NER is clearly a powerful new way to solve for this significant requirement in so may datasets semi-structured and unstructured datasets. With this example working well, next we will look at ways of automating our NER LLM calls and expand the graph to cover the entire human body. Stay tuned for more updates as we explore the world of human anatomy through graphs.

Related Articles

Graphable helps you make sense of your data by delivering expert data analytics consulting, data engineering, custom dev and applied data science services.

We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we have deep expertise in Financial Services, Life Sciences, Security/Intelligence, Transportation/Logistics, HighTech, and many others.

Thriving in the most challenging data integration and data science contexts, Graphable drives your analytics, data engineering, custom dev and applied data science success. Contact us to learn more about how we can help, or book a demo today.

Still learning? Check out a few of our introductory articles to learn more:

Additional discovery:

    We would also be happy to learn more about your current project and share how we might be able to help. Schedule a consultation with us today. We can also discuss pricing on these initial calls, including Neo4j pricing and Domo pricing. We look forward to speaking with you!


    Graphable helps you make sense of your data by delivering expert analytics, data engineering, custom dev and applied data science services.
     
    We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we have deep expertise in Financial Services, Life Sciences, Security/Intelligence, Transportation/Logistics, HighTech, and many others.
     
    Thriving in the most challenging data integration and data science contexts, Graphable drives your analytics, data engineering, custom dev and applied data science success. Contact us to learn more about how we can help, or book a demo today.

    We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.
    Contact us for more information: