As a developer or analyst you may have heard the phrase “text analytics” but not know exactly what it is or how to implement them. In this article we’ll share what text analytics is and some use cases, and share how graph databases can help.
What is text analytics?
So what is text analytics anyway? Text analytics is the discipline that enables organizations to extract valuable meaning from its semi-structured and unstructured data. Typically this is accomplished using machine learning/NLP, statistics and linguistic techniques in order to consume what is often very high volume and freeform data, usually with no pre-existing schema. Text analytics matters deeply today because whether it be social posts or internal reports to customer feedback or even external legislation, many businesses and organizations are finding the detection of trends and patterns within their semi-structured and unstructured textual data to be an essential component of their success. In the digital age, converting this freeform text to be consolidated and standardized so that the occurrence and co-occurrence of entities like people, things, and even feelings, can be derived. The outcome of these text analytics processes have many benefits to the organization. Here are handful of example use cases:
- Tracking Research & Development Projects: In organizations where research is needed in the testing of new products or formulations, tracking the reports and their outcomes is often difficult. Especially if the outcomes of each test are subjective, text analytics can be used to quickly search and find results of experiments and their relatedness.
- Intellectual Property Analysis and “Whitespacing”: Organizations with revenue that depends heavily on intellectual property can leverage text analytics to find areas of convergence in the market. At the same time, gaps or “whitespace” in each domain are good signals for potential new ventures. This can be particularly helpful for how to do patent research and detection.
- Consumer / Customer Sentiment: Tracking what customers say about products and services is essential to success. Text analytics breaks down customer reviews to highlight key areas where customers have positive and negative comments.
- Financial Risk Assessment: When organization receives bad press because they are under investigation, have been indicted or even for less serious issues, text analytics can provide a way to trawl news media for “dishonorable mentions” to take action in time to make a difference.
Text Analytics, NLP and the Problem of Changing Languages
Conventional wisdom in the data science world would point us to Natural Language Processing (NLP) using a process of extracting “named entities” such as people, places, or products for example. But, classic NLP itself has limitations and constraints. In particular, situations where the language used is highly contextual and different from the common vernacular, standard NLP models can return unexpected or even incorrect results. Known as the “No Free Lunch” theorem in mathematics, every model must trade off specificity and generality. Simply put, models that predict specific outcomes accurately will not generalize to broad situations. And vice versa, models that work very well across a broad range of scenarios will not be accurate when required to work within very specific conditions. As a simple example, the word “dolphin” in a general sense is most closely related to the word “fish” since dolphins eat fish and the mahi-mahi is more commonly known as “dolphin fish.” But understanding more critical nuance in meaning can be challenging, for example that dolphins are a mammal and are related to orcas could be a more difficult relationship to derive.
Another issue that is central to text analytics is the dynamic way humans actually use words. New terms and phrases are introduced all the time, whether it it be in slang or science. Especially where abbreviations, acronyms, or contractions are concerned, new phrasing is constantly being introduced. As the simplest example, with legal codes, something that is legal today might become illegal tomorrow, forcing a change in the NLP model and process.
What this means is that a static NLP model is like a car that starts losing its value as soon as it leaves the dealership. From a model accuracy perspective, it needs continuous performance tracking and retraining once its accuracy has been degraded due to specificity requirements or changes in the way phrases are used or as new terms and phrases are created. Simply put, NLP alone cannot accommodate large scale transitions in language use or contextual differences. Utilizing a graph database (e.g. Neo4j), can radically improve our ability to address these text analytics/NLP issues and more.
Using Graph Databases to Create Dynamic Taxonomies / Ontologies
Languages change, so the question becomes, what can we do about it? Knowing that words and phrases evolve over time means that we have to build multidimensional stores that combine context and time. Essentially, each entity in the graph can be stored as an “instance” or point-in-time version that is also contextualized. For example, the term “pop” within the context of music represents a form of music, but its mostly closely related terms should change from the 80’s to the 90’s and beyond. Next, the term “pop” can also be a drink in certain regions, and in other contexts, a mini-explosion, representing something that has been gaining public interest. By using a graph database, the varying meanings can be connected through context with more specific adjacent nodes.
A graph database that utilizes this dynamic approach to context and time enables the creation of a network of terms with likelihoods of attachment or “affinity” to different contexts and even varying time windows. As an example, the graph database can surface links where it is known that “Term A” within “Context B” has “Meaning C” and used to have “Meaning D” given “Context B”. This is in some ways also a form of “taxonomical or ontological version control” so that terms do not either become too specific or too generic. Dynamic graphs give traceability of the history of terms while also driving understanding of the contexts in which it is being used and how many different ways a term can be interpreted.
Text Analytics Scaling and Performance with Graph Database
One of the unique aspects of text analytics using NLP in the context of graph is the unique use of graph embeddings, otherwise known as vectors. These vectors are simply numeric representations of meaning encoded in the node of a graph. Because it is comparing numerics instead of strings, it enables comparisons of meaning using algorithms like Cosine Similarity to quickly find connections between encoded meaning, at massive scale while still ensuring performance. As one example, an organization could use this method to encapsulate positive, neutral and negative sentiment in graph database nodes, drawing from massive stores of social media and other public content, automating the process of understanding and alerting stakeholders to any issues, before they spiral out of control.
So when we ask the question, “What is text analytics?”, it becomes evident that while it is more than a single, simple technology or technique, effectively utilizing a combination of optimal approaches becomes critical in today’s context. While traditional forms of text analytics and associated NLP techniques have provided general insights in to simpler problems in the past, today’s ever changing industry landscape and exponentially growing data demands more context and details from text analytics, while still being able to scale and perform. The natural interconnectedness of the graph database provides an intuitive way to capture that context, encode meaning in a scalable way and draw actionable insights using graph analytics, while scaling and performing in the context of today’s big data volumes. These contextual insights and scalable approach provide improved answers to the specific and important questions organizations desire to tease out of their valuable semi-structured and unstructured textual data.
If you are looking for specific graph NLP-related topics, check out our text to graph machine learning blog article.
Graphable delivers insightful graph database (e.g. Neo4j consulting) / machine learning (ml) / natural language processing (nlp) projects as well as graph and Domo consulting for BI/analytics, with measurable impact. We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.
Still learning? Check out a few of our introductory articles to learn more:
- What is a Graph Database?
- What is Neo4j (Graph Database)?
- What Is Domo (Analytics)?
- What is Hume (GraphAware)?
We would also be happy to learn more about your current project and share how we might be able to help. Schedule a consultation with us today. We can discuss Neo4j pricing or Domo pricing, or any other topic. We look forward to speaking with you!