In our years delivering Neo4j consulting, we’ve had the good fortune of helping our clients across a wide variety of use cases; from recommendations, to fraud, search, regulatory compliance, rings of international illegal asset purchases, and many more. Throughout most of these engagements, if a graph database existed at all in the architecture, usually we find a Neo4j database as an ancillary component handling a specific portion of responsibility, with data piped out of a relational/tabular data store, analyzed in Neo4j, and then piped back into a table somewhere to be query by the application.
Along the way, we began to notice that these “ancillary” pieces of functionality served by graph were, more often than not, actually a core piece of functionality for the application, and that the benefits of a tabular data structure (rigid schema and easy query/filtering within a table among others) were often a hindrance rather than a help to the needs of our clients’ evolving applications. With this in mind, we began recommending a new model to our clients with Neo4j at the core, and they’ve quickly realized the benefits. We’ve seen legacy enterprise applications with 5-10 second query times in normal operation drop to the milliseconds after replatforming the app. We have delivered previously impossible SaaS-based fraud analysis leveraging the power of graph; helped to dramatically improve regulatory compliance capabilities, data governance, and data provenance at top 10 banks; and much more. Today I’d like to walk you through some of the lessons we’ve learned, and where we think this model will evolve in the future.
In the beginning there were abacuses, used for doing complex calculations that humans couldn’t hold in their heads. A long while later computers moved toward replacing human computers for doing complex calculations and executing steps of instructions printed on cards. A short while after that, computers began abstracting logic and calculations, and we could produce much more complex and valuable functionality built on the foundations of those abstractions. Along the way, we’ve begun to realize that our data is typically more connected than we originally could have imagined. Users and their messages are not separate lists of information related by a foreign key purely to comply with Third Normal Form (defined in 1971), but rather they are fundamentally interconnected chains of interactions. The same is true across a wide variety of uses. For example, Orders and Order Lines are not separate lists, but a loose hierarchy connected into an ontology of products. Digital events are not disparate data but rather present a connected and sequential path of travel. Third Normal Form can help you prevent data duplication, but it can’t help you solve these latter goals. Graphs can.
Put simply: If you are building a new application in 2021, we would strongly recommend you consider having a graph database at it’s core. Think about your use case and it’s interconnected nature, you may find that the power of relating your data from one point to another significantly outshines the need to have neat lists of information. Here’s why:
One of the primary reasons to choose a graph database over a relational one is flexibility. Applications evolve over time, and relational databases struggle to evolve with them. In our experience, relational models often create “star” players (star-schema dimensional pun completely intended), where normalized relational models have complex transformations into star-schema dimensional models, typically centered on a small handful of tables. Once there is a complex source relational model transforming into a complex target dimensional model, the simple act of ensuring that this transformation keeps functioning can often become the core goal of the application, rather than only serving its original business purpose. We end up spending more time thinking about the complexity of trying to do anything, rather than the business possibilities of improving our application. On top of that, because chaining joins together from table to table to table involves such a performance cost, there’s a natural tendency to keep your schema centralized around a few core tables. Graph databases provide increased flexibility where your schema can evolve and grow over time, because having relationships between Classes in your schema is a “first principle” of the data structure, instead of something that requires esoteric normalization rules to safely navigate. Importantly, because traversing relationships in a graph is so performant, there’s no need or temptation to keep everything central, and your database remains for flexible and resilient.
In addition to the evolutionary flexibility, graph databases enable you to easily represent arbitrary hierarchies, as well as multiple hierarchies with the same core data. For instance product catalogs, CRMs, or content management systems all use hierarchies to organize their data, but relational systems force you into absolute rules, while graphs can easily deal with items in multiple categories or functioning through multiple levels of the hierarchy. This also creates new flexibility around the age old problem of how to manage slowly changing dimensions (e.g. salespeople whose territories change over time).
Beyond flexibility, graphs can provide significant speed boosts in many applications. These boosts, while not guaranteed, derive principally from the significantly improved relationship management central to graph database versus the rigid structure of an rdbms-based application. One of the most often demonstrated use cases of this speed is in real time recommendation engines, which often use what is called “collaborative filtering”. Due to the flexibility of relationships in graph databases, finding “other things that people who read the same things you read but you haven’t read that thing yet” becomes an almost instantaneous query, and you can quickly move beyond this basic implementation into more advance recommendations/similarity with cosine similarity, GDS algorithms, and more.
Connecting two pieces of data isn’t always a binary activity. Sometimes two things are loosely related, sometimes they are strongly related, potentially you may have an ordered set of alternative products when one is out of stock. All of these can be accomplished with relationship properties, and once you’re using them, you can level up your queries with built in shortest-path algorithms finding not only the shortest path in “hops” (number of relationships) but also in “distance” (aggregate value of a relationship properties). This can be immensely powerful in dependency graphs as we can find all the root dependences of an end-point, but we can also realistically model things like load balancers and their destination routing weights, factoring this into our dependency and impact queries for what-if analysis.
Lastly, but perhaps most importantly, is the semantic nature of graph schemas. It’s not a perfect fit for all production scale applications, but in general a graph schema will have objects as the nodes, and verbs as the relationships.
While this simple example is helpful, we’ve also found through our experience that developing semantic models and rules within your application can drastically decrease application development time, and, with a little creativity, enable incredibly flexible API design, reducing the coupling between your current schema and your current API, as we’ve already discussed how applications evolve over time. This involves making rules for naming relationships that may have many targets vs. an individual target, specific naming conventions for instance nodes, etc. Beyond the immediate benefit however, there’s the added benefit of being self-documenting (NOTE: We also encourage actual documentation.), where a small set of rules defines all or most of the connections within your database, making them self-evident such that new developers are easily transitioned onto the team without the steep learning curve required for understanding typical relational models.
In short, maybe you aren’t in the position of building your application from scratch at the moment, but you should strongly weigh the benefits of putting a graph at it’s core over the current costs of increased development time, slow performance, and difficulty upgrading required of typical relational database models. Rebuilding a core business application may not be an easy decision, but we’ve helped many of our clients, from start-ups to top 10 global financial institutions successfully navigate the journey and we have seen them reap the benefits.
In our Graph-centered AppDev webinar series, we’ll take this recommendation a step farther and review our full stack architecture from database to front-end and everything in between.
Graphable delivers insightful graph database (e.g. Neo4j) / machine learning (ml) / natural language processing (nlp) projects as well as graph and Domo analytics with measurable impact. We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.