Graph Database Fraud Detection: A Powerful Weapon for Financial Services

By Sean Robinson, MS / Director of Data Science

September 1, 2021


Reading Time: 6 minutes

When it comes to detecting costly financial services fraud, there is simply nothing that can compare to graph database fraud detection (e.g. Neo4j fraud detection). Read on for details on how this technology is changing the game in a problem worth trillions of dollars to banks and other financial institutions.

Graph Database Fraud Detection

The ICIJ found that leaked FinCen documents, “…identify more than $2 trillion in transactions between 1999 and 2017 that were flagged by financial institutions’ internal compliance officers as possible money laundering or other criminal activity — including $514 billion at JPMorgan and $1.3 trillion at Deutsche Bank.” (ICIJ)

With the rapid advancement of technology, criminals have been able to devise increasingly sophisticated fraud schemes to hide from the traditional detection systems of the big banks. However, with the advent of graph databases (for more, read “What is a Graph Database?”), these banks are finding powerful new ways to combat such fraud schemes.

Due to their very different architecture and approach, graph databases can help to identify suspicious patterns that were previously difficult or even impossible to detect with traditional relational database systems (RDBMS). These suspicious patterns of fraudulent activity can be detected much more easily and effectively to help financial institutions to stop fraud in its tracks before critical financial damage can be done.

Graph Databases vs RDBMS

While both RDBMS and graph databases may be able to perform similar functions when it comes to doing financial fraud analysis in fraud detection centers around the globe, uniquely, graph databases by nature capture the connectedness within the data. In this way, they are able to better facilitate the analysis of countless customer data points as well as the relationships and the nature of the relationships between those customers, at scale.

Where RDBMS struggles is in accounting for the many relationships between customers and related entities because they must use complex SQL joins often requiring vast system resources. Banks are starting to realize more and more that a graph database fraud detection approach offers greater analysis effectiveness while not compromising performance or over-utilizing resources.

RDBMS-based fraud systems struggle with more sophisticated fraud detection because in that context important information like transactions, customer data, and account data are often normalized and then stored in separate tables. Connecting this data together via the SQL query language requires a combination of complex joins and in-depth knowledge of the existing data model, often asking too much of SQL given the resource cost.

While storing application and transaction data is where RDBMS systems can shine, detecting serious fraud at scale requires a multitude of data sources be brought together seamlessly and all in relationship in order to create a more holistic picture of suspicious activity. As the number of data sources required and depth of connections increase, the more computationally expensive and time-consuming the SQL queries will be, limiting the scalability and effectiveness of fraud detection solutions using traditional RDBMS.

Graph databases are especially effective in financial crime use cases and for fraud detection graph analysis because of their ability to follow complex chains of transactions. These series of connections require many “hops” or traversals to identify and return the relevant set of relationships, and this is where traditional RDBMS struggle. To achieve this type of traversal query in SQL, many recursive inner joins are required and are computationally unrealistic when it comes to the many hops usually required to analyze fraud.

This means that the query must not only capture the core account data in the case of financial fraud, but also all the accounts connected to it as well as the many possible accounts connected to those accounts and so on. With their purpose-built graph query languages for just such database interrogations (e.g. Cypher, Gremlin etc), graph databases offer a unique advantage over traditional SQL-based solutions in a financial crimes context, due to the naturally interconnected nature of the domain and data.

Examples of Money Laundering

Currently, most fraud systems attempting to combat money laundering are built on traditional RDBMS databases, which store data in columns and rows across tables, much like Excel with rows and columns across tabs. This type of architecture is not by nature designed for identifying the complex relationships which must be analyzed when detecting money laundering, whether that be in the case of risk evaluation or money trails.

As examples of money laundering, typically criminals divide the sum of money they want to launder into dozens of different transactions, using many different bank accounts and identities as only the first step in the few stages of money laundering. They then divide the money up again into additional transactions to send to other intermediary accounts (referred to as pooling accounts) where the many small transactions are aggregated into a pool of funds.

This process will then be repeated some number of times and with each repeated step, another layer of complexity is added in order to uncover the fraud, completing the various stages of money laundering. After they are done, many transactions have likely been completed, making it much more difficult to pinpoint the original source. However, by storing and mapping this activity in a graph database, it becomes much more achievable to show the money traveling from one source to many intermediary sources, and ultimately back to one main source again, revealing a trail of transactions between the two parties.

The below image shows the general shape of such a money laundering scheme, as stored and viewed in a graph database:

Graph Database Fraud Detection: Stages of money laundering

In traditional SQL, the ability to uncover this fraud is often not practically possible due to the sheer number of recursive inner joins that would be needed to recreate this interconnected structure, as well as the associated resource cost to return results at scale in any usable timeframe.

First Party Fraud with Credit Cards Example

Another serious problem for financial institutions is first party fraud with credit cards, or financial transaction card fraud. This is where bad actors will use either stolen, fake, or manipulated identifying data such as home addresses, social security numbers, email addresses, and phone numbers to create a “synthetic” identity to apply for several credit cards. Typically, they will then use the credit cards in a normal way, making the payments on time to increase the credit line.

Once the limit is high enough, the actors will then max out the credit cards without paying them. With these synthetic identities, it makes it difficult to track who originally created them and at the end of the day it becomes uncollectible debt which is usually written off by the bank, costing these financial institutions an estimated $30 to 35 billion+ worldwide by 2020 and $40 billion plus by 2025 to 2030, according to the global card and mobile payment experts at The Nilson Report.

Using credit card fraud detection to find these fraud rings (as another kind of fraud detection graph analysis) before real damage has been done is the challenge. When fraud analysts can look at these rings of financial transaction card fraud through the lens of a graph database, using graph fraud detection to find these shared identifiers such as social security numbers and addresses, they have a much better opportunity to identify the bad actors and the associated fraud rings.

It is helpful to note that graph databases are advantageous in the case of credit card fraud for a different reason than they are for money laundering which is focused on following trails or chains of transactions split across many accounts. In the case of credit card fraud, because of the synthetic identities, the emphasis is instead focused on the interconnected shared identifiers prevalent in credit card fraud.

In credit card fraud detection, a network theory technique called link analysis within a graph is typically used to analyze the relationships between nodes and edges by investigating with graph analysis fraud detection how they are connected. The unique ability to build relationships between interconnected data elements such as SSNs, physical addresses, and phone numbers and even trails of transactions between accounts is what enables graph databases and graph fraud detection in general to excel within a financial crimes context.

Below is an example of a simple fraud ring where two actors use their real street address, stolen social security numbers, and burner phones to create and use four synthetic identities with fake names to open the accounts:

First Party Fraud with Credit Cards in Financial Services

In the example above the fake identities opened 14 credit cards. Assuming all the credit cards average out to a $6,000 limit, that is a potential $84,000 loss for any given bank, and this would be an example of an extremely small fraud ring in today’s context.

For a traditional SQL query to detect this type of fraud at scale, as discussed before, a complex and ill-suited series of joins and self-joins would be required, which are functionally limited in the number of traversals it could perform and return. Using a native graph database and a fraud detection knowledge graph in particular for the graph analysis fraud detection is the ideal way to pursue these fraud rings, leveraging the connected nature of graphs to navigate and traverse the connections within the data to uncover relationships in order to much more effectively detect illegal activity.

Graph Database Fraud Detection: Conclusion

Above are just two examples of the many financial crimes typologies. In the case of money laundering, it is evident that criminals obfuscate the trail by breaking the money up into smaller transactions to make it difficult for traditional detection systems to follow the trails of transactions. Regarding first party fraud with credit cards, fraud rings are created leveraging a combination of fraudulent identifiers comprising their synthetic identities resulting in an effective means to hide the fraud when faced with more traditional fraud detection systems.

This costs banks billions and billions every year. In more modern fraud detection centers, financial institutions are now often creating a fraud detection knowledge graph, leveraging the connected nature of graph data models (whether it be with Neo4j fraud detection or any other option), as well as the very nature of graph databases that make them uniquely suited for graph database fraud detection in these and so many other financial service fraud contexts.

For more on related graph algorithms, check out the articles on Betweenness Centrality, Closeness Centrality and Graph Traversal Algorithms.

Graphable delivers insightful graph database (e.g. Neo4j consulting) / machine learning (ml) / natural language processing (nlp) projects as well as graph and Domo consulting for BI/analytics, with measurable impact. We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.

Still learning? Check out a few of our introductory articles to learn more:

Want to find out more about our Hume consulting on the Hume (GraphAware) Platform? As the Americas principal reseller, we are happy to connect and tell you more. Book a demo today.

We would also be happy to learn more about your current project and share how we might be able to help. Schedule a consultation with us today. We can discuss Neo4j pricing or Domo pricing, or any other topic. We look forward to speaking with you!

We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.
Contact us for more information: