In this overview of graph algorithms we will explore the families of algorithms within graph theory and examine how their mechanisms support a wide variety of use cases across countless domains. As a kind of graph algorithms cheat sheet we also look at a few famous examples, using the most common graph algorithms being used to power some of the most common technologies in the world. Beyond just Neo4j graph algorithms, this article applies to any graph data science context.
Graph Algorithms: Understanding the Family Tree
While there are a wide variety of graph algorithms designed for numerous use cases, all of them can be traced back to just a few basic concepts which are deeply ingrained into the fabric of graph theory. While the complexity of graph algorithms is real, from these basic groups we will be able to understand how they are used to construct the logic for additionally complex algorithms.
The first and most basic of these concepts are random walks. A random walk simply chooses a starting node from which to begin its walk and then randomly traverses the graph for some amount of steps or “hops”. A random walk can be performed on any graph whether it be directed, undirected, weighted, or unweighted. However, while random walks can still be performed on a disconnected graph, it does constrain those walks to the component where the walk starts. As we will see, these random walks can be used to solve a number of problems and are therefore the foundation for most graph algorithms.
Pathfinding & Search Algorithms
Another foundational graph algorithm family are graph shortest path algorithms. As we explored in our article on graph traversal algorithms (a.k.a. pathfinding algorithms), shortest path algorithms typically come in two flavors depending on the nature of the problem and how you want to explore the graph to ultimately find the shortest path.
- Depth First Search, starts by traversing as deeply into the graph as possible before returning to its starting point and pursuing another deep path traversal
- Breadth First Search, keeps its traversals as close to the starting node as possible and only ventures deeper into the graph when it has exhausted all possible paths closest to it
Pathfinding is used in many use cases, perhaps most notably in Google Maps. In the earliest days of GPS, Google Maps used pathfinding on a graph to calculate the fastest route to arrive at a given destination. This is just one of many examples of graphs being used to solve everyday problems for countless people.
Centrality Algorithms can be used to analyze a graph as a whole to understand which nodes within that graph have the greatest impact on the network. However, to measure the influence of a node in the network with an algorithm, we must first define what “impact” on a graph means. This differs from algorithm to algorithm and is a great starting point when trying to decide which centrality algorithm to choose.
- Degree Centrality uses the average degree of a node to measure how much of an impact it is having on the graph
- Closeness Centrality uses the inverse farness distance between a given node and all other nodes to understand how central the node is to the graph
- Betweenness Centrality uses shortest paths to determine which nodes serve as central ‘bridges’ across a graph to identify key bottlenecks within a network
- PageRank uses a set of random walks to measure how influential a given node is to a network. By measuring which nodes are more likely to be visited on a random walk. Note that PageRank addresses the disconnected graph problem which random walks face by occasionally jumping to a random point in the graph rather than making a direct hop. This allows the algorithm to explore even disconnected portions of the graph. Named for Google founder Larry Page, PageRank was developed as the backbone of the Google search engine and allowed it to exceed the performance of all its competitors in the early stages of the internet.
Community Detection Algorithms
Community detection is a common use case for a variety of graphs. Typically it is used in any situation where understanding the distinct groups of nodes within a graph offers some tangible value to the use case. This could be anything from social networks, from fleets of trucks making deliveries to a network of accounts transacting with one another. However, which algorithm you choose to discover these communities will greatly impact how they’re grouped.
- Triangle Count simply uses the principle that three nodes fully connected to one another (like a triangle) is the simplest community dynamic that can exist in a graph. It therefore finds every combination of triangles within a graph to determine how those nodes are grouped together
- Strongly Connected Components and Connected Components (aka Weakly Connected Components) are excellent algorithms for determining the shape of your graph. Both aim to measure how many graphs make up the entirety of the data. While Connected Components simply returns the number of completely disconnected graphs within a set of nodes and edges, Strongly Connected Components returns those subgraphs which are solidly connected by many linkages. Because of this, they are typically used in combination as a form of initial exploratory data analysis when first analyzing graph data
- Louvain Modularity finds communities by comparing clusters of nodes and edges to the average for the network. If a group of nodes are found to be generally greater than what is seen on average in the graph, those nodes can be considered a community.
In this article, as a kind of graph algorithms cheat sheet, we’ve merely scratched the surface of the most common graph algorithms for data science (a.k.a graph algos) which can be used to harness the interconnected power that graph has to offer for data analysis. For example, in future artciels we will also focus more on graph search algorithms among others.
We looked at the most fundamental graph theory algorithms which serve as the building blocks for more complex graph algorithms and examined those complex algorithms that can address a variety of problems for many use cases. This is true whether they be Neo4j graph database algorithms or for any other graph database.
For a deeper dive on some of the algorithms discussed here as well as more advanced graph topics, check out our additional articles on the subject:
Graphable delivers insightful graph database (e.g. Neo4j consulting) / machine learning (ml) / natural language processing (nlp) projects as well as graph and Domo consulting for BI/analytics, with measurable impact. We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.
Still learning? Check out a few of our introductory articles to learn more:
- What is a Graph Database?
- What is Neo4j (Graph Database)?
- What Is Domo (Analytics)?
- What is Hume (GraphAware)?
We would also be happy to learn more about your current project and share how we might be able to help. Schedule a consultation with us today. We can discuss Neo4j pricing or Domo pricing, or any other topic. We look forward to speaking with you!