Why Bloomberg Terminal and Graph database? Our Neo4j consulting and Graph data science experts come up with novel and valuable new ways to apply graphs in the real world. In this first example, we look at how much more valuable the Bloomberg Terminal could be by using a graph database at the core.
What if Bloomberg Terminal Used Graph Database?
For anyone not familiar with it, the Bloomberg Terminal has been a cornerstone of the financial industry since the 1980s, helping users to obtain the information they need in time to make critical decisions and to use the data to interact with their clients. It is a physical console that sits atop the desks of analysts, portfolio managers, and executives, and is considered a critical tool even to this day.
What sets the Bloomberg Terminal apart is not only the volume and breadth of data in the system but also the way in which the end-user can choose to consume and analyze that data. Examples include data as diverse as identifiers on a name traded on an international exchange to displaying the location of oil tankers at sea across the world.
Under the covers, the Bloomberg Terminal is completely proprietary so we are hypothesizing how the graph database would fit, especially given the complex and multidimensional nature of the data, and the need to quickly and efficiently traverse the many entities in often unexpected ways. Below we imagine the various uses cases Bloomberg mentions in a 60 Minutes interview (starts at 4:46) including searching and using information, and even presenting it effectively, by leveraging graph databases.
The goal of this example is to do some what-if thinking, using our imaginations and technical expertise to explore the different aspects of its use, but in the context of graph. This is not a far-fetched use case since even Gideon Mann, Bloomberg’s Head of Data Science, has been trying to understand how graph will fit in as far back as 2017.
Imagining the Bloomberg Terminal Graph Schema
Before diving into the improved questions one could ask of the Bloomberg Terminal data by using graph, first we must imagine the potential graph schema, based on what we do know of the data. For example, in the below schema we have company at the center and connections with that company including but not limited to suppliers, board members, sector, and so on.
Asking Connected Questions of The Bloomberg Terminal with Graph
Graph databases excel at helping users unlock the connected nature and insight within their data. To illustrate how we would ask a question of a graph datbase using Neo4j‘s Cypher query language, here is a basic example of a query on the above schema to find information on the company Anheuser-Busch Inbev (BUD), including exchange, sector and last price:
MATCH(c:Company) WHERE c.name CONTAINS ‘Anheuser-Busch’ RETURN c.name as CompanyName, c.ticker as Symbol, c.exchange as Exchange, c.Sector as Sector, c.last_price as LastPrice
Another example of a slightly more complex connected data query would be to find all companies that share the same suppliers and industry as BUD. In SQL, this would be at least a couple of more expensive subqueries first looking up BUD’s industry and its suppliers, then all names in that industry and their suppliers, and finally whether those suppliers are the same as any of BUD’s.
As shown below, in cypher, the query would be much simpler: it identifies the BUD’s industry, all of BUD’s suppliers, then finally matches those suppliers and industry to other companies in the schema, with no sub-queries. Notable in the return statement is the combination of company and supplier since they could share more than one supplier. If you’re only looking to return the list of companies, you could also return distinct c.name.
MATCH(c1:Company)-[:IN_INDUSTRY]-(i:Industry) WHERE c1.name CONTAINS ‘Anheuser-Busch’ WITH c1,i MATCH(s:Supplier)-[:SUPPLIES]-(c1:Company) WITH i,s MATCH(s:Supplier)-[:SUPPLIES]-(c:Company)-[:IN_INDUSTRY]-(i:Industry) RETURN c.name as CompanyName, s.name as SupplierName
The efficient querying of complex connected data is a strength of graph databases, and with a system at the scale of Bloomberg Terminal, implementing graph could materially impact their ability to support more insightful questions being asked of the data that people are expecting more and more, while simultaneously reducing their compute power and infrastructure required to support these kinds of connected data questions.
Graph is the Future of Data Science
As Google, one of the true thought-leaders in data science pointed out, the future of data science and AI will be built around graph and related network technologies. In part it is simply because the graph database is structured and functions more like the human brain. If graph is newer to you, find out more in our article What is a graph database?.
One of the unique features of the property graph specifically, and one of the reasons this topic is a fit for our series Graph Database Project Ideas is that we can label and even weight relationships between nodes. If we want to look beyond the schema and at the data itself, in graph data science (GDS) there is a concept called “projections” which is similar to the database concept of a view. But in this case, it is a view over the stored property graph, surfacing only data relevant to the combination of queries and algorithms in the analysis, and which may have potentially aggregated data as in the example below:
The figure on the left depicts a simple projection of the count of shared suppliers that the four beverage companies have in common. One can glean that Pepsi, Coke, and Anheuser-Busch share a number of suppliers, however, Coke has a couple of additional suppliers that it shares with Starbucks, making it more diversified and possibly more resilient should an issue arise in their overall supply chain.
The second more complex projection example on the right could address a question of similarity between a subset of companies where the weight on these graph relationships (often referred to as “edges”) are the calculated similarity score between each of the three companies, based on a GDS algorithm such as Cosine Similarity, Node Similarity, Jaccard or others.
As one might be expected, the similarity score between Pepsi and Starbucks is higher than the score between Caterpillar. As we consider the unique insights GDS can drive, we can also see that the similarity between Caterpillar and Starbucks is higher than between Caterpillar and Pepsi, something we may not have anticipated. Thinking back to the larger schema, this could be a result of shared consumers, locations, analysts, exchanges, indices, among others– information that could be helpful to an investor seeking to understand more deeply the companies they are investing in.
These two examples only scratch the surface of what is possible in deploying graph database to unleash the power of GDS on the Bloomberg Terminal data.
Cypher is More Intelligible than SQL
Sharing one’s findings with a larger audience is a cornerstone of research no matter the industry and the Bloomberg Terminal has many helpful capabilities in this area. But by leveraging graph, the querying itself can become more transparent and useful to the business.
As defined by Neo4j, Cypher is a declarative SQL-inspired language for describing visual patterns in graphs using ASCII syntax. This means that while it incorporates elements such as selecting fields and taking certain aggregations like SQL, unlike with SQL, an individual with little to Neo4j or Cypher experience can typically follow the sequence of most queries.
Revisiting our earlier example of finding companies who share industry and suppliers with Anheuser-Busch, a typical third-normal form relational application database poses the challenge that its attributes are spread across various tables connected by foreign keys. Industry could be one place, companies in another, and suppliers in a third and each of their various attributes could be stored in a multitude of other reference tables. Linking those together in a way that includes all the necessary elements would take a fair amount of database understanding and time.
On the other hand, Neo4j with Cypher provides a far more visually accessible means of thinking about the database schema and querying. Cypher novices and experts alike can see connections between structure and syntax much more easily, whether it be through read-access to the queries, or if the Bloomberg Terminal decided to make a Cypher query window available for advanced questions about the relationships between entities.
Bloomberg Terminal on Graph: Conclusion
In starting this series on graph database project ideas, imagining the Bloomberg Terminal on graph was a fascinating first idea to explore. We focused on a handful of examples looking at nodes and relationships. But there is so much more that could be explored, including how leveraging node properties can be so powerful for analysis.
Imagine incorporating stats such as last price, P/E, or yield on the node to drive further graph data science analyses? Or imagine applying algorithms to compute risk, and then being able to count hops between companies in order to look at how risky companies may be associated with one’s existing portfolio of company investments?
Graphable delivers insightful graph database (e.g. Neo4j consulting) / machine learning (ml) / natural language processing (nlp) projects as well as graph and Domo consulting for BI/analytics, with measurable impact. We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.
Want to find out more about our Hume consulting on the Hume knowledge graph / insights platform? As the Americas principal reseller, we are happy to connect and tell you more. Book a demo by contacting us here.
Check out our article, What is a Graph Database? for more info.