What is Cypher? A Quick Neo4j Cypher Introduction (With Examples)

By Sam Chalvet / Sr Consultant

August 26, 2022

Blog

Reading Time: 5 minutes

Understanding a new database and accompanying syntax is usually a challenge, and it is no different in the case of Neo4j Cypher. That being said, Graphable has created this overview and Neo4j Cypher introduction, to make the journey that much easier.

What is Neo4j Cypher?

Neo4j Cypher is the Neo4j query language that makes accessing Neo4j much easier through declarative statements, very similar to the SQL language. First created for Neo4j, and now based on the open-sourced openCypher movement that Neo4j started to standardise Graph Query Language, it uses what is called ASCII art syntax to query data in a graph database.

Do You Already Know SQL? See How Neo4j Cypher is Similar and Different

As a method to query data from a graph database, Neo4j Cypher query behaviour shares many similarities with SQL and is classified like SQL as a declarative, text-based, structured query language. Because both consist of keywords, clauses, and expressions including functions and predicates, much of the syntax will feel familiar (e.g. p.ProductId = 3365, WHERE, ORDER BY, LIMIT, AND etc). However, because Neo4j Cypher is about expressing graph patterns, there are also some important differences that are primarily driven by the fact that Neo4j is a schemaless database.

Structured Database to Neo4j Cypher Schemaless
Schemaless databases make query results verification even more important

Some of the most critical differences resulting from the schemaless database paradigm can be classified in three main groups outlined below:


1. Neo4j Cypher queries can reference things that don’t even exist. While most query languages will throw an error if you try to reference a table or a column that doesn’t exist in the schema, Neo4j Cypher does not. This can be disconcerting due to added difficulty in debugging results. In SQL if no results are returned it is because there is no data that fits your query, but in Neo4j Cypher, it could also be due to misspelled (or missing) Node Labels, Relationship Types, or Property. This is because Neo4j Cypher is trying to match patterns, if the pattern doesn’t exist, then nothing is returned.

SQL:

SELECT alias.Property_a, alias.Property_b

FROM Persssson as alias -- misspelled Table Name
LIMIT 1;
>> ERROR Unknown table 'Persssson'

Cypher:

MATCH (alias:Persssson) // misspelled Label
RETURN alias.Property_a, alias.Property_b
LIMIT 1;
>> NULL, NULL

2. The syntax for accessing properties is the same for non-existent properties as it is for properties with no values. Here is a Neo4j Cypher example showing this different behaviour:

MATCH (alias:Person)
RETURN alias.unset_property, alias.set_property, alias.fake_property
LIMIT 1;
>> NULL, "prop value", NULL

3. The property Types stored on Nodes are entirely ad-hoc.

For example, several Nodes with the Label ‘Person’ can have completely different properties of different data types. When querying them, a value will be returned if it exists, or a NULL if is not set or doesn’t exist.

MATCH (p:Person) RETURN p.property1, p.age, p.email LIMIT 4;

Returns those three properties for each node, but datatypes may differ, and it could look like this:

Neo4j Cypher properties can contain data of various types.

4. Relationship direction in Cypher accomplishes something completely different.

In SQL when joining two tables, very common joins are LEFT/RIGHT joins which define the result set you are seeking. In Neo4j Cypher relationship directions is all about pattern matching.

The main difference comes from the fact that they are not equivalent.

MATCH (person1:Person)-[:HEARD_OF]->(person2:Person)

Is not the same as

MATCH (person1:Person)<-[:HEARD_OF]-(person2:Person)

Omitting the relationship direction is less efficient, but will return data where either relationship direction exists.

MATCH (person1:Person)-[:HEARD_OF]-(person2:Person)

Neo4j Cypher, a Graph Query Language

Let’s start at the highest level with an overview of the structure of a graph database composed of Nodes and the Relationships between them. Nodes typically have Labels which is used to define their type. Loosely speaking, you can think of a single Node as being one row of data with the Label being the table name. The Cypher query language expresses this with symbols directly incorporated into the language syntax. This makes reading a query easy, even for novice users.

Neo4j Cypher database schema
Step 1: Familiarize yourself with basic syntax

Nodes are surrounded by parentheses and contain the node Label and its alias.

(node_alias:NodeLabel)

Relationships are surrounded by square brackets and contain the relationship Type its alias.

[relationship_alias:RELATIONSHIP_TYPE]

Nodes and Relationships are linked together with dashes and a greater/lesser than symbol to indicate the direction of the relationship.

(person1:Person)-[rel1:HEARD_OF]->(person2:Person)

Information can be stored on Nodes and Relationships in the form of Properties, which are key-value pairs. These can be set in a few different ways.

On creation of the Node/Relationship using curly braces and colons:

CREATE (person1:Person {property1: “a string”, property2: 123});

CREATE (person1:Person)-[rel:HEARD_OF {property1: DATE()}]->(person2:Person);

CREATE (person2:Person {property1:[“element1”, “element2”]});

As expressed in these Neo4j cypher examples, properties can be many different data types such as dates, strings, numbers, and even arrays.

Properties can also be set with the SET command using dot notation after Node or Relationship creation:

MATCH (person1:Person) SET person1.property1 = "a different string";

Step 2: Understand the core Neo4hj Cypher keyword commands

As in any query language, there are many commands in the form of reserved keywords and keyword combinations used to perform different actions. Some of the common ones in Cypher are MATCH, CREATE, WHERE, SET, RETURN, WITH which represent the most used ones. You may find this Neo4j cypher cheat sheet helpful.

Step 3: Digest the most fundamental Neo4j Cypher syntax best practices

In the context of Neo4j, the interpreter itself doesn’t care how you write out your query so long as you follow the proper patterns such as surrounding Nodes with ( ) and Relationships with [ ], etc.

However, there are some syntax best practices that are highly recommended by the community, which help with consistency and makes reading the query easier.

  • Node Labels are in PascalCase
  • Relationship Types are in MACRO_CASE

These practices follow Java the convention for Class names (a node) and constants (a relationship).

Conventions for property names and node/relationship alias typically depend on the framework in which you are using the queries. In Python, the usage of snake_case for these is more common whereas in NodeJS camelCase would be the norm.

Step 4: Understand Neo4j Cypher case-sensitivity

This is simple but important: nodes, relationships, properties, and property values are all case-sensitive, only keyword commands are not.

A Node with a Label like ‘Person’, is of a different class than a Node with a Label like ‘person’. The same goes for Relationships and their Types.

(person_alias:Person) (person_alias:person)

()-[:HEARD_OF]->()    ()-[:heard_of]->()

Though both examples above are accepted by the interpreter, they are not equivalent and refer to two different types of Nodes/Relationships, and node properties follow this same logic.

Why use Cypher and Neo4j?

It is fast. When it comes to answering complex questions that involve joining many traditional tables together, It is fast, like really fast. Queries that would normally require one to generate multiple “views” to help speed up query time become unnecessary. Want to get more technical? Read more about Neo4j Performance Architecture Explained.

It is flexible. Neo4j being schemaless makes rapid development possible. The ability to completely redesign how the data is connected AFTER loading up all the data is incredibly helpful. You can read more about these benefits and the Application-Driven Graph Schema Design best practices.

It is also simple to read and simple to write. You can realistically expect to cut your query line count in half. Those perks go a long when designing and maintaining an application or a script.

This is why it is one of the best go-to tools for data scientists and anyone who ventures into the unknown with a tight schedule on ROI.

Other Neo4j Cypher Resources

https://en.wikipedia.org/wiki/Cypher_(query_language)

https://neo4j.com/developer/cypher/


Graphable delivers insightful graph database (e.g. Neo4j consulting) / machine learning (ml) / natural language processing (nlp) projects as well as graph and Domo consulting for BI/analytics, with measurable impact. We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.

Want to find out more about our Hume consulting on the Hume knowledge graph / insights platform? As the Americas principal reseller, we are happy to connect and tell you more. Book a demo by contacting us here.

Check out our article, What is a Graph Database? for more info.


We are known for operating ethically, communicating well, and delivering on-time. With hundreds of successful projects across most industries, we thrive in the most challenging data integration and data science contexts, driving analytics success.
Contact us for more information: