Neo4j recently released the BI Connector, which is a general JDBC driver that can process SQL queries against a Neo4j graph and return meaningful results. This enables users of tools like Tableau, that generate SQL queries, to plug directly into graphs. In this article, we’re going to show how you can query Neo4j directly using SQL rather than cypher.
Test Neo4j Database
For my test data, we’re going to load this cypher snippet into a Neo4j 3.5 series database, which just loads a list of cities and countries worldwide.
If you’re using Neo4j 4.0, you can also use this sample, just remember to change the toInt() in the cypher to toInteger() as that changed in Neo4j 4.0.
Once we have the data loaded in, we can see a simple graph of cities & their countries. Here, we’re looking at all US cities with more than 3 million population.
Connect to Neo4j with the BI Connector
I’m going to be using an app called SQuirrel SQL Client in these screenshots, which can act as a general SQL command shell for any database. But you don’t have to use this, the general approach that we’re using applies to most tools. Neo4j published a knowledge base article as well on how to use another tool called SQL Line in a similar way.
We simply fill out a few settings, with the path to the driver JAR file, the driver class, and so on.
This lets our tool know about the BI Connector in general, and how to connect to Neo4j. Notice the driver class, and also the extra class path, which needs to point to the JAR file that comes packaged with the BI Connector. Take note of the class name argument, this is needed for all JDBC drivers.
Now that the driver is set up, we can create a connection / alias to the machine we want to use, like this:Setting up a connection
Important: notice the StrictlyUseBoltScheme=true part. Because I’m connecting to a local database with Neo4j Desktop, this is how I force the driver to use the bolt:// connection scheme. By default, it will try to use bolt+routing:// or the neo4j:// scheme. So make sure you using the appropriate setting depending on your Neo4j deployment. Use of this setting will work with any Neo4j install, but it is only required for stand-alone instances.
After testing the connection, we’re looking good!
As you can see, once we’ve connected, to the SQL client it looks like this is a single relational database with two schemas named “Node” and “Relationship”. Under the “Node” schema, we have two tables:
Each table has its own set of attributes, which are the properties of the node label in Neo4j.
In the “Relationship” schema, we have a single relationship table. Just by its name, you can guess that the City_IN_Country table represents the (:City)-[:IN]->(:Country) path in Neo4j.
Let’s try an easy one:
SELECT _NodeId_, name, population FROM City WHERE population > 5000000 ORDER BY name ASC LIMIT 10;
This allows us to get a certain selection of fields, for only cities with more than 5 million in population, ordered by their name. Behind the scenes, the BI Connector translates the SQL into the necessary Cypher bits, and fetches data from Neo4j in real time.
We can also do much more complex SQL queries as well. Here is a join that connects three tables (City, City_IN_Country, and Country) via an equijoin in SQL, and places other constraints as well.
This query will show an important pattern you’ll see consistently throughout the BI Connector. Every node gets a _NodeId_ column, which corresponds to the ID of the node in Neo4j. Every relationship table will get _SourceId_ and _TargetId_ columns that allow you to join and navigate from one node to the other. Effectively, relationships become join tables.
Happy graph & table hacking!
- Download the BI Connector — The BI Connector is available at no extra charge for Neo4j Enterprise Edition customers.
- Check out the Neo4j Blog Post on how the technology works!
- Have a look at instructions on using it with Tableau Desktop