Neo4j users find themselves needing to monitor the internal health of the system. Halin was built as a tool to help them do that, and presents a “Tasks” screen for each member in a Neo4j cluster, that shows what’s running at any given time.!(upload://haowWzS40kTZrbgM1aauHVm24H3.png)Screenshot of tasks in Halin
Often users want to know “what’s going on with my query” — but the query isn’t the only thing that’s going on under the hood. To understand that, we have to look at two other system objects: Transactions and Connections. In this post, we’ll look at Transactions, Connections, and Queries, and how they all hang together in a Neo4j system.
The query is the easiest first part: it’s a bit of Cypher you send to the database to answer a question.
A transaction is some sequence of events that is applied to a database, that either atomically succeeds or fails. The purpose of transactions are to create little “bubbles” of computation and let the user know that it either all worked, or none of it worked. Suppose you need to do three things:
- Create an employee named Peter
- Create a job called “Software Engineer Intern”
- Link Peter to that job
You want all of that work to either succeed or fail. If creation of “Peter” fails, you don’t want to end up in an intermediate state where you have a “Software Engineer Intern” job that no one is doing. So you can do this by putting three different queries or operations in a single transaction.
At a low level, you can CALL dbms.listTransactions() to show what’s happening at any given time, and you can kill them individually with CALL dbms.killTransaction('some-id').!(upload://iJbfTgYflqyjyXtqOXcI2mkJg0b.png)
Very important to note — this output contains a currentQuery field. Because transactions may perform multiple operations, this could change, as the transaction handles multiple operations. Further, because it’s possible to manipulate the graph inside of the Java API — you don’t have to have any query at all! Not all changes to Neo4j are made via cypher.
Transactions have three key events in their lifecycle:
- They get opened
- They run zero or more operations,
- They commit, or they rollback. This is the part where all of the operations atomically succeed or fail.
The system manages multiple connections, via both HTTP and bolt ports. As a result, transactions get tagged with the source they came from. You can inspect these at a low level by running CALL dbms.listConnections() and you can kill them individually with CALL dbms.killConnection('some-id').!(upload://A0vtkDzB95PmguHlaWDPHRnb2ZD.png)CALL dbms.listConnections()
In this screenshot you can see the various connections Halin made to the database, and what address it was coming from. Each connection has an ID. Correspondingly, in the output of CALL dbms.listTransactions() you can find that some (but possibly not all) transactions are associated with a specific connection ID. This would let us know which client at which network address is trying to run which transaction.
Now that we have those background ideas covered, now back to queries. We can think of a query as a unit of work that is done within a transaction. In common cases, a single query will be the only unit of work. Remember: transactions don’t have to have queries, and a single query could be a part of a transaction, not the whole thing.
Finally — queries might be associated with a connection, but don’t have to be. If a query is run by the database itself, it has no client connection.
Simple Ranging to Complex
In the simplest possible case, a program connects to Neo4j, issues 1 query, gets the results, and disconnects. In that simplest case, there is 1 connection, 1 query, and 1 transaction, and they’re all neatly linked, easy peasy.
But in a production system, there could be hundreds or thousands of transactions in flight from dozens or hundreds of connections, with each transaction surviving less than a second. In that kind of environment, transactions may not even live long enough for you to inspect them with Neo4j’s built-in procedures.
In those cases, looking at CALL dbms.listTransactions() is a poor choice. By the time you have the table result back, the answer is already different! In these cases, you should consider enabling system monitoring with Prometheus or other similar approach. The built-in system procedures are best for manual administration of heavy or long-running queries & transactions.
It’s all Specific to a Cluster Member
Up until this point, we’ve been thinking about this in the case of one Neo4j machine; but if you’re running a Causal Cluster, you may have 3 or more machines. In this case, it’s critical to understand which machine your browser is pointed at. If you ask for a list of connections, you’ll get only the connections to the machine you’re talking to.
This is why Halin puts the query / task view underneath of the cluster member tab. Watch out! If you connect to a cluster using the neo4j://protocol, using a routing driver, Neo4j browser is routing your queries automatically for you, and you might not know which machine’s connections you are looking at. So for manual administration tasks, I’d recommend you use bolt:// to connect to a single cluster member so you know whose connections, queries, and transactions you’re seeing.
With these concepts, let’s now take a look at the built-in procedures and functions in Neo4j, and pull out only the bits that are interesting to management of connections, transactions, and queries. With all of the concepts described in this post, what these procedures do should now be clear and pretty easy to follow. A list of the relevant procedures is below.
Using all of these tools & concepts, you can have fine-grained control on exactly what your cluster is doing at all times, and be able to understand the results of what you’re seeing. Happy graph hacking!![|1x1](upload://6w7HOLoKuTDtEXRteNiYA53kW94.gif)