Neo4j Bloom hangs with error indication

When touching certain items in my knowledge graph Bloom stops working and needs to be restarted. Here is the error message from the debug.log file:

2020-03-06 15:52:25.352+0000 ERROR [o.n.b.r.MetricsReportingBoltConnection] Protocol breach detected in bolt session 'bolt-57431'. Message 'BEGIN Map{mode -> String("r"), tx_metadata -> Map{app -> String("neo4j-bloom_v1.2.0"), type -> String("system")}}' cannot be handled by a session in the TX_READY state.
org.neo4j.bolt.runtime.BoltProtocolBreachFatality: Message 'BEGIN Map{mode -> String("r"), tx_metadata -> Map{app -> String("neo4j-bloom_v1.2.0"), type -> String("system")}}' cannot be handled by a session in the TX_READY state.
at org.neo4j.bolt.v1.runtime.BoltStateMachineV1.nextState(BoltStateMachineV1.java:149)
at org.neo4j.bolt.v1.runtime.BoltStateMachineV1.process(BoltStateMachineV1.java:92)
at org.neo4j.bolt.messaging.BoltRequestMessageReader.lambda$doRead$1(BoltRequestMessageReader.java:89)
at org.neo4j.bolt.runtime.MetricsReportingBoltConnection.lambda$enqueue$0(MetricsReportingBoltConnection.java:68)
at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatch(DefaultBoltConnection.java:191)
at org.neo4j.bolt.runtime.MetricsReportingBoltConnection.processNextBatch(MetricsReportingBoltConnection.java:86)
at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatch(DefaultBoltConnection.java:139)
at org.neo4j.bolt.runtime.ExecutorBoltScheduler.executeBatch(ExecutorBoltScheduler.java:171)
at org.neo4j.bolt.runtime.ExecutorBoltScheduler.lambda$scheduleBatchOrHandleError$2(ExecutorBoltScheduler.java:154)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Does anyone have an idea what might be the problem?

Many thanks in advance.

Hi, I can see from the error log that you are using Bloom version 1.2.0. Could you also let me know the following details about your setup:

  1. Are you using the Bloom Desktop app, the Bloom plugin or is Bloom self-hosted on a server?
  2. Which version of Neo4j are you connecting to?

Many thanks,
Clemens

Hi Clemens,

I am using Neo4j Desktop 1.2.4.1101 with Bloom 1.2.0 installed as an app. The Neo4j database version is 3.5.14. I am working with a MacBook Pro 3.1 Ghz i7 with 16GB RAM.
I have been able to get Bloom to work again after shutting it down and then restarting, however, I am still running into "hangs", in other words Bloom does no longer respond to any input. It happens intermittently and doesn't seem to be repeatable. I usually can only recover by either closing Bloom and re-opening it or by completely rebooting my machine and starting from scratch. I thought initially the "hangs" don't leave any message in the logs, however, I seem to be getting them later in the day or next morning. It seems for every hand I get the same message block.

2020-03-14 01:51:43.894+0000 ERROR [o.n.b.r.MetricsReportingBoltConnection] Protocol breach detected in bolt session 'bolt-10516'. Message 'BEGIN Map{mode -> String("r"), tx_metadata -> Map{app -> String("neo4j-bloom_v1.2.0"), type -> String("system")}}' cannot be handled by a session in the TX_READY state.
org.neo4j.bolt.runtime.BoltProtocolBreachFatality: Message 'BEGIN Map{mode -> String("r"), tx_metadata -> Map{app -> String("neo4j-bloom_v1.2.0"), type -> String("system")}}' cannot be handled by a session in the TX_READY state.
at org.neo4j.bolt.v1.runtime.BoltStateMachineV1.nextState(BoltStateMachineV1.java:149)
at org.neo4j.bolt.v1.runtime.BoltStateMachineV1.process(BoltStateMachineV1.java:92)
at org.neo4j.bolt.messaging.BoltRequestMessageReader.lambda$doRead$1(BoltRequestMessageReader.java:89)
at org.neo4j.bolt.runtime.MetricsReportingBoltConnection.lambda$enqueue$0(MetricsReportingBoltConnection.java:68)
at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatch(DefaultBoltConnection.java:191)
at org.neo4j.bolt.runtime.MetricsReportingBoltConnection.processNextBatch(MetricsReportingBoltConnection.java:86)
at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatch(DefaultBoltConnection.java:139)
at org.neo4j.bolt.runtime.ExecutorBoltScheduler.executeBatch(ExecutorBoltScheduler.java:171)
at org.neo4j.bolt.runtime.ExecutorBoltScheduler.lambda$scheduleBatchOrHandleError$2(ExecutorBoltScheduler.java:154)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-14 01:55:20.857+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=365, gcTime=406, gcCount=1}
2020-03-14 01:55:22.294+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=100, gcTime=172, gcCount=1}

Hope we can figure out what might be causing this as it does not provide a good user experience when an application hangs in the middle of an analysis.

Thank you so much for your insight.
Wolfgang

Thank you for the information Wolfgang. I'm talking to the Drivers team to see if they can identify the issue.
In the meantime, do you have access to the console logs of the Bloom app itself? You can access these logs by activating developer mode in the Neo4j Desktop settings and then clicking the "App Developer tools" button that will appear in Bloom when re-opening it. Any logs that might appear there once the error occurs would help us as well.
Alternatively, if you have error reporting enabled in Bloom (you can opt in for error reporting on the login window) and DM me your IP address, I can look for these logs in our error reporting tool as well.

Many thanks,
Clemens

I have activated the "Developer Mode" in Bloom and can now see the Console. So far I haven't had much time and haven't hit the error, but I will keep using this approach and report back here should I run into it.

In the meantime I did upgrade to Bloom 1.2.1, not sure whether that is going to make a difference, but just wanted to mention it.

Thank you,
Wolfgang

Thank you, that would be very helpful.
The upgrade to 1.2.1 shouldn't make a difference unfortunately.
In case you have error reporting enabled, the utc times of when the error occurs (or the times of any previous errors) could also be of help to us, as we can then track them down in our error reporting tool, which might give us some more information about why Bloom freezes.

Many thanks,
Clemens

I have been trying to work with Bloom over a few days now, using the Developer Mode. During this time I have experienced a number of "hangs", in each case I don't really see anything specific in the Developer window. Today, March 23rd at 9:20 Pacific Standard Time I had another one of those hangs, I am including a screenshot of the Developer window console. Unfortunately I have not been able to determine any specific sequence of events for these hangs as it occurs right in the beginning when trying to see the Relationships or the Neighbors. I happens on some nodes, but not for others.
Here is the screenshot:

Not quite sure where in Bloom I would enable error reporting. I can certainly share my IP with you should that help in tracking down this issue.

Thanks,

Wolfgang

Thanks for providing the screenshot of the console logs. It doesn't look like there are any JavaScript errors, so our error reporting wouldn't show anything on that end either I'm afraid.

It sounds like the error usually occurs when inspecting a node and viewing its connections. In addition to the debug.log, would it be possible to provide the query.log records during the time of the hangs as well? If there's no query.log file in the logs directory, you can enable query logging by setting dbms.logs.query.enabled=true in the database config.

This would enable us to see the type and amount of queries Bloom is performing against the database when Bloom freezes. With graphs that are very connected and/or have many labels, the queries Bloom generates can become expensive. Bloom might generate Cypher queries so extensive that it brings the app to a standstill.

Thanks for pointing to the query.log setup, haven't tried that one before. I actually just had an incident where I tried to run an updated search in a Bloom perspective and Bloom started to hang. Looking into the query.log file I noticed that it appeared to be running into an endless loop. At least it didn't stop until I stopped Bloom itself. I'll share the log file via dropbox to your Neo4j e-mail.
My graph certainly consists of a lot of labels (86 to be precise) and a lot of relationship types (190 as of this date). However, the examples in Bloom should only touch a handful of these labels and relationships. If I run the same query in the Browser it works fine.

Thank you for providing the query log and apologies for taking a while to get back to you. I'm surprised to see that the search phrase attempts to retrieve so many nodes. Bloom limits the amount of unique nodes it returns (a limit which can be changed in the settings panel) and this limit applies to search phrases as well. It doesn't look as if it is applied to your search phrase however.
I've talked with my colleagues and we were wondering if there are a lot of duplicate nodes or relationship in your graph by any chance?

Regarding duplicate nodes and relationships: No, not that I am aware of. I have my own unique identifier for nodes running of a single sequence. on relationships it depends on what is considered a duplicate. Each relationship has its own id in Neo4j, I don't have one on my own in that case. There could be relationships that have the same set of connected nodes. That could be remediated if need be, but I am not clear how that would cause issues in Bloom.
At the moment I could interactively get to a view via a two-step expansion of nodes by relationships. Attempting to do the same with a cypher query defined as a Search fails, it hangs. Running the cypher in Neo4j Browser works fine.
As if this problem wasn't enough. I just tried to switch from one Perspective to another and all of a sudden my Bloom screen is blank again. I tried to restart everything, turn on Developer Mode and see the following issues:


I had the white screen before and it somehow ended up resolving itself. Will attempt to upgrade to Neo4j Desktop 1.2.6 to see whether it resolves, but it is not good to run into this.
Thanks for any further help.

Interesting Finding: After the upgrade to Neo4j Desktop 1.2.6 I had the same problem with a white screen, essentially a blank window, when opening Bloom. Then I remembered that I solved this in the past by creating a new database, re-loading it and then launching Bloom again. I did that again. This then somehow forces Bloom to ask for the Perspective to be used, I choose the proper Perspective and I am back in business.

So something must not be quite right with Bloom when switching Perspectives. Hope this helps sort out what might cause this "blank screen" issue.

We've been investigating the search phrase issue further and noticed in the query.log file that the search phrase that causes the hangs seems to be a union of two identical matches.

It looks like the union of two queries that return the same result causes unexpected behaviour with Bloom's logic to limit the amount of unique nodes that being returned for a query (the limit can be changed in Bloom's settings panel). This is what seems to be causing the hangs you are experiencing.

We've added a ticket for this bug to our backlog to be fixed in a future release. In the meantime, if you remove the union from the search phrase we saw in the query logs that causes the hangs, the query's result should be identical but Bloom won't hang anymore.

Regarding the issue when switching perspectives: we recently released Bloom version 1.3.0 with a major update on how perspectives are being managed in Bloom. Please let us know if you are still experiencing issues when switching perspectives with Bloom's latest version. Since you're using the Bloom app in Neo4j Desktop, the update should have happened automatically.

Please let me know if you have any questions or comments and thank you for your help with identifying this bug.

Thank you for pointing this out. I was able to refactor the cypher query without the union and now it works beautifully.
Switching perspectives doesn't appear to be an issue anymore after the 1.3 upgrade.

Thanks for getting back to me on this issue.