Cluster not processing write cypher

mdfrenchman · January 16, 2020, 12:33pm

We have a cluster of 3 servers (normal config, leader is read/write, 2 followers are read).

We're having an issue of WRITE queries not completing, they're eventually returning an OOM exception.

The cypher (with labels and props obsfucated):

UNWIND ["01", "02", "03", "04", "05", "06", "07", "08", "09"] as pnum
MATCH (:Thing {thingUniqueNumber:pnum})-[r:REL_TO_DELETE]->()
RETURN pnum, r

returns in 13ms with 27 results.

Replacing the RETURN with DELETE r carries on for hours till a log message is reported with oom (I'll update with the actual message if I can).

Possibly related, one of our followers wasn't picking up the heartbeat and getting replicated transactions from the leader after being restarted earlier in the week. The cluster_state folder was cleared as the recommended solution for that.

Thanks for any help!

-Mike French

mdfrenchman · January 16, 2020, 3:27pm

A reboot of the cluster has resolved the issue. Further investigation returned this:
ERROR LEAK: ByteBuf.release() was not called before it's garbage-collected. See Netty.docs: Reference counted objects for more information.

What MAY have caused that....
There were 2 sessions, one adding labels and one removing a different label from the same subset of nodes. That may have caused lock contention.

jeremie · January 16, 2020, 5:32pm

Hello Mike,

Which version are you using.
Are you able to reproduce it?

mdfrenchman · January 16, 2020, 6:14pm

enterprise version 3.5.5 and we haven't been able to reproduce it yet.

jeremie · January 19, 2020, 6:10am

Can you try to update to the latest maintenance release available?
Did you notice some GC pause on any node of the cluster ?
you should have messages like this in in debug.log Detected VM stop-the-world
as mentionned in Fatal error occurred when handling a client connection causes crash
or
in neo4j.log : java.lang.OutOfMemoryError: Java heap space

Topic		Replies	Views
Neo4j 3.5.17 unstable waiting forever for a lock Neo4j Graph Platform	1	462	April 24, 2020
Driver hanging and eventually failing with empty read buffer .NET neo4j-driver	3	585	April 19, 2021
Memory issue in Causal Cluster Cluster	5	520	January 11, 2022
Neo4j enterprise edition memory issues Neo4j Graph Platform	0	100	May 12, 2022
Neo4j instance hang by a heavy write operation Neo4j Graph Platform performance , lock	0	356	May 7, 2020

Cluster not processing write cypher

Related Topics