Memory issue in Causal Cluster

volodymyr.metlyakov · September 21, 2021, 9:22am

Hello team,

we've run into a weird problem with memory consumption in our cluster setup.
Important note: all described problem is only specific to the cluster, when we run all same setup but with a single node (more hefty, though), the problem does not manifest itself.

Environment info:

neo4j enterprise 4.2
cluster consists of 5 members: 3 cores, 2 read replicas;
each member is 16Gb/4 cores instance Azure VM;
load intencity is very low, averaging at a few per sec and maxing (not often, not related to the problem) at a few dozens per second, mostly read queries;
cpu averge is 8-9%, with peaks at 25%;
the graph size is around 400 000 nodes and 2 000 000 edges;

The problem is that the memory on each member of the cluster steadily grows, until the system oom killer terminates the jvm.
The growing speed is on each node different, but the pattern is always the same.

Here is the memory consumption pattern:

All our queries are profiled/optimized and parametrized so eligible for planning caching. Playing with query plan cache parameters didn't do any good.

Now I tried to take a jvm dump from one of the members that was maxing out on memory, and analyze it using memory analyzing tool. Don't know if it is of any help, but here is some info:

Any hints in which direction to look, are greatly appreciated. We really want to go on with the cluster setup.

michael.hunger · September 28, 2021, 11:31pm

Hi this sounds either like a bug, so please open a support ticket or Issues · neo4j/neo4j · GitHub issue.

The first memory dump points more to the stats/query collector running, I think you can disable that.
e.g. with call db.stats.stop() but it shouldn't be running by default, not sure if there's a setting

db.stats.clear()
db.stats.collect()
db.stats.retrieve()
db.stats.retrieveAllAnonymized()
db.stats.status()
db.stats.stop()

As a workaround you can also disable the PIPELINED runtime by setting the default to SLOTTED

unsupported.cypher.runtime=SLOTTED

michael.hunger · September 29, 2021, 10:54am

Seems there was a bug introduced in a version.

https://github.com/neo-technology/neo4j/pull/10720

The team recommends if you can to upgrade to 4.3.4
I still try to figure out if the fix also went into the last version of 4.2

volodymyr.metlyakov · September 29, 2021, 11:09am

Hi,

thanks for the resolution,
will try and get back with the results.

azocankara · January 7, 2022, 2:38pm

Hi, the github link does not work anymore, do you have another link ? I can't find anything related to memory problems and pipelined runtime in the changelogs. Do you know in which version of 4.2 the fix went ?

azocankara · January 11, 2022, 5:00pm

I got an answer from the support team : the issue is fixed in 4.2.12 but does not appear in the change log with the expected title.

Topic		Replies	Views
Memory Issue with Neo4j as Memory usage spikes to 98 % Neo4j Graph Platform performance	1	389	January 9, 2021
Neo4j memory issue Neo4j Graph Platform	2	161	May 18, 2022
Experiencing GC pause and high CPU Cypher	2	255	May 11, 2022
Neo4j Performance issue Neo4j Graph Platform migrated	1	171	September 29, 2022
Memory requirements seems to be dependent of the number of nodes Neo4j Graph Platform migrated	5	65	September 12, 2022

Memory issue in Causal Cluster

Related Topics