Server restarts in a middle of a query for no apperant reason

  • neo4j server version 3.5.1
  • using cypher in cypher shell locally on the database server machine

I'm running the following apoc query:

call apoc.periodic.iterate("match (pr:PolicyReport) where not (pr)-[:FIRST|LAST|NEXT|HAS_REPORT]-() with pr match (pr)-[*]-(a) return pr,a","detach delete pr,a",{batchSize:1000}) yield total return total;

Each PolicyReport is a top of a tree of about 20-30 nodes. Before running this query I ran queries to disconnect PolicyReport from the rest of the graph and thus I only have these 20-30 nodes connected per Policy and I need them all deleted.

After running for about 30 minutes or so the server simply restarts for no apparent reason. There is nothing in the neo4j.log and debug.log just restart info. After starting it does recovery before being available again.

  • There are about 8,500,000 PolicyReport nodes so this script is supposed to delete about 255,000,000 nodes.

  • this machine is a virtual machine that has 256GB of memory, 12 cpu. Total size of DB on disk is 370GB but I don't think its a memory problem I don't see an out of memory exception anywhere.

  • while running the script the rest of the system is down, there are no other users connected and no processes doing anything.

I'm looking for suggesting to how to investigate or correct this.

Thanks.

Could you upgrade to a newer version of Neo4j? It's on 4.3.3 now.

cant upgrade need to do a lot of work on query parameters first.

There has been a development. My initial assessment of the problem is wrong. The restart has nothing to do with the query. I've left the machine on and without anything happening, no query running or anything the server restarts into recovery mode, always recovering from the same transaction log point.

After the recovery ends, even though there is nothing connected to the db and nothing running I see many:

Detected VM stop-the-world pause: {pauseTime=280, gcTime=0, gcCount=0}

such a log entry every few seconds.
After about 20-30 minutes of this the server restarts.

I have 265 transaction log files that were created since I started the delete at 15:00 until 22:00. Since than I have not ran any other query on the db but still experiencing the restarts.

So. I now know what is going on.
I looked in the journalctl log on the linux box. It turns out that the OS determined it does not have enough memory and than killed the java process. Immediately after it restarted it, so this is why I see a restart.
So, if I understand correctly the OS needs more memory. My current memory configuration is this:
32GB heap memory, 220GB off heap memory, 4GB left for OS.
the recommendation from "memrec" is to use 31GB heap memory, 205GB off heap leaving about 20GB to the OS.
My database is 426GB in size. I need to delete about a quarter of it. This is what I'm currently failing to do.
So I'm going to go with the recommended sizes and try again.