We're using Neo4j as a read model in an event sourced system. It's job is basically to process a stream of events and build a database for query purposes.
The events are translated into cypher statements and sent to new4j using the bolt driver, in batches of 1000-5000 statements per transaction. (sequentially, single threaded)
This has been working great up to neo4j 3.5.16, with neo4j processing +/- 1000 events per second when rebuilding from scratch.
Now, we just updated to neo4j 4.2.1, and I'm seeing following strange behaviour when doing a full rebuild (i.e. starting from an empty DB and processing all events, approx 3 million in total)
- after about 40000 events, processing starts to slow down
- after about 100000 events, processing is about 5x slower than it should be
It basically keeps going at this rate from then on.
Now the strange part:
- when I pause the event stream for about 10-15 seconds, then resume, it picks up at full speed again and doesn't slow down anymore. To be clear: this is without restarting neo4j or the client, just a dumb "sleep 10000".
It's looks like neo4j isn't able to keep up somehow, but after waiting a while, all is back to normal.
What could be causing this?
Things I tried:
- increasing/decreasing heap space/page cache settings
- switching to the http driver
- community/enterprise edition
The logs do not contain any related information.
- Mac OS Big Sur
- neo4j 4.2.1 community
- SDN 5.2.6 / OGM 3.2.19
- plugins: apoc 126.96.36.199, spatial algorithms 0.2.4
- java 11.0.9
Anything in your I/O behavior?
Can you turn that into a reproducible test?
Did you test it on another system? E.g. linux / windows / older OSX ?
Anything with transaction handling that changed? Or closing of sessions/transactions?
Do you use UNWIND $array for your batches ? or sending 1k-5k statements in individually?
I deployed this on our debian servers last week (AWS), and the behaviour is exactly the same. So it doesn't seem related to anything environment-specific.
I don't use UNWIND to combine statements as each event stands on its own, there is no guarantee that consecutive events are similar. It's possible that UNWIND is used to translate a particular type of event, but that won't include a whole lot of data. So yes 1-5k events are sent individually.
One thing I tried however is playing with the cypher.min_replan_interval property.
The default of 10s seemed suspicious, as that's exactly the time I need to wait before things get back to normal.
Increasing that setting also seemed to increased the time needed to wait.
One theory I came up with so far:
- We create quite a lot of indices right before processing the first events.
- The planner doesn't use indices at first, as there isn't much data yet
- After a few thousand events however, that data set starts to grow and the planner should start using indices to speed up the statements.
- For some reason it doesn't, and statements start to take longer and longer to process.
- Pausing > 10s triggers replanning.
It's just a wild guess, as I don't know enough about neo4j internals to confirm it.
Could this make sense?
Yes that sounds sensible.
You should also wait for the indexes/constraints to be online with
btw. for keys it's better to use constraints than indexes.
One think you could do is to invalidate the plan cache manually with a procedure call.
There are also some prefix-settings for forcing or skipping replanning as of 4.2
Did you try
EXPLAIN on the 3.5 and 4.2 queries?
Do the resulting plots look the same?
It would be useful to know if the Query planning changed or not.
Hi @clem , given the releases of Neo4j from 3.5 to 4.2 with various improvements on index and other various parameters, I hardly doubt the explain plan will look alike.
I would follow michael.hunger advice.
My thought is that it would be good to rule out that the plan didn't get worse between the two releases.
If indeed the plan got worse, then it would be good to investigate.
Not likely, but possible.
@michael.hunger I did try out your suggestions:
- Waiting for indices using db.awaitIndexes() doesn't make a difference
- Calling "db.clearQueryCaches()" however when things start to slow down seems to solve the issue. Immediately after the call, things speed up again.
So it looks like replanning doesn't work as it should ... ?
Probably best then to raise this as an issue with the core database (cypher planner)
can you copy this thread in to a GitHub issue here
I'm currently manually calling
db.clearQueryCaches() more regularly than I'd like during a 3 hour batched insert job but at least it mostly works. Each time I call it it fixes the issue for a shorter time so occasionally I have restart the database container and pray the job doesn't fall over.
I'm only inserting nodes with 10 or so properties, and no edges during this pass, so I'm not doing anything complex, unless the 1st party abstraction layers are doing something strange.
For anyone else who's interested, here's the GitHub issue that was raised by Christian but closed by the dev team: performance issue after upgrade 3.5 => 4.2 · Issue #12721 · neo4j/neo4j · GitHub
We're still using a workaround by pausing the job for a few seconds or calling clearQueryCaches(). As a rebuild is pretty rare, it's not that big of an issue for us.
I didn't get to making a reproducible scenario yet as requested in the github issue. We have so many parameters that could lead to this behaviour that it would probably be difficult to isolate and reproduce.
And hoping that the issue will eventually disappear in a future release...