How does one optimize the concurrency (and batch_size) parameters with apoc.periodic.iterate()? Is there some rule of thumb for parallelizing with respect to the number of cores on my machine or available RAM?
Any help would be greatly appreciated!
This is my opinion, and my opinion only, but using apoc.periodic.iterate is far from easy.
To the best of my knowledge there is no rule of thumb to set the batch size, the reason is that it depends a lot on what you do in your query.
The size of the batch size has a great impact on how fast your query run and you will want to have it as big as possible but if it's too big your database will crash (I crashed mine many times because of this).
There is no easy way, as far as I know, to timeout this function and this can be a real problem (imagine the query has been running for an hour and you have no idea how close it is to the end, what will you do ? Stop it or keep it going ?)
I realized that handling the batch myself, when possble, was far more efficient (I use MATCH with LIMIT).
If you really want to stick with apoc.periodic.iterate, my advices would be :
- don't use MERGE in your request.
- don't use CREATE in your request.
- don't be greedy on your batch size.
I have no idea why but I realized that things went smoother when I was just MATCHing and SETing in my query.
Hope that helps. If you find a way to set the batch size properly with a rule of thumb, please share it, I know other people in the same situation.
EDIT : just so you know, i was using periodic.iterate when matching aruond one million nodes or a few millions links