Memory usage impact of `--skip-duplicate-nodes=true`

Detail question on batch importer. If you use --skip-duplicate-nodes=true I assume there are some in-memory structures being built up to check if a given node is already there. Do those data structures reside in JVM heap, in pagecache or in off-heap separate from pagecache?

Hi Stefan, great to see you.

Yes there are in-memory datastructures for the de-duplication, the Kernel team said this.

A: Sorry this one took a while to track down
so from what I am reading yes, potentially all.So it will try multiple places depending on if OutOfMemoryError or NativeMemoryAllocationRefusedError is thrown when trying to allocate the cache it uses.First it will try the off-heap.
If that fails and if --cacheOnHeap (default: false ) is true , it will then try the heap.
If that fails it will try the page-cache
Finally if that fails, it will just throw.So depending on the configuration, it could be on any of them.

Q: Does it do that too for the de-duplication datastructures?

A: I believe so; I think they use the same factory to create their respective caches

1 Like