I try to load a large (4.5GB) json file into neo4j. This file is in jsonl format, meaning each json object is on its own line. There are about 5.3 million entries.
I read about the apoc.load..() functions but have a few questions:
Do I have to take care of periodic commits?
Can I split the file via apoc.load on the line endings?
Thanks in advance.
from my understanding if the json file is essentially a list on top level (and not a map), it is streamed, see neo4j-apoc-procedures/LoadJson.java at 3.5 · neo4j-contrib/neo4j-apoc-procedures · GitHub.
There is no periodic commit by default, but you can easily do that (untested code below, take care);
"call apoc.load.json(....) yield value return value",
" create (p:Person) set p = $value // placeholder for your create/merge... statement that operates on every json list elemt - aka every value",
my problem is that the file is not proper json as a whole, but each line represents a json object. I will try some command line magic to torn this into an json array.
Good to know that are periodic commits.
Thanks again, import of the german handelsregister is running now. Will take some time as it is over 4GB of json with over 5.000.000 company entries.
Or not. Import ist OOM me. Not enough heap space, even though I increased it to 8 GB (dbms.memory.heap.max_size) already.
Looks like apoc.load.json($url) is not streaming and tries to load the file upfront.
Just to close this thread, I finally managed to conclude the import and wrote a bit about it: Importing corporate data into Neo4j • Bert Radke
Thanks for the help.