I filled a database with nodes (CREATE) through a python script using neo4j.GraphDatabase.
Every 1000 entries I finished the session and transaction and began a new one.
This worked fine for a continuous set of ~30.000 nodes.
Then I tried it on a much larger dataset.
After ~1.5 million nodes the python script did not go forward anymore. Checking the task manager showed Java running with 100% cpu.
As of my understanding this should not happen(?)
My code looks about like this:
driver = GraphDatabase.driver(uri, auth=(user, password))
session = driver.session()
transaction = session.begin_transaction()
transaction.run("CREATE ( ... )") # 1000 times
Is this a memory issue within Neo4j?
What is your heap configuration.
You should not create a new session 3000 times.
Just create a single session and use 3000 transactions.
Batching also helps, if you are able. Doing an UNWIND of a batch of data (such as 10k or so at a time) and processing the entire batch per transaction rather than a single create per transaction will be more efficient.
agreed with Dave on the config, please share what you have configured for your server.
if would be good if you used parameters, e.g. a list of dicts for your data
and then use
UNWIND $params AS row
CREATE (n:Something) n += row
see also 5 Tips & Tricks for Fast Batched Updates of Graph Structures with Neo4j and Cypher | by Michael Hunger | Neo4j Developer Blog | Medium
Thank you for your quick feedback.
I did not set any parameters. I used a fresh installation, created a database and started inserting.
I will try to use only a single session and see, if things go better.
I will also try using UNWIND. Thank you for the hint and the link!
(1) Creating only one session with several transaction let me insert all data into the database.
(2) UNWIND: This speeded up the insertion process by the factor of 5. Very nice hint.