Load-CSV very slow with millions of nodes

benjamin.squire · June 1, 2019, 7:03pm

Two things:

1.) Using Neo4j admin import seen here Import - Operations Manual
We load 6 Billions of nodes and 10 Billion rels in 2-3 hours. You want to break the files down into distinct nodes and distinct pairs of nodes. if your numerical id's are unique you can maybe use the flag "id-type =integer" which will give you a boost on memory used.

2.)If you have to use somekind of load csv, Load in Parallel. Consider apoc.periodic.iterate, seen here Neo4j APOC Procedures User Guide .Relationships cannot load in parallel unless you are sure that the way the file is built that no two cpus will try and grab the same nodes, so it looks like you may be stuck with slow loading, if you can be smart about sorting the data such that for each batch, you never have a race condition to the same nodes I think it could be parallelized, i.e. if you have a batch size of 1000 and you have 8 cpu's then the first 1-1000 can have the same nodes, but the next 2000-6000 must have different nodes, It might be tough but could give lift if the sort was figured out. Also I bet neo4j would appreciate it xD! I would suggest maybe increasing the the commit size to 100k, 5k is small, you have 30 GB RAM to use when you load relationships, you might be able to get away with higher, like 300k per batch.

Topic		Replies	Views
Fastest way to load data in neo4j using python Cypher	5	6085	May 5, 2021
Load large CSV with LOAD CSV or python Neo4j Graph Platform migrated	2	595	August 4, 2023
Performance issue when importing CSV relationships Import / Export performance , import , csv , index	2	2019	January 28, 2019
Help me merge 170M relationships with LOAD CSV Cypher load-csv	10	3514	October 23, 2019
What is the most efficient and fast way to load very large volumes of data into a Neo4j graph database? Import / Export apoc , cypher , import	2	344	August 19, 2021

Load-CSV very slow with millions of nodes

Related Topics