We are trying to do a one time bulk data load as part of our go-live.
We are using Databricks to transform data from source DB to Neo4j format with a batch size of 20k.
Initially we tried with a cluster wherein there were 3 core nodes and 3 read replicas.
The load failed after a certain point as a result of which we moved our cluster to a single core node instance but still it took ~9 hours to load about 240 million relationships. Our JVM is 31g each and its running on a 16core box.
Question: What are some recommendations to expedite data load? Thanks in advance.