How to make this load csv go faster?

skmami · May 13, 2020, 9:31pm

Greetings,

I have the following query and my file contains about 30 million records. Is there a way to make this run faster ?

It has been running for well over 40 minutes and still running.

CALL apoc.periodic.iterate('
LOAD CSV WITH HEADERS FROM
"file:///home/pdss01/satish/intl/part-00004-4df038b2-3d1d-47ba-ad35-f391e09d7306-c000.csv" AS line return line.FAREID as fareid, toInteger(line.TARIFF_NBR) as tariff ','
match (f:Fare {ID: fareid})
match (ft:FareTariff {name: tariff})
CREATE (f)-[fft:fare_to_faretariff]->(ft)
',{batchSize:1, iterateList:true, parallel:true})

There are fewer tariff numbers than fares. I am afraid that I might get dead lock errors if I use a bigger batchSize.

Thanks

koji · May 14, 2020, 1:52am

Hi,

I think it would be faster if you created an index before CALL apoc.periodic.iterate.

for 4.x

CREATE INDEX id FOR (n:Fare) ON (n.ID);
CREATE INDEX name FOR (n:FareTariff) ON (n.name);

for 3.x

CREATE INDEX ON :Fare(ID);
CREATE INDEX ON :FareTariff(name);

skmami · May 14, 2020, 2:23am

Thanks @koji. I have created constraints on both nodes. Wouldn't that be enough ? I thought constraints created an index.

CREATE CONSTRAINT ON (f:FareTariff) ASSERT f.name IS UNIQUE;
CREATE CONSTRAINT ON (f:FareBasis) ASSERT f.name IS UNIQUE;

Also I verified with call db.constraints(); that my constraints are created properly:

"constraint_9fff29c0"	"CONSTRAINT ON ( faretariff:FareTariff ) ASSERT (faretariff.name) IS UNIQUE"

"constraint_f599caff"	"CONSTRAINT ON ( fare:Fare ) ASSERT (fare.ID) IS UNIQUE"

is there any other way to check what is causing this to go so slow. ?

Thanks again for your help.

intouch_vivek · May 14, 2020, 8:53am

Hi Satish,

Avoid to have parallel:true for complex executions

Also why you have mentioned batchsize as 1, it's value should be based on data size you are trying to process at a time. Default value is 10000.
https://neo4j.com/docs/labs/apoc/current/graph-updates/periodic-execution/

Could you please try below
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM
"file:///home/pdss01/satish/intl/part-00004-4df038b2-3d1d-47ba-ad35-f391e09d7306-c000.csv" AS line
match (f:Fare {ID: line.FAREID})
match (ft:FareTariff {name: toInteger(line.TARIFF_NBR)})
CREATE (f)-[fft:fare_to_faretariff]->(ft)

koji · May 14, 2020, 10:57pm

It's enough.
CONSTRAINT ON creates these index.

skmami · May 15, 2020, 3:17am

For some reason it is very very slow. I am now looking into import tool. Hopefully that works.

Topic		Replies	Views
APOC Function apoc.load.csv() Slowing Down After Importing ~20 Million Nodes Procedures & APOC	10	388	July 6, 2021
Apoc.load.csv fail (there's no apoc called apoc.load.csv) Neo4j Graph Platform migrated	3	286	January 11, 2023
How to speed up apoc json load Procedures & APOC apoc , performance , import	9	688	October 18, 2021
Why parallel:true can't be used in apoc.load.csv? Neo4j Graph Platform migrated	1	134	November 15, 2022
Using Nested `LOAD CSV` with `apoc.periodic.iterate` Procedures & APOC	5	4090	August 12, 2019

How to make this load csv go faster?

Related Topics