Neo4j merge performance optimization

Hi

Does anyone have advise on the optimal way to run many merge statements. Eg. I have thousands of records that I want to write to Neo4j (insert or update) continuously throughout the day.

I'm following the tips in this post but was also wondering if there were further optimizations since the post is quite old.

Regards
Geoff

That guidance you're using is still pretty good. Rather than recap all of that, I think what would be more helpful is posting your MERGE code right now, and what your issue with it is, along with which indexes you have, and maybe a data sample. This would enable more targeted advice people can offer.

Thanks @david_allen

here is the merge code (dummy object names and data)

with [
{cid:'234242',prop1:'999', state:'ACTIVE', cycle:'3', val1:'999', date1:'2020-03-23T23:42:52', date2:'2020-02-25T10:01:32' , date3:'2020-02-25T11:01:39', val2:'+462462456546', val3:'3.534123128943' , val4:'678.437955'}
,{cid:'43534662',prop1:'999' , state:'ACTIVE' , cycle:'8' , val1:'999' , date1:'2020-03-23T23:42:53' , date2:'2020-01-17T14:34:43' , date3:'2020-01-17T15:34:48' ,val2:'+2354246', val3:'12.234' , val4:'9.3675096'} 
,{cid:'245235432',prop1:'999' , state:'ACTIVE' , cycle:'5' , val1:'999' , date1:'2020-03-23T23:43:11' , date2:'2020-01-25T00:13:38' , date3:'2020-01-25T01:13:45' ,val2:'+4554373657',val3:'23.123123' , val4:'9.3719427'}
,{cid:'56375637657',prop1:'999' , state:'ACTIVE' , cycle:'4' , val1:'999' , date1:'2020-03-23T23:43:11' , date2:'2020-01-25T05:09:04' , date3:'2020-01-25T06:09:10' ,val2:'+34565437', val3:'12.345345' , val4:'9.9074033'} 
,{cid:'3756756',prop1:'999' , state:'ACTIVE' , cycle:'5' , val1:'999' , date1:'2020-03-23T23:43:15' , date2:'2020-01-22T06:48:27' , date3:'2020-01-22T07:48:32' ,val2:'+345654367', val3:'56.0721976' , val4:'9.4616742'}
,{cid:'67867867',prop1:'999' , state:'ACTIVE' , cycle:'2' , val1:'999.99' , date1:'2020-03-23T23:43:16' , date2:'2019-12-16T16:43:10' , date3:'2019-12-16T17:43:13' ,val2:'+34576578',val3:'456.9676098' , val4:'435.570213' } 
,{cid:'35676575467',prop1:'999' , state:'ACTIVE' , cycle:'10' , val1:'999' , date1:'2020-03-23T23:43:17' , date2:'2020-01-25T01:16:26' , date3:'2020-01-25T02:16:32' ,val2:'+56785678', val3:'567.3400092' , val4:'6.8716568' } 
,{cid:'356735675',prop1:'999' , state:'ACTIVE' , cycle:'2' , val1:'999' , date1:'2020-03-23T23:43:17' , date2:'2020-01-25T01:22:32' , date3:'2020-01-25T02:22:38' ,val2:'+35675367', val3:'123.4146345' , val4:'56.9644088' } 
,{cid:'45867648476',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T22:14:35' , date3:'2020-02-23T23:14:42' ,val2:'+6798987', val3:'75.311285798443771' , val4:'34.623363014315835' } 
,{cid:'35673567567',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T23:02:20' , date3:'2020-02-24T00:02:28' ,val2:'+678068998', val3:'9.311508576432221' , val4:'64.62342035774968' } 
,{cid:'3567653765',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T23:31:53' , date3:'2020-02-24T00:32:00' ,val2:'+45676487648', val3:'9.281032089744471' , val4:'76.574261019797286' }
,{cid:'467876969',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T19:05:17' , date3:'2020-02-23T20:05:31' ,val2:'+35675648', val3:'9.2663239706366' , val4:'45.627927547325965' } 
,{cid:'46785476567',prop1:'999' , state:'ACTIVE' , cycle:'1' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T19:30:39' , date3:'2020-02-23T20:30:46' ,val2:'+456435756', val3:'345.0265098', val4:'87.5634466' }
,{cid:'078908970',prop1:'999' , state:'ACTIVE' , cycle:'10' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T19:49:20' , date3:'2020-02-23T20:49:26' ,val2:'+568679578', val3:'48.6547792' , val4:'9.3496854' } 
,{cid:'3567658568',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:20' , date2:'2020-02-23T19:59:57' , date3:'2020-02-23T21:00:10' ,val2:'+467846757', val3:'867.311389147309876' , val4:'9.623492514719835' } 
,{cid:'4679879897',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:20' , date2:'2020-02-23T20:05:55' , date3:'2020-02-23T21:06:02' ,val2:'+35746584768', val3:'4.311530792642691' , val4:'9.62344690663975' } 
,{cid:'4658768765',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:20' , date2:'2020-02-23T20:42:41' , date3:'2020-02-23T21:42:53' ,val2:'+3567536345', val3:'354.311543542700233' , val4:'9.6234034662664145' } 
,{cid:'35674657648',prop1:'999' , state:'ACTIVE' , cycle:'2' , val1:'999' , date1:'2020-03-23T23:43:23' , date2:'2020-02-12T14:40:03' , date3:'2020-02-12T15:40:14' ,val2:'+345567654', val3:'59.4612873' , val4:'435.7003877' } 
,{cid:'3567567',prop1:'999' , state:'ACTIVE' , cycle:'1' , val1:'999' , date1:'2020-03-23T23:43:31' , date2:'2020-03-09T06:20:41' , date3:'2020-03-09T07:20:49' ,val2:'+098908779', val3:'34.019712' , val4:'56.9003335' } 
,{cid:'47987696',prop1:'999' , state:'ACTIVE' , cycle:'4' , val1:'999' , date1:'2020-03-23T23:43:31' , date2:'2019-12-24T23:15:21' , date3:'2019-12-25T00:15:26' ,val2:'+5678675647', val3:'35.9476002' , val4:'56.3394755' } 
] as batch
unwind batch as c
MERGE (p:node1{cid:c.cid}) 
WITH p,c SET p.prop1=c.prop1 , p.state=c.state , p.cycle=c.cycle , p.val1=c.val1, p.date1=datetime(c.date1) , p.date2=datetime(c.date2) , p.date3=datetime(c.date3) 
MERGE (m:val2{val2:c.val2}) 
MERGE (p)-[:HAS_VAL2]-> (m) 
MERGE (g:val3{ val3_val4:toString((round(toFloat(c.val3) *1000)/1000)) + '_' + toString((round(toFloat(c.val4) *1000)/1000)) , val3:(round(toFloat(c.val3) *1000)/1000) , val4:(round(toFloat(c.val4) *1000)/1000) }) 
MERGE (p)-[:HAS_VAL3]-> (g) 

I have indexes on node1[cid], val2[val2] and val3[val3_val4,val3,val4]

Currently this takes 16 seconds to run which seems very slow considering it's only 20 records

Since merges are like matches (and if no match exists, a create), having indexes in place to support quick lookup is critical. You may want to ensure you have indexes on the following:

:node1(cid)
:val2(val2)

:val3 may be tricky though, which properties are meant to uniquely identify the node? Create an index on that property (or for multiple, create a composite index).

Double-check that an EXPLAIN of your query doesn't have any NodeByLabelScans but uses NodeIndexSeeks.

2 Likes

What Andrew said! Also for val3
Also if you can turn val3 into using a simpler, single key that would be highly beneficial.

In general this looks like a RDF model not a property graph one,
why don't you store the val2 and val3properties directly on the node?

Also please use parameters if any possible,
esp. if you feed this data from a script or program

{"batch": [
{cid:'234242',prop1:'999', state:'ACTIVE', cycle:'3', val1:'999', date1:'2020-03-23T23:42:52', date2:'2020-02-25T10:01:32' , date3:'2020-02-25T11:01:39', val2:'+462462456546', val3:'3.534123128943' , val4:'678.437955'}
,{cid:'43534662',prop1:'999' , state:'ACTIVE' , cycle:'8' , val1:'999' , date1:'2020-03-23T23:42:53' , date2:'2020-01-17T14:34:43' , date3:'2020-01-17T15:34:48' ,val2:'+2354246', val3:'12.234' , val4:'9.3675096'} 
,{cid:'245235432',prop1:'999' , state:'ACTIVE' , cycle:'5' , val1:'999' , date1:'2020-03-23T23:43:11' , date2:'2020-01-25T00:13:38' , date3:'2020-01-25T01:13:45' ,val2:'+4554373657',val3:'23.123123' , val4:'9.3719427'}
,{cid:'56375637657',prop1:'999' , state:'ACTIVE' , cycle:'4' , val1:'999' , date1:'2020-03-23T23:43:11' , date2:'2020-01-25T05:09:04' , date3:'2020-01-25T06:09:10' ,val2:'+34565437', val3:'12.345345' , val4:'9.9074033'} 
,{cid:'3756756',prop1:'999' , state:'ACTIVE' , cycle:'5' , val1:'999' , date1:'2020-03-23T23:43:15' , date2:'2020-01-22T06:48:27' , date3:'2020-01-22T07:48:32' ,val2:'+345654367', val3:'56.0721976' , val4:'9.4616742'}
,{cid:'67867867',prop1:'999' , state:'ACTIVE' , cycle:'2' , val1:'999.99' , date1:'2020-03-23T23:43:16' , date2:'2019-12-16T16:43:10' , date3:'2019-12-16T17:43:13' ,val2:'+34576578',val3:'456.9676098' , val4:'435.570213' } 
,{cid:'35676575467',prop1:'999' , state:'ACTIVE' , cycle:'10' , val1:'999' , date1:'2020-03-23T23:43:17' , date2:'2020-01-25T01:16:26' , date3:'2020-01-25T02:16:32' ,val2:'+56785678', val3:'567.3400092' , val4:'6.8716568' } 
,{cid:'356735675',prop1:'999' , state:'ACTIVE' , cycle:'2' , val1:'999' , date1:'2020-03-23T23:43:17' , date2:'2020-01-25T01:22:32' , date3:'2020-01-25T02:22:38' ,val2:'+35675367', val3:'123.4146345' , val4:'56.9644088' } 
,{cid:'45867648476',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T22:14:35' , date3:'2020-02-23T23:14:42' ,val2:'+6798987', val3:'75.311285798443771' , val4:'34.623363014315835' } 
,{cid:'35673567567',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T23:02:20' , date3:'2020-02-24T00:02:28' ,val2:'+678068998', val3:'9.311508576432221' , val4:'64.62342035774968' } 
,{cid:'3567653765',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T23:31:53' , date3:'2020-02-24T00:32:00' ,val2:'+45676487648', val3:'9.281032089744471' , val4:'76.574261019797286' }
,{cid:'467876969',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T19:05:17' , date3:'2020-02-23T20:05:31' ,val2:'+35675648', val3:'9.2663239706366' , val4:'45.627927547325965' } 
,{cid:'46785476567',prop1:'999' , state:'ACTIVE' , cycle:'1' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T19:30:39' , date3:'2020-02-23T20:30:46' ,val2:'+456435756', val3:'345.0265098', val4:'87.5634466' }
,{cid:'078908970',prop1:'999' , state:'ACTIVE' , cycle:'10' , val1:'999' , date1:'2020-03-23T23:43:19' , date2:'2020-02-23T19:49:20' , date3:'2020-02-23T20:49:26' ,val2:'+568679578', val3:'48.6547792' , val4:'9.3496854' } 
,{cid:'3567658568',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:20' , date2:'2020-02-23T19:59:57' , date3:'2020-02-23T21:00:10' ,val2:'+467846757', val3:'867.311389147309876' , val4:'9.623492514719835' } 
,{cid:'4679879897',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:20' , date2:'2020-02-23T20:05:55' , date3:'2020-02-23T21:06:02' ,val2:'+35746584768', val3:'4.311530792642691' , val4:'9.62344690663975' } 
,{cid:'4658768765',prop1:'999' , state:'ACTIVE' , cycle:'0' , val1:'999' , date1:'2020-03-23T23:43:20' , date2:'2020-02-23T20:42:41' , date3:'2020-02-23T21:42:53' ,val2:'+3567536345', val3:'354.311543542700233' , val4:'9.6234034662664145' } 
,{cid:'35674657648',prop1:'999' , state:'ACTIVE' , cycle:'2' , val1:'999' , date1:'2020-03-23T23:43:23' , date2:'2020-02-12T14:40:03' , date3:'2020-02-12T15:40:14' ,val2:'+345567654', val3:'59.4612873' , val4:'435.7003877' } 
,{cid:'3567567',prop1:'999' , state:'ACTIVE' , cycle:'1' , val1:'999' , date1:'2020-03-23T23:43:31' , date2:'2020-03-09T06:20:41' , date3:'2020-03-09T07:20:49' ,val2:'+098908779', val3:'34.019712' , val4:'56.9003335' } 
,{cid:'47987696',prop1:'999' , state:'ACTIVE' , cycle:'4' , val1:'999' , date1:'2020-03-23T23:43:31' , date2:'2019-12-24T23:15:21' , date3:'2019-12-25T00:15:26' ,val2:'+5678675647', val3:'35.9476002' , val4:'56.3394755' } 
]}

then send the param along with your statement in any of the APIs
and change the statement to, so the cypher parser + query planner have to parse it only once and not a huge statement every time

unwind $batch as c
MERGE (p:node1{cid:c.cid}) 
WITH p,c SET p.prop1=c.prop1 , p.state=c.state , p.cycle=c.cycle , p.val1=c.val1, p.date1=datetime(c.date1) , p.date2=datetime(c.date2) , p.date3=datetime(c.date3) 
MERGE (m:val2{val2:c.val2}) 
MERGE (p)-[:HAS_VAL2]-> (m) 
MERGE (g:val3{ val3_val4:toString((round(toFloat(c.val3) *1000)/1000)) + '_' + toString((round(toFloat(c.val4) *1000)/1000)) , val3:(round(toFloat(c.val3) *1000)/1000) , val4:(round(toFloat(c.val4) *1000)/1000) }) 
MERGE (p)-[:HAS_VAL3]-> (g)