Hello.
Currently, I have created about 1,000,000 nodes using the "ciations file.csv" file, and each node has its property values configured as {lenid, title, year, journal_name}. When I created the node, I used it as below.
LOAD CSV FROM 'file:///citations%20file.csv' AS ci
CREATE (jj:Journal{lensid:ci[0], title:ci[1], year:ci[2], journal_name:ci[3]})
Since then, for connections between nodes, we are trying to create a relationship using a unique identification value called lensid.
For matching identification values, a separate "reference file.csv" file consists of one lensid and reference per line. The lensid has an average of about 30 references per one. The "reference file.csv" contains approximately 30,000,000 (lensid, reference) pairs. Below is what I wrote to create a relationship. ref[0] is the lensid, and ref[1] is the identifier reference belonging to the lensid.
LOAD CSV FROM 'file:///reference%20file.csv' AS ref
WITH ref
MATCH (j:Journal{lensid:ref[1]})
WITH j, ref
MATCH(j2:Journal{lensid:ref[0]})
CREATE (j)-[:referenced]->(j2)
The question is this.
I have performed the work using the code written above, but it is still in progress for 3 days.
However, it seems that the generated code takes a lot of time to match.
Could somebody give me some advice on how to make the calculation for the above relationship simpler?
I don't know if it's necessary, but the hardware information and RAM allocation are as follows.
CPU : intel i7-9750h
RAM : 32GB
dbms.memory.heap.initial_siz e=16G
dbms.memory.heap.max_size=16G
dbms.memory.pagecache.size=10G
Thank you for reading.