How to import CSV file and link entire contents to a previously existing node

I have a master CSV file that I have imported. I would like to import another CSV file (formatted exactly like the master) and, upon import, link each and every entry in the secondary CSV to an existing node created by the master. What's the best way to accomplish this?

We probably need more detail in your question, but let me see if I can help.
Let's say you performed an initial insert of the data in master.csv and created a node for each row, something like this:

USING PERIODIC COMMIT 100
LOAD CSV WITH HEADERS FROM 'file:///master.csv' AS row
CREATE (a:LabelA)
SET a.id = row.id,
    a.propA = row.propA
    etc,etc,etc;

Now you want to load data from secondary.csv and want to create a node for each row, but also relate it so some node that you created from master.csv. You'd want to do something like this:

USING PERIODIC COMMIT 100
LOAD CSV WITH HEADERS FROM 'file:///secondary.csv' AS row
MATCH (a:LabelA {id: row.master_id})
CREATE (b:LabelB)<-[:SOME_REL]-(a)
SET b.id = row.id,
    b.propB = row.propB,
    etc,etc,etc;

This is making a lot of assumptions about what you are attempting to do and that you have some field in the secondary.csv file that can link you to an individual node that was previously created.

For best results, on any significant sized data you'll want to be sure to have an unique constraint or index available on some the matching propery of the nodes created is master.csv before you run the secondary load.

Here's what I've used to import (I'm just using some partial data for the SET since I thought that would simplify getting started and getting the hang of it):

// Import ALL Matches
LOAD CSV WITH HEADERS FROM 'file:///Matches.csv' AS row
WITH row.fullName AS fullName, row.fName AS fName, row.mName AS mName, row.lName AS lName, row.mDate AS mDate,
row.relateRange AS relateRange, row.SuggRelate AS suggRelate, toInteger(row.sharedcM) AS sharedcM, toInteger(row.longestBlock) AS longestBlock,
row.linkedRelate AS linkedRelate, row.email AS email, row.ancestralSurnames AS ancestralSurnames,
row.yHaplo AS yHaplo, row.mHaplo AS mHaplo, row.matchingBucket AS matchingBucket
MERGE (r:Relative {fullName: fullName})
  SET r.sharedcM = sharedcM, r.longestBlock = longestBlock
RETURN count(r)

All of the CSV files are in the same format since that is all I can pull from the DNA matching site I use:

fullName,fName,mName,lName,mDate,relateRange,SuggRelate,sharedcM,longestBlock,linkedRelate,email,ancestralSurnames,yHaplo,mHaplo,notes,matchingBucket
John Doe,John,,Doe,1/01/19,1st Cousin - 3rd Cousin,2nd Cousin,358,81,1st Cousin 1R,email@someplace.com,,,,,N/A

I start by pulling a CSV file of ALL of my DNA matches. Then, I select one person I'm related to and filter on the website to see just the relatives I share with the selected individual. I can then export that filtered result as a CSV. It'll take time to pull all of them (2,000+), so I'll probably be selective at first to see how things go.

So, with the master imported, I would then want to load the next CSV containing a list of people John Doe and I share a relationship to and link all of those people with a relationship to John Doe.

I think I see. If I understand correctly...
You've already loaded the Matches.csv as described above.
Then you are going to pick a Relative from that file (i.e John Doe) and then get a file in the same format of the Relatives you have in common. Then you want to relate John Doe to each Relative in the file.

How about something like this:

LOAD CSV WITH HEADERS FROM 'file:///john-doe-mutual-relatives.csv' AS row
WITH row.FULLNAME AS name
MATCH (startPerson:Relative {fullName: 'John Doe'}), (b:Relative {fullName: row.fullName})
MERGE (startPerson)-[:ALSO_RELATED_TO]->(b)

You probably need to make sure you have a unique constraint on the fullName property of the Relative label

I was getting an error:

Variable row not defined (line 4, column 90 (offset: 217))
"MATCH (startPerson:Relative {fullName: 'John Doe'}), (b:Relative {fullName: row.fullName})"

So, I was tinkering around trying to figure out what was going on. I noticed the

WITH row.FULLNAME AS name

line and didn't see "name" used anywhere. As a test, I commented it out and it ran and connected everything perfectly (as far as I can see). I even ran it again with another CSV list of relatives.

Do you know why it worked when I commented out that line?

//Import relationships TEST v1
LOAD CSV WITH HEADERS FROM 'file:///john-doe-mutual-relatives.csv' AS row
//WITH row.FULLNAME AS name
MATCH (startPerson:Relative {fullName: 'John Doe'}), (b:Relative {fullName: row.fullName})
MERGE (startPerson)-[:ALSO_RELATED_TO]-(b)

Yeah sorry, that was my mistake.
When using that WITH clause, the scope of the row variable is lost.

Glad you got it working.

I'm running into issues with new imports. I got the first two imported and working correctly but now I just get a "(no changes, no records)" message when trying to import.

Forget that! I had a formatting issue with my CSV!

Everything's been going well with your help! Thank you so much!

1 Like