Query reading multiple types of nodes from single file

Hi everyone,

I'm struggling to efficiently import multiple types of nodes from a single CSV file.

The setup is as follows:

I have a file containing (unique) patent application IDs. The file also holds the associated patent family ID for each application (not unique, multiple patents can be assigned to the same family). I want to create applications and families as distinctly labeled nodes and record the affiliation relation between them. Currently I do so as follows:

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
CREATE (app:Application {appln_id: toInteger(row.appln_id)})
MERGE (fam:Family {family_id: toInteger(row.family_id)})
CREATE (app)-[:BELONGS_TO]->(fam)
SET app.filing_date = date(row.appln_filing_date),
    app.filing_year = toInteger(row.appln_filing_year),
    app.granted = row.granted

Because the family IDs are not unique in the file (and there also exists no file with unique family IDs), I have to use the MERGE clause (without an index on family IDs). Running this query seems prohibitively expensive even for a small sample of the final dataset, which has many millions of applications.

I'd appreciate any help in making this feasible!

PS: I hope I'm not overlooking similar questions that have been answered before, im still very new to Neo4j and don't know yet how to effectively search for stuff.

You can still use the index even though the values might not be unique.
The syntax is:

CREATE INDEX index_name FOR (n:Person)
ON (n.surname)

Thanks for the tip!
I tried splitting the import into two steps by first importing the family IDs as an attribute, putting an index on them and then using MERGE to create nodes for the family IDs and the relationship between the two sets of nodes:

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
CREATE (app:Application {appln_id: toInteger(row.appln_id)})
SET app.docdb_family_id = toInteger(row.docdb_family_id),
    app.filing_date = date(row.appln_filing_date),
    app.filing_year = toInteger(row.appln_filing_year),
    app.granted = row.granted;

CREATE INDEX idx_docdb_family_id FOR (a:Application) ON (a.docdb_family_id);

MATCH (app:Application)
MERGE (fam:Family {family_id: app.docdb_family_id})
MERGE (app)-[:BELONGS_TO]->(fam)

However, this seems to still take very long so I'm not sure if I'm utilizing the index correctly.

Here's my solution after some fiddling, just in case another newbie like me has a similar problem :slight_smile: :

CREATE CONSTRAINT ON (a:Application) ASSERT a.appln_id IS UNIQUE;
CREATE CONSTRAINT ON (f:Family) ASSERT f.family_id IS UNIQUE;

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
CREATE (app:Application {appln_id: toInteger(row.appln_id)})
SET app.filing_date = date(row.appln_filing_date),
    app.filing_year = toInteger(row.appln_filing_year),
    app.granted = row.granted;

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
MERGE (fam:Family {family_id: toInteger(row.docdb_family_id)});