Query reading multiple types of nodes from single file

JaHo · September 21, 2020, 1:57pm

Hi everyone,

I'm struggling to efficiently import multiple types of nodes from a single CSV file.

The setup is as follows:

I have a file containing (unique) patent application IDs. The file also holds the associated patent family ID for each application (not unique, multiple patents can be assigned to the same family). I want to create applications and families as distinctly labeled nodes and record the affiliation relation between them. Currently I do so as follows:

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
CREATE (app:Application {appln_id: toInteger(row.appln_id)})
MERGE (fam:Family {family_id: toInteger(row.family_id)})
CREATE (app)-[:BELONGS_TO]->(fam)
SET app.filing_date = date(row.appln_filing_date),
    app.filing_year = toInteger(row.appln_filing_year),
    app.granted = row.granted

Because the family IDs are not unique in the file (and there also exists no file with unique family IDs), I have to use the MERGE clause (without an index on family IDs). Running this query seems prohibitively expensive even for a small sample of the final dataset, which has many millions of applications.

I'd appreciate any help in making this feasible!

PS: I hope I'm not overlooking similar questions that have been answered before, im still very new to Neo4j and don't know yet how to effectively search for stuff.

bratanic_tomaz · September 21, 2020, 2:00pm

You can still use the index even though the values might not be unique.
The syntax is:

CREATE INDEX index_name FOR (n:Person)
ON (n.surname)

JaHo · September 21, 2020, 2:24pm

Thanks for the tip!
I tried splitting the import into two steps by first importing the family IDs as an attribute, putting an index on them and then using MERGE to create nodes for the family IDs and the relationship between the two sets of nodes:

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
CREATE (app:Application {appln_id: toInteger(row.appln_id)})
SET app.docdb_family_id = toInteger(row.docdb_family_id),
    app.filing_date = date(row.appln_filing_date),
    app.filing_year = toInteger(row.appln_filing_year),
    app.granted = row.granted;

CREATE INDEX idx_docdb_family_id FOR (a:Application) ON (a.docdb_family_id);

MATCH (app:Application)
MERGE (fam:Family {family_id: app.docdb_family_id})
MERGE (app)-[:BELONGS_TO]->(fam)

However, this seems to still take very long so I'm not sure if I'm utilizing the index correctly.

JaHo · September 22, 2020, 6:53am

Here's my solution after some fiddling, just in case another newbie like me has a similar problem :

CREATE CONSTRAINT ON (a:Application) ASSERT a.appln_id IS UNIQUE;
CREATE CONSTRAINT ON (f:Family) ASSERT f.family_id IS UNIQUE;

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
CREATE (app:Application {appln_id: toInteger(row.appln_id)})
SET app.filing_date = date(row.appln_filing_date),
    app.filing_year = toInteger(row.appln_filing_year),
    app.granted = row.granted;

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
MERGE (fam:Family {family_id: toInteger(row.docdb_family_id)});

Topic		Replies	Views
Loading multiple CSV files in Neo4j with each row of each CSV being a node Import / Export load-csv	7	5389	February 24, 2022
Import relationships using a csv file Cypher cypher , import	26	518	April 4, 2022
Neo4j-admin import: Creating multiple nodes per line of CSV data Import / Export import	1	203	July 15, 2021
Load data from CSV and connect nodes to a single parent node General migrated	1	126	June 27, 2022
Importing relationships from multiple csv file Import / Export performance , load-csv	12	2985	June 5, 2020

Query reading multiple types of nodes from single file

Related Topics