CSV column with Null values

Hi,

I am working on a problem, where I have to create a node only when the value exists in the column and create no nodes otherwise for null values.

Should I use CREATE or MATCH?

My query:

LOAD CSV WITH HEADERS FROM "file:///Latest_Matrix.csv" AS line WITH line

WITH line
WHERE NOT line.Form Name Extension IS NULL and NOT line.Study Title is NULL
and NOT line.Domain Labelis NULL and NOT line.SDTM Variable Name is NULL

MERGE (a: Study {name: line.Study Title, type: 'Study'})
//CREATE (domain: Domain {name: line.Domain})
mergE (b: Domain {name: line.Domain Label, type: 'SDTM Domain', domain_identifier: line.Study Title + line.Domain Label})

Merge (e: Target_Variable {name: line.SDTM Variable Name,type: 'SDTM Domain Variable', transformation:line.Transformation Code, transformation_type:line.Transformation Type,SDTM_Domain_Label:line.SDTM Variable Label,
field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})

CREATE (d: Source_Variable {name: line.Form Variable Name,type: 'Source_Form_Variable', field_identifier: line.Study Title + line.Domain Label})

Merge (c: Source_File {name: line.Form Name,type: 'Source_Form_File', form_name_extension:line.Form Name Extension, file_identifier: line.Study Title + line.Domain Label + line.Form Name})

MERGE (e)-[q:Constitutes]->(b)

MERGE (b)-[p:Part_of]->(a)

MERGE (c)-[u:Has]->(d)

MERGE (d)-[u:transforms]->(e)

RETURN a,b,e;

I am confused on the part: CREATE (d: Source_Variable ......

Why wouldn't you just use MERGE? I find myself rarely using CREATE except on an initialization CYPHER script to create some placeholder nodes when re-building my DB cleanly (when I KNOW the node won't already exist).

MATCH would be appropriate if you aren't sure if it exists, and you want to bail out if the node doesn't already exist. Do you want the :Source_Variable node created if it doesn't already exist? If so, then go with MERGE.

Or are you concerned about "Form Variable Name" being null?

Thanks for your response Paul.

I am concerned about Form Variable Name" being null. I do not want to create a node when this value is null.

So suppose my column Form Variable Name" has values h,p,q,NULL,NULL,a , I want nodes to be created only for h,p,q,a and no nodes for NULL values.

I have two codes now:


This code does not give any output graph.

LOAD CSV WITH HEADERS FROM "file:///Latest_Matrix.csv" AS line WITH line

WITH line
WHERE NOT line.Form Name Extension IS NULL and NOT line.Study Title is NULL
and NOT line.Domain Labelis NULL and NOT line.SDTM Variable Name is NULL

MERGE (a: Study {name: line.Study Title, type: 'Study'})
//CREATE (domain: Domain {name: line.Domain})
mergE (b: Domain {name: line.Domain Label, type: 'SDTM Domain', domain_identifier: line.Study Title + line.Domain Label})

Merge (e: SDTM_Target_Variable {name: line.SDTM Variable Name,type: 'SDTM Domain Variable', transformation:line.Transformation Code, transformation_type:line.Transformation Type,SDTM_Domain_Label:line.SDTM Variable Label,
field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})

Merge (c: Source_Form_File {name: line.Form Name,type: 'Source_Form_File', form_name_extension:line.Form Name Extension, file_identifier: line.Study Title + line.Domain Label + line.Form Name})

WITH line

WHERE NOT line.Form Variable Name IS NULL

MATCH (d: Source_Form_Variable {name: line.Form Variable Name,type: 'Source_Form_Variable', field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})

MERGE (c)-[u:Has]->(d)

MERGE (e)-[q:Constitutes]->(b)

MERGE (b)-[p:Part_of]->(a)

MERGE (d)-[g:transforms]->(e)

REMOVE b.domain_identifier

REMOVE e.field_identifier

REMOVE d.domain_identifier

REMOVE c.file_identifier

RETURN a,b,e;


This code creates named "Other" for all NULL values. Is there a way I could have just one node named "Other" and point all NULL to that?

LOAD CSV WITH HEADERS FROM "file:///Latest_Matrix.csv" AS line WITH line

WITH line
WHERE NOT line.Form Name Extension IS NULL and NOT line.Study Title is NULL
and NOT line.Domain Labelis NULL and NOT line.SDTM Variable Name is NULL

MERGE (a: Study {name: line.Study Title, type: 'Study'})
//CREATE (domain: Domain {name: line.Domain})
mergE (b: Domain {name: line.Domain Label, type: 'SDTM Domain', domain_identifier: line.Study Title + line.Domain Label})

Merge (e: SDTM_Target_Variable {name: line.SDTM Variable Name,type: 'SDTM Domain Variable', transformation:line.Transformation Code, transformation_type:line.Transformation Type,SDTM_Domain_Label:line.SDTM Variable Label,
field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})

Merge (c: Source_Form_File {name: line.Form Name,type: 'Source_Form_File', form_name_extension:line.Form Name Extension, file_identifier: line.Study Title + line.Domain Label + line.Form Name})

MATCH (d: Source_Form_Variable {name: coalesce(line.Form Variable Name,"Other"),type: 'Source_Form_Variable', field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})

MERGE (c)-[u:Has]->(d)

MERGE (e)-[q:Constitutes]->(b)

MERGE (b)-[p:Part_of]->(a)

MERGE (d)-[g:transforms]->(e)

REMOVE b.domain_identifier

REMOVE e.field_identifier

REMOVE d.domain_identifier

REMOVE c.file_identifier

RETURN a,b,e;

Replace this:

WITH line

WHERE NOT line.Form Variable Name IS NULL

MATCH (d: Source_Form_Variable {name: line.Form Variable Name,type: 'Source_Form_Variable', field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})

with

FOREACH(ignoreMe IN CASE WHEN line.Form Variable Name IS NOT NULL THEN [1] ELSE END |

CREATE (d: Source_Form_Variable {name: line.Form Variable Name, type: 'Source_Form_Variable', field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})like 

)

You cannot use MERGE under FOREACH loop.

This way node, (d: Source_Form_Variable), will be created for non null values and no node will be created for null values.

-Kamal

FOREACH(ignoreMe IN CASE WHEN line.Form Variable Name IS NOT NULL THEN [1] ELSE END |

Here after ELSE two square brackets. Somehow it's appearing as a box when I post it here.

-Kamal

Hi Kamal,

I tried what you told. It gives me one null node for source variable and everything points to that. Even for the one where (d: Source_Form_Variable) is not null, it still doesn't create new nodes and points it to the empty node.

My code:

LOAD CSV WITH HEADERS FROM "file:///Latest_Matrix.csv" AS line WITH line

WITH line
WHERE NOT line.Form Name Extension IS NULL and NOT line.Study Title is NULL
and NOT line.Domain Labelis NULL and NOT line.SDTM Variable Name is NULL

MERGE (a: Study {name: line.Study Title, type: 'Study'})
//CREATE (domain: Domain {name: line.Domain})
mergE (b: Domain {name: line.Domain Label, type: 'SDTM Domain', domain_identifier: line.Study Title + line.Domain Label})

Merge (e: SDTM_Target_Variable {name: line.SDTM Variable Name,type: 'SDTM Domain Variable', transformation:line.Transformation Code, transformation_type:line.Transformation Type,SDTM_Domain_Label:line.SDTM Variable Label,
field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})

Merge (c: Source_Form_File {name: line.Form Name,type: 'Source_Form_File', form_name_extension:line.Form Name Extension, file_identifier: line.Study Title + line.Domain Label + line.Form Name})

FOREACH(ignoreMe IN CASE WHEN line.Form Variable Name IS NOT NULL THEN [1] ELSE END |

CREATE (d: Source_Form_Variable {name: line.`Form Variable Name`,type: 'Source_Form_Variable', field_identifier: line.`Study Title` + line.`Domain Label` + line.`SDTM Variable Name`})

)
MERGE (c)-[u:Has]->(d)

MERGE (e)-[q:Constitutes]->(b)

MERGE (b)-[p:Part_of]->(a)

MERGE (d)-[g:transforms]->(e)

REMOVE b.domain_identifier

REMOVE e.field_identifier

REMOVE d.domain_identifier

REMOVE c.file_identifier

RETURN a,b,e;

Post one line of your .csv file to check the failure point.
-Kamal

Matrix ID Study Title Domain Name Domain Label Domain Name Extension Sub Domain Label Form Name Form Label Form Name Extension Form Variable Name Form Variable Label SDTM Variable Name SDTM Variable Label Join Criteria Transformation Type Transformation Pseudo Code Transformation Code
1 Study123 DM Demographics DM.csv None DM Demographics DM.CSV PROJECT STUDYID Study Identifier No Transformation No Transformation
2 Study123 DM Demographics DM.csv None DM Demographics DM.CSV DOMAIN Domain Abbreviation Manual Entry Text "DM"

Check to see if Form Variable Name is enclosed between back ticks in FOREACH statement. I might have missed it in my example.

Thanks. Form Variable Name column shows blank in second row. In this case you should use
FOREACH(ignoreMe IN CASE WHEN line.Form Variable Name <> "" THEN [1] ELSE END |

Tried that. So the problem, it creates a null node and points everything to that.

I want line.Form Variable name node "PROJECT" created and it should be related to line.Form Name . But when we encounter empty cell for Form Variable name it shouldn't create a node at all.

Here is the code that worked for me.

Replace part of your code with this:

FOREACH(ignoreMe IN CASE WHEN line.Form Variable Name IS NOT NULL THEN [1] ELSE END |

MERGE (d: Source_Form_Variable {name: line.Form Variable Name,type: 'Source_Form_Variable', field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})
MERGE (c)-[u:Has]->(d)
MERGE (d)-[g:transforms]->(e)

)

MERGE (e)-[q:Constitutes]->(b)

MERGE (b)-[p:Part_of]->(a)

MERGE seems to be working now under FOREACH.

puneet

-Kamal