Load csv issue

Hello All,

I am using desktop version to access browser based neo-4j instance.
while i am trying to load data to database using "LOAD CSV", there was an error on one of the rows with column value as "l cells, eosinophils and macrophages in C57BL/6 mice, TCR-δ−"
I am getting below error
@ position 244043317 - there's a field starting with a quote and whereas it ends that quote there seems to be characters in that field after that ending quote. That isn't supported. This is what I read: 'l cells, eosinophils and macrophages in C57BL/6 mice, TCR-δ−",2,97,20050008627-3316T
1503204,20050008627,CELLTYP,CL0000235,macrophage,out3289.xml,18395,17142,macrophages,"ll'

I have used cypher query:
:auto USING PERIODIC COMMIT 100000
LOAD CSV WITH HEADERS FROM 'file:///hits_termite.csv' as line
create (d:Hit {id:line.id,hitTerm:line.hitTerm,
loc:line.loc,
frag:line.frag,
hitID:line.hitID,
docID:line.docID,
entityType:line.entityType,
patent_no:line.doc_number,
sentenceNbr:line.sentenceNbr,
loc_custom:line.loc_custom,
name:line.name,
nonambigsyns:line.nonambigsyns
})

I am unable to understand what the issue is and how to over come the same.
could you please help.

Hello @i.varikuti and welcome to the Community!

From what you are describing, your data is not "clean".

You should inspect it to see where the quotes are being misunderstood. For example, you may need to add escaped characters to the field.

Rather than creating nodes from the data initially, I would recommend that you simply return the suspect field values so you can see them. That is the easiest way to see if the data is clean.

Best regards,
Elaine Rosenberg

Hello @elaine.rosenberg,

Thanks for your warm welcome and for your insights on the thread.
As you already might know I am new to Neo-4j, after my analysis I understood that the issue is because of the character backslash at end of the string.
Could you please let me know how could I escape this character backslash and would this cause any issue if the sentence is ending with this character?
In addition to this, it would be really useful for me if you can provide me some information(like blogs or documentation) on how to clean the data before loading it to the DB.

Regards,
Indrakaran Varikuti

@i.varikuti,

Use single quotes around strings and use the \ to escape the \ character.

For example dlfkgjsdlfgjsddlfg\\

I recommend that you look at the lesson we have on using LOAD CSV in our online course, Introduction to Neo4j 4.0 at GraphAcademy Online Training - GraphAcademy Online Training

This lesson methodically goes through some steps you can take to examine and clean up/transform data.

Elaine

Hi @elaine_rosenber,

Thanks for your response and suggestions

Regards,
Indrakaran

First column data seems to be missing starting double quote. Also the last column is in error. I created a .csv file with your data line after making the necessary changes and ran it my local db (version 4.1.0). Here is the result:

csv file:
c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
"l cells, eosinophils and macrophages in C57BL/6 mice, TCR-δ−",2,97,20050008627-3316T1503204,20050008627,CELLTYP,CL0000235,macrophage,out3289.xml,18395,17142,macrophages,ll

Cypher query:

LOAD CSV WITH HEADERS FROM 'file:///iv.csv' as line
return line

Result:

{
"c11": "17142",
"c10": "18395",
"c13": "ll",
"c12": "macrophages",
"c1": "l cells, eosinophils and macrophages in C57BL/6 mice, TCR-δ−",
"c2": "2",
"c3": "97",
"c4": "20050008627-3316T1503204",
"c5": "20050008627",
"c6": "CELLTYP",
"c7": "CL0000235",
"c8": "macrophage",
"c9": "out3289.xml"
}
Hope this works for you.