My csv file has two columns:
nid embedding
001 [0.001, 0.002, 0.003]
002 [0.004, 0.005, 0.006]
My current load statement in Python driver is:
CALL apoc.periodic.iterate("
CALL apoc.load.csv('topic.tsv', {nullValues:['','na','NAN',false], sep:' '})
yield map as row",
"MERGE (m:Topic {nid: row.nid})
ON CREATE SET m += row
ON MATCH SET m += row
RETURN count(m) as mcount", {batchSize:1000, iterateList:true, parallel:true})
However, this will load the 2nd column embedding as a whole string, not an array of floats in Neo4j. How to load them as an array of floats? I can also change my embedding column format if it make things easier.
I am thinking something like this:
CALL apoc.periodic.iterate("
CALL apoc.load.csv('topic.tsv', {nullValues:['','na','NAN',false], sep:' '})
yield map as row",
"
**WITH row.embedding = apoc.convert.fromJsonList(row.embedding)**
MERGE (m:Topic {nid: row.nid})
ON CREATE SET m += row
ON MATCH SET m += row
RETURN count(m) as mcount", {batchSize:1000, iterateList:true, parallel:true})
But the added WITH before MERGE is disallowed. How to update the row's value before MERGE?