Export Millions of Data on Neo4j to CSV, JSON with Official Drivers

Hello , i'm using a lot of official drivers from Neo4j, i use the Java Driver, Python Driver, and Javascript Driver, so i need that when ask for a Query like

MATCH (n:Client) return n.name , n.address, n.clientNumber

i can return the results into a CSV , in python i get the result of the query and get record by record and then add to a CSV file with native python libraries.

Is there an function in one of the official drivers that i can export this data easy to a CSV or JSON without previous treatmentof the data ?

CALL apoc.periodic.iterate("MATCH (n:Client) return n",
"CALL apoc.export.json.query('UNWIND $_batch as row with row.n as cli RETURN {name:cli.name,address:cli.address,clientNumber:cli.clientNumber} as map','/export_data/test-json-export-' + $_count+'.json',{useTypes:true, storeNodeIds:false,params:{_batch:$_batch}}) YIELD nodes return sum(nodes)",{batchSize:100000,iterateList:true,parallel:true,concurrency:20});

This will export files that are 100k in length, exporting them in parallel using APOC. Follow instructions online to use APOC and its exports, should be an easy google search.

2 Likes

HI Ben,
I just wrote code in python that would create the data extraction for each labeled node in the database I'm working on. I utilized your idea of using apoc.export.csv.query within the apoc.periodic.iterate process. I initially tested this on a Node, and extracted a few columns. That executed really well. As I expanded the python script to extract all properties in the nodes based on a single label, I ended up getting errors every time i tested. The issue is that if you bring in 100,000 rows, and that batch does not have all of the properties in it, then it fails on that batch. Do you know if there is a way to apoc.export.csv.query not fail if it doesn't see the properties? If this can be fixed, I'll be using your method of doing extraction.

Hi Brett,

I believe this can be done with Coalesce() functionality. In my example above, supposing that one of those properties is null or doesn't exists I would do the following:

"CALL apoc.export.json.query('UNWIND $_batch as row with row.n as cli RETURN {name:COALESCE(cli.name,"null"),address:COALESCE(cli.address,"null"),clientNumber:COALESCE(cli.clientNumber,"null")} as map','/export_data/test-json-export-' + $_count+'.json',{useTypes:true, storeNodeIds:false,params:{_batch:$_batch}}) YIELD nodes return sum(nodes)",{batchSize:100000,iterateList:true,parallel:true,concurrency:20});

1 Like