Importing CSV Data into Neo4j

LOAD CSV is great for importing small- or medium-sized data (up to 10M records). For data sets larger than this, we have access to a command line bulk importer. The neo4j-admin import tool allows you to import CSV data to an empty database by specifying node files and relationship files.

We want to use it to import order data into Neo4j: customers, orders, and ordered products.

The tool is located in <neo4j-home>/bin/neo4j-admin and is used as follows:

bin/neo4j-admin import --id-type=STRING \
                       --nodes:Customer=customers.csv --nodes=products.csv  \
                       --nodes="orders_header.csv,orders1.csv,orders2.csv" \
                       --relationships:CONTAINS=order_details.csv \
                       --relationships:ORDERED="customer_orders_header.csv,orders1.csv,orders2.csv"

The first few rows of data used for this import look like this:

Table 1. customers.csv customerId:ID(Customer) name 23 Delicatessen Inc 42 Delicous Bakery Table 2. products.csv productId:ID(Product) name price :LABEL 11 Chocolate 10 Product;Food Table 3. orders_header.csv,orders1.csv,orders2.csv orderId:ID(Order) date total customerId:IGNORE 1041 2015-05-10 130 23 1042 2015-05-12 20 42 Table 4. order_details.csv :START_ID(Order) amount price :END_ID(Product) 1041 13 130 11 1042 2 20 11 Table 5. customer_orders_header.csv,orders1.csv,orders2.csv :END_ID(Order) date:IGNORE total:IGNORE :START_ID(Customer) 1041 2015-05-10 130 23 1042 2015-05-12 20 42
If you call the bin/neo4j-admin import without parameters it will list a comprehensive help page.

The repeated --nodes and --relationships parameters are groups of multiple (potentially split) CSV files of the same entity, i.e. with the same column structure.

All files per group are treated as if they could be concatenated as a single large file. A header row in the first file of the group or in a separate, single-line file is required. Placing the header in a separate file can make it easier to handle and edit than having it in a multi-gigabyte text file. Compressed files are also supported.

  • The --id-type=STRING indicates that all :ID columns contain alphanumeric values (there is an optimization for numeric-only IDs).

  • The customers.csv is imported directly as nodes with the :Customer label and the properties are taken directly from the file.

  • Product nodes follow the same pattern where the node-labels are taken from the :LABEL column.

  • The Order nodes are taken from 3 files - one header and two content files.

  • Line item relationships typed :CONTAINS are created from order_details.csv, relating orders with the contained products via their IDs.

  • Orders are connected to customers by using the order CSV files again, but this time with a different header, which :IGNORE’s the non-relevant columns.

The column names are used for property-names of your nodes and relationships. There is specific markup on specific columns, which we will explain.

  • name:ID - global id column used to look up the node later reconnecting.

    • if the property name is left off, it will be not stored (temporary), which is what the --id-type refers to.

    • if you have repeated IDs across entities, you have to provide the entity (id-group) in parentheses like :ID(Order).

    • if your IDs are globally unique, you can leave that off.

  • :LABEL - label column for nodes. Multiple labels can be separated by delimiter.

  • :START_ID, :END_ID - relationship file columns referring to the node ids. For id-groups, use :END_ID(Order).

  • :TYPE - column to specify relationship-type.

  • All other columns are treated as properties but skipped if empty or annotated with :IGNORE.

  • Type conversion is possible by suffixing the name with indicators like :INT, :BOOLEAN, etc.

For more details on this header format and the tool, see the documentation in the Neo4j Manual and the accompanying tutorial.


This is a companion discussion topic for the original entry at https://neo4j.com/developer/guide-import-csv/