LOAD CSV command reads all values as a string.
No matter how the value appears in a file, it will be loaded as a string with
So, before we import, we want to ensure we convert any values that are non-string.
There are a variety of conversion functions in Cypher. The ones we will use for this exercise are as follows:
toInteger(): converts a value to an integer.
toFloat(): converts a value to a float (in this case, for monetary amounts).
datetime(): converts a value to a datetime.
We will look at the values in each CSV file to determine what needs to be converted.
The values in the products.csv files are for product ID, product name, and unit cost.
Product ID looks like an integer value that increases with each row, so we can convert this to an integer using the
toInteger() function in Cypher.
Product name can remain a string since it consists of characters.
The final column is the product unit cost.
Though the sample values from our inspection are all whole numbers, we know that monetary amounts often have decimal place values, so we will convert these values to floats using the
We can see the Cypher to handle all of these conversions below; however, we are still not loading the values into Neo4j yet. We are just viewing the CSV files with converted values.
LOAD CSV FROM 'file:///desktop-csv-import/products.csv' AS row WITH toInteger(row) AS productId, row AS productName, toFloat(row) AS unitCost RETURN productId, productName, unitCost LIMIT 3
Note that we are using collection positions (row, row, row) to refer to the columns in the row and improve readability by using aliases to reference them in the return. In a file that has no headers, this is how to reference values in each position.
The values in the orders.csv (per the column names) are for orderID, orderDate, and shipCountry. Again, we can evaluate the values and determine any conversions to apply.
OrderID looks like an integer, so we can convert that using the
The orderDate column is certainly in a date format and will require us to format it using the
Finally, the shipCountry values are characters, so we can leave that as a string.
Just as we did with the last CSV file, let us look at the results of these conversions without importing the data.
LOAD CSV WITH HEADERS FROM 'file:///desktop-csv-import/orders.csv' AS row WITH toInteger(row.orderID) AS orderId, datetime(replace(row.orderDate,' ','T')) AS orderDate, row.shipCountry AS country RETURN orderId, orderDate, country LIMIT 5
There was one tricky thing with this CSV in the
Neo4j’s datetime uses the ISO 8601 format which uses the delimiter
T between the date and time values.
The CSV file does not have the 'T' joining the date and time values but has a space between them instead.
We used the
replace() function to change the space to the character 'T' and get the string into the expected format.
Then, we wrapped the
datetime() function around that to convert the changed string to a datetime value.
The values in the order-details.csv (from column names) are for productID, orderID, and quantity. Let us look at which ones need to be converted.
Our product ID is also from our products.csv file, where we converted that value to an integer.
We will do the same here to ensure we match formats.
The order ID field contains values from our orders.csv file, so we will match our previous conversion and translate this field to an integer, as well.
The quantity field in this file is a numeric value.
We can convert this to an integer with the
toInteger() function we have been using.
The results of these conversions are in the code below. Remember that we still are not loading any data yet.
LOAD CSV WITH HEADERS FROM 'file:///desktop-csv-import/order-details.csv' AS row WITH toInteger(row.productID) AS productId, toInteger(row.orderID) AS orderId, toInteger(row.quantity) AS quantityOrdered RETURN productId, orderId, quantityOrdered LIMIT 8
This is a companion discussion topic for the original entry at https://neo4j.com/developer/desktop-csv-import/