In Cypher how can I check the data type of a property?
Once a property is used for one data type, are all properties for that node label that same property? I'm assuming no. So if I have a node label :testnode and I have a property of mystery. I could mix-match putting numbers, strings, dates, etc... in the nodes. Obviously a data quality nightmare but that leads me back to my first question. I'd like to be able to query my nodes and look for data quality issues.
Cypher properties always have a type, but Neo4j doesn't constrain their type. That is to say that if you have a node property called mystery, it's possible to make it sometimes a string, sometimes an integer.
A way that you can profile looking for data quality issues is by checking types like this:
WHERE t.mystery = toString(t.mystery)
That will tell you how many strings there are. If the property were an int, it wouldn't match. You could do a similar things with other types too, to build up a table of how many instances were which type for an attribute.
UPDATE August 2020
In recent versions of APOC, there is CALL apoc.meta.nodeTypeProperties() and CALL apoc.meta.relTypeProperties() which samples the database and outputs schema for all labels & rel types. This is a great way to detect (for example) if any property in the database ever has more than one distinct type. So in the example I gave above with the "mystery" property, the results of those procedures would report that the "mystery" property is of type ["String", "Long"]
I think you may have given me an idea for a way I could contribute to the APOC library. Have a function like DataType() that would go through a series of case statements testing each data type and return a string value of the data type that it was determined to be.
Benoit's got a good idea -- thing is I would just caution you that there's a way of telling the type of an individual property value, but that's not the same thing as a property having a type -- they don't have types, or at least they can vary.
Reason I bring this up is that you need to do some kind of sampling, like for example MATCH (n:Node) return n.mystery limit 100 or MATCH (n:Node) where id(n) % 3 = 0 return n.mystery limit 100. Only if the types of all of the sample agree is it probably safe to assume that's the type of the property.
Once a property is set as INT, how to make sure new values for that property too are INT.
I created a node with age as property and type casted it to INT.
When am creating nodes using load csv or jdbc, this age property is loaded as string not INT.
Do I need to typecast everytime i load?
Is this always true? I seem to be ingesting CSV files using neo4j-admin import with arguments like --nodes "import/nodes_header.csv,import/nodes.csv" where nodes_header.csv specifies data types e.g. value:int and live:Boolean..
Properties themselves do not have types, but the values they hold do. On import, numeric values, booleans, strings are all stored differently for a property which is why they need to be imported as typed data. However, you can have a property, X which could be assigned a string value or a numeric value. We don't enforce typing for a property. Of course, it would not make sense to assign a property different type values in your graph.
The syntax for creating constraints was designed with future extensibility in mind. With a growing interest in more specific and rigid constraints, it might be worth revisiting the Cypher constraint parser to add more flexible syntax.
Existing constraint syntax impliesASSERT xxx is a parsed result expecting xxx to be a boolean result. If it was, we could do something like the following:
CREATE CONSTRAINT ON (book:Book) ASSERT apoc.meta.type(book.id) = "INTEGER";
Currently supported constraints are only those documented in 5.4 Constraints:
ASSERT (node.property [, ...]) IS UNIQUE
ASSERT (node.property [, ...]) EXISTS
ASSERT (node.property [, ...]) IS NODE KEY
These constraints are a boon to defining production data, but have not yet been expanded beyond the initial definition:
So, if anyone out there is looking for a way to contribute to Neo4j in a big way, this would be a good one. (Might be me, once I finish my current efforts)