How to decide between Node or a property on a node for performance.
I have a label called : Customer and there will be about 80 million are so nodes of this type. They all have the properties called Country, State and City along with other properties like firstname, lastname, contact_phone etc. My request for information about customer and other related data about this customer starts with either passing Country + State or they can start with supplying Country + State + City. Should I create these as nodes and connect them with customer nodes or should I just leave them as properties on customer node and use them in where clause where I am retrieving customers with where clause using those properties. What is best for performance ? Planning on creating indexes on these three properties to make it faster. Is there a downside to this?
Thanks a bunch.
Hi there, if you plan on grouping Customers by Country, City or State I would certainly model these concepts as separate nodes and connect Customers to them. This would enable you to make matches more elegantly and also avoid issues where spelling of city/country names are inconsistent due to bad input. Just use the data from Country in the UI where customers can select the correct node and connect them to it.
Also consider if you want these nodes to be standalone, or if you want them linked (city in a state in a country). If linked, be aware that you will need multiple city nodes of the same name to represent cities with the same name that are in different states (Springfield, for example). In a hierarchical structure you do not want to represent these with a single node for all Springfields.
You will have to be careful in how you MERGE and MATCH entries here, as that will have to be in a pattern including the state and/or country, otherwise you end up pulling in unrelated cities in different states or countries
Good point, had that challenge in one of my earlier projects and fixing it afterwards is no fun.