Hi all. I'm trying to decide if Neo4J is the best tool for the information I'm collating. I'm trying to form a PESTLE analysis of the UK Car industry and then using Neo4J to display the way these PESTLE factors relate to each other. The difficulty I'm having is the information and data is coming from lots of different sources and not necessarily easy to standardise. Ultimately I would like to form an impact assessment of how new technology would impact these PESTLE based factors. Ultimately i'm asking if this is possible or if anyone has done anything similar. Any help or advice would be much appreciated.
You neglected to say what PESTLE analysis is...
The power of a Graph DB is when you have lots of many-to-many relationships, and especially if the relationships have a long and indefinite chain (e.g. 6 degrees of Kevin Bacon).
It's hard to know if this is the case with your problem.
For example, suppose under Law, there are Consumer Protection, Anti-Trust, Fair Advertising, etc. Laws. And suppose you have an Electric Vehicle that is a Sedan and a Luxury Car (Tesla S). Then you can have a many-to-many relationship between a Tesla S and the different Laws that might affect it.
You can even make a Manufacturer a Label. And if a car is a joint venture, you can give the car multiple Manufacturing labels. Or say you have an old Fiat car that was created prior to the merge of Chrysler-Fiat. You can give new Chrysler-Fiat cars two Labels and old Fiats one Label.
I will note, that if Cypher queries that filter on a Label type instead of a Property value will be better.
One big advantage of Neo4J is the schema is flexible. Suppose you decide that you need a new Node Type (called a Label). It's easy to add one. And a Node can have multiple Labels. Or suppose you discover that one of your data sources has an interesting property that you hadn't thought of. You can just add it. This flexibility will be useful when your data is inconsistent and comes from different sources.
In a traditional Relationship DB, you have to change the schema and maybe do a data migration when you change your mind. All that is a head ache!
I hope that helps.
Thanks Clem, really informative. I'm still in the feasibility stage at the moment so may come back to you but thank you for your help.
Hi @matthewprowse, whenever you are dealing with multiple different sources, and doing analysis on them, don't create too many labels or properties. The golden rule for running analysis from different sources are having a translation layer and have a property for "source".
For example, for Train a Stop is a Railway station, and for a Bus its a Bus Stop. They both are different transportation domain, but they domain and behavior are different based on their domain. Something like a Type Casting.
Advantage and Disadvantage of NoSQL are they are flexible schema. But when it comes to Data Engineering and Data Science world they always consume a defined schema for test and training set. And this is the reason when publishing API, the author publishes API documentation also.
Let me know how i can further assist you.