Brand new to graph databases in general and trying to design a system using appropriate technology. 20+ years experience in the RDBMS realm and adding yet another new tool to the toolbox.
I'm trying to design a model that identifies people and communitions between them. For example, Person:A sends Communication:Email to Person:B. I can identify that each person is a node, and it's logical to assume that each person has a property for email address. However, in this universe it's not possible to assume an email is unique to any given Person (nor is any other type of communication, ie: phone call, facebook message, SMS message, etc). Does that mean the value for email is itself a node? My gut feeling is that it is not, but my RDBMS background is yelling at me to ensure uniqueness! I also identify the actual email as another node as that is a specific instance of a Communication, links People nodes, and has properties unto itself.
So in my current model, we have lots of methods of communications, none of which can be considered unique to an individual Person, and none of which are required. Do Person nodes just have properties for each communication type and we let queries perform the necessary aggregation (how many emails from address A to adres B)? Or instead is the address of a communication enough to warrant that it is itself a node?
Here is how I data model this:
Create 'Person' nodes with properties that store the relevant info. If a person A sends email to person B then create a 'EMAIL' relationship between A and B as shown above.
I also come from SQL databases and I migrated data from SQL to Neo4j.
Hope this will help you to start your project.
When I model use this rule of thumb:
Model how you would speak it. Nouns become nodes, verbs become relationships. Be weary of lazy speech, you don't email someone, you send someone an email.
Here's a quick model of how I would start. Don't be afraid to use nodes. Using nodes allows you to connect the numerous relationships when needed to express the data you seek. Also remember it is through traversing the graph that you'll see performance benefits over using and RDBMS.
This is a very good explanation, and also reinforces some of the concepts I've been researching. For example, the point on lazy speech makes perfect sense when you break it down into core components. Thank you for your response.
My mind was thinking about this some more (once you start with graph, you can't stop ) and the model I laid out could be condensed in it's actual implementation. Almost a logical to physical implementation.
Email, SMS, Phone Calls, Chat; they're all just forms of communications, right? So the thought I had is since nodes can have zero-to-many labels, you could model with a generic
:Message label. Every message has some common attributes/relationships;
time. Then you could add a second label to further identify the message,
:Chat. Then with graph being schema-less if there are attributes that are specific to a node-secondary-label such as email's have subject lines where chat's do not, you can still store that information on those nodes.
I think modeling this way would greatly simplify queries when you're just analyzing general questions of whom-is-communicating-to-whom. It would visually condense those communications patterns. But with the secondary node labels you can still retain the flexibility of fine grain querying on specific communication methods.
That's the path I was trying to going down. If you refer to my original post I was calling them "Communication", but "Message" is a better, simpler name (thanks!). I agree that messages in general contain many common attributes, so it makes sense to simply them to a common label with appropriate atributes.
At this point I have still don't understand how I'm going to relate the sender to the email. Referring to my orignal post:
Because it's possible multiple Person nodes could have the same address (emal, phone, etc - legacy system, don't ask...), I was planning to use the unique identifier for Person to relate it to the email. Does this sound like an appropriate relationship? I think that once I understand this part I can actually begin pushing real data to a test database and starting writing quueries against real data.
What about model like this? Since the definition of account owner can be vague, you can leave off that relationship and just infer if someone has used an account to send a message they also are an owner. Or if later on you decide to build in that relationship you can.
I think Account makes perfect sense - you require an account to send a message, that account has attributes associated with it, and it's utilized by a Person. You've provided some excellent guidance, Mike. Thank you for taking the time to help me better understand how to organize my model to fit our needs!