[Modeling question] Graph matching algorithm used

Hi there,

I got in an argument with someone that wants to add more links the the schema.
His argument is that it's faster if the entity is linked directly to the entity I want to search.

My argument is that it is better to have the most natural model possible, that mostly resemble reality. and that even if an entity is 4 link deeper, it will be fast since traversing node and their linked nodes is linear. Also, making the Cypher query will be closer to natural language and easier to build. + I don't want to maintain 'fake' link just for performance reason.

Am I wrong? Is adding link that are not natural a good way to link entity and have faster queries?

Example is for a virus on a computer

My model
User -> [HasFile] -> File -> [Contains] -> Virus

His model that is challenging mine
User -> [Has] -> Virus (I don't agree with this one)
User -> [Is_Owner] -> File
File -> [Has] -> Virus

Some Use case that the graph needs to answer:

  • know all the users that own a certain virus (virus has a Hashed unique value as Id)
  • know all the virus of one user


User -> [Has] -> Virus
User -> [Is_Owner] -> File
File -> [Has] -> Virus

If it was this few relationships, I would agree with your simple model.

If "File -> [Has] -> Virus" has changed, "User -> [Has] -> Virus" must also be changed. This is because it requires dual management.

But if I have a lot of relationships, such as network devices, I'll make each relationship with the service layer, the logic layer, and the physical layer.

Maybe I was not clear, but I want to know the difference in complexity from the last 2 use-case query.
Is one model more performant than the other?

I find adding a link between User and Virus to be hard to understand and hard to maintain on my side. A user does not have a virus, it's the file that contains the virus, the user has it 'indirectly', so I would not add a direct link there.

1- Match (user:User)-[Has]-(File)-[Has]-(virus:Virus) where virus.hash= '123' return user
2- Match (user:User)-[Has]-(virus:Virus) where virus.hash= '123' return user

Any difference in performance? Thanks

I had in mind the graph finding algorithm does not need to traverse the whole Tree from the top (User), but instead will find the match first (Virus) and go up the tree to return the requested element.

If the goal is to query with a user name and get to the virus they have, I think your buddy is going to be right to have the direct relationship. Now is that a versatile model and possible a headache to maintain, possibility yes.

@maxdemarzi has one of my favorite videos to watch and watch over again about understanding how you moved can affect query performance: Secret Sauce if Neo4j

Thanks @mike_r_black for your reply.
Ill check the video to understand better.

To add to the original problem, only "Files" entity with Virus are stored in the model.

For the time complexity of the matching algorithm on finding the Virus of a User, It will need to be linear with the number of File that a user has? So If a user as 10 file, there will be 10 link to pass by to find their File, then 10 link to pass to find their Virus (leaf node), On the proposed model with direct link to Virus, there will also be 10 links to pass correct? Is passed through a secondary entity that costly? For me, that is still O(NumberOfFile) in both case