Pretty new to this. Would love to get help on what to use for this.
I have data with user IDs and a bunch of actions, ranked by some chronological order (integer from 1 to some number). Each action has properties like the name of it.
I've created a relationship that, for each user, connect all actions from first to last.
I want to take all these relationships and aggregate it so I could see the most common paths taken (order and name of the action are important).
So, for example, I could see that actions X->Y->Z, in that order, have 1,000 different users in common is the most popular one, while X->Z->Y has fewer.
I imagine action names (from property) connected and counted based on common users that had both actions in the same index order (so for the example above X will have a split, going either to Y or to Z).
Was trying hard to look it up but not even sure what's the term for this analysis. How would you approach this?
You might abstract away the concept of a unique person node, and aggregate the edges (adding new properties to the edge, e.g. count , avg time elapsed for action, etc), but this raises questions about the node identities, so depending on your situation you might be able to simply consider the relationship paths, and count how many occur.
Possibly not what you were looking for exactly but here is a brute force simplistic way to find the most popular paths. I ran this cypher on a set of PubChem nodes (trying to think of where to run it, I thought this would be interesting to know...)
Caveat: my query example is open ended it will search to any path length, this part of my graph contains only Directed Acyclic Graphs (DAG), and I know the maximum path length possible...
with relationships(p) as rels
WITH [r IN rels | [type(r)]] AS steps
return distinct steps, count(steps)
One question are you only interested in full paths end to end (e.g. directional relationships a->b->c in a row), as I've shown or fixed length segments (e.g. a subset series of actions)?
To answer your question first, I'm interested in full paths, and the way some actions iterate in their chronological order. I don't have a fixed length (if really necessary then I can just take max from the data, though).
Your suggestion is a good direction. I don't know if what I want is something that is too complex, but I imagine something that will portray graphically all the splits between different orders.
To put it simple, I imagine each action name to be a node, that is connected to other nodes based on order. All users starts with the same action, and then we can just see all the directions they're going based on index + action. The size of the node/relationship depends on the volume of users going through it.
Also, I just have one type of relationship ("Phase"). Action names are part of the nodes that are connected with it. So if I use your suggestion all I get is things like [[Phase],[Phase],[Phase]] count = 100
Does it make any sense?
What I often do is to start with a very small dataset, then draw by hand what I want to see as the output, this often reveals the challenges (e.g. data modeling) and also reveal how to implement a solution.
Can you provide a tiny dataset (e.g. ideally a cypher CREATE), and a hand drawing of the output you are thinking of? It sounds like just a few purchase paths that overlap might be enough. Fake data is fine as long as it maps back to your real data...