I have a graph with static timestamp (datetime) properties on nodes labelled Users in the graph. These timestamps indicate the first and last observation, both of which are indexed, of each user when the data was loaded. I am trying to maintain 1 weeks worth of data which means each day I want to delete the first day of the week, before I add the newest day. Further, user nodes are connected to third-party ids which do not have any timestamps on them.
The method I am trying to achieve is to delete all users from the first day whose 4 hop subgraph has no timestamps beyond the first day.
A couple of approaches I am thinking:
1.) Collect all users who are from the first day
2.) Using Unwind - For each user in that collection, run apoc.subgraphNodes with max level set to 4 and return all user nodes in the subgraph, if any of them have a last observation after the first day do not delete any of the ids, Else, if no users in the subgraph have a last observation after the first day. Delete/ Mark to delete all nodes in the subgraphs.
1.) Py2Neo? do everything in python?
I am looking for helping setting this query up and suggestions on optimizations. This is a toy example, in production we are aiming to maintain 1-2 years of data and it may have around 5-15 Million users per day to consider for deletion out of 1.8 Billion User nodes.
What I have so far:
Match (u:User) where datetime(u.last_obs) < datetime('2018-01-02') with u limit 10 with collect(u) as coll unwind coll as u CALL apoc.path.subgraphNodes(u, {maxLevel:2,filterStartNode:true,relationshipFilter:'OBSERVED_WITH',labelFilter:'>User'}) yield node return node.last_obs order by node.last_obs DESC
This returns all users in each subgraph for the original 10 (with u limit 10), I want to wrap a case when statement with a limit so for each user, I run a case when on the subgraphNodes with a limit 1 on ordered by node.last_obs DESC such that if the latest 'last_obs' is greater than or equal to '2018-01-02' then I want to call all nodes using call subgraphNodes with no label filter and mark/delete all nodes in the subgraph.
- neo4j version : 3.4.9 community
- Possibly using py2neo. Otherwise just cypher or Apoc
- a sample of the data you want to import