Graph Search with path pruning


I'm trying to solve a path-finding problem between a root_node and a target_type_id-nodes. There can be multiple paths from the root_node to the target_type_id-nodes, but I know a potential max_depth.
Each node in the graph has 3 properties, type_id, unique_id, and weight.

What I want to figure out is all paths between root_node and target_type_id, but I'm only interested each path that passes through a unique sequence of type_id nodes . It is very common that there are multiple paths for each unique type_id sequence, but I'm only interested in the "max" one, according to they weight. The weight should be derived as, if from node X there are multiple relations to node [Y1..Yn], where all Yn nodes have the same type_id, explore the Y-node with the highest weight.

Explore until the path reaches a node of type_id == target_type_id, or max_depth is reached.

Speed is very important. Right now I simply extract all paths between root_node and target_type_id-nodes, and do this in python (way too slow for large graphs, the cypher-query is what is the bottle-neck). The weight and type_id is on the nodes atm, but could be moved to the relations.

Any thoughts?
Thanks for the support!

You could look at the transversal methods in the APOC library. Maybe some of the APOC experts could guide you in reviewing other methods.

if that does not work, this seems to be a very good application for writing your own procedure. The procedure could take the two nodes and the sequence of id's and return the path you desire. I have written my own procedures for my application that iterates over graphs calculating graph metrics. I feel that your requirement would be pretty straight forward. The procedure runs on the neo4j server, so it should be your fastest option. The APOC library is implemented the same way, so if you could find one that meets your needs that would save you a lot of trouble. The only downside is that you need to deploy a jar with your procedures on the server itself. I don't think this is supported in the hosted solutions. You would need access to your server.

let me know if you are interested in this option, as I can help you get started.

Alright, thanks!

I have been looking at writing my own procedure. I host it on my own hardware so that is currently not a problem.

I'll let you know what path we decide on :)

Hi Emil!

How many type_ids do you have? If you wanna get an idea about how performant this traversal can be, create a label that identifies the type_id value for each node and use APOC expand config.

A combination of labelFilters, maxLevel and terminatorNodes should help you achieve the desired configuration.



type_id has 500K different ids ish right now, max up to a few million.
I've tried expandConfig without much success :( I'll post my results from a custom procedure later on! :)

Thanks for your support! :D