Filter relationships on relationship properties returned by subGraphAll

I need to convert the relationships returned by subgraphAll into path and filter path relationships by relation properties.

I used expandConfig and expand which returns path, but those 2 queries are getting hang in case of large data sets. SubgraphAll is working fast in my case.

If you need paths, then subgraphAll() isn't the one to use.

You might want to look at using apoc.path.spanningTree() if you want to ensure every node is only visited once. Note that this may exclude relationships, since a relationship that connects to an already visited node will be discarded.

You may need to add more detail on your requirements.

I need to return all nodes and relationships(a sub graph kind of thing) from a node by relationship property 'projectId' as defined below. Two nodes can have atmost one relationship name 'Forward' with different property values as shown below.
Given graph data,
A -----:Forward{ projectId: 1}-----> B
A -----:Forward{ projectId: 2}-----> B
A -----:Forward{ projectId: 3}-----> B
B -----:Forward{ projectId: 3}-----> C
C -----:Forward{ projectId: 3}-----> A
if a get lineage on A by projectId 3, query has to return A --> B ---> C ---> A

Ah, in that case the path expanders won't do what you need.

What have you tried so far using just Cypher? And do you have an expanded PROFILE plan of the query?

I tried like
Match p =(:ColumnNode)-[r:Forward*]->(:ColumnNode) where all(x in relationships(p) where x.projectId in ['3']) return p

The above query execution never ends..

This will get you all possible path segments as well. Do you have specific and/or end nodes? Do you want longest segments?

If so you may want to ensure that your start node has no incoming :Forward relationships with projectId = 3, and likewise for outgoing :Forward relationships from the end node.

If these are the start and end nodes you want, do you know about how many of those kinds of start and end nodes there are in the graph?

Yes, but the above query never return results in case of large graph. I don't have end node kind of thing. I just provide start node, the query has to return all possible paths starting from start node.

My graph consists of nearly half a million nodes and 6 million relationships

How many paths are you expecting? Have you tested out returning a count() of possible paths instead of path results? If so, and if that returns, are the counts of possible paths making sense?

If you are referring to just Cypher query, I am sorry to say that the query is never returning, so i couldn't able to check the count of path returning. Where as in case of ExpandConfig, query is returning exactly what i want if i set limit option. i want expandConfig to run without limit option..

Using expandConfig with limit, i am iterating the paths and filtering the path having relationships with projectId = 3 using all(). In this case i am getting expected results.

What limits are you using?

(also are you always going to be starting from a certain starting node? If so you may want to change your cypher approach accordingly)

The point of my previous request was for you to check if the number of possible paths being returned is reasonable, or if it's far too much than you'd want to work with at once. That is, if this is returning millions or billions of results, is that something you really need? Is there any amount of paths where you would want it to cut off, or will you always be working with the result set, no matter how high that could go?

You can use a LIMIT before your count() aggregation to do something similar to the limit within the expandConfig() call. Also, have you verified that the paths you're getting are all useful, and what you're looking for, or are there additional filtering you would need to get your desired result set?

Do you by chance have loops in your graph for relationships with the given projectId? If so, it may be worth trying out expandConfig() while using uniqueness:'NODE_PATH' in the config, as that will ensure you'll never have loops in your paths.

If loops are allowed, and with sufficiently complex branching paths, that can contribute to a large permutation of distinct paths. using NODE_PATH could help, if that's the case.

I am using expandConfig limit like limit:2000 in configuration. Yes i always start from certain starting node. i dont think my graph is that much dense that it returns milloins or billions of results. I check that paths returned are in multiples of thousands and I verified that the paths i am getting are all useful. No additional filtering needed except from projectId.
I even need to get loops from my graph so i use RELATIONSHIP_PATH as uniqueness and it worked, provided that the limit is set to 2000 in my expandConfig configuration. Finally i need to remove that limit and get result set no matter how high it is.

Try gradually increasing the limit (with Cypher), see how how that can go before you start seeing timing issues, and let us know how high that gets. I think even if in the end you want everything, at this point understanding how high the results can go is useful. You may also want to see how long the paths are getting too, by returning length(path) in addition.

I would still expect the Cypher version to be outperforming expandConfig(), since with an all() predicate on the relationships it should be able to check during expansion, rather than filtering after. With a LIMIT on expandConfig() it may be faster but you will be returning fewer hits, since the LIMIT should only be applying to the expandConfig() call which isn't doing the filtering.

Also expandConfig() using 'RELATIONSHIP_PATH` uniqueness will perform about the same as just Cypher, I wouldn't think you would get much benefit from using it instead of Cypher.

Thanks. Will check and let you know.

Match path=(c:ColumnNode{key:'APS0117001:BIQA:DBO.CERTIFICATEREPORTINGDETAIL:CREATED'})-[:ColumnReverseLineage*]->(:ColumnNode)
where all(a in relationships(path) where a.projectId in ['3']) WITH path unwind [x in relationships(path) ] as r
with distinct r return collect( distinct { /* My custom json based on result set */}) as lineage

My question is, where to keep limit in my query to limit the result set?

Depends on what you want to limit. If it's paths, then add it after you get and filter your paths:

MATCH path=(c:ColumnNode{key:'APS0117001:BIQA:DBO.CERTIFICATEREPORTINGDETAIL:CREATED'})-[:ColumnReverseLineage*]->(:ColumnNode)
WHERE all(a in relationships(path) WHERE a.projectId in ['3']) 
WITH path 
LIMIT 100000
UNWIND relationships(path) as r 
WITH DISTINCT r 
RETURN collect( distinct { /* My custom json based on result set */}) as lineage

Also make sure you have an index or unique constraint on :ColumnNode(key), though you likely have that already.