Finding all connected nodes in a cyclic graph

I have a graph with lot of cycles in it. I want to find all nodes connected to a specific node by any length of path.
I've tried this
match (a{firstName:"XYZ"})-[ * ]-(b) return distinct b
but i think this got stuck in a cycle.
So, is their any way to do this?
Also, i was curious to know that can we find path with longest length and having distinct nodes in a cyclic graph?

In answer to your first question, you might want to come at it from another angle (if you are able to add properties onto the nodes). I believe what you are describing is counting the nodes in a subgraph that is disconnected from the rest of the nodes in a Neo4j db? There is at least one algorithm in apoc that can be used to label all nodes in every disconnected graph with a unique group number, once that is complete then a cypher to count the nodes in a group (all with the same group number) is quick and easy.

More information on a related thread

Doing this can be very useful to get a perspective on what is in a graph, the characteristics and number of disconnected subgraphs can provide insights.

Hi Harshit,
Could you please share any dataset and your required pattern?
I tried with below

A B
B C
C D
D A
D E
E F
F G

match path = (n:Person{name:'A'})-->(n:Person{name:'A'}) return path
and it gave me one cyclic path only.
A-B-C-D-A

match path = (n:Person{name:'E'})-->(n:Person{name:'E'}) return path
no path

So, this is an example dataset. I want to know all connected nodes from a node (connection can be of any length) by any relationship
So, i have tried
match (a{firstName:"Vikas"})-[ * ]-(b) return distinct b.firstName
but this query does not give results as it may have got stuck in a cycle....i wanted to know a way to do this

Just to clarify something, Cypher queries will never get caught in an infinite loop. Cypher uses a uniqueness property such that per path, a relationship may only be used once. By definition infinite loops must reuse relationships.

That said, Cypher can be overwhelmed by the permutation of all possible distinct paths (that don't repeat relationships), that's what's happening here. If you set an upper bound on your var length relationship (let's say *..8), and RETURN count(*), you might see a pretty huge humber of paths. Keep increasing that by one and getting the counts, and that should explain why your query is hanging. It's likely trying to process tens or hundreds of millions or even billions of possible paths, each distinct because they use a different permutation of relationships.

This is also why it's generally a bad idea to to have reciprocal relationships between two nodes if a single relationship is enough to capture the the relation...the sheer presence of a second relationship between every two related nodes vastly increases the permutations of distinct paths for queries like these.

As for solutions, if you have APOC Procedures installed you can use apoc.path.subgraphNodes(), which is designed to find distinct connected nodes from a start node. It uses a different kind of uniqueness than Cypher such that a node can only be visited once across all paths, that means it's not prone to the skyrocketing permutations of paths issue that's affecting your current query.

Hi Andrew, Thank you for your post on path following. Are there any references that cover this in more detail?

One note on this to be sure everyone is aware, (and let me if I'm wrong Andrew) as I understand it a single path can cross over the same node multiple times, as long as it is following unique relationships. A nuance that might not be obvious at first, but is important to know.

Yes, here's the section on uniqueness behavior in Cypher in the current docs, with some examples.

With this approach, you are correct that the same node can occur multiple times in the same path as long as every relationship in the path is unique to the path.

In the APOC docs there's a section on the different kinds of uniqueness that exist. These can only be changed/selected via the traversal API. The APOC path expanders are built using the traversal API, and you can of course build your own custom procs that use it too.

Hi Harshit,

Dataset I used

Query I tried to get the nodes which are somehow connected with A
match path = (from:Person{name:"A"})-->(to:Person) return distinct apoc.map.mergeList(nodes(path))

mergeList() isn't an aggregation function, so that likely wouldn't work (each path is on a separate row and likewise the path nodes). What you're doing is, per path, merging the properties of all of the nodes of that particular path together, so it's like you have one smushed properties map representing all the nodes of that path at once (and likely several of those properties are overwriting the others), which likely isn't what you want.

To get all nodes connected from your start node, you could use the path expander procedures:

If direction doesn't matter:

MATCH (from:Person{name:"A"})
CALL apoc.path.subgraphNodes(from, {labelFilter:'Person'}) YIELD node
RETURN node

This finds all :Person nodes at any distance via any relationship reachable from your start node. The nature of the procedure is such that the nodes yielded will be distinct.

It is possible that mergeList will not give the desired ouput however when I tried subgraph it gave me all the nodes, irrespective whether i can reach from A or not .
Output of query you gave me is as below and you can see that although I cannot reach to I from A but even then it is coming

Ah, I missed that direction matters to you, the query I gave would find you all connected nodes regardless of the direction of the relationships. We can add a relationship filter to get you the desired results:

MATCH (from:Person{name:"A"})
CALL apoc.path.subgraphNodes(from, {labelFilter:'Person', relationshipFilter:'>'}) YIELD node
RETURN node

For this simple graph, a small fix on your previous Cypher should work just as well:

match path = (from:Person{name:"A"})-->(to:Person)
return distinct to

If you want the nodes in a list rather than per row, then you can chang eyour return to return collect(distinct to) as nodes