Get Leaf Nodes for all Nodes in directed graph

Hi Team,

I have a simple graph that has Nodes which represent duplicate record id in the below form
Duplicate Id, Original Id
A,B
B,C
C,D
X,Y
Y,Z

The directed graph looks like A -> B ->C ->D and I want CSV result that looks like below that will represent the ultimate Original Id
A,D
B,D
C,D
X,Z
Y,Z

Have been struggling for more than a day being newbie to graph databases, any help will be highly appreciated.

Thanks

Here ya go:
http://console.neo4j.org/r/vqzcbq

image

Cypher:

MATCH p=(a:Ident)-[:ORIGID*]->(b:Ident)
WHERE NOT (b)-[:ORIGID]->(:Ident)
WITH a, LAST(nodes(p)) as b
RETURN a.name, b.name
ORDER BY a.name

image

Thank you for the reply and it worked well for small clusters however I have 1 large cluster of 50 connected nodes and it is struggling to make all traversals and failing. Is there any way to prevent further traversals from Origin node if the 1 leaf node is found? All nodes in my cluster are connected to each other so at all times there is only 1 leaf node in a cluster so I can use that fact to have a query that works without failing.

So it's a directed graph, but not acyclic.

You could use subgraphNodes() from APOC path expanders, but you would need to pre-match to leaf nodes first, collect() them and use them as end nodes, and use a limit:1 so that it stops looking after a single path to a leaf node is identified.

Alternately, if using Neo4j 4.x, you can use subqueries to limit the expansion to only one leaf node:

MATCH (a:Ident)
CALL {
  WITH a
  MATCH (a)-[:ORIGID*]->(b:Ident)
  WHERE NOT (b)-[:ORIGID]->()
  RETURN b
  LIMIT 1
}
RETURN a.name as aName, b.name as bName
ORDER BY aName

That will find only one leaf node per a node. It won't look for others.

The LIMIT can't be dynamic from a variable, but if you executed a query prior to this to find the number of leaf nodes ( MATCH (a:Ident) WHERE NOT (a)-[:ORGID]->() RETURN count(a) ), and you passed that value as a parameter to your query, then you could use RETURN DISTINCT b LIMIT $leafNodeCount so it would stop searching when all leaf nodes have been found.