How to only return the "best" connecting node between pairs

I have a graph that generally looks like this:

(:A)-[:HAS_B]->(:B)-[:HAS_C]->(:C)

And, :B represents a kind of "study" that was done that yielded some result. So, as quite often is the case, there are many B's for each A/C pairing.

What I'd like to do is return the distinct A/C pairs, but only the "best" B (by score property) between them for each pair. I've tried something like:

match (a:A)-[:HAS_B]->(b:B)-[:HAS_C]->(c:C)
return * order by b.score desc limit 1

But, this limits the entire query to a single row, returned (a single A/C pair). I've thought about collecting all the B's together in a list and taking the top one, but this also doesn't have the desired results:

match (a:A)-[:HAS_B]->(b:B)-[:HAS_C]->(c:C)
with * order by b.score desc
return a, c, head(collect(b)) as b

I've also considered a two-pass query (first get all the A/C pairs, then go back and get the best B connecting them), but I've been stumped implementing it.

Any help appreciated. Thanks!

APOC to the rescue:

MATCH (a:A)-[:HAS_B]->(b:B)-[:HAS_C]->(c:C)
RETURN a, c, head(apoc.coll.sortNodes(collect(b), "score")) as bestB

apoc.coll.sortNodes takes a list of nodes and a property name as input and sorts the node list using the provided property. Applying head should return the "best".

As an alternative, APOC has some useful aggregation functions. Shouldn't be any significant performance differences, just a matter of preference.

MATCH (a:A)-[:HAS_B]->(b:B)-[:HAS_C]->(c:C)
WITH a, c, b
ORDER BY b.score DESC
RETURN a, c, apoc.agg.first(b) as bestB

Just a note, looks like the APOC documentation for the aggregation functions got missed when redoing the docs. We'll work on that, in the meantime you can use this to see the aggregation functions available to you:

CALL apoc.help('agg.')
1 Like

Fantastic! Thank you both.