Pattern comprehension - sort output

Is there a way to get the output of a pattern comprehension sorted without wrapping it in a apoc.coll.sortNodes or apoc.coll.sortMulti?

Would be nice, if WITH and ORDER BY could by applied directly within the predicate section - maybe I missed it?


A collection can be unwound and then ordered:

WITH [3, 1, 2, 5] as stuff
UNWIND stuff as thing
RETURN thing
ORDER BY thing

A pattern comprehension can return an collection. So the combination of these two things let you do what you're asking.

Yes, I know, but this is part of a much bigger cypher (54 lines) that gets all applicants for a vacancy and lot of related data as nested objects. In some cases there about 200 applications, each has 1-2 CVs, and each CVs about 5-20 workphases. I have to select the youngest workphase before collecting the applications, as these info is part of the application map projection.

So before pattern comprehension was introduced, I would have used an optional match, order applications and workphases, collect workphases, cut the first with head()...but this seems quite slow compared to pattern comprehension. Using pattern comprehension but then unwind, sort (all objects in with clause) and collect the workphases again also doesn't seem straightforward to me and probably nullifies the speed advantage of using pattern comprehension (just guessed).

So I came to the following approach, which works quite fast, but is not easy to read:

MATCH (application:Application)-[:APPLIED_FOR]->(positionNode:Position {uuid:"xyz"})
WITH positionNode, application { .*, LastEmployer: apoc.coll.sortMulti([(application)-[:HAS_CV]->(:CV)-[:WORKED_AT]->(wp:WorkPhase) WHERE wp.validEmployment | wp ], ['dateTo','dateFrom']) [0] }
WITH positionNode { .*, applications:COLLECT(application) }
RETURN positionNode

It would be nicer, if the pattern comprehension could return sorted output directly, something like:

[(application)-[:HAS_CV]->(:CV)-[:WORKED_AT]->(wp:WorkPhase) | wp ORDER by wp.dateTo, wp.dateFrom ]

The feature improvement is on the backlog, we agree having ORDER BY, SKIP, and LIMIT available within pattern comprehensions would be great!

Unfortunately there is a lot on the backlog, so we'll have to see if/when that improvements gets prioritized and picked up for a subsequent release.

Good to know it's on the backlog, thanks for that information.

Just to mention: I just had a case that throws an error with the pattern comprehension wrapped in apoc.coll.sortMulti:

Failed to invoke function `apoc.coll.sortMulti`: Caused by: java.lang.NullPointerException```

I thought pattern comprehension always returns an array, maybe an empty array, but it seems it returns null in some cases. Wrapping it in coalesce( ... ,[]) works.

That's strange, you're right that pattern comprehensions should always return an array. Can you provide the full query?

It occurs, when the path in the pattern comprehension is build upon a node variable that is null (by an optional match with no result). In my example above it happens, when application is null. I assume this is by design and not a bug.

You can try it with the movie-graph and the following cypher. With the condition released = 2018, movie and the result of the pattern comprehension is null.

MATCH (tom:Person {name: "Tom Hanks"}) 
OPTIONAL MATCH (tom)-[:ACTED_IN]->(movie:Movie)
WHERE movie.released = 2018
RETURN tom, movie.title, movie.released, [(movie)<-[:ACTED_IN]-(coActor:Person) WHERE <> | ] as coActors

But apoc.coll.sortMulti should accept null as an input parameter with an output of null, instead of throwing this error, as apoc.coll.sort or apoc.coll.min does.

1 Like

Ah, in this case, you may want to wrap the pattern comprehension result in a coalesce() to provide a default value [] if it comes out null.

Do we have any updates on this feature request? As Reiner mentioned, allowing limit and sort will add big help with complex queries. My queries are getting borderline unreadable without comprehensions!

Sorry, it remains on the backlog for consideration, no movement yet.

In the meantime, you may want to use subqueries to scope the expansion and limiting and sorting of results before collecting. It is more verbose than a pattern comprehension, but it should perform the same operations and be scoped per-row like a pattern comprehension, and the subquery would be able to encapsulate your logic fairly well.

Thanks for the update. I have been using subqueries to maintain the scope and that has helped. But they are definitely more verbose than patterns, which seems like an overhead if I don't plan to reuse those subqueries.
Sorry to go on a tangent here, but can you confirm if there is any performance impact of splitting my logic into multiple subqueries (most of them use apoc functions). I have not read anything on those lines or profiled my code yet, just wondering if new variables being created and passed back and forth with subqueries have any impact. I can add a new question if that's cleaner.

Depends on which APOC functions. If the function has to perform an entirely new query under the hood, then the overhead of having to do that (per row) can have an impact, and native subqueries would often perform better.

1 Like