WITH producing different result depending on variables declared

First of all, thanks for a great project, and thanks for always offering help!

Say there's a sample data like this:


The following query returns one project

with collect(p)[..1] as p
return p

But the following query returns three projects

with collect(p)[..1] as p,e
return p

What's causing the difference?
How should it be written to slice a list and return related engineers together?

collect is an aggregation function. The base for the aggregation is everything listed in the with not being an aggregation. If your first query there are no other columns, so the base is null, in the second query you have e as non-aggregating expression.
Therefore the first one will do a global aggregation wheres the second query aggregates per e.

1 Like

Hi stefan,

Thanks for offering help!
How would you suggest to re-write the query?

I was planning something like this

with collect(p)[..1] as projects
unwind projects as p
match (p)-[:BELONGS_TO]->(e:Engineer)
return p,e

I guess related engineers are engineers having a relationship to the same project

MATCH (e1:Engineer)<-[:BELONGS_TO]-(p:Project)-[:BELONGS_TO]->(e2:Engineer)
WHERE id(e1) < id(e2) // trick to prevent e1 and e2 from changing roles causing duplication of result set
RETURN e1.name, e2.name, count(*) AS numberOfCommonProjects

If you want a list of engineers per project:

MATCH (p:Project)-[:BELONGS_TO]->(e:Engineer)
RETURN p.name, collect(e.name)

If you have projects without engineers but wanted them listed as well:

MATCH (p:Project)
OPTIONAL MATCH (p)-[:BELONGS_TO]->(e:Engineer)
RETURN p.name, collect(e.name)

That's an interesting trick!

Your answer really helped a lot, I think I can continue developing now.