Count() returns same thing using two very different variables

Oh man, this has me a little confused. This is from intro-neo4j-exercises #7, Controlling Query Processing, question number 6. It asks: "Retrieve all actors that have not appeared in more than 3 movies. Return their names and list of movies."


I tried this and got 96 records that looked like I expected.

MATCH (p:Person)-[:ACTED_IN]-(m:Movie)
WITH p, collect(m.title) AS Movies, count(m) AS movieCount
WHERE movieCount <= 3
RETURN p.name, Movies

This was the answer provided by the exercise, producing what looks like the same 96 records. But this code looks like it's counting the (Person) nodes as (Movie) nodes?!?

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH  a,  count(a) AS numMovies, collect(m.title) AS movies
WHERE numMovies <= 3
RETURN a.name, movies

I would have expected it to produce an error...

Thanks for any insight.

Both code blocks are performing the same operations. The difference is that the first is binding the variable 'p' to Person nodes, while the second code block is binding 'a' to Person nodes. The first assigns number of movies to count(a), while the second count(m). Both count the number of rows that have the same value of 'a', since the grouping is being done over 'a' from the 'with' clause.

Another choice would have been 'count(*)'.

1 Like

OK, I get it. I was thinking about the structure incorrectly. It makes sense, there's a row for each actor/movie relationship, and when there are three or less rows per actor, it collects the movie titles. Doesn't matter if it's looking for three or less (:Person) or three or less (:Movie). Then it collects the movie titles into a list.

I thought the request made an object with the (:Person) and list of (:Movie) already together.

My mistake. Thanks for helping me see this correctly.