Match vs comma separated values

Why do these two queries produce different results? I thought having the comma separated statements in the match clause were equivalent to using another match statement. And yet the former returns actors with Keanu Reeves in the mix while the latter does not. Is that coincidence?

match (k:Person{name:"Keanu Reeves"})- [:KNOWS] -(friends)
match (friends)-[:KNOWS]-(friendsOFfriends)
return distinct friendsOFfriends.name
limit 5

vs

match (k:Person{name:"Keanu Reeves"})- [:KNOWS] -(friends),
(friends)-[:KNOWS]-(friendsOFfriends)
return distinct friendsOFfriends.name
limit 5

This is a consequence to the uniqueness of matching used within a single MATCH pattern. Cypher uses a type of uniqueness called 'RELATIONSHIP_PATH', meaning that within a single path, the same relationship will only ever be traversed once (this is useful for many things, notably to prevent infinite loops in traversal by taking the same relationships over and over). This uniqueness applies for the entire pattern in a single MATCH (or OPTIONAL MATCH), so it does apply to comma-separated parts of the pattern.

Since you're using two separate MATCHes in the first query, the :KNOWS relationship you traversed to get from k to friends can be reused to get from friends to friendsOFfriends, so Keanu Reeves is able to be returned in those results.

In the second query, since it's the same pattern being used (just comma-separated), that KNOWS relationships from k cannot be traversed again to get friendsOFfriends, so Keanu Reeves cannot be reached or returned in the results.

3 Likes

Thank you @andrew.bowman. Do you mind showing me where this is in the documentation? I may have misread, so I would much appreciate being able to see it.

Sure, this is described in the Uniqueness section of the docs, tucked away in the Cypher introduction section.

1 Like

Hi Andrew,

I know this is a bit old. But just wanted to confirm the following:

This only happens in the case when the relationship is not explicitly given a direction, correct? In other words, this would also solve the double MATCH clause:

match (n:Person {name:"Keanu Reeves"})-[:KNOWS]-(f)
match (f)-[:KNOWS]->(contact2)
return contact2.name
limit 5

Why does it make a difference?

In your example, the thing that makes this a non-issue is that you have the :KNOWS relationships in two different patterns (two separate MATCH clauses), so the uniqueness issue never comes into play.

If you had both patterns on a single line:

match (n:Person {name:"Keanu Reeves"})-[:KNOWS]-(f), (f)-[:KNOWS]->(contact2)
...

then the issue is very much in play, but would only manifest if the :KNOWS relationship is incoming to Keanu Reeves (and thus matching the direction of your second pattern, the KNOWS relationships outgoing from f) (and if there is ONLY a single :KNOWS relationship between Keanu and f)...and in that case Keanu would never appear in the contact2 results, since the :KNOWS relationship would have been traversed once already for the given MATCH, and would not be able to be traversed a second time in order to match the desired pattern.

Remember that relationships ALWAYS have a direction, even if you aren't specifying it in your pattern.

Ok. There is a single outgoing from Keanu -> to f, so if ( f ) -> to contac2, then it can't be Keanu. But the case in which Keanu could reappear in the results is in one of the following cases:

1 - Keanu -> (f ), but there also exists a relationship ( f ) -> (Keanu)

2 - When a direction is not specified in the pattern, regardless of the direction between the two nodes, so long as a relationship exists, Keanu can be returned in the results

Is that correct?

Correct that if the only connection between Keanu and f is outgoing toward f, that pattern will never match since it's explicitly looking for the relationship incoming toward Keanu.

  1. If two relationships existed between them in opposite directions, you're correct that the pattern could be matched successfully within a single MATCH clause, since both relationships would only need to be traversed once each.

  2. If this was within a single MATCH, it depends on both what the query is looking for and also on the actual number/direction of the relationships. If you still only had a single relationship between them and you had the following:

match (n:Person {name:"Keanu Reeves"})-[:KNOWS]-(f), (f)-[:KNOWS]-(contact2)

You would still not get Keanu Reeves as a result for contact2. The single :KNOWS relationship between them would be traversed once by one of these, and thus not be available to be traversed for the other :KNOWS in the pattern.

Ultimately what this comes down to is the limitation that a specific relationship may only be traversed once within the context of a pattern in a single MATCH (or OPTIONAL MATCH).

And with 2), is what guarantees the relationship_path uniqueness, correct?

However, if there was a relationship going from (f)->(keanu), then the below:

match (n:Person {name:"Keanu Reeves"})-[:KNOWS]-(f), (f)-[:KNOWS]-(contact2)

could return Keanu Reeves, correct?

Yes, that's the behavior of RELATIONSHIP_PATH uniqueness.

If you meant that there was an ADDITIONAL relationship going from (f)->(keanu) (so there would then be two :KNOWS relationships between them) then not only would this query return Keanu for contact2, you would actually see two occurrences of him in the results, because there would be two unique paths that meet this pattern, switching up the order of which :KNOWS relationship fulfills which part of the pattern.

Yes, I meant there's an additional. So if the nodes had the following relationships:

(keanu) - [:HAS_CONTACT]->(f)
(f) - [:HAS_CONTACT]->(keanu)

So the question was -

match (n:Person {name:"Keanu Reeves"})-[:KNOWS]-(f), (f)-[:KNOWS]-(contact2)

would definitely return Keanu. But why would this now return Keanu twice here?

Remember the Cypher is interested in finding all possible paths that match the pattern.

It helps to understand if you have a means to differentiate the relationships. Let's say for the sake of example that one of the :KNOWS relationships has its internal id() = 1, and the second has its internal id()=2.

There would be two possible paths that match the pattern. The first path uses relationship 1 between n and f (the first part of the pattern), and relationship 2 between f and contact2 (the second part of the pattern.

The second possible path switches the relationships used, so relationship 2 in the first part, relationship 1 in the second part.

If you wanted distinct contact2 results, you can use the DISTINCT keyword for that in your WITH or RETURN.

ah ok. Yes, i think the simplest explanation is the Cypher query looks for any path match. I generally end up focusing on the structure of how it is in the database and not the query itself.

Ok, I apologize for the constant ask on this, but this query returned an unexpected result where I would have expected uniqueness.

match (n:Person {name:"Al Pacino"})-[:KNOWS]-(f), (f)-[:KNOWS]-(contact2)
where contact2.name = "Al Pacino"
return contact2.name

Return Al Pacino, twice. I asume this means there must be a path actually going from (f)-[:HAS_CONTACT]->(Al Pacino). That's the only possibility, correct?

match (n:Person {name:"Keanu Reeves"})-[:KNOWS]-(f), (f)-[:KNOWS]-(contact2)
where contact2.name = "Keanu Reeves"
return contact2.name

Where the above doesnt return Keanu Reeves, as expected.

I checked the graph, and there is only one Al Pacino node, and one Keanu reeves node. Why do I get this unexpected outcome with Al Pacino node?

The most likely explanation is that there is a node f that has two :KNOWS relationships to Al Pacino. It doesn't matter what direction those relationships are in (since your pattern doesn't have any restrictions on direction). Because there are two :KNOWS relationships, there are two distinct paths that fit the pattern, using the same relationships but in a different order of use.