Query Performance for Label Matching

Hello!

I feel like I'm encountering some unintuitive behavior that I'm trying to explain.

My database has about 50k nodes, and 200k relationships. I have two queries with wildly different performance characteristics.

Query 1:

MATCH p=(x:Element {name: "Target"})<-[:Has|Belongs*]-(y) RETURN y

This computes with 5200 total hits in 65ms.

Query 2:

MATCH p=(x:Element {name: "Target"})<-[:Has|Belongs*]-(y:Node) RETURN y

This computes with 8,813,850 db hits in 32396ms

I would have expected that the second query would have less computation time since the set of source nodes is restricted to a specific label. Am I missing something?

Hi @rookuu !

Can you share the Explain of each query?

Second one adds for sure a Filter on y nodes.

Bennu

Query 1:

NodeIndexSeek x:Element(name) WHERE name = $
VarLengthExpand (x)<-[anon_0:Has|Belongs*]-(y)
ProduceResults y

Query 2:

NodeIndexSeek x:Element(name) WHERE name = $ && NodeByLabelScan y:Node
CartesianProduct x, y
VarLengthExpand (x)<-[anon_0:Has|Belongs*]-(y)
ProductResults y

Apologies for the notation, the difference being that in Query 2, it runs NodeByLabelScan y:Node at the same time as NodeIndexWeek x:Element(name) then feeds that into CartesianProduct.

Profiling both queries tells me that it's the VarLengthExpand that differs wildly in db hits from about 4.5k to 9 million hits.

Hi @rookuu !

Clearly the problem is that the Query planner is using a NodeByLabelScan plus Cartesian Product instead of Expanding on x and filtering on y afterwards. Which version of Neo4J are you using?

Can you try:

MATCH (x:Element {name: "Target"})
WITH x
MATCH p=(x)<-[:Has|Belongs*]-(y:Node) 
RETURN y

Bennu

PS: Next time, a screenshot of the planner could be easier for both of us. :wink: