Performance of a parallel query execution

michael.horak · April 24, 2019, 12:37pm

Hello,

I have a Cypher query (please see below) that is used to get user data filtered by user permissions.

We are using the Role-based access control where every user has some roles (role can extend another role so we need a variable length path) with specified permissions on a group of nodes (group might extend another group so we need a variable length path).

This query seems to be performing quite well in case of one user but when we have 4 users executing the query in parallel, the execution time increases ≈ 4 times.

Since we are running the test on a 12 CPU machine with 16G of ram, we expected that Neo4j will execute the read queries in parallel so the time should be similar to single execution.

So I would like to ask if there is something wrong with the query or how can we improve this result.

Thanks a lot,

Michael

Neo4j Version: 3.5.3
Driver: neo4j-jdbc-driver (version: 3.1.0)

MATCH (user:N{userId:"1234"}) 
OPTIONAL MATCH (user)<-[:owner]-(i:N) WHERE NOT (i)-[:permission]->() 
RETURN COLLECT(i.t) AS nodes,COLLECT((i)-->()) AS relations 
	UNION
MATCH (user:N{userId:"1234"}) 
	OPTIONAL MATCH (user)<-[:hasRole]-(:N:Role)-[:extendRole *0..]->(r:N:Role)
	OPTIONAL MATCH (r)<-[:permission]-(p:N:Perm) WHERE p.perm = "READ" WITH p
	MATCH (p)<-[:group]-(:N:Group)<-[:groupExtend *0..]-(:N:Group)<-[:nodes]-(i:N) 
RETURN COLLECT(i.t) AS nodes,COLLECT((i)-->()) AS relations

execution plan:

eric13013 · April 26, 2019, 2:10pm

After investigation, we have noticed that Neo4j is retrieving too many relationships that we do not need. Regarding this, we modified the query like this :

MATCH (user:N{userId:"1234"})
OPTIONAL MATCH (user)<-[:owner]-(i:N) WHERE NOT (i)-[:permission]->()
MATCH (i)-[r]->() 
RETURN COLLECT(DISTINCT i) AS nodes,COLLECT(DISTINCT r) AS relations  
UNION
MATCH (user:N{userId:"1234"}) 
OPTIONAL MATCH (user)<-[:hasRole]-(:N:Role)-[:extendRole *0..]->(r:N:Role)
OPTIONAL MATCH (r)<-[:permission]-(p:N:Perm) WHERE p.perm = "READ" WITH p
MATCH (p)<-[:group]-(:N:Group)<-[:groupExtend *0..]-(:N:Group)<-[:nodes]-(i:N) 
MATCH (i)-[r]->() 
RETURN COLLECT(DISTINCT i) AS nodes,COLLECT(DISTINCT r) AS relations

The key modification is the COLLECT(DISTINCT r), which seems to return the same number of relationships as before, but faster than COLLECT((i)-->()) we used to do before.

Here is the request PROFILE :

It seems like there is one branch less for the collect.

If someone in the Neo4j staff could explain this.

Thanks.

Topic		Replies	Views
Parallel execution of single cypher query Cypher apoc , performance , cypher	8	322	June 6, 2023
Cypher query execution is not effectively Cypher performance , cypher	2	274	July 27, 2021
Intermittent issue in query execution time Neo4j Graph Platform migrated	6	152	July 4, 2022
Read / write performance dramatically degrades with concurrent queries Neo4j Graph Platform performance , cypher	15	9085	April 12, 2023
1000 queries takes time(as expected), how can i approach this this in a better way Neo4j Graph Platform performance , cypher	3	239	November 24, 2023

Performance of a parallel query execution

Related Topics