Query taking unusually long to complete

I have a Neo4j 4.1.0 community edition setup on an EC2 instance (Ubuntu 18.04) with 16 GB RAM. The size of the database is 211 M, determined by running
du -hs /var/lib/neo4j/data/databases/neo4j/
which is made up of about 93K nodes of 3 labels with a single property each.

I have configured the following settings as suggested by neo4j-admin memrec.

dbms.memory.heap.initial_size=6g
dbms.memory.heap.max_size=6g
dbms.memory.pagecache.size=7g

I am running the following query which is taking about 6 minutes to get completed.
MATCH (person:Person), (person:Person)-[r0:STUDIED_AT]-(college:College), (college:College)-[r]-(x) RETURN type(r) AS label, last(labels(x)) AS target, count(r) AS count ORDER BY count(r) DESC

Can someone help me understand why this query is taking so long to run although the size of the graph is pretty small and the system specs are good enough? Also, is there a way to speed up the execution considerably without modifying the query (because the query is coming from popoto.js and I do not have much control over it).

I have already tried the following:

  1. CALL apoc.warmup.run()
  2. Run the same query twice (expecting a better time at second execution)
  3. Create index on all three labels (I do not need to write to the DB, it is largely read-only).

Couple of more questions:

  1. What limits the size/number of requests to the DB? How can I accommodate more?
  2. Is caching results possible? I know that neo4j caches the db and the query plans but not sure if results can be cached. I saw a feature request in the github issues but not sure if it got addressed.

Can you post the output of the query with prepended with the keyword EXPLAIN ?
This shows the processing done for the query and gives more insight.

See https://neo4j.com/docs/cypher-manual/current/query-tuning/how-do-i-profile-a-query/ for more info

Here is the output of EXPLAIN. Please let me know if you need more details.

You probably can rewrite it to which avoids some cartesian duplication:

Unfortunately I can't edit the query. It's created internally by a js library which I am using for my application. So firstly I am trying to assess if this performance (given the size of the data and the machine configuration) is warranted and if there is a way to configure neo4j for faster performance

What js library is that?

Even if you can't change the generated code it is interesting to know how it compares to the generated query.

The library is popoto.js

Sorry not familiar with it, perhaps others are :)

Thanks for trying to help. Would you be able to comment on whether this performance (given the size of the data and the machine configuration) is warranted?

6 minutes seems outrageous long, which instance type are you using?
I would love to see how much time is shaved off with the query rewrite.
Even if you can't "fix" the query its good to know if this helps.

Are you able to download the dataset and try it on a local Neo4J desktop instance?
Just to see how it compares to the EC2 instance..

you might want to post the output of PROFILE as well, just to get a bit more insight.

Here you go, thanks for looking

I'm not sure if it is zoomable. Here is the link to the image in case it is not.

As you can notice the query causes an enormous cartesian product, this is why its so slow.

What is it you are trying to build?

I would investigate in getting popoto to be smarter with the query or move away from popoto.

Thanks. I am trying to build a web interface for neo4j to make a dataset available for users to explore. I figured out a way to edit the queries created by popoto on the server side. That resolved the issue. Thanks for looking into this.

2 Likes

Great thanks for the update, appreciated!

Hi, Could you please share the way to edit queries created by popoto on server side. Also I want to know how we can write custom queries in popoto js . I am trying to do it with help of schema. But your help means a lot to me. Thanks in advance.