Simple count query taking too long to execute

I have a db with 20 million nodes and 50 million edges. I'm trying to execute a query of the form:

Match (u:User)-[:MENTIONED]->(:Keyword {keyword:"baseball"}) 
Return Count(u)

I have around 10 million users and only 10 keywords in the db. Approximately 50,000 results are returned

Is it normal for the above query to take around 10 minutes to finish? How can I speed this up?I have constraints on the users forcing them to be unique.

Can you please paste the schema here?

Most likely reason is that you might not have an index on User (active) property and Keyword(keyword) property.

If you do not have index on either of them it will cause label scan.

Best way to check is to do

Match (u:User)-[:MENTIONED]->(:Keyword {keyword:"baseball"}) 
Return Count(u)

It will tell you if there are any label scans happening.

Have you got indexes on the property and the keyword.keyword property?

Along with Rik's recommendation. You can try
Match (u:User:active:'True'})-[:MENTIONED]->(:Keyword {keyword:"baseball"})
Return Count(u)

Thank you for the suggestions.

I have just done
CREATE INDEX ON :Keyword(keyword)
CREATE INDEX ON :User(active)

I also ran the query as modified by Vivek and this seems to have sped things up.

This is my first usage of indexes and brings up some questions:

  1. How do I properly gauge the time of query executions? There seems to be quite a bit variance if run the same query over and over again. Why is this?

  2. Before I created the indexes the first step in the query was a NodeByLabelScan on :Keyword and now the first step is NodeIndexSeek n :Keyword. What exactly is going on here?

  3. What is the drawback of creating indexes? Why not just create an index on every property? I don't really know what's going on under the hood when I create an index, but I guess they will use up more space? What are some guidelines as to when it is appropriate to create indexes?


First time when you run, the query planner needs to create an execution plan. So it can take more time. Second time onwards when you execute the query, the query plan is taken from query cache. So it might seem faster. Any variance after that could be based on the system load.

NodeByLabelScan means for the property used in query there is no index. So, it has to look at all the nodes with the given label to find the one you need. NodeIndexSeek means it is using the index to find the node that matches your query.

Indexes take storage and memory. If you create indexes for all the properties, then it might need more memory to hold the indexes themselves, which is needed to do other work like db traversals etc.