Slow cypher query computation

Hey, just wondering, why is cypher query on community edition is very slow?

I tried using profile and explain, followed the practice to name the label whenever I can. In addition, I have also added an index and constraint. So I am wondering whether the computation is very slow, or is it just the community edition.

Data ingestion from csv is really fast by the way. And the query I'm trying now is this:

MATCH (c:user)-[r:trx]->(p:user) 
WITH r.created_trx_date AS date, c.user_id as user, SUM(r.amount) AS total_transaction_amount
WHERE date > date('2019-03-01')
    AND total_transaction_amount >= 10000000 
RETURN wallet, total_transaction_amount
ORDER BY total_transaction_amount DESC
LIMIT 25

I have about 24,309,729 nodes and 156,617,848 relationships, currently 11 node labels, and 10 relationship labels.

I feel that the where clause is the reason it is very slow. I want to write a more complex cypher query, but if it will take very long, I am concerned that I cannot do real-time analysis on this. Can someone shed some light in query computation for me please?

Hello,

You should change your model so that the properties you are testing are node properties. Then you can place indexes on these values for better performance. In your query, you are testing relationship properties.

Elaine

Your query plan currently is currently doing label scans, not index lookups (check the PROFILE or EXPLAIN plans to verify this), which won't be efficient and will see performance issues as the number of nodes with these labels increases.

Alternately with 3.5 you can create a fulltext schema index on the relationship type + property. That said, this is a more complex process both for creation and querying, and as Elaine suggests, refactoring your model such that you are performing lookup and filtering on the node property will more easily utilize the schema index for fast lookups.

Hey, thanks a lot for answering my questions. All your answers are very helpful. Yeah, right now I am still using 3.4, but I will migrate to 3.5 to explore fulltext schema index. Actually, after reading about the fulltext scheme index, the idea is really cool!!

I will prefer to have my transaction type relationship as an edge rather than a node for visualization purpose. However, right now, I am focusing more on efficient computation.

So, Andrew Bowman, aside from complex creation and querying, do you think full text schema index will accelerate the query computation process right now compare to transaction as index?

You should also move the date check before the aggregation.

If you're on 3.4 you can also use apoc indexes for that, then you'll have an index trx.

MATCH (c:user)-[r:trx]->(p:user) 
WHERE r.created_trx_date > date('2019-03-01')
WITH r.created_trx_date AS date, c.user_id as user, SUM(r.amount) AS total_transaction_amount
WHERE total_transaction_amount >= 10000000 
RETURN wallet, total_transaction_amount
ORDER BY total_transaction_amount DESC
LIMIT 25