How to NOT use an index?

chris3 · June 24, 2019, 4:02pm

Hello everyone!

I have been using Neo4j for a while now but just realized that I have a performance problem with the following query (simplified):

MATCH (u:User)<-[:U]-(:Profits)-[:DP]->(dp:DailyProfit) WHERE ID(u)=12345 AND dp.localDate>='20190601' AND dp.localDate<='20190624' RETURN dp

(FYI Result = only 24 DailyProfit nodes, one per day)

Instead of using the index on the ID, getting about 500 dp nodes and then filtering those on the localDate property, the query planner uses a NodeIndexSeekByRange on all of the 500k (!) DailyProfit nodes in the graph ... which takes about 1.5 seconds (!)

BTW the query with just the localDate>=x (and not a range >= and <=) takes 18ms (milli seconds that is!) because it does exactly what I would expect: Start with a single User node and filter the ~500 DailyProfit nodes...

Can I tell the query planner to start with the index on the ID of the user and filter the very small result with a range filter somehow?

Or any other suggestions to optimize this so that the planner doesn't always start with the range seek just because it thinks that is the fastest way (because of course it isn't since there are about 500 DailyProfits nodes per user but 500k in the entire DB)?

THX!

andrew_bowman · June 24, 2019, 4:34pm

So this is a frustrating thing, the planner REALLY wants to use that index seek by range. Even if I massage the query to perform that lookup by id, it still wants to use an index seek by range and use a hash join between the results.

I'll raise this with the Cypher team, but in the meantime we can throw something into the query which will cause the planner to only use the localDate predicates as a filter and not for lookup. Try this out, verify it with EXPLAIN:

MATCH (u:User)<-[:U]-(:Profits)-[:DP]->(dp:DailyProfit) 
WHERE ID(u)=12345 
WITH u, dp, true as ignored // introducing a new variable to throw off the planner
WHERE dp.localDate>='20190601' AND dp.localDate<='20190624' 
RETURN dp

chris3 · June 24, 2019, 5:06pm

Hey Andrew,
I have tried all sorts of stuff ... but your solution is genius AND it works!
THX!!!
Cheers, Chris

PS: I was thinking along the lines of USING INDEX u:User(*ID*) ... or NOT USING INDEX dp:DailyProfit(localDate) ... or some way of telling the planner where to start since I know that it would be smarter to start at the user-index...

andrew_bowman · June 24, 2019, 5:09pm

By default the planner SHOULD be preferring the graph id lookup (it isn't an index lookup so we can't use a hint for it) over all the others.

And in the case where I could get it to use the graph id lookup, it just couldn't let go of the index range scan on the others, so it would still be inefficient, so I had to resort to trickery.

I've raised the suggestion with our Cypher team on introducing a new planner hint to disable a lookup of any type on a node, so only expansions to that node would be possible. We'll see if that gets picked up later.

Topic		Replies	Views
How to Optimize a Query with Huge Where Clause Cypher performance , cypher	7	685	June 16, 2020
Querying relationships slow performance Cypher performance , cypher , relationship	4	1557	October 15, 2020
1000 queries takes time(as expected), how can i approach this this in a better way Neo4j Graph Platform performance , cypher	3	234	November 24, 2023
ORDER BY not using index Neo4j Graph Platform performance	3	175	June 8, 2023
Cypher performance optimization when Node doesn't have indexing Cypher performance , cypher	3	854	December 20, 2018

How to NOT use an index?

Related Topics