I have a problem with working on a very big database. The database has only 1 label and 1 relationship type called Person and FOLLOWS.
There are 376 million Person and 2.8 billion FOLLOWS. I want to get specific users' followers. For example, there are 3 people whose id's are '1327956520', '3984666707', '3271563388', in order to get their followers' ids, I use the code below.
where a.id in ['1327956520','3984666707','3271563388']
with collect(x.id) as ids
but this query is too slow.
neo4j version : 4.2.3
The query is doing an index seek, so that's good.
Keep in mind that you didn't provide a direction on the :FOLLOWS relationship in the pattern, so x is matching to both followers of
a as well as the persons that
a is following. If you ONLY wanted followers of
a, then you should use the correct direction of the relationship in the pattern, and that should also reduce the rows (work) and results.
If :FOLLOWS relationships can only connect :People nodes (and if that's not likely to change anytime soon), then you can remove the
:Person label from
x in your pattern, since you know those will always be :Person nodes according to your knowledge of your own graph. That will save an unnecessary Filter operation on the label for x.
If you have the ability to turn on query logging, check the query log for the query and check for cache hits and misses. Run the query again and see if that has changed.
If there are still cache misses, that may indicate that more memory is needed to be assigned to the pagecache, which is used for the in-memory graph. If most of the graph is able to be covered by the pagecache, you won't have cache misses so db operations can avoid hitting disk.
Also, your query seems to be getting the followers of all the given users into a single list. If you wanted the followers bucketed into lists per user, then you will need to keep
a in scope in your WITH clause.