I have a Neo4j 4.1 DB with 200 nodes (just for demo purposes). The nodes can either be grouped by 5 distinct groups or 50. When I use 5 labels almost equally distributed over the 200 nodes Neo4j Bloom runs smoothly. When I use 50 labels mostly equally distributed Bloom is slowing down dramatically. E.g. building of the graph pattern below the the search field of Bloom takes almost 100x longer.
In both scenarios there is only one label set at each node. No indexes set in both cases.
I’m curious if the amount of labels used in the DB (in total not at a node - this question is already answered) can have these dramatic effects? As said it is only a demo DB with just 200 nodes and 1500 relations and one label at each node. The only difference is the amount of labels in total in the DB. This amount shouldn't restrict the query time in Bloom this dramatically.
Is there any trick to speed up the fetching of the graph pattern in Bloom?
Local installation on standard development hardware. No customizations for memory, heap etc… all standard as it comes out of the box.
Any idea that could explain this difference in performance highly appreciated.
Hi @krid_mail !
The way Neo4j accesses data is through this hierarchy:
- Anchor node label. Indexed node properties
- Type of relationship
- Anchor node properties, non-indexed
- Flow nodes' labels
- Relationship's properties
The anchor node is the one where the queries start to traverse the graph. So, as you can see the first thing that searches the engine is the label, so having lots of labels can have an impact on how Neo4j searches for the nodes you want.
I'd recommend to use indexes, but mainly trying to do the queries you want to do inside Neo4j Browser, with Cypher, you can use the PROFILE or EXPLAIN command before the MATCH to see the execution plan and, in the case of PROFILE, how many times the database is consulted (db hits); this way you can know for sure where the problem of the queries is.
The database is really small, so I'd say that would be the primary reason (50 labels is kind of a big number, I'm working in a production-level graph and we have around 25 labels).
Edit: If you do need to have this many labels, using indexes for sure will help (in 4.3, there are also relationship indexes), also changing the default config on the database, the heap min size, its max size, and the pagecache. But the main thing would be to profile the queries and see where's the issue, if its the labels, something else about the model, etc.
thanks for your quick reply and your suggestion to follow the query execution with the profiler. The thing in my example is, that it runs smoothly in the Neo4j Browser. As you said it is a very small example and shouldn't cause any performance issues. The performance problems I encounter are in Neo4j Bloom. Especially when Bloom tries to create the graph pattern when you activate the upper left search search field.
A query like this:
fetching the whole DB in Neo4j Browser runs with a reasonable performance (it's a small DB with just limited number of nodes and relations).
The fetching of the Meta-Graph Pattern in Bloom seems to be somehow affected by the amount of labels in the DB. And I'm searching for the bottleneck causing this (and probably a fix). The generation of the graph pattern itself cannot be profiled (at least I haven't found any way to do this).
For me the question is, is there a way to speed up the graph pattern construction within Neo4j Bloom by tweaking the settings (index etc.) somewhere.
I already tried setting BTREE indexes and FullText Index on the caption property but that didn't had any effect on the performance.
I haven't tried Label indexes because my demo is in 4.1 where this functionality isn't available.
I may found an explanation for the performance issue with Bloom. I managed to get Bloom respond with better performance by dramatically reducing the allowed categories (labels) and relationship-types as a new perspective. It appears to me that Bloom gets into trouble when it has to deal with a large amount of labels and (even worse) relationship-types. Crafting multiple views are at least an approach to deal with this. Downside is that you loose some of the nice flexibility Bloom offers in exploring your data due to the need to pre-define at least the major direction within a perspective.
I'm happy to read that you have found a solution. Bloom also uses the GPU to render the animations, perhaps something can be done by tweaking with it. I'll keep searching if I see something that could help. I'm sorry to answer until now, but it was a bit of a rough week!