Spring Data neo4j performance issues

Hi all!

I'm facing a performance issue on a rather small graph (~30K nodes and ~270K relationships).

I'm exposing the database through Spring Data Rest using spring 2.6.2, neo4j version: 4.4.2-community

The source code is freely available at Jedidex / public API · GitLab

For example, If I query the apparencies of a character, it can take up to 30 seconds to give a response. Meanwhile, if the same query get executed directly on the database it is almost instantaneous.

For reference, this is the Character model:

package com.holodex.publicapi.model.resource.character;

@Node("Character")
public class Character extends SWElement {

    private String gender;
    private Double height;
    private Double mass;
    private String hair;
    private String eyes;
    private String skin;
    private String cyber;

    @JsonSerialize(contentAs = SWElement.class)
    @Relationship(type = "BORN_IN")
    private SWElement homeWorld;

    @JsonSerialize(contentAs = SWElement.class)
    @Relationship(type = "OF_SPIECES")
    private SWElement species;

    @JsonSerialize(contentAs = SWElement.class)
    @Relationship(type = "AFFILIATED_TO")
    private Set<SWElement> affiliation;

    @JsonSerialize(contentAs = SWElement.class)
    @Relationship(type = "APPRENTICE_OF")
    private Set<SWElement> masters;

    @JsonSerialize(contentAs = SWElement.class)
    @Relationship(type = "MASTER_OF")
    private Set<SWElement> apprentices;

    @JsonSerialize(contentAs = SWElement.class)
    @Relationship(type = "APPEARS_IN")
    private Set<SWElement> appears_in;

}

To keep things lightweight I expect each relationship to be of kind "SWElement" which is the supertype of every entity. It does not have any relationship.

An example of a result could be found here: https://api.jedidex.com/v1/characters/452217/appears_in, this returns ~350 elements which are the various media in which Luke Skywalker appears.
This request is performed by the following method in the Neo4jRepository:

package com.holodex.publicapi.repository.character;

    @Override
    @Query("MATCH path=(n:Character {element_id:$id})-[r]-(x) RETURN n, collect(nodes(path)), collect(relationships(path))")
    Optional<Character> findById(Integer id);

The element_id property is indexed, I think is more of a object-mapping problem since the query is quite fast if executed on neo4j. I have no clue on how to optimize this case :confused:

I'm using Spring Rest Data since the goal is to have all the kinds of resources exposed by this API without writing too much of boilerplate code.

Thanks in advice
Niko

The problem for me is to get the data :wink:
I see the problem that you are somehow limited to Spring Data Rest structure and need to use the findById.
My first thoughts/questions are:

  • Why are you using a path to get the data?
  • Why do you create a custom query at all?

I think MATCH (n:Character {element_id:$id})-[r]-(x) RETURN n, collect(r), collect(x) would already improve the mapping because the collect(nodes(path)) might contain duplicates and SDN needs its time to filter out the already mapped / uninterested data.

Hi @gerrit.meier
Thanks for the kind answer!

Well, I was using a path because of an example found online, my bad I guess :smiley:
I tried to use your suggested query but still, I experience poor performances.

To answer your second question, I used a custom query because otherwise it will fetch the whole database :confused:

Foor seek of completeness, I have deployed a test database (containing all the 30K nodes and relationships) at the following url:
the database is exposed at: bolt://jedidex.com:8087
using neo4j as both username and password, maybe it can be useful :slight_smile:

here there is the explorer https://neo4test.jedidex.com/

To execute the query you mentioned it took around 72ms

Started streaming 1 records after 5 ms and completed after 72 ms.

Please profile/explain clause of the query in neo4j explorer or bloom.It will give you the amount of actual work done by Graph DB engine in term of number of hits per millisecond.Please check if you have done indexing correctly.
This will give you enough clue to optimise your query.

Many thanks
Sameer S Gijare

Hi @sameer.gijare14
thanks for answering

Wel, I have ran the profile and it says that I got 19099 hints in 59 milliseconds.

Those are the index that I have if I execute the :shema commands

As I said the query is quite fast, or at least, it seems to be fast. But using the API it take quite some time to show some results

The bottle neck is in the concrete class determination of the mapping bits. And with the SWElement you are really challenging SDN :wink:
But hey, challenge accepted: Inheritance determination performance improvements · Issue #2487 · spring-projects/spring-data-neo4j · GitHub
In ~30 minutes there should be a snapshot available 6.2.3-GH-2487-SNAPSHOT that should improve the performance a lot. The request for "Luke" went down to sth. 1 - 1.5 but this was also with a profiler running.
Would be great if you could give us feedback here or (even better) on the issue, if you also have a GitHub account.

Your pom should then contain this:

<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-neo4j</artifactId>
    <version>6.2.3-GH-2487-SNAPSHOT</version>
</dependency>

<repositories>
  <repository>
    <id>spring-milestones</id>
    <name>Spring Milestones</name>
    <url>https://repo.spring.io/milestone</url>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
  </repository>
  <repository>
    <id>spring-snapshots</id>
    <name>Spring Snapshots</name>
    <url>https://repo.spring.io/snapshot</url>
    <releases>
      <enabled>false</enabled>
    </releases>
  </repository>
</repositories>

Hi @gerrit.meier ,

Thanks you! I continue the discussion on the linked issue