Returning Variable Depth Relationships & Nodes as POJOs

tl;dr What is an appropriate replacement for "MATCH (n1: UserID)-[r1*]-(n2) RETURN n1, n2, r1" where I want to return all relationships & nodes.

We are using Neo4j to store complex POJOs.

@NodeEntity
class User (
    var id: UserID
    var text: Narrative
    var name: MutableSet<HumanName>
    var address: MutableSet<Address>

where UserID, Narrative, HumanName, and Address are similar entities and the layers go deeper

We stored this information as (:User)-[:ID]-(:UserID) and etc..

So when we wanted to return all User Information we do "MATCH (n1: UserID)-[r1*]-(n2) RETURN n1, n2, r1" but it is very slow (1 seconds per User).

Is there an alternative?

Can you clarify that this pattern is what you really want? (n1: UserID)-[r1*]-(n2)

That means, for each :UserID node, it will find all patterns at any depth of any type to any reachable node. That may include your entire graph, if :UserID is connected to every node at any distance. Or even worse, if every :UserID node is connected to the entire graph, then you would essentially be matching to the entire graph multiple times, once per each user.

Did you instead mean (n1:UserID)-[r1]-(n2)?

This would find all :UserID nodes in the graph, and find paths for all adjacent nodes (1 relationship away from each)

We intake large JSONs that are deeply nested.

For example, one JSON that has all the demographic information on a user. It will have name, address, text, names of children, etc... We store this object as a graph (so its not connected to other people even if they are related).

One API call we have is to return all the information on a selected USER so we would return MATCH (n1: UserID)-[r1*]-(n2) WHERE n1 = {id} RETURN n1,r1,n2 No overall there may be 50-100 nodes that are connected to n1. We cannot do (n1: UserID)-[ * ]-(n2) WHERE n1 = {id} RETURN n1,n2` because we need r1 to map it correctly.

Can you provide some sample graph showing this? Also what is the average depth from a user to its nodes, and how interconnected are all the nodes? In a highly interconnected graph (most nodes in the subgraph connected to each other) that may produce a large number of distinct paths per user, with the same data being repeated because of different permutations of relationships used for expansion.

There are no interconnected nodes and the average depth may be something like 5.

An intake may be something like this

{
  "resourceType": "User",
  "id": "example",
  "text": {
    "status": "generated",
    }
  ],
  "active": true,
  "name": [
    {
      "use": "official",
      "family": "Chalmers",
      "given": [
        "Peter",
        "James"
      ]
    },
    {
      "use": "usual",
      "given": [
        "Jim"
      ]
    },
    {
      "use": "maiden",
      "family": "Windsor",
      "given": [
        "Peter",
        "James"
      ],
      "period": {
        "end": "2002"
      }
    }
  ],
  "telecom": [
    {
      "use": "home"
    },
    {
      "system": "phone",
      "value": "(03) 5555 6473",
      "use": "work",
      "rank": 1
    },
    {
      "system": "phone",
      "value": "(03) 3410 5613",
      "use": "mobile",
      "rank": 2
    },
    {
      "system": "phone",
      "value": "(03) 5555 8834",
      "use": "old",
      "period": {
        "end": "2014"
      }
    }
  ]

And an image of the graph may be something like this

Okay, these look like acyclic graphs, so your match pattern should work fine, with each connected node only having a single distinct path to it.

Were your timings with respect to just the Cypher query (from the browser, but not the graph result view, or from cypher-shell), or from your application? Knowing if the performance cost is with the query itself vs the projecting of the data may be helpful.

They were from the application. We send them as @Query from our Kotlin / spring data neo4j backend to neo4j through bolt.

Looks like you have about 40 nodes for that user. Is that about the number of connected nodes per user in your graph? How many users? And do you need to get all of that data at once?

You may also want to perform timings via Cypher vs via your application with Spring. It is possible the bottleneck might be objection creation from the Spring side of things.

We currently have only 10 User with about 50-100/per user. We won't need all the user in the database at once generally speaking but we would need to be able to return 10-100 users at once. The current timing is 12000 ms with relationships and 330 ms without relationship information. This is database timings w/o spring.

And do you need all user data at once? Up to 100 nodes per user would seem to encompass a lot of data. In your application, is all of that data relevant and does it need to be viewable at once? If not, you might consider what the minimal amount of data is needed to address the case, then if there's a drilldown or detail view then you might get the full data set for the user.

Also can you clarify what you mean by with and without relationship information? Is this in reference to relationship properties? Are there a lot of properties for each of your relationships in the user subgraphs?