Optimising or restructuring n-n relationship queries causing very slow response times

Hello Folks,

I'm looking for some assistance with one of my live graphs. My nodes and relationships looks something like this

This is an example of one end to end relationship within the graph.

The issue is related to node (Store) having multiple (Notifications) and they being related to multiple (Notification Action Types) & (Notification Types)

The query looks something like this.

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), shortestPath((nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)), shortestPath((nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute) WHERE s.uuid={app_id} AND sta.key='name' OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand) OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty) RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

And the response times for one query is more than 8 secs. And my application ends up requiring this information fairly frequently. Which was causing overall poor response and crashes which is why in the interim I have introduced redis in between to cache some of this data required by the app.

Please provide the following information if you ran into a more serious issue:

  • neo4j version: neo4j:3.5.12-enterprise and running in Docker on AWS m5.xlarge machine with 12G heap and cache size configured

  • what kind of API / driver do you use: Rest API on ECS and Node JS on separate instances

  • screenshot of [PROFILE or EXPLAIN]

  • which plugins / extensions / procedures do you use: neo4j spatial plugin

Also, attaching query.log which explains the execution time lines on live environment.

query.log
query.log.1

Any assistance on this is much appreciated!

Thanks,
Arnab

What exactly are you trying to accomplish with your query? It would help to know what information you're trying to get from the graph. I don't want to assume but it looks like you're finding stores, their locations, and then directions? But it's hard to tell.

@MuddyBootsCode

Thanks for the reply. Sorry if it wasn't clear. But the query is trying find the related notification nodes associated to the stores where the profiling query looks like:

PROFILE MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), shortestPath((nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)), shortestPath((nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute) WHERE s.uuid='dd669910-e0d3-11e8-b431-1d83bfd113a3' AND sta.key='name' OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand) OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty) RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

And the e.g. of the response of one entry from the original query is

And, the json looks like below where the column names are the nodes in the graph

[
  {
    "nt_id": "002a3ba0-2584-11ea-93de-118eb121a0f8",
    "st_ids": [
      "e5fb2cc0-2246-11ea-a327-c1a6ac2ca4a0"
    ],
    "store_names": [
      "AND"
    ],
    "notification_brands": [],
    "notification_labels": [],
    "notification": [
      {
        "sub_text": "Happy shopping!!",
        "action_url": "https://www.tatacliq.com/and/c-mbh11a00015",
        "image_url": "",
        "notification_match_type": "GENERAL",
        "validity_start": 1,
        "text": "Welcome to {{store_name}}",
        "inventory_request_params": "",
        "isActive": true,
        "validity_end": 1,
        "uuid": "002a3ba0-2584-11ea-93de-118eb121a0f8",
        "active_days": "{\"SUNDAY\":\"1100-2100\",\"MONDAY\":\"1100-2100\",\"TUESDAY\":\"1100-2100\",\"WEDNESDAY\":\"1100-2100\",\"THURSDAY\":\"1100-2100\",\"FRIDAY\":\"1100-2100\",\"SATURDAY\":\"1100-2100\"}"
      }
    ],
    "notification_type": [
      {
        "name": "Deals & Offers",
        "uuid": "2fdc2b20-4faf-11e9-bfff-47192e190163"
      }
    ],
    "notification_action_type": [
      {
        "name": "In Store",
        "uuid": "ce78fc50-4fae-11e9-b974-7995b4e2b93d"
      }
    ]
  }
]

I can tell you that the shortest path algorithm is pretty expensive. I’m not sure in your use case that it’s necessary. You could probably restructure your relationships a bit and make it work better. I’ll try to make some better suggestions when I get more time. But, I would start there.

Hi @MuddyBootsCode

Thanks for the answer. Well, I've managed to resolve this now. The reason why I had introduced shortestPath was I had this query earlier. Which is

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

And if in my graph there was an incoming relationship i.e. [r:HAS_NOTIFICATION_TYPE] & [ra:HAS_NOTIFICATION_ACTION_TYPE] for each Notification node, and there were multiple notification to each Store node. When the graph grew i.e. When I had more than 1500+ store and ~200 notifications under each and each of those had the above incoming relationships. The query would just not work and hang and crash the server. Changing those to shortestPath did stop that from happening but the results time was still high.

Now, I'm saving the Notif_Type & Notif_Action_Type nodes' uuid in the Notification node's properties. And updated the query like so:

MATCH (n:business_entity)-[:HAS_APP]->(s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), 
        (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
        WHERE n.uuid={b_id} AND s.uuid={app_id} AND sta.key='name'
        OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
        OPTIONAL MATCH(ntt:notification_type) WHERE ntt.uuid = nt.notification_type
        OPTIONAL MATCH(ntta:notification_action_type) WHERE ntta.uuid = nt.notification_action_type
        RETURN nt.uuid as nt_id,  COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names,
        COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(nt)) as notification, 
        COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type

And, the response times are under 500ms now. Thanks for the help! Cheers!

Glad to be able to help. Sometimes all it takes is a little different perspective. Happy you got it working.