Unexpected UNWIND results with multiple records

I’m new to Neo4j/Cypher and apologize upfront if there is a simple newbie mistake going on. I’m looking to accept feeds of multiple records (from Kafka), where subsequent records can change or remove relationships from previous records. I’m seeing different results depending upon whether there is a single record in the set for UNWIND or multiple records in the set. I’m looking for a way for the multiple record results to be the same as multiple single record runs.

This is a simplified example that seems to illustrate the issue encountered in a more complex scenario.

WITH [{namespace:"proj3", person:"joe",role:"admin" }
      ] AS events
UNWIND events AS event
WITH event, randomUUID() AS uuid
MERGE (namespace:Namespace {Name:event.namespace})
MERGE (who:Person {Name:event.person})
MERGE (who)-[r:hasAccess]->(namespace) 
  ON CREATE set r.Runid = uuid, r.Role = event.role
  ON MATCH  set r.Runid = uuid, r.Role = event.role
WITH namespace, uuid
MATCH ()-[rd1:hasAccess]->(namespace) WHERE rd1.Runid <> uuid DELETE rd1

Added 2 labels, created 2 nodes, set 4 properties, created 1 relationship, completed after 3 ms.

When run with this first record. It creates a Namespace node “proj3” and a person node “joe” and creates a hasAccess relationship between them with a Role property set to admin and a Runid UUID value of this run.

This is run again with a second transaction:

WITH [{namespace:"proj3", person:"bill",role:"admin" }
      ] AS events
UNWIND events AS event
WITH event, randomUUID() AS uuid
MERGE (namespace:Namespace {Name:event.namespace})
MERGE (who:Person {Name:event.person})
MERGE (who)-[r:hasAccess]->(namespace) 
  ON CREATE set r.Runid = uuid, r.Role = event.role
  ON MATCH  set r.Runid = uuid, r.Role = event.role
WITH namespace, uuid
MATCH ()-[rd1:hasAccess]->(namespace) WHERE rd1.Runid <> uuid DELETE rd1

Added 1 label, created 1 node, set 3 properties, deleted 1 relationship, created 1 relationship, completed after 3 ms.

This time a Person node “bill” is created and a hasAccess relationship with a roleAttribute of admin and Runid UUID value of this run. The hasAccess relationship between the Person node joe and the namespace has been removed.

This is all as expected. However, when both records are include in the set in a single run. Different results occur. Start with a clean database (match (n) detach delete n) and run them both together.

WITH [{namespace:"proj3", person:"joe",role:"admin" },
    {namespace:"proj3", person:"bill",role:"admin" }
      ] AS events
UNWIND events AS event
WITH event, randomUUID() AS uuid
MERGE (namespace:Namespace {Name:event.namespace})
MERGE (who:Person {Name:event.person})
MERGE (who)-[r:hasAccess]->(namespace) 
  ON CREATE set r.Runid = uuid, r.Role = event.role
  ON MATCH  set r.Runid = uuid, r.Role = event.role
WITH namespace, uuid
MATCH ()-[rd1:hasAccess]->(namespace) WHERE rd1.Runid <> uuid DELETE rd1

Added 3 labels, created 3 nodes, set 7 properties, deleted 2 relationships, created 2 relationships, completed after 8 ms.

The end result is that the two Person nodes “joe” and “bill” are created and the Namespace node proj3 is created. But there are no relationships in this final result. There is also the interesting stats that 2 relationships were created and 2 were deleted.

Why isn’t the end result the same as when the two are run separately? What can be done so the same result is received when the two transactions are present as a set in the unwind?

Try running your query this far.

WITH [{namespace:"proj3", person:"joe",role:"admin" },
    {namespace:"proj3", person:"bill",role:"admin" }
      ] AS events
UNWIND events AS event
WITH event, randomUUID() AS uuid
MERGE (namespace:Namespace {Name:event.namespace})
MERGE (who:Person {Name:event.person})
MERGE (who)-[r:hasAccess]->(namespace) 
  ON CREATE set r.Runid = uuid, r.Role = event.role
  ON MATCH  set r.Runid = uuid, r.Role = event.role
RETURN namespace, uuid

If you look at the table view in Neo4j Browser, you will see that you have two records, one for each UUID. Since yo have two rows, you are doing the following match twice. The first time you match all the hasAccess relationships for namespace and delete the one that doesn't match your first UUID. The second time, you match all the hasAccess relationships for namespace and delete the ones that don't match the second UUID. That leaves you no relationships left.

If you would like to delete any relationships that don't have a UUID that matches one of the ones that you just created for that namespace, you could do that at the start of your statement before you add any new nodes and relationships. Here's another alternative that matches what I think you are trying to do.

WITH [{namespace:"proj3", person:"joe",role:"admin" },
    {namespace:"proj3", person:"bill",role:"admin" }
      ] AS events
UNWIND events AS event
WITH event, randomUUID() AS uuid
MERGE (namespace:Namespace {Name:event.namespace})
MERGE (who:Person {Name:event.person})
MERGE (who)-[r:hasAccess]->(namespace) 
  ON CREATE set r.Runid = uuid, r.Role = event.role
  ON MATCH  set r.Runid = uuid, r.Role = event.role
WITH namespace, collect(uuid) as uuids
MATCH ()-[rd1:hasAccess]->(namespace) WHERE NOT rd1.Runid IN uuids DELETE rd1

@nsmith_piano thank you for taking a look at this and for the suggestions. Unfortunately that doesn’t provide the results I was looking for. When running your final example, the relationship to both joe and bill remain. In the intended result, the relationship to only bill in the second record should remain.

In the intended result when starting with an empty database, the first record would create the namespace ‘proj3’, the person ‘joe’ and a hasAccess relationship between them with the Role admin and the first Runid value. The match/delete wouldn’t delete any relationships as there are no other relationships yet with a different Run ID value.

The intended result with the second record would create the person ‘bill’ and a hasAccess relationship from ‘bill’ to ‘proj3’ with Role admin and the second Runid value. The match/delete when run with this record should delete the hasAccess relationships to ‘proj3’ that do not have the second Runid value, that is, it should delete the relationship from ‘joe’ in the first record, but leave the relationship from ‘bill’ in the second record which does have the second Runid value.

This is the result received if each of those records are run in individual executions of the statement. However, when they are run with both records in the events array, different results are received.

Your first example shows that the relationships are setup with different Runid values, one from the first record and a different value for the second record. The Runid values appear to be set as expected. But the match/delete part of it does not seem to remove them as intended.

Any thoughts on how that result might be obtained?

According to my understanding, a Cypher statement represents a single transaction against the database. What you're describing sounds to me like two transactions. If are using the apoc library, you could turn each event into it's own transaction this way.

call apoc.periodic.iterate(
"WITH [{namespace:'proj3', person:'joe',role:'admin' },
    {namespace:'proj3', person:'bill',role:'admin' }
      ] AS events unwind events as event RETURN event",
"WITH event, randomUUID() AS uuid
MERGE (namespace:Namespace {Name:event.namespace})
MERGE (who:Person {Name:event.person})
MERGE (who)-[r:hasAccess]->(namespace) 
  ON CREATE set r.Runid = uuid, r.Role = event.role
  ON MATCH  set r.Runid = uuid, r.Role = event.role
WITH namespace, collect(uuid) as uuids
MATCH ()-[rd1:hasAccess]->(namespace) 
WHERE NOT rd1.Runid IN uuids DELETE rd1", 
{batchSize:1, parallel:false})

I am pretty new to Cypher myself, so I would welcome a more experienced community member to chime in! :grin:

You could also do a bigger batch size and set iterateList to false in the parameters for apoc.peridoic.iterate.

{batchSize:100, iterateList:false, parallel:false}

I don't know if it makes a difference for performance.

@nsmith_piano, I started down a similar path. I'm intending to use this to consume messages from Kafka and the "With" part of this with the records would be external to the scope I could influence. But I did come up with this that appears to provide the results expected:

WITH [{namespace:"proj3", person:"joe",role:"admin" },
    {namespace:"proj3", person:"bill",role:"admin" }
      ] AS events
UNWIND events AS event
call apoc.cypher.doIt("
WITH $event as event, randomUUID() AS uuid
MERGE (namespace:Namespace {Name:event.namespace})
MERGE (who:Person {Name:event.person})
MERGE (who)-[r:hasAccess]->(namespace) 
  ON CREATE set r.Runid = uuid, r.Role = event.role
  ON MATCH  set r.Runid = uuid, r.Role = event.role
WITH namespace, uuid
MATCH ()-[rd1:hasAccess]->(namespace) WHERE rd1.Runid <> uuid DELETE rd1
",{event:event}) yield value
return value

I need to take this simple example and try to implement it back into the more complex use case to see if it still holds out. But I'm optimistic at the moment that it will work out. I'll let you know if it does or not. And if it does will mark this closed.

Thank you very much for your input and suggestions.

1 Like

Thanks for sharing apoc.cypher.doIt(). I haven't used that one before. It looks like you're on the path to success.

Here's an alternative Cypher-only solution. It only creates the hasAccess relationship for the last event on each namespace. Does that fit the bill?

WITH [{namespace:'proj3', person:'joe',role:'admin' },
    {namespace:'proj3', person:'bill',role:'admin' },
    {namespace: 'proj4', person:'emily', role:'admin'}
      ] AS events 
      unwind events as event
      MERGE (namespace:Namespace {name:event.namespace})
      MERGE (:Person {name:event.person})
      WITH event, namespace
      OPTIONAL MATCH ()-[rd1:hasAccess]->(namespace)
      DELETE rd1
      WITH event.namespace AS namespace, collect(event) as nsEvents
      WITH namespace, head(reverse(nsEvents)) as lastEvent, randomUUID() as uuid
      MATCH (p:Person {name:lastEvent.person}), (ns:Namespace {name:namespace})
      MERGE (p)-[:hasAccess {Runid:uuid, role:lastEvent.role}]->(ns)
      RETURN *