Match nodes in a FOREACH statement

I understand neo4j doesnt allow MATCH in FOREACH statements but I dont quite know how to merge my nodes to existing nodes without using the match statetement. (My code explains it better)

UNWIND $data as row
MERGE (a:Assembly)
ON CREATE SET a+= row
WITH a
UNWIND $components as components
FOREACH(id IN components.uuid | MATCH (c:Component {_uuid: id})
MERGE (a)-[:RECIPE]->(c))

However if I just use merge then I will be creating new component nodes and not match them to the components that i already have. I am looking for a query that can help me solve this task

Disclaimer: I see a simalar question in the community but i dont quite understand the answer given as such I put my own problem.

UNWIND $data as row
MERGE (a:Assembly)
ON CREATE SET a+= row
WITH a
UNWIND $components as component
MATCH (c:Component {_uuid: component.uuid}
MERGE (a)-[:RECIPE]->(c)

Thank you for the quick reply. This answer worked for me. Please correct we if im wrong. From what I understand, I can use the UNWIND similarly to the FOREACH as long whats in the 'UNWIND' has the same format (not sure if this is the correct term)

There's three main types grouping/storing data in Cypher:

  • Map -- Like an object: prop: {key: val, key: val2, ...}
  • List -- Like an array: prop: [val1, val2, ...]
  • Simple -- Simple: prop: val

FOREACH, and UNWIND, operate on Lists, but do it in different ways. While they may appear similar, FOREACH is very limited in what commands in can run within the loop, but can do it much faster than UNWIND.

UNWIND brings each element in the list into the main body of the Cypher statement.

FOREACH runs one simple command on each element in the list.

Thanks this helped a lot

I should add two very useful things about lists:

list compression:

Any list, including those returned by functions, can be filtered, and operated on, with a relatively simple syntax.

Example1:

RETURN [x in [0,1,2,3,4,5] WHERE (x % 2) - 1 = 0 | x / 2] AS result
// [0.5, 1.5, 2.5]

Example2:

CREATE (temp:Thing {prop1: 1; prop2: 2; prop3: 3})
RETURN [k IN KEYS(temp) WHERE temp[k] % 2 != 0 | k + "--" + temp[k]] AS result
// [ "prop1--1", "prop3--3" ]

REDUCE

When you need to compress a list into a single value, like an average, this is your tool. Note that Cypher aggregating functions like AVG() operate on variable in the main body, rather than lists.

WITH [1, 2, 3, 4, 5] AS list1
RETURN REDUCE(sum = 0, val IN list1 | sum + val) / SIZE(list1) AS average

[edit: with fixes from andrew.bowman. Thanks for the clarification!]

I think the confusion comes from the fact that Cypher is a declarative language and people are used to using procedural languages. This is a paradigm shift that will take a bit getting used to.

If you do something like:

MATCH(p:Person) WHERE p.born > 1980
RETURN p.Name

you're getting back a stream of Persons which gets operated on in a declarative way (you can almost think of it as being in parallel). It seems like a LIST but it's not!

You can force the stream into a list:

MATCH(p:Person) WHERE p.born > 1970
WITH COLLECT(p.name) AS nameslist
RETURN nameslist  // returns names inside [] which is a list
// RETURN toUpper(nameslist) // causes an error!

Now you can UNWIND any list so that each member of the list (name) gets treated individually in a stream (and here we uppercase each individual name)

MATCH(p:Person) WHERE p.born > 1970
WITH COLLECT(p.name) AS nameslist
UNWIND nameslist AS names // no longer a list
RETURN toUpper(names) // returns a stream of uppercase names

Note that RETURN toUpper(nameslist) gives an error because toUpper expects a string and not a list of strings.

A stream can be viewed as a Table in the browser.

Close, but careful with your descriptions. In Cypher, a collection and a list are the same thing, and your comment of you're getting back a collection of Persons isn't accurate.

A more accurate description would be that operations produce streams of records. We informally refer to them as "rows" in the query plan.

So the result of this:

MATCH (p:Person) 
WHERE p.born > 1980
RETURN p.Name

Is a stream of records/rows. Each row will only hold a single name.

If only looking at the results of the MATCH, this is a stream of node records, and for each row, p will refer to only a single person. Each operation you perform on p will operate on each row. So if you did some additional MATCH from p, that expansion would be applied per row (and thus, per person). There is no collection/list at this point.

This one one reason why it's important to be careful with your singular vs plural tenses on your variable names.

MATCH (persons:Person) is not good, because per row, persons is actually a singular person node, not a plural, not a collection, and this can create confusion. It would be better to use person instead, singular.

But MATCH (p:Person) WITH collect(p) as persons makes sense, because persons is a list of person nodes, it's a collection, so we can use things like the IN inclusion check, or list predicates, or things that only work on lists.

2 Likes

Thanks! That was helpful!

The concept of a stream is somewhat abstract for newbies (vs. procedural languages).

A stream does still seem like it might connote a procedure though...

It's good from the sense you don't have visibility into the contents of a stream the way that you can watch a procedural language.

There's been some questions about debugging Cypher, and a tool to set breakpoints and examine stream contents at certain points of the query might be useful. That said, depending on the execution (whether there's an Eager operator in the plan), the order of execution might change (from streaming row by row, to operating on all rows at once), which could be confusing for many.

At the high level, it's best to stick with a simpler conception, like:

Operations produce rows
Operations execute per row
A variable, unless it is explicitly a list (from a collect(), pattern comprehension, or some other function that produces a list), refers to a single entity per row.

I still don't understand why MATCH can't occur inside FOREACH.

It's a missing feature in the language or is it because of something profound with how Neo4J works that I don't yet appreciate?

I'd say it's just a missing feature of the language, some consequence of a long-ago design decision. I could be wrong, but I don't see any real reason why it couldn't be added in. I can ask around and see if we can raise it as a feature request.