How do I create a subnetwork from a large network?

This problem is defeating me but it seems so simple that i must be looking at this backwards.

Graph Now: Drivers start at the top and drive down a directed graph [x:DRIVE] from node to node. Along that route they have two types of stops, "delivery" and "Rest". Routes can be bidirectional ( not really important to this ) so when I look at the route (Graph Now), I get a fairly simple graph. All good so far.

But one of the analytics wants a graph like End State where they only want to see the network of nodes that are "Rest" stops. [x:REST]. Rather than do this on the fly ( the actual app is big) I want to build a separate set of relationships for REST rather than calculate.

But I cannot figure out how to build the one in the middle.

I can't connect all the rest stops or I lose my flow. I can't connect all of the rest stops because that won't follow the route. I have to follow the "path" of DRIVE . So in essense, I want a simplified version of DRIVE that only has Rest Stops, and I want to create a relationship between them as an overlay.

merge (a:Delivery {inode:0,type:'Start',load:'Truck',compileunit:'route'});
merge (a:Delivery {inode:1, type:'Delivery', load:'pallet', compileunit:'route'});
merge (a:Delivery {inode:2, type:'Delivery' , load:'pallet', compileunit:'route'});
merge (a:Delivery {inode:3, type:'Delivery' , load:'pallet', compileunit:'route'});
match (a:start {inode:0}), (b:Delivery {inode:1}) merge (a)-[x:drives]-(b);
match (a:Delivery {inode:1}), (b:Delivery {inode:2}) merge (a)-[x:drives]-(b);
match (a:Delivery {inode:1}), (b:Delivery {inode:3}) merge (a)-[x:drives]-(b);
merge (a:Delivery {inode:4, type:'Rest' , load:'coffee', compileunit:'route'});
match (a:Delivery {inode:2}), (b:Delivery {inode:4}) merge (a)-[x:drives]-(b);
merge (a:Delivery {inode:5, type:'Delivery', load:'pallet', compileunit:'route'});
merge (a:Delivery {inode:6, type:'Delivery' , load:'pallet', compileunit:'route'});
merge (a:Delivery {inode:7, type:'Rest', load:'coffee', compileunit:'route'});
merge (a:Delivery {inode:8, type:'Rest' , load:'lunch', compileunit:'route'});
match (a:Delivery {inode:4}), (b:Delivery {inode:5}) merge (a)-[x:drives]-(b);
match (a:Delivery {inode:4}), (b:Delivery {inode:6}) merge (a)-[x:drives]-(b);
match (a:Delivery {inode:6}), (b:Delivery {inode:7}) merge (a)-[x:drives]-(b);
match (a:Delivery {inode:6}), (b:Delivery {inode:8}) merge (a)-[x:drives]-(b);
merge (a:Delivery {inode:9, type:'Rest' , load:'pallet', compileunit:'route'});
match (a:Delivery {inode:3}), (b:Delivery {inode:9}) merge (a)-[x:drives]-(b);

I used your Cypher script and added the [:rest] relationships with the aim to get End State

merge (a1:Delivery {inode:0,type:'Start',load:'Truck',compileunit:'route'})
merge (a2:Delivery {inode:1, type:'Delivery', load:'pallet', compileunit:'route'})
merge (a3:Delivery {inode:2, type:'Delivery' , load:'pallet', compileunit:'route'})
merge (a4:Delivery {inode:3, type:'Delivery' , load:'pallet', compileunit:'route'})

merge (a1)-[:drives]->(a2)
merge (a2)-[:drives]->(a3)
merge (a2)-[:drives]->(a4)

merge (a5:Delivery {inode:4, type:'Rest' , load:'coffee', compileunit:'route'})
merge (a3)-[:drives]->(a5)
merge (a1)-[:rest]->(a5)

merge (a6:Delivery {inode:5, type:'Delivery', load:'pallet', compileunit:'route'})
merge (a7:Delivery {inode:6, type:'Delivery' , load:'pallet', compileunit:'route'})
merge (a8:Delivery {inode:7, type:'Rest', load:'coffee', compileunit:'route'})
merge (a9:Delivery {inode:8, type:'Rest' , load:'lunch', compileunit:'route'})

merge (a5)-[:drives]-(a6)
merge (a5)-[:drives]-(a7)
merge (a7)-[:drives]-(a8)
merge (a8)-[:drives]-(a9)
merge (a5)-[:rest]->(a8)
merge (a5)-[:rest]->(a9)

merge (a10:Delivery {inode:9, type:'Rest' , load:'pallet', compileunit:'route'})
merge (a4)-[:drives]-(a10)
merge (a1)-[:rest]->(a10)

Added 10 labels, created 10 nodes, set 40 properties, created 13 relationships, completed after 175 ms

Result:

match (a:Delivery) where a.type = 'Start'
match (b:Delivery) where b.type = 'Rest'
match (a)-[:rest]->(b)
optional match (b)-[:rest]->(c) where c.type = 'Rest'
return a, b, c

Result:
Screen Shot 2020-11-10 at 8.19.52 PM

The above result shows your 'not valid' (shown in red). This is by Neo4j architecture we cannot avoid this!

One solution I can think of is using virtual nodes. Here is the Cypher script:

match (a:Delivery) where a.type = 'Start'
match (b:Delivery) where b.type = 'Rest'
match (a)-[:rest]->(b)
with a, b, collect(b) as b1
optional match (b)-[:rest]->(c)
with a, b1, collect(c) + b1 as c1
unwind c1 as c2
with apoc.create.virtual.fromNode(a, ['type']) as d1, c2
WITH d1, c2.type as t2, head(labels(c2)) AS l2, 'rest' AS rel_type
CALL apoc.create.vNode([l2],{name:l2, type:t2}) yield node as g
CALL apoc.create.vRelationship(d1,rel_type,{},g) yield rel
RETURN *;

Result:
Screen Shot 2020-11-10 at 8.36.32 PM

Interesting. Well at least I wasn't completely stupid. This was very helpful. One step may still be an issue.

Talking to them a few minutes ago, we agree that while the one at the bottom is close, this one
Screen Shot 2020-11-10 at 8.19.52 PM

is enough. That extra Drives between 8 and 7 is annoying but the representation is much closer to what we needed. So thank you, thats close enough for that. As long as 0 didnt connect directly to 7 or 8, thats perfect.

But you also did something i can't do. I cannot define those relationships from the data as its coming in. I have to build the REST connections after all the data is in.

so this line - merge (a1)-[:rest]->(a5)

I can't do that manually. That has to be constructed from the data. So picture this from the standpoint of not having these lines.

merge (a1)-[:rest]->(a5)
merge (a5)-[:rest]->(a8)
merge (a5)-[:rest]->(a9)
merge (a1)-[:rest]->(a10)

So step 1 is to get those built, step 2 is what you have nicely laid out

Any suggestion on how to build those 4 statements/connections from the data I sent ?

Using your Cypher scripts, I recreated the scenario without adding the :rest relationships.

Here is my shot at creating the :rest relationships with this small set of data. 
I did this in two steps.

Step: 1
match (c:Delivery), (d:Delivery)
match (c)-[*..3]-(d)
where c.type = 'Start' and d.type = 'Rest'
with c, d
merge (c)-[:rest]->(d)
return c, d

Screen Shot 2020-11-11 at 12.54.27 PM

Step: 2

match (c:Delivery) where c.type = 'Start'
match(c)-[:rest]-(d)-[]-(e)-[]-(f)-[]-(g)
where d.type = 'Rest' and e.type = 'Delivery' and f.type = 'Rest' and g.type = 'Rest'
with c, d, e, f, g
merge (d)-[:rest]->(f)
merge (d)-[:rest]->(g)
return c, d, e, f, g

Screen Shot 2020-11-11 at 12.55.42 PM

Finally you get your result.

APOC Procedures can help here, but you'll need to first alter the graph so that nodes of type 'Rest' get an additional :Rest label:

CALL apoc.periodic.iterate("MATCH (d:Delivery) WHERE d.type = 'Rest' RETURN d", "SET d:Rest", {}) YIELD batches, total, errorMessages
RETURN batches, total, errorMessages

Now that we have that, we can make use of APOC path finding procedures, which can let us specify a traversal pattern where we can stop expansion at the first encountered node of a given label. Basically from each :Rest node (as well as the starting node, which has no incoming :drive relationships), we'll traverse outgoing :drive relationships and only return the first :Rest node encountered per path (it prevents traversal past :Rest nodes), then we create our :rest relationship between them:

MATCH (start:Delivery)
WHERE start:Rest OR NOT ()-[:drives]->(start)
CALL apoc.path.subgraphNodes(start, {relationshipFilter:'drives>', labelFilter:'/Rest'}) YIELD node as restStop
CREATE (start)-[:rest]->(restStop)

In the labelFilter portion, prefixing a label name with / means it's a termination filter, so expansion will stop at :Rest nodes (expanding no further) and return those nodes. This prevents us from accidentally creating :rest relationships that bypass :Rest nodes.

Thank you ! This should be enough. Another great solution. Thanks again

Ahhhhhhhh that makes sense. Thank you both for some excellent answers.

Both solutions work - But if you will forgive a followup question.

We have no problem changing the "start" node to have a :Rest label. Does that simplify the query any ?

Yes, that means we can just use WHERE start:Rest as the filter for the MATCH.

Makes sense. Its all working now. Its not quick but fast enough. 5m nodes is not trivial. Made the change to the root node because I always know thats in the chain. Thank you both !