Create the relationship based on condition

Hello! I have a database with student, subjects, auditoriums and relationships:
a) () -> [: study] -> () the student is studying the subject
b) () -> [: visit {timestamp}] -> () the student visits the auditorium at the specified time.
I want to say that timestamp has the type of string (I could not use the time, but this is not particularly important).

I need to add a relation (undirected or two-way) () -> [: classmate] -> (). Students are called classmates if they study at least one general subject and have visited the same auditorium at least three times. And visualize the graph of groups (the tops are students, the edge is the presence of a classmate relationship).


The JSONs of nodes and relationships is here:
https://cloud.mail.ru/public/qaSu/28oSmVknV

For simplicity, the dates are the same where students visited the classroom together. But, Donskoy does not have a common subject with Belousova Gomenyuk And Razumov. Razumov visited the audience only 2 times and is not included in any group. Donskoy is classmate only Vasilchenko.
It is clear that the result should be like this:

Hi Nick,

Try this Cypher query.

MATCH (s1:Student)-[:study]->()<-[:study]-(s2:Student),
(s1)-[v1:visit]->(a)<-[v2:visit]-(s2)
WHERE v1.timestamp = v2.timestamp
WITH s1, s2, a, count(v1) AS visitCount
WHERE visitCount >= 3
AND id(s1) < id(s2)
MERGE (s1)-[:classmate]->(s2)
1 Like

Hi Nathan Smith!! It helped, thanks a lot :)

Is it possible to write a query that cuts off relations between fake classmates? For example, here it is necessary to cut off the Donskoy Ilya.

Hi Nick,

This is kind of a long query. Does it do what you want for removing fake classmates?

If we step through the code, here's what it does.

  1. Search for students with classmate relationships.
  2. Count the number of times the students both study the same thing.
  3. Count the number of times they have visited the same auditorium at the same time.
  4. Find the student pairs where the number of study links is zero or there are no auditoriums with three simultaneous visits.
  5. Delete the classmate relationship.
MATCH (s1:Student)-[c:classmate]->(s2:Student)
OPTIONAL MATCH (s1)-[st:study]->()<-[:study]-(s2)
WITH s1, s2, c, count(st) as studies
OPTIONAL MATCH (s1)-[v1:visit]->(a)<-[v2:visit]-(s2)
WITH s1, s2, c, studies, a, sum(case when v1.timestamp = v2.timestamp then 1 else 0 end) AS visitCount
WITH s1, s2, c, studies, collect(visitCount) AS visitCounts
WHERE studies = 0 OR NONE(vc IN visitCounts WHERE vc >= 3)
DELETE c

The code works correctly. But (no changes, no records).

I would expect the code to return no changes, no records if all of the classmate relationships in your graph are legitimate. Our previous query didn't create any fake classmates, so there should be none to delete. If you found later that one of the visit relationships was entered at the wrong time and updated the graph to reflect the correction, it might cause one of the classmate relationships to be invalid. In that case, the query would delete the invalid classmate relationship.

Did I misunderstand what you mean by fake? I don't read the Cyrillic alphabet well, so perhaps I misunderstood your example.

I want like this

You said above that Donskoy is a classmate of Vasilchenko. They both study English for Academic Purposes and visited auditorium G467 three times. Doesn't that make them classmates?

Yes, but Donskoy is not a classmate for the rest of the students in the group: Vasilchenko, Tyschenko, Polikutin.

Perhaps something like this query might help. It's a variation of the triangle counting algorithm. Students with a low ratio of one-step paths to two-step paths might have fake classmate relationships.

MATCH p=(s1:Student)-[:classmate*2]-(s2)
OPTIONAL MATCH (s1)-[r2:classmate]-(s2)
RETURN s1, 
COUNT(p) AS twoStepPaths, 
COUNT(r2) AS oneStepPaths, 
COUNT(r2)*1.0/count(p) AS ratio
ORDER BY ratio