@andrew_bowman thanx once again for your help.
Regarding multiple contacts between 2 individuals, it is one of the most complex behaviours to model (I think). That is where we are struggling!
This is mainly because every new contact between 2 individuals must take into account the state of the individual transmitting in that moment in time (which could be the addition of multiple previous contact events).
Lets see the following simple example:
So, lets say Mark is tested positive covid19 in Day0. Lets say we want to calculate the probability of Ann being sick due of this contacts.
The contact paths for Ann are:
Mark -> day1 -> Tom -> day2 -> Ann
Mark -> day3 -> Tom -> day4 -> Ann
Mark is in contact with Tom in day1, transmitting the virus with a certain probability p1, and later Tom is in contact with Ann in day2, transmitting the virus with a certain probabilty p2.
In this initial sequence (path1) we can calculate the probability of Ann being infected as
which are all values that we can get from the path itself.
Now, Mark contacts again with Tom in day3 with a transmission probability of p3, and later in day4 Tom contacts again with Ann with a transmission probability of p4.
In this second sequence (path2), the resulting probability of Ann being infected due to this contact in day4 is NOT p3•p4. If we just take into account p3•p4 we will be assuming that the initial contact of Mark and Tom in day1 didnt exist. In day4 Tom transmitts to Ann his state at that later moment, which is a result of being contacted by Mark 2 times.
The true probability of Ann being infected due to contact in day4 is (assuming events are statistically independent for simplicity) is
P(AnnSickContact4)= p4•(p1+p3 - (p1•p3))
This is because the probability of Tom being infected in the second (later state) path is:
(p1+p3 - (p1•p3))
(theory of calculating the probability of 2 independent events)
So, what I find challenging is how to represent this is the Cypher queries. As you can see, in order to calculate the probability of Ann being sick in the second path involves knowing information of the first path (we need p1).
In addition to that, we need to calculate the overall probability of Ann being sick, which is the result of 2 independent events too: transmission in contact in day2 and transmission in contact in day 4. We can again apply the same formula of addition of independent events:
Overall probability of Ann being sick =
P(AnnSickContact2) + P(AnnSickContact4) - [P(AnnSickContact2)•P(AnnSickContact4)]
In this example I used 2 doble contacts of the same people (I could've done an even simpler example), but in general terms the issue we want to address is that the "state" being transmited by personB to personC in a given contact event cannot be evaluated taking only into account the state of personB as a result of that path but as a result of all the transmission paths that led to him in previous moments.