I'm looking for help in modelling and querying my problem domain. My data input is a sequence of real-time measurements which represent trains traveling within Europe. Each measurement is a point on the map, with a type: departure, passage, arrival. It has additional properties like the train id, event time, country code, ... It's 1 year of data, which is about 5M measurements. A sequence of these events for a train specifically is: a departure, followed by a number of passages, and an arrival in a repeating pattern. This represents a train doing some in-between stops like for example for red lights, driver changes, ...
What I would like to query on these is custom groupings of these measurements, and then get the distance traveled for each group. Potential type of groupings:
- group measurements delimited by "departure" and "arrival" events. This would give me the distances traveled between each stop for a specific train
- country based grouping: this would give the distance traveled by the train in each country
- time based grouping: distance traveled between two specific points in time
- is this problem domain a good fit for a graph DB?
- how would this preferably be modeled. I'm a bit deciding between two ways of approaching this: 1. either model the measurements as chain of nodes, with relationships between them. Or 2. model a separate "train" node which has the train id, and then relation to each measurement for that train
- how would you query this using cypher? I've been reading a lot of documentation about graph grouping, node collapsing and spatial functions. But I'm new to this, so I'm a bit lost on what the correct approach would be.
Any help would be greatly appreciated!