I'm looking for help in modelling and querying my problem domain. My data input is a sequence of real-time measurements which represent trains traveling within Europe. Each measurement is a point on the map, with a type: departure, passage, arrival. It has additional properties like the train id, event time, country code, ... It's 1 year of data, which is about 5M measurements. A sequence of these events for a train specifically is: a departure, followed by a number of passages, and an arrival in a repeating pattern. This represents a train doing some in-between stops like for example for red lights, driver changes, ...

What I would like to query on these is custom groupings of these measurements, and then get the distance traveled for each group. Potential type of groupings:

  • group measurements delimited by "departure" and "arrival" events. This would give me the distances traveled between each stop for a specific train
  • country based grouping: this would give the distance traveled by the train in each country
  • time based grouping: distance traveled between two specific points in time

My questions:

  • is this problem domain a good fit for a graph DB?
  • how would this preferably be modeled. I'm a bit deciding between two ways of approaching this: 1. either model the measurements as chain of nodes, with relationships between them. Or 2. model a separate "train" node which has the train id, and then relation to each measurement for that train
  • how would you query this using cypher? I've been reading a lot of documentation about graph grouping, node collapsing and spatial functions. But I'm new to this, so I'm a bit lost on what the correct approach would be.

I would suggest you whiteboard your data model and then take basic data modelling exercise videos by Emil Efram. Later on you can take ETL ,CQL lessons
that will be available on neo4j website indicated below.