Model split on date

Link Prediction with the Graph Data Science Library - Developer Guides describes how to perform link prediction. The data is split into a train/test dataset.

How can I influence this train/test split? In particular, I have a temporal graph and want to perform the split using temporal cross-validation. (i.e. consider the time attribute when performing the split).

It looks like you may want to roll your own Split Relationships - Neo4j Graph Data Science functionality. Usually one does not specify how/where to split the dataset, because that introduces user bias, random sampling is used. That said, I can imagine a few scenarios where I would be likely to split on timestamp (e.g. train on past events to predict future events..).

1 Like

@Joel has got the right idea. Instead of using the splitRelationships procedures (which break your in-memory graph's relationships into test / train / holdout), you can label the relationships you want to use for those yourself (either in your source data model, by using a cypher projection, or when you load the data) and specify them when you call link prediction.

If you have time stamped data, time slicing is considered a best practice - so definitely try it out, and let us know how it works out :slight_smile:

1 Like

Thanks for the suggestions. Indeed, I do have timestamped edges, i.e. an interaction between nodes takes place at a certain timestamp.

Choosing the right relationships is easy and already clear to me. However, so far it is unclear to me how to use this edge attribute to natively inside neo4j i.e. using the split relationship tool to perform such a time series cross-validation.

Is there any native & integrated tool to support time slicing?

Use can use latest Dev Tools that will help you realize relationships clearly. But If you want time series to validate your findings then you need to push the data into time series database of your choice. The good site to have a look at is where you will get concepts clear on time series data analysis.

Thanking you
Sameer G

Not yet - although we're aware of the request, and it's in our backlog.

What you'll want to do now is add labels to your existing DB based on your time split (eg MATCH (n1)-[r:RELATIONSHIP]-(n2) WHERE < DATE MERGE (n1)-[:TEST_DATA]-(n2)), and then load the data in the in-memory graph with the specified labels, and pass those to the link prediction algorithm.

It's a bit cumbersome at the moment, but we're working on making the process less painful. Another alternative is to use the subgraph projection features coming out in 1.6 and break your in-memory graph into separate projections based on the time stamp, but they would be separate graphs instead of a single graph where you can control the holdouts.

1 Like