"Git commit" for graphs: Compare two graphs and apply the latest changes

Dear Community,

I am facing an interesting problem: I have a graph. A subgraph of it is copied over into a different DB (e.g. subdatabase in Neo4j) and being worked on. Now if the user wants to save his changes, these changes need to be transfered back into the original graph. Basically, I am looking for a "git commit" & "git merge" functionality, where a user "creates a local branch", "commits his changes" and the changes are then "merged" into the "main branch". Is there something like that? Has anyone had this problem before?

I know that there are certain methods to compare two graphs to see whether there are differences (e.g. producing an md5sum) but is there a simple way to "apply" the differences to the other graph?

Looking forward to a good discussion,
Elena

1 Like

How big of a subgraph are you talking about? I have an idea based on something I am using to host demo "fake" data that I think might be applicable.

Not too big. My "main" graph consists of about 5000 nodes and the subgraph of about 1500. The actual changes that I want to "commit and merge" apply to a subsubgraph of about 100-200 nodes with about 100 relationships maybe.

Ok, so this might not be the cleanest solution, but it is something I have been using at work.

Use apoc.export.json.query to export your subquery to a JSON file. Reference the Neo4j Documentation on formatting options. With your subquery as a JSON file you can then commit it to your git repo for your records.

I would say from there you could just edit that JSON file so you can see the changes in the git repo. Or you can load that subgraph into a separate graph and make your changes using Cypher. You can pick your poison. Loading the subgraph into a separate graph to make changes means all of your internal IDs will change, which can make things tricky when you export and migrate the changes back.

Here are some caveats to think about:

  1. Depending on what kind of changes you are making (prop value changes, deleting nodes, deleting rels, etc), the queries you use to reimport might get tricky. Not terribly tricky, but you just need to consider if you have to delete nodes/rels.
  2. Don't count on using the internal node/rel IDS (you'll see these in the exported JSON file). If you are doing significant node/rel additions/deletes then these can change when you try to readd the data back to the graph once the changes have been made.

Do you think this subgraph changes will be a consistent pattern that you will use to make graph changes?

In my view, the tricky part really is the "reimport", so the first caveat you mention. It would be nice to not having to do this "manually". We are also looking for a coded process flow here that executes the merging of the graphs on the click of a button.
As for your second caveat: we are using (additional) ids of our own to identify the nodes and relationships which is why this probably would not be a problem for us.

The changes to the subgraph will be more or less consistent as in there will be a UI where the user can make changes to the displayed entities (nodes / relationships) but is not able to see more than we show him. Hence, there is a big but limited amount of changes possible.

1 Like