Graph algorithms 3.5.13 source code

I noticed that the triangleCount algorithm sometimes throws ArrayIndexOutOfBoundsException. This seems to be fixed in 3.5.13.0 and for that reason I moved to 3.5.13. But then noticed that now jaccard similarity algorithm sometimes throws ArithmeticException. I was going to look at the code and see if I can understand why that happens but looks like the githup repo does not have the code for 3.5.13. The latest version in the repo is 3.5.4. Does that mean the source code of the newer versions is not open?

Can you create a GH issue for the exceptions you see?

We are working on making the code available again, it's currently undergoing some internal restructuring / modifications.

Thanks and yes @michael.hunger I already did that:

@michael.hunger Are algo.nodeSimilarity and algo.similarity.jaccard using the same libraries behind the scene? I am thinking maybe if I use algo.nodeSimilarity instead of algo.similarity.jaccard it may not give me that ArithmeticException anymore.

Hi Shan!

We're in the process of moving the labs code into a product supported library, which should be released in the next month or two. We're deprecating Jaccard in favor of nodeSimilarity which uses the Jaccard similarity scoring function, but is a much more performant implementation :slight_smile:

Look for open sourced code in the next few weeks as we get ready for a major release - I'll post on the forums as soon as it's available!

1 Like

Hi Alicia,

Thanks for your reply. I am glad to hear lab graph algorithms are going to be officially supported. Thanks for letting us know.
Looking forward to the release.

Seyed

Hi Alicia,

I noticed that nodeSimilarity does not support sourceId and targetId whereas jaccardSimilarity does. Is there any workaround for that?

Thanks,
Seyed

@shan - when using a cypher projection? The syntax is source/target, eg:

CALL algo.nodeSimilarity.stream(
     'MATCH(n) WHERE n:Person OR n:ItemType RETURN id(n) as id', 
     'MATCH (p:Person)-[:PURCHASED]->(e:Item)-[:INSTANCE_OF]->(m:ItemType) RETURN id(n) as source, id(m) as target',
{graph:'cypher', direction:'outgoing'})

If you're looking for something equivalent to the sourceIds and targetId parameters, where you could pass a vector specifying which you want to compare, we don't explicitly support that input in nodeSimilarity. You'll want to specify the node labels for source and target either directly or via the cypher loader.

Hope that helps!

Thanks very much @alicia.frame.
Yes I am using cypher projection and I meant sourceIds and targetIds.
Just as a feedback, the good thing about having those parameters is that sometimes you have a graph, you find similarity between nodes, then add some new nodes/edges to your graph, and now you want to only calculate similarity between the newly added nodes and the old ones. Recalculating all those similarities every time a new node is added to the graph could be inefficient if you have a large graph.

As another difference between the new nodeSimilarity and the old jaccardSimilarity, the former adds two edges between every pair of nodes (a-->b and a<--b) whereas the latter was smart enough to just add one edge. Adding two same similarity edges with the same score that are different only in their directions does not carry that much information.

@shan - thanks for the feedback! I've added it to our backlog so we keep track of it when we talk about enhancements :slightly_smiling_face:

WRT your first question, we just open sourced the code for the graph data science library, ahead of our preview release in February: GitHub - neo4j/graph-data-science: Source code for the Neo4j Graph Data Science library of graph algorithms.. It's still a work in progress, but if you want to see the underlying code or open issues etc, this will be the place for it.

1 Like

That's awesome. Thanks a lot @alicia.frame
Looking forward to the its official release :slight_smile: