I noticed that the triangleCount algorithm sometimes throws ArrayIndexOutOfBoundsException. This seems to be fixed in 3.5.13.0 and for that reason I moved to 3.5.13. But then noticed that now jaccard similarity algorithm sometimes throws ArithmeticException. I was going to look at the code and see if I can understand why that happens but looks like the githup repo does not have the code for 3.5.13. The latest version in the repo is 3.5.4. Does that mean the source code of the newer versions is not open?
@michael.hunger Are algo.nodeSimilarity and algo.similarity.jaccard using the same libraries behind the scene? I am thinking maybe if I use algo.nodeSimilarity instead of algo.similarity.jaccard it may not give me that ArithmeticException anymore.
We're in the process of moving the labs code into a product supported library, which should be released in the next month or two. We're deprecating Jaccard in favor of nodeSimilarity which uses the Jaccard similarity scoring function, but is a much more performant implementation
Look for open sourced code in the next few weeks as we get ready for a major release - I'll post on the forums as soon as it's available!
Thanks for your reply. I am glad to hear lab graph algorithms are going to be officially supported. Thanks for letting us know.
Looking forward to the release.
@shan - when using a cypher projection? The syntax is source/target, eg:
CALL algo.nodeSimilarity.stream(
'MATCH(n) WHERE n:Person OR n:ItemType RETURN id(n) as id',
'MATCH (p:Person)-[:PURCHASED]->(e:Item)-[:INSTANCE_OF]->(m:ItemType) RETURN id(n) as source, id(m) as target',
{graph:'cypher', direction:'outgoing'})
If you're looking for something equivalent to the sourceIds and targetId parameters, where you could pass a vector specifying which you want to compare, we don't explicitly support that input in nodeSimilarity. You'll want to specify the node labels for source and target either directly or via the cypher loader.
Thanks very much @alicia.frame.
Yes I am using cypher projection and I meant sourceIds and targetIds.
Just as a feedback, the good thing about having those parameters is that sometimes you have a graph, you find similarity between nodes, then add some new nodes/edges to your graph, and now you want to only calculate similarity between the newly added nodes and the old ones. Recalculating all those similarities every time a new node is added to the graph could be inefficient if you have a large graph.
As another difference between the new nodeSimilarity and the old jaccardSimilarity, the former adds two edges between every pair of nodes (a-->b and a<--b) whereas the latter was smart enough to just add one edge. Adding two same similarity edges with the same score that are different only in their directions does not carry that much information.