Graph algorithms 3.5.13 source code

shan · December 18, 2019, 9:33pm

I noticed that the triangleCount algorithm sometimes throws ArrayIndexOutOfBoundsException. This seems to be fixed in 3.5.13.0 and for that reason I moved to 3.5.13. But then noticed that now jaccard similarity algorithm sometimes throws ArithmeticException. I was going to look at the code and see if I can understand why that happens but looks like the githup repo does not have the code for 3.5.13. The latest version in the repo is 3.5.4. Does that mean the source code of the newer versions is not open?

michael.hunger · December 19, 2019, 9:48am

Can you create a GH issue for the exceptions you see?

We are working on making the code available again, it's currently undergoing some internal restructuring / modifications.

shan · December 19, 2019, 3:21pm

Thanks and yes @michael.hunger I already did that:

github.com/neo4j/neo4j

algo.similarity.jaccard throws ArithmeticException

opened 04:29PM - 18 Dec 19 UTC

closed 10:00PM - 23 Dec 19 UTC

seyeda

bug

**Server spec:** - Neo4j version: 3.5.12 - Graph algorithms version: 3.5.12.0 …or 3.5.13.0 - Operating system: macOS Sierra **Steps to reproduce:** 1. Install neo4j with the above spec. Don't forget to add the graph algorithms jar file. 2. Run the following cyphers to create a small sample graph ``` merge (a:Test {name:"a"}) merge (t1:Test {name:"t1"}) merge (t2:Test {name:"t2"}) merge (t3:Test {name:"t3"}) merge (t4:Test {name:"t4"}) merge (t5:Test {name:"t5"}) merge (t6:Test {name:"t6"}) merge (t7:Test {name:"t7"}) merge (t8:Test {name:"t8"}) merge (t9:Test {name:"t9"}) merge (t10:Test {name:"t10"}) merge (t11:Test {name:"t11"}) merge (t12:Test {name:"t12"}) merge (t13:Test {name:"t13"}) merge (t14:Test {name:"t14"}) merge (t15:Test {name:"t15"}) merge (t16:Test {name:"t16"}) merge (t17:Test {name:"t17"}) merge (t18:Test {name:"t18"}) merge (t19:Test {name:"t19"}) merge (t20:Test {name:"t20"}) merge (t21:Test {name:"t21"}) merge (t22:Test {name:"t22"}) merge (t23:Test {name:"t23"}) merge (t24:Test {name:"t24"}) merge (t25:Test {name:"t25"}) merge (t26:Test {name:"t26"}) merge (a)-[:CONNECTED_TO]->(t1) merge (a)-[:CONNECTED_TO]->(t2) merge (a)-[:CONNECTED_TO]->(t3) merge (a)-[:CONNECTED_TO]->(t4) merge (a)-[:CONNECTED_TO]->(t5) merge (a)-[:CONNECTED_TO]->(t6) merge (a)-[:CONNECTED_TO]->(t7) merge (a)-[:CONNECTED_TO]->(t8) merge (a)-[:CONNECTED_TO]->(t9) merge (a)-[:CONNECTED_TO]->(t10) merge (a)-[:CONNECTED_TO]->(t11) merge (a)-[:CONNECTED_TO]->(t12) merge (a)-[:CONNECTED_TO]->(t13) merge (a)-[:CONNECTED_TO]->(t14) merge (a)-[:CONNECTED_TO]->(t15) merge (a)-[:CONNECTED_TO]->(t16) merge (a)-[:CONNECTED_TO]->(t17) merge (a)-[:CONNECTED_TO]->(t18) merge (a)-[:CONNECTED_TO]->(t19) merge (a)-[:CONNECTED_TO]->(t20) merge (a)-[:CONNECTED_TO]->(t21) merge (a)-[:CONNECTED_TO]->(t22) merge (a)-[:CONNECTED_TO]->(t23) merge (a)-[:CONNECTED_TO]->(t24) merge (a)-[:CONNECTED_TO]->(t25) merge (a)-[:CONNECTED_TO]->(t26) merge (b:Test {name:"b"}) merge (b)-[:CONNECTED_TO]->(t1) ``` 3. Find jaccard similarity between nodes `a` and `b`: ``` match (a:Test {name:"a"})-[ie:CONNECTED_TO]->(t:Test) with [{item:id(a), categories: collect(distinct id(t))}] as source_data, collect(distinct id(a)) as source_id match (b:Test {name:"b"})-[ie:CONNECTED_TO]->(t:Test) with [{item:id(b), categories: collect(distinct id(t))}] as target_data, source_data, source_id, collect(distinct id(b)) as target_id CALL algo.similarity.jaccard(source_data+target_data, {similarityCutoff:0.01, sourceIds:source_id , targetIds: target_id, write:true, writeRelationshipType:'SIMILAR', writeProperty: 'jaccardSimilarity'}) YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100 return nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100 ``` **Expected behaviour:** Returns jaccard similarity between the two nodes and add an edge of type `SIMILAR` between the nodes **Actual behaviour** Throws an exception and shows the following error message: `Failed to invoke procedure algo.similarity.jaccard: Caused by: java.lang.ArithmeticException: / by zero` **Further explanation** I noticed that if I set the `similarityCutoff` threshold to zero or `write` to False, it works. Or if I use `algo.similarity.jaccard.stream`, it will still work. It looks like when it needs to add the a new edge to the graph then it fails. If I use version 3.5.11 of the graph algorithms library, it seems to work. * As a side note, I noticed that if I set the `similarityCutoff` to zero, it never writes the result back to the graph. This is not documented in neo4j manual. In fact there are some examples in the manual for `algo.similarity.jaccard.stream` where `similarityCutoff:0.0` which makes you to think that you can do the same with `algo.similarity.jaccard`

shan · December 19, 2019, 3:38pm

@michael.hunger Are algo.nodeSimilarity and algo.similarity.jaccard using the same libraries behind the scene? I am thinking maybe if I use algo.nodeSimilarity instead of algo.similarity.jaccard it may not give me that ArithmeticException anymore.

alicia.frame · December 20, 2019, 9:38am

Hi Shan!

We're in the process of moving the labs code into a product supported library, which should be released in the next month or two. We're deprecating Jaccard in favor of nodeSimilarity which uses the Jaccard similarity scoring function, but is a much more performant implementation

Look for open sourced code in the next few weeks as we get ready for a major release - I'll post on the forums as soon as it's available!

shan · December 20, 2019, 3:34pm

Hi Alicia,

Thanks for your reply. I am glad to hear lab graph algorithms are going to be officially supported. Thanks for letting us know.
Looking forward to the release.

Seyed

shan · January 21, 2020, 6:59pm

Hi Alicia,

I noticed that nodeSimilarity does not support sourceId and targetId whereas jaccardSimilarity does. Is there any workaround for that?

Thanks,
Seyed

alicia.frame · January 22, 2020, 7:04pm

@shan - when using a cypher projection? The syntax is source/target, eg:

CALL algo.nodeSimilarity.stream(
     'MATCH(n) WHERE n:Person OR n:ItemType RETURN id(n) as id', 
     'MATCH (p:Person)-[:PURCHASED]->(e:Item)-[:INSTANCE_OF]->(m:ItemType) RETURN id(n) as source, id(m) as target',
{graph:'cypher', direction:'outgoing'})

If you're looking for something equivalent to the sourceIds and targetId parameters, where you could pass a vector specifying which you want to compare, we don't explicitly support that input in nodeSimilarity. You'll want to specify the node labels for source and target either directly or via the cypher loader.

Hope that helps!

shan · January 23, 2020, 3:50pm

Thanks very much @alicia.frame.
Yes I am using cypher projection and I meant sourceIds and targetIds.
Just as a feedback, the good thing about having those parameters is that sometimes you have a graph, you find similarity between nodes, then add some new nodes/edges to your graph, and now you want to only calculate similarity between the newly added nodes and the old ones. Recalculating all those similarities every time a new node is added to the graph could be inefficient if you have a large graph.

As another difference between the new nodeSimilarity and the old jaccardSimilarity, the former adds two edges between every pair of nodes (a-->b and a<--b) whereas the latter was smart enough to just add one edge. Adding two same similarity edges with the same score that are different only in their directions does not carry that much information.

alicia.frame · January 24, 2020, 5:21pm

@shan - thanks for the feedback! I've added it to our backlog so we keep track of it when we talk about enhancements

WRT your first question, we just open sourced the code for the graph data science library, ahead of our preview release in February: GitHub - neo4j/graph-data-science: Source code for the Neo4j Graph Data Science library of graph algorithms.. It's still a work in progress, but if you want to see the underlying code or open issues etc, this will be the place for it.

shan · January 24, 2020, 6:30pm

That's awesome. Thanks a lot @alicia.frame
Looking forward to the its official release

Topic		Replies	Views
Error while trying to execute 'algo.triangleCount' Graph Algorithms/Graph Data Science apoc	5	589	March 30, 2020
Problem with graph algorithms/jaccard similarity Graph Algorithms/Graph Data Science	14	2414	April 9, 2020
Comparing Jaccard Similarity (Neo4J 3.4) to Node Similarity on Neo4j 3.5 and GDS 1.1.1 Graph Algorithms/Graph Data Science	8	505	April 22, 2021
ArrayIndexOutOfBoundException when running Degree Centrality algorithm Graph Algorithms/Graph Data Science	1	738	March 14, 2020
Graph Algorithm - A* algorithm returns java.lang.ArrayIndexOutOfBoundsException: -1 Graph Algorithms/Graph Data Science neo4j-spatial	0	796	September 17, 2019

Graph algorithms 3.5.13 source code

Related Topics