How to find the similarity between common nodes of multiple type nodes?

So i'm using the "PIMA INDIANS DIABETES DATASET".
And i made these type nodes:

  1. Person {age,id}
    2.BMI {bmi level}
    3.OUTCOME {outcome}
    3.Blood Pressure {Blood Presseure level} and so on.....

I want to find the similarity between all the persons whose age is between 21-25 and who have been diagnosed with diabetes.
I want my answer something like this:
BMI similarity: 0.82
BP similarity: 0.67
I have seen all the graph algorithms but i didn't find anything relevant.
can we achieve this using Neo4j?

Ps. All the examples i have seen uses similarity between same type of relation.

Welcome to the community. Can you describe more of what you mean by similarity or give a link to an example you have considered?

Ok so i have a dataset which contains the following columns.

1.ID
2.AGE
3.BMI
4.BP
5.INSULIN
6. OUTCOME

I Took Id as a node and added age as it's property.
Then i made separate nodes of all other columns like BMI, BP, INSULIN etc.
I have made relationship such that each "ID" has nodes to connected to their "BMI", "BP","INSULIN" etc values.

Now my query is this:
"Find the mean of BMI of all the persons whose age is <=25?"

Is creating a dedicated node set for each column the most efficient way?

Our node similarity algorithm calculates the similarity of nodes based on their neighboring nodes (think of a (:Person)-[:LIKES]->(:Instrument) graph -- we measure how similar Person nodes are based on the number of the same Instruments they like vs. the number of different ones.

If you wanted to use that algorithm, you would need to the things you want to measure similarity on (eg. outcomes) into nodes. If you have a schema where Person is a node label with age and id attributes, and Outcome is a node label with a description attribute you could use nodeSimilarity in this way:

CALL algo.nodeSimilarity.stream(
     'MATCH(p:Person) WHERE p.age < 25 RETURN id(n) as id', 
     'MATCH (p:Person)-[:HAS_OUTCOME]->(o:Outcome) RETURN id(p) as source, id(o) as target',
{graph:'cypher')

In your reply to @nsmith_piano, you're asking about a mean value. Check out our documentation on aggregating functions here: Aggregating functions - Cypher Manual .

1 Like

Thanks for the info. As you mentioned, the node similarity calculates similarity for only one type of relationship "LIKES" in your example. Like 'A like guitar and piano", "B likes keyboard and guitar". So they are 50% similar. What i want is "A likes guitar and lives at London", "B likes piano and lives at Mumbai"., so "A and B are 50% similar as they like same instrument but stay at different place. I know we can do this by measuring similarity to relation "LIKES" once, and then with "LIVES" once. But what if i want to compare using two relations at the same time? Btw, sorry if i framed the question wrong. I was just confused.

You can combine multiple node and relationship types for the purpose of running an algorithm -- either by pre-loading a named graph (see section 2.3.4 loading multiple relationship types and node labels), or by using a cypher projection that references the nodes and relationships you want to consider.

For the musical intrument example, if we add in a Place node and a LIVES_IN relationship, you could use a cypher projection like this:

CALL algo.nodeSimilarity.stream(
     'MATCH(n) WHERE n:Person or n:Instrument or n:Place RETURN id(n) as id', 
     'MATCH (s:Person)-[]->(t) RETURN id(s) as source, id(t) as target',
{graph:'cypher', direction:'outgoing'})
2 Likes

Solved my issue.Thanks a lot! :slight_smile:

1 Like

Hey Alicia, great solution!
How can we return the node label instead of node id?

You can use the asNode function -- in the YIELD statement, return the nodeId, and then you can use algo.asNode to access labels and attributes. For example:

CALL algo.nodeSimilarity.stream('Person | Instrument', 'LIKES', {
  direction: 'OUTGOING'
})
YIELD node1, node2, similarity
RETURN algo.asNode(node1).name AS Person1, algo.asNode(node2).name AS Person2, similarity
ORDER BY similarity DESCENDING, Person1, Person2
1 Like

I have similar question.
How can we apply node similarity based on edge property value?
I have graph in which stock names are node.
Dates are node.
And price links node with dates.
So how to apply node similarity for different stocks?

1 Like

Hi Alicia

I think nodeSimilarity is now deprecated, I tried to run this cypher projection with jaccard similarity but i get an error "Procedure call does not provide the required number of arguments: got 3 expected 2."

@mangesh.karangutkar Node Similarity has not been deprecated: Node Similarity - Neo4j Graph Data Science

The error message you received from jaccard indicates that you've provided more inputs that it expects. The jaccard function expects a pair of inputs (the two nodes being compared); perhaps that's the issue. I would look to the docs for more information on the syntax: Similarity functions - Neo4j Graph Data Science

Please Could you help me...how to find out node similarity algorithm between nodes without relationships.

Thank you .

Nodes can then be just considered as classes the way we treat them in OOPS. You can write your own algorithms either for finding or comparing similarities between two classes/nodes.

But then that's your design and you need to tailor the algorithm as per your needs. If you need more help you need to be more verbose/specific on what exactly you want

Thanks
Sameer

Hi Alicia! I have done this in a similar structure, but the algorithm takes too long. I'm using four labels of nodes (Client and data from them: range of income, age, business line, etc.) and three types of relationships in a named graph (using gds), what could be happening?

If you don't have any relationships, you'd need to use node properties to calculate similarity on - check out KNN or Cosine Similarity. Those can create relationships between nodes that have similar properties, but no relationships.

Can you share your code? And how many nodes / relationships are in your graph?

Of course! Thank you!

Here I create the named graph and execute the algorithm:

--Client job graph
CALL gds.graph.create("client-job-graph", ["Client", "BusinessLine", "EconomicActivity","MonthlyIncome"],
["HAS_BUSINESS_LINE", "HAS_ACTIVITY", "HAS_MONTHLY_INCOME"]) YIELD nodeCount, relationshipCount;

CALL gds.nodeSimilarity.write("client-job-graph", {
    writeRelationshipType: "SIMILAR_J",
    writeProperty: "score_j",
    degreeCutoff: 3,
    topK: 5
})

In the named graph there's 528,739 nodes and 1,586,139 relationships, almost all of the nodes are Client nodes, since the other ones are sort of categories.

Does it execute too slowly? Or not at all? Usually the first thing I recommend is adding a degree cutoff and setting topK, but you've done that already.


You can take a peek at the debug logs to check on progress - as NodeSimilarity runs, it will print the percentage of each stage that's complete.


One thing you can try is to first run WCC on your client-job-graph and then run node similarity on individual components - this breaks the problem up and makes it much faster.

It executes slowly, it does finish but after an hour or hour and a half. I will be trying your suggestions and comment on the results, thanks a lot, Alice!