Weighted Personalised PageRank

My work is related to creating a autonomous decision model for cyber counterintelligence.
In short I have "attack" nodes connected to countermeasure nodes with a weighted countermeasure_success relationship. Multiple network connections will perform the attack and successful countermeasures will be linked to a common countermeasure node.

I have created a simple native graph projection with the relationshipProperty set to 'weight' and I also see that this information is available through the streamrelationshipProperty.

Anyone got any experience with this algorithm?

I tried to play around in Neuler as well, and the issue with weight exists there as well. Maybe there is some prerequesites I'm missing in the graph data.

Hello @bjorge_eikenes and welcome to the Neo4j community :slight_smile:

Can wee see your queries (projection and PageRank query)?

Regards,
Cobra

Sure I think that will be ok, what is the easiest way to export a subset of my data that is relevant into cypher?

What problems are you having with the weights in NEuler? If there's a bug there I will try to fix it!

subgraph-community-2020-11-25.cypher.txt (6 KB)

My graph projection would look something like this:

call gds.graph.create('threat', 
  ['SyntheticRequest','SyntheticCountermeasure'],
  'COUNTERMEASURE_SUCCESS',
  {relationshipProperties:['weight']}
)

My personalised PageRank would look something like this:

MATCH (attack:SyntheticRequest)-[:COUNTERMEASURE_SUCCESS]->(c:SyntheticCountermeasure) WHERE attack.url contains 'attack1'
call gds.pageRank.stream('threat', {
	sourceNodes: [c],
    relationshipWeightProperty: 'weight'
})
YIELD nodeId, score
WHERE score > 0
RETURN nodeId,gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC

I uploaded my graph data as well as my graph projectiuon and personalised pagerank call.
Could be my lack of understanding since I'm still new to this :slight_smile:

You'd need to change the query that collects the source nodes to read like this:

MATCH (attack:SyntheticRequest)-[:COUNTERMEASURE_SUCCESS]->(c:SyntheticCountermeasure) 
WHERE attack.url contains 'attack1'
WITH collect(c) AS sourceNodes
call gds.pageRank.stream('threat', {
	sourceNodes: sourceNodes,
    relationshipWeightProperty: 'weight'
})
YIELD nodeId, score
WHERE score > 0
RETURN nodeId,gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
1 Like

What I discovered in Neuler was that the weighted pagerank named graph code had relationshipWeightProperty in it, for named graph I believe it is only used for the call gds.pagerank. functions. (still valid for anonymized graphs though)

When I tried to use "WITH collect(c) as sourceNodes" I ended up with countermeasure 3 and 5 at the same rank for attack1. When I tried "WITH collect(attack)+collect(c) as sourceNodes" I got a different ranking between the two, but then I also got ranking on the Request nodes as well.

I found out that I could just add another statement to the WHERE score > 0.
gds.util.asNode(nodeId):SyntheticCountermeasure gives only the nodes I'm looking for :smiley:

(Ignore this if you already understand it, but just in case...)

When you don't provide any source nodes, every node in the threat graph starts with a score of 0.15. Those scores are then dispersed to neighbours over the iterations of the algorithm.

When you provide source nodes, only the source nodes in the threat graph start with a score of 0.15. And then the same dispersing process happens.

What it means is that when you provide source nodes, the nodes that are close to those source nodes will likely end up with higher (relative) scores than they otherwise would have.

So it is expected that you should get different scores (and maybe ranks) for nodes when you vary the source nodes.

1 Like

Are there other "recommender algorithms" that are relevant for my use case?

It looks like I'm still doing something wrong because I get the same results both in the "relationshipWeightProperty: 'weight'" set and without.....

I have started a new synthetic database in order to explore this algorithm further.
At the moment I'm trying to understand the impact of tweaking different settings as well as the impact on having single vs multiple "successful" relations to the countermeasure node in addition to having multiple "successful" relations with different weight.
This is the full dump of my database in json format: synthetic-threat-2020-11-29.json.txt. (4.9 KB)

To explain my steps in my testing:

First I create a graph projection of all nodes and relationships including the relationshipProperty 'weight'

Second I run the following personalized pagerank algoritm:

MATCH (attack:Request) WHERE attack.url contains 'attack1'
WITH collect(attack) as sourceNodes
CALL gds.pageRank.stream('full',{sourceNodes: sourceNodes,relationshipWeightProperty:'weight'}) YIELD nodeId,score
WHERE gds.util.asNode(nodeId):Countermeasure
RETURN gds.util.asNode(nodeId).name,score
ORDER BY score DESC

The countermeasure with a single successful relation will get the lower rank, but my two countermeasures with two successful relationships but different weight get the same ranking.
Which at the moment I do not understand.