I am testing graph embedding algorithm, especially fastRP. My split all nodes in my graph into test and train sets, and then evaluate fastRP/fastExtendedRP's prediction capability. I am comparing performances in the 3 categories:
- bertEmbedding: I created a bert embedding vector for the 'name' property and use that as the single feature
- fastRP: I created a fastRP embedding vector and use that as a single feature
- fastExtendedRP: I created a fastExtendedRP embedding vector through 'bertEmbedding' vector as the node's name property and use that as a single feature
In all cases, I am running the classification algorithm:
CALL gds.alpha.ml.nodeClassification.train
The results are below:
- fastRP is the worst, about 72% F1
- fastExtenedRP (using bertEmbedding as name property) is better, about 75% F1
- bertEmbedding alone is the best, about 90% F1
It seems that the structure of my graph doesn't help too much, while bert embedding alone is far better than fastRP or fastExtendedRP. Since fastExtendedRP has taken advantage of the bertEmbedding, I don't expect it is significantly worse than bertEmbedding alone. I guess it might be due to my parameter setting when training it, and those are my settings:
CALL gds.beta.fastRPExtended.write(
'nodeGraph',
{ embeddingDimension: 512,
iterationWeights: [0.0, 1.0, 1.0, 1.0],
normalizationStrength: 0,
propertyDimension: 96,
featureProperties: ['bertEmbedding'],
writeProperty: 'graphEmbedding'
}
)
YIELD nodePropertiesWritten
""")
Graph embedding authors, any suggestions on tuning in the parameters? The BERT embedding is 768 dimension in standard form.