The docs for the Louvain algorithm mention that when doing the seeded approach, "the algorithm will try to keep the seeded community IDs", which is very vague. And after doing some testing, I did confirm that seeded community IDs can be overwritten - i.e. they are not locked-in. My questions are:
Are there additional resources where I can learn more about what specifically happens when doing the seeded approach with the louvain algorithm? That is, what does it mean it will "try to keep the seeded community IDs"? I've done a lot of searching and can't seem to find any deeper information about it.
If we know certain nodes belong in the same community and therefore want to seed them with the same community id, is there a way we can run the louvain algorithm ensuring that these nodes are kept in the same community (essentially a way of establishing a must-link constraint)?
Thanks in advance for any help!
Hi, here is the answer from our engineering team:
It's not possible to lock seed IDs. Our seeds are just start IDS that are different from the default start ID, which is the internal node id. The algo runs the same and if there is a community that is better than the seeded one, that community will change.
I created some cards for those questions to get them into our pipeline
Thanks for your response Michael! So is the advantage of providing seed IDs simply for performance improvement - i.e. it reaches the optimal modularity faster (assuming many have not changed communities)? I just want to make sure I understand the benefits of providing seed IDs.
Hi, seeded algorithms could execute faster for the reason you mentioned, yes. Another benefit is during write-back (that is, using a .write procedure mode), the seed ID can be used to check wether the community ID has changed and then only those changed values need to be written.