How to infer the results of Weakly Connected Components?

The database looks like this

id | parent | rank
1 | 1 | norank
5 | 10987 | genus
6 | 5 | species
10 | 5 | species

Similarly, the database has 2.6M nodes. I have created a relationship namely
(child)<-[:IS_PARENT]-(parent).

Now I'm trying to analyze the isolated subgraphs using Weakly Connected Components Algorithms. Used the query below to get some details of the subgraph

 MATCH (c:childnode)
 WITH c.subgraphComponent as Subgraph_ID, 
     count(*) as Subgraph_Size, 
      collect(c.rank_name) as Ranks,
      collect(distinct c.rank_name) as Subgraph_Ranks
      ORDER BY Subgraph_Size DESC LIMIT 5 
 RETURN Subgraph_ID, 
        Subgraph_Ranks

OUTPUT:

So I get the top five densely connected components with their ranks and component size. But from the results generated, I can see that something weird is happening.
There cannot be a child and a parent having the same rank. For example, 5(genus) is a parent of 6(species) from the data given above. While loading the CSV file, I used the same labels for rank_name i.e., id with rank_name as property and parent with the same rank_name as property.

Questions I have:

  • When creating WCC graphs, does the algorithm consider the existing relationships?
  • How to infer the result of WCC algorithms, does it output the total number of subgraphs or the total number of isolated weak connections?
  • How do I alter the above cypher to display the list of subgraph ids and the distinct ranks belonging to those ids.

Thank you,
Ankita R

Hi Ankita,

Quick question. So are you saying, for example, Subgraph_ID 372310 should not have both rank_name 'species' and 'genus'? In your query you are matching only on Nodes with a :childnode label. Is it possible for childnodes with the same subgraph_ids to have those name_ranks?

Thank you,

-Matt

Hello Matt,

Let's consider the subgraph id 421971 from the above output. We can see that it comprises only Species. I can understand that this might be due to matching only on :childnode . I'm not sure how to implement a relationship-based Match query.

Every child node has a parent node and there cannot be a parent and a child with the same rank names. Also, a child_node can be a parent node. If you can refer to the data given above we can see that there is only one rank column.

The required output is Subraph_ID, Subgraph_Ranks (where I want the ranks to display both the distinct rank_name of child nodes as well as distinct rank name of parent nodes).

I hope I have answered your questions :)

Regards,
Ankita R

I see. I think I have a better understanding of the problem. I’ll have some more time later, however you mentioned you are not sure how to implement a relationship based match. Borrowing from your original post you could match a pattern like this:

MATCH (child)<-[r:IS_PARENT]-(parent) RETURN child, r, parent LIMIT 10

I’m not exactly sure what your data looks like, but this may give you some clues on how to get what your looking for.

Thank you,

-Matt

Yes, you are correct but what I meant was I wasn't sure to use a relationship-based match query to get a subgraph component. I use variable 'c' to access the child nodes in the subgraph components.

Match (c:childnode)
WITH c.subgraphComponent as Subgraph_ID

I want the weakly connected component to consider the relationship that I have created so that it counts the isolated subgraphs based on the relationship.

How do we modify this query MATCH (child)<-[r: IS_PARENT]-(parent) to access the rank_names of both the child and parent nodes of the subgraph component?

Regards,
Ankita R

MATCH (child) <- [r:IS_PARENT] - (parent)
RETURN child.subgraphComponent, r, parent.subgraphCompnent
LIMIT 10

You can access the properties of all the nodes in your query. However, I’m not sure this will get what you are looking for.

The Graph Data Science Library has algorithms to detect WCC. Maybe that can help?

-Matt

1 Like

Hi
Please call apoc.meta.relType (your definitin of weakness) on returned entities and filter them.If current release of apoc libraries does not include that then write your own plug in and install it on your desktop.
If successful you can share your contribution on github of neo4j.
Thanks for asking question and to me it looks like a new function that needs to be added to library.If you have older versions then just upgrade your library version and then try.
Thanking you
Yours faithfully
Sameer SG

1 Like

Hi
There is one more plug in readily available which is known as graph data
Science plug in and explore functions in that so you can order them in descending order.
Hopes that will resolve your problem
Thanks
Sameer SG

Hello Sameer SG,

The main issue is that there is only one single column for rank_names in the database. When I load the CSV file, I created child nodes with property child_id ,child_rank, and parent nodes with property parent_id and parent rank. I did it this way because I wanted to hover over the nodes and visualize the parent-child relationships and check the rank_name for each node.

child_id | parent_id | rank
1 | 1 | norank
5 | 10987 | genus
6 | 5 | species
10 | 5 | species

In the above data, the rank_name for id = 5 is a genus. I used cypher query to get the rank_name assigned for 5. The results show rank_name as genus when child_id = 5 and rank_name as species when parent_id = 5. (Which is correct based on the way I created each node).

Similarly, when WCC (GDS plugin) works on the same data it considers that child and parent belong to the same species rank name. It considers the row where 5 is a parent_id and outputs that information.

Output:
subgraph_id | subgraph_size | child_id | child_rank | parent_id | parent_rank
23451       |      2        |    6     |   species  |    5      | species

Required output:

Output:
subgraph_id | subgraph_size | child_id | child_rank | parent_id | parent_rank
23451       |      2        |    6     |   species  |    5      | genus

So this satisfies the criterion that parent and child don't belong to the same rank_name.

@johnmattgrogan I also tried out a similar solution to yours. It works, but still, I do not get the desired output. The o/p shows that child and parent belong to the same rank. So I assume that this is caused due to the above problem.

Maybe this could help but I'm yet to try, I will create the nodes once again from first but this time I will create the rank_name property for child nodes alone.

Also, I will try to update this thread if I'm successful :smiley:

Best,
Ankita R

1 Like

Hello,

I thought of giving you an update that my solution didn't work and the issue still persists.
So I'm open to trying any of your solutions.

Do let me know if you have ideas :slight_smile:

Thank you!!

Best,
Ankita R

There is a lot of context missing here that I think would be vital to answering your question. I don't think it is entirely clear how your graph model looks, This statement is unclear:

For example, 5(genus) is a parent of 6(species) from the data given above. While loading the CSV file, I used the same labels for rank_name i.e., id with rank_name as property and parent with the same rank_name as property.
Is id 5 have labels of both child and parent? Please provide a better graph model and properties/labels associated with them.

What is your query to run WCC? What is [gds.graph.create] command that you use?

To your question:

How do we modify this query MATCH (child)<-[r: IS_PARENT]-(parent) to access the rank_names of both the child and parent nodes of the subgraph component?

The answer would be:

Match (child)<-[r:IS_PARENT]-(parent) RETURN child.rank_name as child_rank_name, r, parent.rank_name as parent_rank_name
LIMIT 10

Per some of your other questions:
When creating WCC graphs, does the algorithm consider the existing relationships?
I would ask again what does your graph.create statement look like? WCC considers all relationships if you specify them in the the graph.create step.

How to infer the result of WCC algorithms, does it output the total number of subgraphs or the total number of isolated weak connections?
It writes to every property that exists in the in-memory graph based on the graph.create statement a new specified property in this case: subgraph_id. One thing you can do is index subgraph_id on all your nodes and then simply run:
Match (n) where n.subgraph_id={SOME SUBGRAPH_ID} return n limit 1000 To get a view of what it looks like to make sure you are understanding what the graph is doing, perhaps choose a subgraph which is smaller so you can see the entire structure and make sure your model is correct.
How do I alter the above cypher to display the list of subgraph ids and the distinct ranks belonging to those ids.
Your query shown looks to be doing exactly what you asked for, you are taking the largest subgraphs in the database and then filtering for (childnodes) and then getting a distinct list of those ranks. How I interpret your line 5 in the results that you disagree with or find weird is if you take all the childnodes in subgraph 2998 there exists the following rank_names: species, genus, norank. Based on your sample data:
5 | 10987 | genus
6 | 5 | species
10 | 5 | species

lets pretent that subgraph 2998 contains id 5, then we would see
subgraph_id | distinct_rank_name

2998 | [ genus,species]
because 5 is a child node and has genus rank name but is also connected to 6 and 10 which are child nodes with species rank_name.

What is your query for the incorrect output shown here?

Output:
subgraph_id | subgraph_size | child_id | child_rank | parent_id | parent_rank
23451 | 2 | 6 | species | 5 | species

Cheers,
Ben

1 Like

Hello Ben,

I have sent you a private message containing answers for all your questions. Do have a look at it.

I will update this thread once the issues gets solved.

Regards,
Ankita R