Hello, I have this huge graph dataset loaded to Neo4j and I needed to project the whole graph to check for its weakly connected component. Is there are procedures or a standard way to delete the nodes so that I can reduce the size of the graph?

Hi @chim3yy ,

Creating a graph projection is a common step in using Neo4j GDS. The short explanation is that you can either describe the graph using a cypher query, which is called a "cypher projection" or you can use a "native projection" by specifying the node labels and relationship types.

For creating graph projections, take a look at: https://neo4j.com/docs/graph-data-science/current/graph-create/

Creating a Cypher projection:

https://neo4j.com/docs/graph-data-science/current/graph-create-cypher/

Creating a native projection:

https://neo4j.com/docs/graph-data-science/current/graph-create/

Then, for weakly connected components, see:

Best,

ABK

Thank you so much for your kind response and recommendations. My issue is that I want to project the whole graph but whenever I try to project the whole graph to run weakly connected component algorithm, it gives me memory error. So I was wondering if I can reduce the graph by removing some nodes, but the problem is which nodes to remove?

Actually I want to scale down the graph using random node sampling method as was done in this blog (Sampling A Neo4j Database - DZone). The problem is that I have many nodes with various labels so I am not sure how can I do it? The main objective is to reduce the size of the graph so that I can project the sampled graph (from the large graph) and then run weakly connected component algorithm. Itâ€™s not feasible to project the whole large graph as it gives memory error.

Does it make sense to do something like this?

Match (n1: label1)-[r]-(), (n2:label2)-[r2]-()

Where rand()<0.1

Return n1, r, n2, r2

And then I can export this graph using apoc.

Any suggestion in this regard would be so helpful.