#### Using The Neo4j Graph Data Science Library And Rule-Based Graph Data Visualization In Neo4j Bloom

In this post, we explore how to use the graph algorithms available in the Neo4j Graph Data Science library along with the Neo4j Bloom graph visualization tool. We show how to run graph algorithms and use the results to style graph visualizations in Bloom, all in Neo4j Sandbox.

### The Neo4j Graph Data Science Library

Neo4j Graph Data Science is a library that provides efficiently implemented parallel versions of common graph algorithms for Neo4j, exposed as Cypher procedures. Types of algorithms available include centrality, community detection, similarity, pathfinding, and link prediction. You can read more about Neo4j Graph Data Science in the docs.

### Neo4j Bloom

Neo4j Bloom is a graph exploration application for visually interacting with graph data. You can read more about Neo4j Bloom in the docs or on the Bloom landing page.

![](upload://w1KKJ9uiTc8R0o7iN6tnpjZjxov.png)The many features of Neo4j Bloom.### Neo4j Sandbox

![](upload://4c70KVyUjw3HPgyDSCW01PPIgCw.png)The different projects available in Neo4j Sandbox.Neo4j Sandbox is a great free way to try out Neo4j without downloading or installing anything locally. It gives you access to Neo4j database, Neo4j Bloom, and plugins like Neo4j Graph Data Science, all hosted online and private to you. Once you sign in you can choose which project to spin up. Each project contains a different dataset and guided experience to show you how to query the dataset using Cypher and the Neo4j drivers, all in a private hosted environment.

![](upload://xCn3EEIfHDqkDFOaXawLkfQhzUS.png)Today weβre interested in the Graph Data Science project so weβll start there. Once we create a sandbox instance we can open Neo4j Browser where weβll see a friendly interactive guide with embedded queries and images walking us through the process of using the Neo4j Graph Data Science library.

![](upload://xe5o5ocbITxw1O2xhrLviMGKLp6.png)The guide explores how to use PageRank, Label Propagation, Weakly Connected Components (WCC), Louvain, and Node Similarity.

**The Dataset**

The Graph Data Science sandbox comes loaded with a dataset of Game of Thrones characters and their interactions across the many books of the fantasy series. The interactions are derived from the text of the books: any two character names appearing within 15 words of one another is counted as an interaction.

![](upload://2nE8EyLpTvjgsJw9pheJC3ZppX7.png)The Graph of Thrones data model.The dataset is partly based on the following works:

*Network of Thrones, A Song of Math and Westeros**, research by Dr. Andrew Beveridge.**A. Beveridge and J. Shan, βNetwork of Thrones,β Math Horizons Magazine , Vol. 23, β4 (2016), pp. 18β22**Game of Thrones, Explore deaths and battles from this fantasy world**, by Myles OβNeill, **https://www.kaggle.com/**Game of Thrones**, by Tomaz Bratanic, GitHub repository.*

### Creating A Graph Data Science Graphs

The GDS algorithms run on a subgraph or projected graph that can be specified based on labels and relationships, or a Cypher query that may create a projected graph (one that doesnβt exist in the database). For our first example, we will use the(:Person)-[:INTERACTS]-(:Person) subgraph. We load this graph using the gds.graph.create procedure.

CALLgds.graph.create('got-interactions', 'Person', {

INTERACTS: {

orientation: 'UNDIRECTED'

}

})

### Finding Influential Nodes With PageRank

A common question when working with graph data is to ask, βWhat are the most influential nodes in this network?β Centrality algorithms are used to determine the importance of nodes in a network. The Graph Data Science library includes several centrality algorithms including PageRank, ArticleRank, betweenness, closeness, degree, and eigenvector centrality.

![](upload://g1Qd8BFNw12gziwj1e9nXMMnMEJ.png)The PageRank algorithmPageRank measures the transitive influence and connectivity of nodes to find the most influential nodes in a graph. As a result, the score of a node is a certain weighted average of the scores of its direct neighbors.

PageRank is an *iterative* algorithm. In each iteration, every node propagates its score evenly divided to its neighbours.

You can read more about PageRank and other centrality algorithms in the documentation.

**Computing PageRank With Neo4j Graph Data Science**

When running algorithms with the Neo4j Graph Data Science procedures, we have the option of streaming the results back, or writing the values to the graph as node or relationship properties.

Here we compute PageRank over our interactions subgraph, showing the top 10 nodes with the highest PageRank score.

CALLgds.pageRank.stream('got-interactions')YIELDnodeId, scoreRETURNgds.util.asNode(nodeId).nameASname, scoreORDERBY scoreDESCLIMIT10

-----------------------------

ββββββββββββββββββββββ€βββββββββββββββββββ

β"name" β"score" β

ββββββββββββββββββββββͺβββββββββββββββββββ‘

β"Jon Snow" β17.213974114507444β

ββββββββββββββββββββββΌβββββββββββββββββββ€

β"Tyrion Lannister" β16.862491100281478β

ββββββββββββββββββββββΌβββββββββββββββββββ€

β"Cersei Lannister" β13.730249913781883β

ββββββββββββββββββββββΌβββββββββββββββββββ€

β"Jaime Lannister" β13.522417121008036β

ββββββββββββββββββββββΌβββββββββββββββββββ€

β"Stannis Baratheon" β12.067919091135265β

ββββββββββββββββββββββΌβββββββββββββββββββ€

β"Daenerys Targaryen"β11.790235754847526β

ββββββββββββββββββββββΌβββββββββββββββββββ€

β"Arya Stark" β11.221931967139245β

ββββββββββββββββββββββΌβββββββββββββββββββ€

β"Robb Stark" β10.819953513145444β

ββββββββββββββββββββββΌβββββββββββββββββββ€

β"Eddard Stark" β10.244713875278832β

ββββββββββββββββββββββΌβββββββββββββββββββ€

β"Catelyn Stark" β9.997365672886371 β

ββββββββββββββββββββββ΄βββββββββββββββββββ

Other centrality measures include degree centrality and weighted degree, which we can easily compute in Cypher.

CALLgds.pageRank.stream('got-interactions')YIELDnodeId, scoreASpageRankWITHgds.util.asNode(nodeId)ASn, pageRankMATCH(n)-[i:INTERACTS]-()RETURNn.nameASname, pageRank, count(i)ASdegree, sum(i.weight)ASweightedDegreeORDERBYpageRankDESCLIMIT10

----------------------------------

ββββββββββββββββββββββ€βββββββββββββββββββ€βββββββββ€βββββββββββββββββ

β"name" β"pageRank" β"degree"β"weightedDegree"β

ββββββββββββββββββββββͺβββββββββββββββββββͺβββββββββͺβββββββββββββββββ‘

β"Jon Snow" β17.213974114507444β190 β2757 β

ββββββββββββββββββββββΌβββββββββββββββββββΌβββββββββΌβββββββββββββββββ€

β"Tyrion Lannister" β16.862491100281478β217 β2873 β

ββββββββββββββββββββββΌβββββββββββββββββββΌβββββββββΌβββββββββββββββββ€

β"Cersei Lannister" β13.730249913781883β199 β2232 β

ββββββββββββββββββββββΌβββββββββββββββββββΌβββββββββΌβββββββββββββββββ€

β"Jaime Lannister" β13.522417121008036β169 β1569 β

ββββββββββββββββββββββΌβββββββββββββββββββΌβββββββββΌβββββββββββββββββ€

β"Stannis Baratheon" β12.067919091135265β147 β1375 β

ββββββββββββββββββββββΌβββββββββββββββββββΌβββββββββΌβββββββββββββββββ€

β"Daenerys Targaryen"β11.790235754847526β120 β1608 β

ββββββββββββββββββββββΌβββββββββββββββββββΌβββββββββΌβββββββββββββββββ€

β"Arya Stark" β11.221931967139245β132 β1460 β

ββββββββββββββββββββββΌβββββββββββββββββββΌβββββββββΌβββββββββββββββββ€

β"Robb Stark" β10.819953513145444β136 β1424 β

ββββββββββββββββββββββΌβββββββββββββββββββΌβββββββββΌβββββββββββββββββ€

β"Eddard Stark" β10.244713875278832β126 β1649 β

ββββββββββββββββββββββΌβββββββββββββββββββΌβββββββββΌβββββββββββββββββ€

β"Catelyn Stark" β9.997365672886371 β126 β1230 β

ββββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββ΄βββββββββββββββββ

Instead of streaming the results back we can write them to the graph, stored as node property values in Neo4j. Here we compute PageRank for each Person node in our projected graph and write the value back to the pageRank node property.

CALL

gds.pageRank.write('got-interactions', {writeProperty: 'pageRank'})

Writing the values to the graph is useful because we can then use those values stored in Neo4j in our real-time queries or data visualizations.

**Rule-Based Node Size In Neo4j Bloom**

Now that weβve computed PageRank, letβs switch over to Neo4j Bloom and start exploring the graph visually. In Bloom, the **perspective** is the configuration for our visualization environment. The perspective defines the node labels, relationships, properties, icons, size, and colors to use. We can also define search phrases in a perspective that enable powerful natural language like searching functionality by exposing parameterized Cypher queries.

In the perspective, we can also define rule-based styling that takes into account the values of node or relationship properties when styling the visualization. For example, now that weβve computed PageRank for our Person nodes, letβs create a styling rule that nodes with higher PageRank score should appear larger in the visualization. To do this we select the Person category in the legend panel and create a new rule-based style.

![](upload://cSzaVt84371xE0NdFMV4e2q4xjn.png)Creating a rule-based style to scale node size relative to the PageRank value.Now when we query for the Person category we see node size scaled by the PageRank score for that node. This allows us to immediately determine which nodes are most important, just by visually examining the scene.

![](upload://wWpEHKSTRzd8L48MI3zLW5TkwMN.png)Node size scales relative to the computed PageRank value.### Community Detection

Next, weβll apply the Label Propagation algorithm to find communities in our graph. Community detection algorithms are used to determine how groups of nodes in a graph are clustered and also the tendency of clusters or partitions to break apart. The Neo4j Graph Data Science library currently includes the following community detection algorithms: Louvain, Label Propagation, Weakly Connected Components, K-1 Coloring, Modularity Optimization, Strong Connected Components, and Triangle Counting / Clustering Coefficient.

![](upload://6H5MD8WSE6SsfB48Yc3U8DR3AcI.png)Label Propagation is a fast algorithm for finding communities in a graph. It propagates labels throughout the graph and forms communities of nodes based on their influence.

Label Propagation is an iterative algorithm. In the first iteration, a unique community label is assigned to to each node. Then in each iteration, this label is assigned to the most common label among a nodeβs neighbors. Densely connected nodes quickly spread their labels across the graph. For more details, see the documentation.

Since label propagation is a weighted algorithm (it can take into account relationship weights), we need to create a new interaction subgraph which includes relationship weights:

CALLgds.graph.create(

'got-interactions-weighted',

'Person',

{

INTERACTS: {

orientation: 'UNDIRECTED',

properties: 'weight'

}

}

)

And now letβs run the algorithm, writing the assigned communities back to the graph in a property called community. Since label propagation is an iterative algorithm and can take a few iterations to converge on stable community assignment, we specify the maximum number of iterations in the procedure call.

CALLgds.labelPropagation.write(

'got-interactions-weighted',

{

relationshipWeightProperty: 'weight',

maxIterations: 10,

writeProperty: 'community'

}

)

**Rule-Based Node Color**

Now that weβve assigned each node to a community, we want our visualization in Neo4j Bloom to use the results of this algorithm, coloring nodes by community assignment. To do this we add a new rule-based styling to the perspective in the legend panel.

![](upload://tmULcRZk8OO3EHdIlbLD2VUxHxS.png)Configuring a rule-based style to color nodes according to their community.Now in our Bloom scene, we see the node size styled by PageRank and the color determined by which community the node is assigned.

![](upload://y8tnUb2xL19wOsDsuF2CLmNzJdL.png)Next, we explore the node similarity algorithm.

**Node Similarity**

![](upload://velbMMyVPgyMwrvaY8LqcA5n9FC.png)Jaccard similarity is the size of the intersection of two sets divided by the size of their union.The Node Similarity algorithm is used to compute similarity scores for pairs of nodes in a graph. The similarity between two nodes is based on their respective sets of neighbors. To obtain a similarity measure between two sets, we compute the Jaccard Similarity of their neighboring nodes*.* Nodes are similar if most nodes that are neighbors to either node are in fact neighbors to both. Typically, Node Similarity is used on a bipartite graph, for example containing People that have a LIKES relationship to Items. Node Similarity can then be used to find the pairs of People that are the most similar in the sense that they mostly like the same items. For more information see the documentation.

Letβs create a new subgraph of Person , Book , House , and Culture nodes and all relationships connecting them.

CALLgds.graph.create('got-character-related-entities', ['Person', 'Book', 'House', 'Culture'], '*')

And then we run the node similarity algorithm on this subgraph. This will create a weighted SIMILARITY relationships in the graph, storing the Jaccard similarity for each node pair in the property character_similarity .

CALLgds.nodeSimilarity.write(

'got-character-related-entities',

{

writeRelationshipType: 'SIMILARITY',

writeProperty: 'character_similarity'

}

)

**Rule-Based Relationship Size**

We can use this similarity score to style our graph in Bloom by making the thickness of the relationship scale relative to the similarity score of two nodes: nodes that are more similar should have a thicker relationship connecting them. To do this we add a new rule-based style to the SIMILARITY relationship, specifying the size of the node should be determined by the character_similarity relationship property value.

![](upload://wVcPf1k6hg3tz8Wtxj9nipfz6QN.png)Styling SIMILARITY relationships relative to the node similarity score in the legend panel.Having added this style rule, we can use Bloomβs natural language like querying to search for a character and all similar characters in the search bar.

Person with name Daenerys Targaryen SIMILARITY Person![](upload://c4E4KMbcp2ZqtwXKz5AsaP3aKZp.png)

The equivalent Cypher query for this search would be something like this:

MATCH

(p1:Person {name: "Daenerys Targaryen"})-[r:SIMILARITY]-(p2:Person)RETURN*

Now in the visualization, we can see the relationships connecting characters most similar to Daenerys are thicker.

![](upload://eWnDic6iqTu2sgJjkaTzWII9383.png)We can verify this by inspecting all relationships connected to Daenerys in the Card List, and see that Barristan Selmy is indeed most similar to Daenerys.

![](upload://wlbD5i9SlVviKiLZAmm8QPxm9Ue.png)### Putting It All Together

Letβs say we want to visualize the top characters (as defined by PageRank) and see who they interacted with the most. We could write a Cypher query like this to give us the results:

MATCH(p:Person)WHEREEXISTS(p.pageRank)ANDEXISTS(p.community)

// only take top 10 by PageRankWITHpORDERBYp.pageRankDESCLIMIT10

//find interactions with other charactersMATCHinteractPath=(p)-[r:INTERACTS]->(other)

// order by number of interactionsWITHinteractPath,p,r,otherORDERBYr.weightDESC

// take top ten interactions per PersonWITHCOLLECT(interactPath)[0..10]AStopInteractions, pRETURNtopInteractions

But how can we visualize the results of this Cypher query in Bloom? To do this we create a search phrase in the Bloom perspective drawer. Search phrases allow us to define parameterized Cypher queries that can then be used in the natural language like search by Bloom users via the search bar.

![](upload://e4PJLOrFa0f7wOuDm7Ngxlc20Su.png)Adding a search phrase in the perspective drawer.Now we can easily search for the 20 top characters as defined by PageRank, and for each character see the top 40 characters they interacted with, simply by using the search phrase in the search bar.

Top 20 characters and 40 connections

The values 20 and 40 are passed into the parameterized Cypher query we defined in the search phrase as $numCharacters and $numConnections

![](upload://m6JeNsYdR4Ajh2a0LFGa6VeXKcI.png)This visualization is styled using the results of the graph algorithms we ran earlier

- Node size is determined by PageRank β more important nodes are larger in the visualization
- Node color is determined by label propagation, a community detection algorithm. Nodes belonging to the same community are the same color.
- Relationships are styled according to the weight property of the relationship. Relationships are thicker when two characters have interacted more frequently.

When we zoom out we can see the full subgraph:

![](upload://u5O5x31DBOfFcrem4SviEsxuXsw.png)### Resources To Learn More About Graph Algorithms And Neo4j Bloom

Here are a few resources to learn more about Neo4j Graph Algorithms and Neo4j Bloom.

- Neo4j Graph Data Science sandbox β use graph algorithms and Neo4j Bloom to explore a dataset of Game of Thrones characters and their interactions.
- Applied Graph Algorithms Online Course β use graph algorithms to add functionality to a business search application using Neo4j and React.js.
- Data Science With Neo4j Online Course β this course introduces you to using Neo4j as part of your Data Science and Machine Learning workflows.
- Download The Book βGraph Algorithms: Practical Examples in Apache Spark and Neo4jβ βfree download of the authoritative book on using graph algorithms with Neo4j.
- The Neo4j Graph Data Science Library Manual
- Neo4j Bloom landing page and documentation.
- The Bloom Visual DiscoveryNeo4j Sandbox β use Neo4j Bloom to explore a financial services dataset to find potential fraud.

Hands-On With The Neo4j Graph Data Science Sandbox was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.