New to Neo4j graph database for data science. Need help

hi, my name is Wilson and i'm working on a fintech company. Currently I'm researching on wether we need to create a graph database. Is it wise to create a really big graph about some usecases (ex: transaction relation, P2P relation etc)? the nodes would be more than ten millions and have multiple labels and properties.

This graph database would be used for daily operational (query data) and analytics. The analytics example would be for jaccard similarity. How's the jaccard performance compared to python package such as scipy cdist, and wether Neo4j could process 1000 nodes for jaccard for example.

Thanks

Hi @wilsonchand95! Common use cases for graph databases in the financial sector are for fraud/anomaly detection, customer 360, recommendations, and knowledge graphs. Neo4j can certainly support databases with tens of millions of nodes!

For your analytics workloads, we recommend using Neo4j's graph data science library - we offer both an alpha implementation of Jaccard similarity (which should easily process a thousand nodes), as well as node similarity - which is a highly optimized implementation of Jaccard that scales much better. For even bigger graphs, we offer KNN - which uses an approximation algorithm to provide very good results while being much faster computationally.

Compared to a python library, using Neo4j's GDS avoids the i/o costs of pulling data in and out of your database, and typically GDS is much more scalable. Our implementations are written inJava, using internal APIs, to optimize them for speed and memory consumption, and are purpose built for dealing with graph data.

Check out our white paper on Financial Fraud with Graph Data Science, or this more general one. If you want to see a hands on example, @dave.voutila has a great series of blog posts on analyzing fraud using a synthetic dataset: paysim

2 Likes

hi @alicia_frame1 thanks a lot for your explanation. I've researched and tried Neo4J desktop version and GDS a lot this week and it's really fun! Thanks for the confirmation regarding number of nodes, I will try to create a graph database then.

Many thanks once again and have a great day Alicia!