I have 100GB data in AWS S3 bucket in CSV format. I wanted to load those S3 data into Neo4j fabric (Horizontal scalable) database. Can someone please suggest some tool through which I can import the bulk S3 data into Neo4j fabric.
I have deployed neo4j enterprise casual cluster 4.0.4 version from AWS marketplace with 3 nodes. I wanted to shard 100GB data on all three nodes.
I feel you might be thinking you will have data distributed equally among 3 nodes in the cluster. That's not how clustering works in neo4j. Each node in the cluster has the exact same data. In other words all the nodes in a cluster are exact replica of each other.
So, all the nodes will have data related to whole file.
As for loading data, if you are comfortable with Python you can use this ingest utility to load the data into neo4j.
But documentation is saying neo4j-fabric is horizontal scalable and it also shard the data on each node of the cluster. I am bit confused here for neo4j fabric, you are saying its just replicate the same data on each node of the cluster don't do sharding. Then what is the use of horizontal scalable feature of neo4j fabric.
reference : Sharding Graph Data with Neo4j Fabric - Developer Guides
Considering your requirements I think Architecture pattern 1: Would be a good fit and most likely remove confusion you have in your mind for the time being.But before you finalize the architecture it is better to consider how much data you are going to add/sync daily/monthly/yearly and will this solution scale if your data grows.In enterprise you are working for, at what rate data is growing and what level of data granularity does your enterprise needs?