So I am really interested in making use of neo4j on quite large social network data. Probably in the range of 10s of millions of nodes and billions of edges. The data that would be the basis for this is currently stored in postgres. Now the environment in which I work is both computationally constrained and similarly constrained in storage capacity. My issue / question is whether any body has any strategies for using neo4j without having to replicate my data twice? A very general question I know but perhaps people have faced similar constraints....
Now of course one solution is to drop postgres completely but in the interim would need to show the benefit of neo4j over the traditional DB.
If the types of queries you have against the dataset are focused on the connections between data (which is typically a social network), then Neo4j is a great fit. I recommend that you use the ETL tool to load data from your Postgress database into Neo4j and compare queries for each of the systems.
Hi thanks for the reply. I'm very sure that neo4j is the correct tool for doing what I want but my issue is that I cannot have two copies of the data. What I am wondering is if there is a way of storing my data only once but still being able to query from both PG and neo4j.... I mean I'm pretty sure it's not possible but thought maybe someone may have some idea or just to confirm my suspicion...
We have many customers who store the connections between data in Neo4j, but store things such as profiles in a different database. Of course, you would need to determine what data you are querying on as you probably want that data in Neo4j and just use Postgres for the data that will never be used for a query, but may be linked to (such as an image).
You could have a Java client that has a connection to both databases.