Hi everyone, I wanted to ask if there is any possible way to import records from multiple rdb's into my neo4j using neo4j-import tool for initial offline bulk load.
I know there are other options available to import data to neo4j from rdbms but online import will be very slow for my needs.
What I want to achieve is
- Build a metadata graph from all my dbs schema info into neo4j. For this I could use SchemaCrawler.
- Import all my data records inside neo4j so I will be able to run graph analytics.
The offline bulk import tool needs an empty db to work so making the connection with multiple dbs and import the records is not feasible (official documentation)
Looking into the current architecture of etl tool:
My initial thought was that it would be easy to write a script that will generate the metadata-mapping
export relational data to CSV, generate Mapping Headers and then import all of them at once...something like the implementation of @michael.hunger in 2015 relational_to_neo4j_import_tool
.....but......the minimal command line example for export it needs also input parameters for import tool. it is also written in the documentation of 'export' command
neo4j-etl export - Export from RDBMS and import into NEO4J via CSV
Examples of command usage:
Minimal command line
./bin/neo4j-etl export \
--rdbms:url <url> --rdbms:user <user> --rdbms:password <password> \
--destination $NEO4J_HOME/data/databases/graph.db/ --import-tool $NEO4J_HOME/bin \
Is it possible to separate export and import from command line ?
If not another option will be to create multiple dbs and then manage & query them to get my insights with Neo4j Fabric.
Any other option you may have to suggest?
The import tool generates the mappings (I think with `generate-mappings)
And then you run the import separately.
Why is online import too slow? How many records do you want to import in which timeframe?
What you can do in neo4j-enterprise (which is availalbe in desktop with a developer license) is just to choose different databases for the bulk import mode. And then e.g. use neo4j-fabric to query across them.
Thank you Michael for taking the time to respond.
I have no issue to generate the mappings separately but to export the csv files. Then I could run bulk import once as it cannot run separately.
I have TBs of data and the graph will have to be re-generated over the time.
Creating relationships with online import will take lot of time as it locks the nodes right?
Is there any specific reason for export and import to be tied in one command?
It seams like from the ER as well that those 2 functionalities are properly separated just not exposed in the end user.
if this is not an option then definitely neo4j-fabric will be the way but this means only with neo4j-enterprise.
It's just how the tool evolved. Origianlly there was generate-mapping + import steps. Which we then also combined.
I presume you've seen the command line docs which are basically what's used by the UI app behind the scenes.
And so far we haven't gotten any request to combine multiple schemas/imports.
Please also note that the neo4j-etl tool was really meant for beginners to get their first bit of data into neo4j, not for heavy production loads. Also as the generated schema just mirrors the relational schema and might not be a great graph schema (depending on use-case)
For those some industry level ETL infrastructures like Kettle/Hop or Talent or Nifi would be used.