Data import from EDL 1.0 to Neo4j

Hi Everyone,

I have my data in EDL 1.0 and i want to import this data in neo4j .
my data is available at publish layer of EDL nad i can access it through impala . i can write query there and fetch the data .
could you please tell me what is the best way to do it .
i never did data import through hdfs before .

Thanks in advance.

Please provide more context. What is EDL 1.0 ? I only know the acronym as "Eclipse Distribution License".

Hi Stefan thanks for your reply.

EDL stands for Enterprise data lake and in other terms we can say that my data is on HDFS.
i need to import it in neo4j.

The APOC library allows for accessing hive datasource via apoc.load.jdbc.
Additionally all procedures using URLs do allow for hdfs:// stlye urls e.g. to load csv, json, xml or others.

Can we access impala through APOC ??

Never used Impala. Quick googling shows it has a jdbc driver, so I assume apoc.load.jdbc will work with it.

thank you so much stefan.
could you please tell me a sample syntx for data load using above apoc procedure..

Hi Stefan ,
i tried it with hive and gave below url connection string

WITH 'jdbc:hive2://my_Usrname:my_password@ip-172-31-6-58.ap-south-1.compute.internal:10000/cts687382_test' as url
CALL apoc.load.jdbc(url,'student') YIELD row
RETURN row.rollno, row.class;

but after running this getting below error

I have downloaded hive jdbc connector and put this in plugins folder
please help me to find what i missed here .

any suspicious entries in log/debug.log? You can also try to explicitly load a jdbc driver via CALL apoc.load.driver("com.mysql.jdbc.Driver"); (you need to replace the classname of course with the hive pendent)

hi
please find my log file


2018-11-04 18:08:05.771+0000 INFO [o.n.k.i.DiagnosticsManager] LAST_TRANSACTION_COMMIT_TIMESTAMP (Commit time timestamp for last committed transaction): 1539811587053
2018-11-04 18:08:05.771+0000 INFO [o.n.k.i.DiagnosticsManager] UPGRADE_TRANSACTION_COMMIT_TIMESTAMP (Commit timestamp of transaction the most recent upgrade was performed at): 0
2018-11-04 18:08:05.771+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for NEO_STORE_RECORDS END ---
2018-11-04 18:08:05.787+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for TRANSACTION_RANGE START ---
2018-11-04 18:08:05.787+0000 INFO [o.n.k.i.DiagnosticsManager] Transaction log:
2018-11-04 18:08:05.787+0000 INFO [o.n.k.i.DiagnosticsManager] Oldest transaction 2 found in log with version 0
2018-11-04 18:08:05.803+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for TRANSACTION_RANGE END ---
2018-11-04 18:08:05.803+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for KernelDiagnostics:StoreFiles START ---
2018-11-04 18:08:05.803+0000 INFO [o.n.k.i.DiagnosticsManager] Disk space on partition (Total / Free / Free %): 69688356864 / 2212962304 / 3
Storage files: (filename : modification date - size)
2018-11-04 18:08:05.818+0000 INFO [o.n.k.i.DiagnosticsManager]   New folder:
2018-11-04 18:08:05.818+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2018-11-04T02:52:10-0800 - 0.00 B
2018-11-04 18:08:05.818+0000 INFO [o.n.k.i.DiagnosticsManager]   certificates:
2018-11-04 18:08:05.834+0000 INFO [o.n.k.i.DiagnosticsManager]     neo4j.cert: 2017-09-23T00:11:21-0700 - 1002.00 B
2018-11-04 18:08:05.834+0000 INFO [o.n.k.i.DiagnosticsManager]     neo4j.key: 2017-09-23T00:11:21-0700 - 1.69 kB
2018-11-04 18:08:05.834+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2017-09-23T00:11:21-0700 - 2.67 kB
2018-11-04 18:08:05.849+0000 INFO [o.n.k.i.DiagnosticsManager]   data:
2018-11-04 18:08:05.849+0000 INFO [o.n.k.i.DiagnosticsManager]     dbms:
2018-11-04 18:08:05.865+0000 INFO [o.n.k.i.DiagnosticsManager]       auth: 2017-09-23T00:12:44-0700 - 113.00 B
2018-11-04 18:08:05.865+0000 INFO [o.n.k.i.DiagnosticsManager]     - Total: 2017-09-23T00:12:44-0700 - 113.00 B
2018-11-04 18:08:05.865+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2017-09-23T00:11:26-0700 - 113.00 B
2018-11-04 18:08:05.881+0000 INFO [o.n.k.i.DiagnosticsManager]   import:
2018-11-04 18:08:05.881+0000 INFO [o.n.k.i.DiagnosticsManager]     test.csv: 2018-10-17T12:09:02-0700 - 87.00 B
2018-11-04 18:08:05.881+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2018-10-17T12:24:50-0700 - 87.00 B
2018-11-04 18:08:05.896+0000 INFO [o.n.k.i.DiagnosticsManager]   index:
2018-11-04 18:08:05.896+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2017-09-23T00:11:24-0700 - 0.00 B
2018-11-04 18:08:05.896+0000 INFO [o.n.k.i.DiagnosticsManager]   logs:
2018-11-04 18:08:05.912+0000 INFO [o.n.k.i.DiagnosticsManager]     debug.log: 2018-11-04T10:08:05-0800 - 5.14 MB
2018-11-04 18:08:05.912+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2017-09-23T00:11:18-0700 - 5.14 MB
2018-11-04 18:08:05.928+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore: 2018-10-17T14:39:19-0700 - 8.00 kB
2018-11-04 18:08:05.928+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.counts.db.a: 2018-10-17T14:39:19-0700 - 960.00 B
2018-11-04 18:08:05.928+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.counts.db.b: 2018-10-17T14:24:18-0700 - 928.00 B
2018-11-04 18:08:05.943+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:05.959+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.labelscanstore.db: 2018-11-04T10:07:46-0800 - 48.00 kB
2018-11-04 18:08:05.959+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.labeltokenstore.db: 2018-10-17T12:39:13-0700 - 8.00 kB
2018-11-04 18:08:05.959+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.labeltokenstore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:05.974+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.labeltokenstore.db.names: 2018-10-17T12:39:13-0700 - 8.00 kB
2018-11-04 18:08:05.974+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.labeltokenstore.db.names.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:05.990+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.nodestore.db: 2018-10-17T14:39:19-0700 - 16.00 kB
2018-11-04 18:08:05.990+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.nodestore.db.id: 2018-11-04T10:07:45-0800 - 873.00 B
2018-11-04 18:08:05.990+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.nodestore.db.labels: 2017-09-23T00:11:21-0700 - 8.00 kB
2018-11-04 18:08:06.006+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.nodestore.db.labels.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.006+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db: 2018-11-04T02:43:26-0800 - 135.45 kB
2018-11-04 18:08:06.006+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.arrays: 2018-10-17T14:39:19-0700 - 8.00 kB
2018-11-04 18:08:06.021+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.arrays.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.021+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.021+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.index: 2018-10-17T13:39:16-0700 - 8.00 kB
2018-11-04 18:08:06.037+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.index.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.037+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.index.keys: 2018-10-17T13:39:16-0700 - 8.00 kB
2018-11-04 18:08:06.053+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.index.keys.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.053+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.strings: 2017-09-23T01:12:02-0700 - 16.00 kB
2018-11-04 18:08:06.053+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.propertystore.db.strings.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.068+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshipgroupstore.db: 2017-09-23T01:12:02-0700 - 8.00 kB
2018-11-04 18:08:06.068+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshipgroupstore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.068+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshipstore.db: 2017-09-23T01:12:02-0700 - 23.91 kB
2018-11-04 18:08:06.084+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshipstore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.084+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshiptypestore.db: 2017-09-27T00:46:23-0700 - 8.00 kB
2018-11-04 18:08:06.084+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshiptypestore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.099+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshiptypestore.db.names: 2017-09-23T00:32:56-0700 - 8.00 kB
2018-11-04 18:08:06.099+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.relationshiptypestore.db.names.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.115+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.schemastore.db: 2017-09-23T00:11:23-0700 - 8.00 kB
2018-11-04 18:08:06.115+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.schemastore.db.id: 2018-11-04T10:07:45-0800 - 9.00 B
2018-11-04 18:08:06.115+0000 INFO [o.n.k.i.DiagnosticsManager]   neostore.transaction.db.0: 2018-11-04T10:07:42-0800 - 451.49 kB
2018-11-04 18:08:06.131+0000 INFO [o.n.k.i.DiagnosticsManager]   plugins:
2018-11-04 18:08:06.131+0000 INFO [o.n.k.i.DiagnosticsManager]     apoc-3.2.3.6-all.jar: 2018-11-04T02:42:07-0800 - 7.01 MB
2018-11-04 18:08:06.131+0000 INFO [o.n.k.i.DiagnosticsManager]     hadoop-common-3.1.1.jar: 2018-11-04T03:52:09-0800 - 3.85 MB
2018-11-04 18:08:06.146+0000 INFO [o.n.k.i.DiagnosticsManager]     hive-exec-3.1.0.jar: 2018-11-04T09:39:48-0800 - 38.72 MB
2018-11-04 18:08:06.153+0000 INFO [o.n.k.i.DiagnosticsManager]     hive-jdbc-3.1.0.jar: 2018-11-04T01:12:06-0800 - 122.33 kB
2018-11-04 18:08:06.157+0000 INFO [o.n.k.i.DiagnosticsManager]     httpasyncclient-4.0-beta4.jar: 2018-11-04T10:04:52-0800 - 150.93 kB
2018-11-04 18:08:06.161+0000 INFO [o.n.k.i.DiagnosticsManager]     httpclient-4.5.jar: 2018-11-04T09:48:31-0800 - 710.51 kB
2018-11-04 18:08:06.169+0000 INFO [o.n.k.i.DiagnosticsManager]     libthrift-0.9.3.jar: 2018-11-04T09:15:49-0800 - 228.71 kB
2018-11-04 18:08:06.177+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2018-11-04T10:06:15-0800 - 50.76 MB
2018-11-04 18:08:06.181+0000 INFO [o.n.k.i.DiagnosticsManager]   store_lock: 2017-09-23T00:11:21-0700 - 0.00 B
2018-11-04 18:08:06.185+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for KernelDiagnostics:StoreFiles END ---
2018-11-04 18:08:06.209+0000 INFO [o.n.k.i.DiagnosticsManager] --- SERVER STARTED START ---
2018-11-04 18:08:06.813+0000 INFO [o.n.k.i.DiagnosticsManager] --- SERVER STARTED END ---
2018-11-04 21:33:22.577+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 10798696ms.
2018-11-04 21:33:43.232+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 12549ms.
2018-11-05 03:19:37.205+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 20752015ms.
2018-11-05 03:19:56.070+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 324ms.

and i tried CALL apoc.load.driver("org.apache.hive.jdbc.HiveDriver");

it returns nthing but execured well aand after adding so many dependency jars getting below error

Failed to invoke procedure `apoc.load.jdbc`: Caused by: java.lang.NoClassDefFoundError: org/apache/http/HttpRequestInterceptor

NoClassDefFoundError might mean you're missing one of thedependent jars required to load the jdbc driver class. Which jars did you add to plugins folder?

Hi Stefan ,

please find below screenshot for available jars in plugin folder
image

Apoc docs for 3.2 advice to add these (APOC User Guide 3.2.3.6):

  • hadoop-common-2.7.3.2.6.1.0-129.jar
  • hive-exec-1.2.1000.2.6.1.0-129.jar
  • hive-jdbc-1.2.1000.2.6.1.0-129.jar
  • hive-metastore-1.2.1000.2.6.1.0-129.jar
  • hive-service-1.2.1000.2.6.1.0-129.jar
  • httpclient-4.4.jar
  • httpcore-4.4.jar
  • libfb303-0.9.2.jar
  • libthrift-0.9.3.jar