Hi All,
We are uploading bulk data into neo4j using below script in which we are using apoc.load.jdbc with periodic iterate to connect our oracle database and create nodes and relationships into Neo4j.
we are using neo4j enterprise 3.5.5 version in our testing environment and have set heap initial and max heap memory 8 GB and page cache as 10 GB as recommended by neo4j admin memrec .
Our target is to load aroud 2000 million (node+property+relationships) .
once we execute our script , initially it execute with very fast speed around 10000 nodes per second and gradually performance degrades.
For further investigation we are using Jconsole for memory and thread utilization tracking in which we identified that in heap memory Old Gen is not getting cleared by GC.
Find below attachment for reference.
We need to know what is the issue in script i ran profile also but didn't observed any eager operation in it.
My question here are
1- Is it right approach to do bulk data insert or is any other method in neo4j ?
2- if we use this method then why old gen area is not getting cleared by GC?
with "select 'JioSite-XX-XX-XX-'||SITE_INST_ID uuid,a.SITE_INST_ID, a.SITE_HUM_ID, a.CLLI,
a.NUM, a.BASE_NUM, a.LATITUDE, a.LONGITUDE, a.RESTRICTIONS, a.CONTACTS, a.ADDRESS, a.POST_CODE_1, a.POST_CODE_2, a.CITY, a.STATE_PROV, a.COUNTY,
a.COUNTRY, a.ROOM, a.FLOOR, a.NPA_NXX, a.COMMENTS, TO_CHAR(nvl(LAST_MOD_TS,'01-JAN-9999'),'YYYY-MM-DD HH24:MI:SS')LAST_MOD_TS, a.LAST_MOD_BY, a.INST_VER,
TO_CHAR(nvl(LOCK_TS,'01-JAN-9999'),'YYYY-MM-DD HH24:MI:SS')LOCK_TS, a.LOCK_BY, a.PARENT_SITE_INST_ID, a.STATUS, a.FPLAN_DEPTH, a.FPLAN_WIDTH, a.FPLAN_HEIGHT,
a.FPLAN_NAME, a.FPLAN_ORIGIN_X, a.FPLAN_ORIGIN_Y from SITE_INST a
where a.num<>'CUSTOMER' AND a.STATUS<> 'IN USE' and SITE_INST_ID >=50000000 AND SITE_INST_ID <= 55000000 " as query
CALL apoc.periodic.iterate('
CALL apoc.load.jdbc("jdbc:oracle:thin:rilprod/jioprod@10.147.161.47:1522/XCOMSTB2","'+query+'") YIELD row ',
"MATCH (m:classes{name:'gFacility'})
MERGE (n:Facility { _uuid: row.UUID})
SET
n.site_inst_id = toInteger(row.SITE_INST_ID),
n.name = row.SITE_HUM_ID,
n.clli = row.CLLI,
n.num = row.NUM,
n.base_num = row.BASE_NUM,
n.site_type = row.SITE_TYPE,
n.site_id = row.SITE_ID,
n.parent_site_inst_id = toInteger(row.PARENT_SITE_INST_ID),
n.status = row.STATUS
MERGE (n)-[r:INSTANCE_OF]->(m)
",
{ batchSize:2000, parallel:false})
YIELD batches, total , timeTaken , failedOperations , failedBatches , errorMessages , committedOperations , operations
RETURN '-----SITE_INST--------',batches, total , timeTaken , failedOperations , failedBatches , errorMessages , committedOperations , operations;