Deleting older relationships

I'm new to graph and I'm evaluating if neo4j would fit my use case.

I have 2 CSV files as follows:-

  1. Persons file (phone number, name columns)
  2. Calls file (callerNumber , recipientNumber, callDate columns).

I anticipate >50million nodes, >20billion relationships)

I have been able to create Nodes using Persons file and Relationships using Calls files through Neo4j admin import.

Challenge comes when deleting relationships for a certain callDate so that I can add newer relationships. It's too/painfully slow for large datasets.
match ()-[r: {callDate:20200101}]->() delete r;

I found out I can't index relationship properties.

Is there a way to optimize this cypher? How could I possibly re-model my CSVs?

Hello @DanielGittx,

Yes, it's possible :slight_smile: This request should work:

CALL apoc.periodic.iterate('MATCH ()-[r:{callDate:20200101}]->() RETURN r', 'DELETE r', {batchSize:1000, iterateList:true})

It deletes relationships by batches of 1000 relationships.

Regards,
Cobra

Hi @cobra,

Thanks much. Indeed the apoc you shared works (I just refractored syntax abit). But it's a bit slow for about 10billion relationships I'm working with(6 months data)

I came across this "db.index.fulltext.createRelationshipIndex" as a way of indexing relationship property.
The index is currently populating hopefully the cypher will gain some speed once done

Nice, happy to hear this :)

The apoc procedure and the index should really speed up your query :slight_smile:

Regards,
Cobra

Just an update...
The indexing process is very slow.

Considering:-

  • Database size is 1.2t
    Server configs:-
  • Heap - 230g
  • Page cache - 1.182t

Neo4j Version:-
Neo4j Browser version: 4.0.3
Neo4j Server version: [3.5.15]

It has taken 3hrs to just get to 12% (index populating)

  • CALL db.index.fulltext.createRelationshipIndex("callDateRelationship",["CALLS"],["CALL_DATE"], { analyzer: "url_or_email", eventually_consistent: "true" })

Why is this and is it possible to fast track?

Hello @DanielGittx,

Yeah because it has to index all your database, that's why it's better to do it when you create the database :slight_smile:

Regards,
Cobra

Agreed, however initially had done a bulk import (neo4j admin import).
Will neo4j admin import preserve indexes if i create them in advance then do a bulk import?

If I'm right, the index is set at the importation :slight_smile:

Regards,
Cobra

I don't think so, especially for relationship indexes

I marked one of your messages as solution because i tested that with a subset of the graph and it worked(was fast) also for the fact that i'm solving a different issue now

1 Like

I don't know more about this topic but I think you right, according to the DOC,

Full-text indexes are powered by the Apache Lucene indexing and search library

so it must be pre-computed already :thinking:

Regards,
Cobra

1 Like

Daniel,
I would suggest changing your data model to have a day of the call as a node. So it would look like:
(:Person) -[:CALLED_ON]->(:DayOfCall) <-[:RECEIVED_CALL]- (:Person)
Then you can index day of the call with date property - then DELETE request will work much faster. Please note that you will still need to use apoc.periodic.iterate()