How to speed up ORDER BY

Hello, I have a question about the ORDER BY performance of neo4j.

  • neo4j version: 4.0.6 / neo4j desktop

Node & Relationship

Node - 10,001,203 EA

  1. Article - 10,000,000 EA
  2. ...

Releationship - 11,251,324 EA

  1. Section-[CONSISTS_OF]->Article

Indexes (native-btree-1.0)

| 1 | "article_index" | ["Article"] | ["articleId", "officeId"] | "native-btree-1.0" |
| 2 | "office_index" | ["Article"] | ["officeId"] | "native-btree-1.0" |
| 3 | "order_index" | ["Article"] | ["updateDateTime"] | "native-btree-1.0" |

without order by desc

MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) RETURN a LIMIT 100

with order by desc

MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) RETURN a ORDER BY a.updateDateTime DESC LIMIT 100

The number of 'Article' nodes is about 10 million.
and The number of '(a:Article {officeId: '001'})' is 227,245.

There are many performance differences depending on whether ORDER BY is used or not.

Is there any other way to address ORDER BY performance issues?

Thanks

Can you try this?

MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a

@soham.dhodapkar

MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
RETURN a LIMIT 100

The above query itself is slow. it takes about 400ms

Is there a problem with the 'updateDateTime' index?
Or need something like a descending index?

The 'Article' Node properties are as follows.

        "properties": {
          "contents.title": "...",
          "contents.content": "...",
          "officeId": "001",
          "articleId": "0009656943",
          "contents.section": "10464f000",
          "insertDateTime": "20180820035121",
          "updateDateTime": "20181029013007",
          "refinedContent": "..."
        }

Do you have indexes on officeId and updateDateTime ?
Also, can you please profile the query and share the query execution plan?

PROFILE
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a

Indexes

I have indexes on officeId and updateDateTime.

Profiles

PROFILE
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a

And are the results returned correct?
I am trying to think of a way to make this faster than 282ms. Number of (a:Article {officeId: '001'}) is 227,245. How many nodes are (s:Section {sectionId:'104'}) ?

We can try to leverage index-backed ordering, but this requires a means to hint the type of the property, such as in a WHERE clause against a value of the same type. Can you give this a try?

PROFILE
MATCH (a:Article {officeId:'001'})
where a.updateDateTime < dateTime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
RETURN a

@soham.dhodapkar

MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a

The result of returned is not correct.There are only seven results.

MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})

I think the above query is executed as a result from 'ORDER BY a.updateDateTime DESC LIMIT 100'.

And Number of (s:Section {sectionId:'104'}) is only one node

@andrew_bowman

PROFILE
MATCH (a:Article {officeId:'001'})
where a.updateDateTime < dateTime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
RETURN a

No result from the above query. I think where a.updateDateTime < dateTime() clause doesn't seem to work.

updateDateTime format

"updateDateTime": "20181029013007"

Profile

Is this right condition between updateDateTime and dateTime()?

Ah, I thought you were using a dateTime type. If you're using the epoch time, then maybe something like WHERE a.updateDateTime < timestamp() for that segment.

I just changed the property 'insertDateTime' to dateTime type with the query below

CALL apoc.periodic.iterate(
    "MATCH (a:Article) RETURN a",
    "SET a.insertDateTime = datetime({epochmillis: apoc.date.parse(a.insertDateTime, 'ms', 'yyyyMMddhhmmss')})",
    {batchSize:1000, parallel:true})

I made a Index on 'insertDateTime' property and I ran the command you recommend

PROFILE
MATCH (a:Article {officeId:'001'})
where a.insertDateTime < datetime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.insertDateTime DESC
LIMIT 100
RETURN a

Below query is also slow ..

PROFILE MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) 
WHERE a.insertDateTime < dateTime()
RETURN a ORDER BY a.insertDateTime DESC LIMIT 100

And this query is faster than both above

PROFILE MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) 
RETURN a ORDER BY a.insertDateTime DESC LIMIT 100

Is leverage index-backed ordering working?