Why isn't this measurement of performance consistent?

I have a query:

MATCH (m:Product)
WHERE  'beautify' IN  m._effect
RETURN properties(m) as properties LIMIT 1

I used logger to measure the time spent just on the query:

print(cypher)
        logger.info('before query ...')
        result = tx.run(cypher)
        logger.info('after query ...')

And the result through the Python Bolt Driver is blow:

2020-11-19 19:33:54,841 - kg_api - INFO - 461 - before query ...
2020-11-19 19:33:54,895 - kg_api - INFO - 463 - after query ...

So the time is 895-841, which is 54 milliseconds. However, in the Neo4j Browser, the same query it shows it only takes 8 milliseconds:

Started streaming 1 records after 1 ms and completed after 8 ms.

In both measurements, I repeated the query many times, and the two measurements are very different.
Is this two measurements comparable?

Depending on the size of your DB, the query cache can cause some variability of speed. That is, if the results of the query (plus intermediate results) cause a lot of cache misses, then the query will slow down. If other queries cause the cache to lose the results you want later, your query will perform slower again.

So, a lot depends on the size of your DB, size of your memory, and how you configured your Neo4J DB (e.g. with more or less cache memory). Probably some other things I'm not aware of too.

a few thoughts

  • these are very different environments
  • my first guess is that the query you are measuring is very fast (we know it is below 8ms right?) and that most of what you are measuring right now is overhead for the environment specifics of each to start/stop a timer, and send/receive a query (not the query itself)
  • these times are close the limits of the timer resolution, at this level of granularity the host operating system (and other tasks running on it) will introduce timing variance

I suggest timing a longer running query, a query that takes more than a few seconds. Then you'll be measuring the query time instead of the start up / shutdown overhead. Make sure everything else is the same for both tests, remove all other variables (no load on client, no load on server, run both tests on the same machine as the database, etc..)

if we were comparing language drivers, you could start a timer, open the db connection and run 10,000 fast queries then close the connection and stop the timer (you are comparing to the browser so I didn't suggest this)