I am looking for advice and/or best practice approaches to handle block device errors in a clustered environment.
I am experimenting with causal clustering and noticed that the database engine does not shut down when the data volume's backing block device is inaccessible. The affected node still accepts client connections and far too happy to return a low level Neo4j error that it has issues accessing the data store.
Is there a best practice approach to shut down the database engine if the node experiences such a permanent error? My best idea now is to monitor the logs and fire off safety events when certain patterns are identified.
Thank you for sharing your thoughts!