Running neo4j as part of a docker multi-stage build

Hi all,

I've been using neo4j locally and am now trying to integrate it into my python based web application (using FastAPI and Starlette). My deployment strategy is to deploy a single docker container to elastic beanstalk. I am aware it is a docker anti-pattern to put too much into a single container, however, for time and cost reasons, this is the option I am left with.

My question is, how can I use neo4j as part of a multi-stage build?
Docker Page on Multi-stage builds

Stage 1: Build off of the provided neo4j docker image
Stage 1.b: configure the neo4j environment
Stage 2: Build off a python image
Stage 2.b configure the python application environment
Stage 2.c import requisite code from Stage 1

Note There are other use cases for this. One best practice is to deploy to your docker application only the requisite code needed to run, and it would be great to know if that option is available for the neo4j binary built by the jvm.

Thanks in advance.

I think we need more information on what you're trying to do. On the face of it, doing what you're asking is straightforward -- just have a look at the repo where our dockerfiles are built from and you can see that you can build your own docker container on top of this, you just need to ensure that docker-entrypoint.sh gets called, which actually starts the Neo4j process.

But on the other hand - it is a really good idea to isolate one process per docker container. If you're trying to run a python server and neo4j inside of the same docker container -- this is doable but not something I would advise.

For an app with multiple components, more typical would be to have one Docker container per component (one for your python app, neo4j by itself, and others if needed) and then use something like docker-compose, docker swarm, or kubernetes to string all of those containers together into the one big thing that is your application.

Hi David,

Thank you for your response.

..just have a look at the repo where our dockerfiles are built from and you can see that you can build your own docker container on top of this, you just need to ensure that docker-entrypoint.sh gets called, which actually starts the Neo4j process.

I did in fact do just that.

But on the other hand - it is a really good idea to isolate one process per docker container. If you're trying to run a python server and neo4j inside of the same docker container -- this is doable but not something I would advise.

I agree. I am working around some cost and knowledge-capital constraints at the moment, but may ultimately move towards the recommended 1-container, 1-component approach.

Independent of my stated use case, Docker multi-stage builds are one recommended way to extract the smallest image to deploy to a production environment. The neo4j image is ~362 MB. Is it possible to take out only the neo4j binary executables and jvm? I had something like the following at the top of my Dockerfile

# JAVA: Neo4j
FROM neo4j:3.5 AS builder

# PYTHON: Franklin Web App
FROM python:3.6 AS appenv

# Get java resources for neo4j
COPY --from=builder /usr/local/openjdk-8 /usr/local/openjdk-8
COPY --from=builder /var/lib/neo4j /var/lib/neo4j
ENV JAVA_HOME=/usr/local/openjdk-8 \
    NEO4J_HOME="/var/lib/neo4j" \
    NEO4J_EDITION=community

And this does work, but size of the resulting image is the sum of the two images (python, neo4j). Is it possible to deploy a smaller image to production?

As for the size of the Neo4j docker image, it's probably dominated by the size of the JDK itself needed to run Neo4j, it's really an entire java platform.

If you really want to combine the docker images in this way, I'd recommend rather that you start with a stripped down linux distribution like a slim debian, or maybe even alpine, and then iteratively install everything manually. On the neo4j site you can get instructions on how to install on debian, so in your dockerfile, you'd start with a debian docker image (not neo4j, not python) and you'd then apt-get install everything that you need. You could then copy over our entrypoint script and adapt it to the new paths and so on.

The way you're doing this layering is probably bloating the image because (this is just a guess) the python image may be based on a completely different OS so when you add that layer on you're probably bringing truckloads of dependencies needed for that build of python which you wouldn't necessarily have if you did a clean install of both on a single machine.

Let me stress here though that optimizing for docker image size is also not usually the way you want to go. Partially you're going to be stuck with a big docker image no matter what you try, precisely because you're putting quite a lot of stuff in it. JVM + Neo4j + Python interpreter + your applications + every dependency of those components.

So I think if you start with a stripped down debian and build it up it'll be smaller, but probably not 50% smaller.

Thanks again. I'll probably move to a multi-container solution before trying to self install all of the necessary parts.

I appreciate your response.