Docker: no bad tips

Docker: no bad tips


In the comments to my article Docker: bad advice there were many requests to explain why the Dockerfile described in it was so terrible.


Summary of the previous series : two developers on a hard deadline make up the Dockerfile. In the process Ops Igor Ivanovich comes to them. The final Dockerfile is so bad that the AI ​​is on the verge of a heart attack.



Let's see what's wrong with this Dockerfile.


So, a week has passed.


Dev Peter meets in the dining room over a cup of coffee with Ops Igor Ivanovich.


P: Igor Ivanovich, are you very busy? I would like to find out where we screwed up.


AI: It's good, you don’t often see developers who are interested in exploitation.
First, let's agree on some things:


  1. Docker ideology: one container - one process.
  2. The smaller the container, the better.
  3. The more cached the better.

P: Why should there be one process in one container?


AI: When a container is started, the Docker tracks the status of the process with pid 1. If the process dies, Docker tries to restart the container. Suppose you have several applications running in a container or the main application is not running with pid 1. If the process dies, the Docker will not know about it.


If there are no more questions, show your Dockerfile.


And Peter showed:


  FROM ubuntu: latest

 # Copy the source code
 COPY .//app
 WORKDIR/app

 # Update the list of packages
 RUN apt-get update

 # Updating packages
 RUN apt-get upgrade

 # Install the necessary packages
 RUN apt-get -y install imagemagick gsfonts ruby-full ssh supervisor image libpq-dev

 # Install bundler
 RUN gem install bundler

 # Install nodejs used to build statics
 RUN curl -sL https://deb.nodesource.com/setup_9.x |  sudo bash -
 RUN apt-get install -y nodejs

 # Install dependencies
 RUN bundle install --without development test --path vendor/bundle

 # We clean caches
 RUN rm -rf/usr/local/bundle/cache/*.gem
 RUN apt-get clean
 RUN rm -rf/var/lib/apt/lists/*/tmp/*/var/tmp/*
 RUN rake assets: precompile
 # Run the script at the start of the container, which will start the rest.
 Cmd ["/app/init.sh"]  

AI: Oh, let's sort it out in order. Let's start with the first line:


  FROM ubuntu: latest  

You take the latest tag. Using the latest tag leads to unpredictable consequences. Imagine, the image maintainer collects a new version of the image with a different list of software; this image receives the latest tag. And your container stops collecting at best, and at worst you catch bugs that weren't there before.


You take an image with a full OS with a lot of unnecessary software that inflates the volume of the container. And the more software, the more holes and vulnerabilities.


In addition, the larger the image, the more it takes up space on the host and in the registry (do you store images somewhere?)


P: Yes, of course, we have a registry, and you set it up.


AI: So, what am I talking about? .. Oh, yes, volumes ... The load on the network is also growing. For a single image it is imperceptible, but when there is a continuous assembly, tests and a warm, it is noticeable. And if you don’t have God’s mode on AWS, you’ll also have a space account.


Therefore, you need to choose the most appropriate image, with the exact version and minimum software. For example, take: FROM ruby: 2.5.5-stretch


P: Oh, I see. And how and where to see the available images? How to understand which one I need?


AI: Usually images are taken from dokhabab , do not confuse with porn :).There are usually several builds for the image:
Alpine : images are collected on a minimalist Linux image, only 5 MB. Its minus: it is compiled with its own libc implementation, standard packages do not work in it. Finding and installing the right package will take a lot of time.
Scratch : base image, not used to build other images. It is intended solely for running binary, prepared data. Ideal for running binary applications that include everything you need, such as go-applications.
Based on any OS, for example, Ubuntu or Debian. Well here, I think, it is not necessary to explain.


AI: Now we need to put all the extras. packages and clean up caches. And you can immediately throw out the apt-get upgrade . Otherwise, with each assembly, despite the fixed tag of the basic image, different images will be obtained. Updating packages in an image is a task of the maintainer, it is accompanied by a change in the tag.


P: Yes, I tried to do it, I did it like this:


  WORKDIR/app
 COPY .//app

 RUN curl -sL https://deb.nodesource.com/setup_9.x |  bash - \
  & amp; & amp;  imagemagick gsfonts ruby-full ssh supervisor nodejs apt-get -y install libpq-dev
  & amp; & amp;  gem install bundler \
  & amp; & amp;  bundle install --without development test --path vendor/bundle

 RUN rm -rf/usr/local/bundle/cache/*.gem \
  & amp; & amp;  apt-get clean \
  & amp; & amp;  rm -rf/var/lib/apt/lists/*/tmp/*/var/tmp/*  

AI: Not bad, but here too, there is something to work on. Look, here is this command:


  RUN rm -rf/usr/local/bundle/cache/*.gem \
  & amp; & amp;  apt-get clean \
  & amp; & amp;  rm -rf/var/lib/apt/lists/*/tmp/*/var/tmp/*  

... does not remove data from the final image, but only creates an additional layer without this data. Correctly:


  RUN curl -sL https://deb.nodesource.com/setup_9.x |  bash - \
  & amp; & amp;  apt-get -y install imagemagick gsfonts nodejs libpq-dev \
  & amp; & amp;  gem install bundler \
  & amp; & amp;  bundle install --without development test --path vendor/bundle \
  & amp; & amp;  rm -rf/usr/local/bundle/cache/*.gem \
  & amp; & amp;  apt-get clean \
  & amp; & amp;  rm -rf/var/lib/apt/lists/*/tmp/*/var/tmp/*  

But that's not all. What have you got there, Ruby? Then you do not need to copy the entire project at the beginning. It is enough to copy Gemfile and Gemfile.lock.


With this approach, bundle install will not be executed for every source change, but only if Gemfile or Gemfile.lock has changed.


The same methods work for other languages ​​with a dependency manager, such as npm, pip, composer, and other dependency list-based files.


Finally, remember, at the beginning I spoke about the Docker ideology “one container - one process”? This means that supervisor is not needed. You should also not install systemd for the same reasons. In essence, the Docker is itself a supervisor. And when you try to run multiple processes in it, it’s like running a single application in a single supervisor process.
When assembling, you will make a single image, and then run the required number of containers so that each process has one process.


But more on that later.


P: It seems, understood. See what happens:


  FROM ruby: 2.5.5-stretch

 WORKDIR/app
 COPY Gemfile */app

 RUN curl -sL https://deb.nodesource.com/setup_9.x |  bash - \
  & amp; & amp;  apt-get -y install imagemagick gsfonts nodejs libpq-dev \
  & amp; & amp;  gem install bundler \
  & amp; & amp;  bundle install --without development test --path vendor/bundle \
  & amp; & amp;  rm -rf/usr/local/bundle/cache/*.gem \
  & amp; & amp;  apt-get clean \
  & amp; & amp;  rm -rf/var/lib/apt/lists/*/tmp/*/var/tmp/*

 COPY./app
 RUN rake assets: precompile

 Cmd ["bundle", "exec", "passenger", "start"]  

Do we restart daemons when the container is started?


AI: Yes, that's right. By the way, you can use both CMD and ENTRYPOINT. And to figure out what the difference is, this is your homework. On this topic on Habré there is a good article .


So, come on. You download the file to install the node, but there is no guarantee that it will be what you need. It is necessary to add validation. For example:


  RUN curl -sL https://deb.nodesource.com/setup_9.x & gt;  setup_9.x \
  & amp; & amp;  echo "958c9a95c4974c918dca773edf6d18b1d1a41434 setup_9.x" |  sha1sum -c - \
  & amp; & amp;  bash setup_9.x \
  & amp; & amp;  rm -rf setup_9.x \
  & amp; & amp;  apt-get -y install imagemagick gsfonts nodejs libpq-dev \
  & amp; & amp;  gem install bundler \
  & amp; & amp;  bundle install --without development test --path vendor/bundle \
  & amp; & amp;  rm -rf/usr/local/bundle/cache/*.gem \
  & amp; & amp;  apt-get clean \
  & amp; & amp;  rm -rf/var/lib/apt/lists/*/tmp/*/var/tmp/*  

With the checksum, you can verify that you downloaded the correct file.


P: But if the file changes, the build will fail.


AI: Yes, and oddly enough, this is also a plus. You will know that the file has changed, and you can see what has been changed there. You never know, they added, say, a script that deletes everything it reaches, or makes a backdoor.


P: Thank you. So, the final Dockerfile will look like this:


  FROM ruby: 2.5.5-stretch

 WORKDIR/app
 COPY Gemfile */app

 RUN curl -sL https://deb.nodesource.com/setup_9.x & gt;  setup_9.x \
  & amp; & amp;  echo "958c9a95c4974c918dca773edf6d18b1d1a41434 setup_9.x" |  sha1sum -c - \
  & amp; & amp;  bash setup_9.x \
  & amp; & amp;  rm -rf setup_9.x \
  & amp; & amp;  apt-get -y install imagemagick gsfonts nodejs libpq-dev \
  & amp; & amp;  gem install bundler \
  & amp; & amp;  bundle install --without development test --path vendor/bundle \
  & amp; & amp;  rm -rf/usr/local/bundle/cache/*.gem \
  & amp; & amp;  apt-get clean \
  & amp; & amp;  rm -rf/var/lib/apt/lists/*/tmp/*/var/tmp/*

 COPY./app
 RUN rake assets: precompile

 Cmd ["bundle", "exec", "passenger", "start"]  

P: Igor Ivanovich, thanks for the help. I have to go, I have to make 10 more commits today.


Igor Ivanovich, with his gaze stopping a hurried colleague, takes a sip of strong coffee. After thinking a few seconds about the SLA 99.9% and the code without bugs, he asks a question.


AI: Where do you keep logs?


P: Of course, in production.log. By the way, yes, and how do we get access to them without ssh?


AI: If you leave them in files, a solution for you has already been invented. The docker exec command allows you to execute any command in a container. For example, you can make cat for logs. And using the -it key and running bash (if installed in the container), you will get interactive access to the container.


But storing logs in files is not worth it. At a minimum, this leads to uncontrolled growth of the container, but no one rotates the logs. All logs need to be thrown in stdout. There you can already see them using the docker logs command.


P: Igor Ivanovich, and can put the logs in the mounted directory, on the physical node, as user data?


AI: It’s good that you didn’t forget to transfer the data loaded on the node's disk. With logs, it is also possible, just do not forget to configure rooting.
Everything, you can run.


P: Igor Ivanovich, and advise what to read?


AI: First, read recommendations from the Docker developers , hardly anyone knows Docker is better than them.


And if you want to do an internship, go to intensive . After all, theory without practice is dead.

Source text: Docker: no bad tips