Closed Bug 1597334 Opened 6 years ago Closed 6 years ago

reduce CI times

Categories

(Socorro :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

Attachments

(2 files)

Socorro takes a long time in CI. Sometimes it's as short as 13 minutes, but I've seen it take as long as 18 minutes. Every time we create a PR, add a commit to an existing PR, have to rerun a build because of ephemeral errors, land a PR, and push a tag--we're waiting a while.

This bug covers looking for easy ways to reduce CI times.

I was thinking we could pull the mozilla/socorro_app:latest image from Dockerhub before running make build. Tests on my local machine suggest I don't understand Docker layer caching and images and this doesn't help at all.

We could pull off 3 or so minutes by moving the minidump-stackwalk stuff into a separate project that has its own build cycle, so then building Socorro would involve downloading that image rather than building it. We used to do something like this when we used pre-built minidump-stackwalk binaries built by Taskcluster.

Brian suggested switching on docker layer caching in Circle CI:

https://circleci.com/docs/2.0/docker-layer-caching/

That's a one-line change. Seems like it'd be fine and could help. John, Brian, and I think that's worth trying out. Let's see how far that gets us.

Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: -- → P3
Type: task → enhancement

It may be useful to clean up the apt files at the end of set_up_ubuntu.sh. This is a common pattern in official Docker images:

https://github.com/docker-library/python/blob/0b1fb9529c79ea85b8c80ff3dd85a32a935b0346/3.6/stretch/slim/Dockerfile#L20

This might be done to reduce file sizes, but it also may be used to remove time-stamped files, which would create a new layer and prevent Docker from using cached layers further in the stack.

John: Want to test those two statements out?

I've read up a bit, and the docker docs for RUN and apt-get helped.

Docker caching for RUN commands is based solely on the command, not the output of the command. The reason to remove the cache files is to reduce image size, not to increase cache hits. The caching for COPY commands is based on the checksum of the contents, so changes to set_up_ubuntu.sh will cause a cache miss, and the script to run with the fresh contents.

There is a later command in one of the builds to RUN apt-get -y install gdb vim. If the cache files are removed, this will fail. This would need to be expanded to a full command, like RUN apt-get update && apt-get -y install ....

I'll look into how much we'd save in image size.

I think gdb and vim are only added to the intermediary minidump-stackwalk build container so they won't contribute to the final socorro_app image size.

Locally, the local/socorro_app:latest image is 1.2GB. Removing the apt cache lowers it to 1.19GB.

I looked a little into image sizes. The command docker history local/socorro_app:latest shows the layer sizes, in reverse order of execution. Adding --no-trunc expands the commands, at the cost of filling the terminal. Here's the sizes added by some steps:

Step Size
python:3.6.8-slim-stretch 138MB
set_up_ubuntu.sh 507MB
COPY stackwalk 49.5MB
npm install 161MB
pip install 264MB
COPY app 15.1MB
manage.py collectstatic 15.7MB
RUN chown app.app 30.8MB

There's some small things we can do (COPY --chown app.app would help with that last image), but we'd need role-specific images to make a dent on the image sizes.

I tried a build with COPY --chown=app:app, and it does not appear to reduce the final image size or the intermediate images. I'll submit just the apt-cache cleanup. I'll reference this bug, but it may increase build time while dropping file size 1%.

This PR has work from me and willkg:

  1. apt cache clearing
  2. Only install breakpad building tools in the socorro_breakpad stage
  3. Fix quoting in set_up_ubuntu.sh
  4. Combine some commands to reduce layers
  5. Set USER app sooner, and run collectstatic as the app user, to avoid RUN chown and another layer

jwhitlock merged PR #5038: "bug 1597334: Adjust Dockerfile for size" in 08c377c.

I think this enough for this round of Dockerfile improvement. The size improvement is small, 1.2 GB to 1.14 GB on my system. I think more radical changes would be needed to reduce Docker image size, but at the cost of build complexity, splitting images by role, and layering development images on top of production-specific images. This is already scope creep on top of the bug goal of reducing CI times.

For completeness, here's the new layer sizes

Step Old Size New Size Change
python:3.6.8-slim-stretch 138MB 138MB 0
set_up_ubuntu.sh 507MB 497MB -20MB
COPY stackwalk 49.5MB 49.5MB 0
npm install 161MB 161MB 0
pip install 264MB 264MB 0
COPY app 15.1MB 15.2 MB +0.1MB *
manage.py collectstatic 15.7MB 15.7 MB 0
RUN chown app.app 30.8MB Removed -30.8MB

The increase in COPY app is probably due to an extra file in my working directory.

John and I agree we've done a good pass here and getting further probably requires more work than the improvements would be worth.

I'll keep an eye on CI times, but time to close this out.

Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: