Closed Bug 1432390 Opened 2 years ago Closed Last year

Allow to derive docker images from in-tree images

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla60

People

(Reporter: glandium, Assigned: dustin)

References

(Blocks 1 open bug)

Details

Attachments

(9 files, 1 obsolete file)

59 bytes, text/x-review-board-request
dustin
: review+
Details
59 bytes, text/x-review-board-request
dustin
: review+
Details
59 bytes, text/x-review-board-request
dustin
: review+
Details
59 bytes, text/x-review-board-request
dustin
: review+
Details
59 bytes, text/x-review-board-request
dustin
: review+
Details
59 bytes, text/x-review-board-request
dustin
: review+
Details
59 bytes, text/x-review-board-request
dustin
: review+
Details
59 bytes, text/x-review-board-request
dustin
: review+
Details
59 bytes, text/x-review-board-request
dustin
: review+
Details
Currently, the best we can do is upload an image to the docker hub and then use FROM in the Dockerfile (like e.g. valgrind-build or desktop-build do, deriving from centos6-build-upd).
Comment on attachment 8945389 [details]
Bug 1432390 - Directly call the docker API over its unix socket instead of calling docker load.

https://reviewboard.mozilla.org/r/215578/#review221212


Static analysis found 1 defect in this patch.
 - 1 defect found by mozlint

You can run this analysis locally with:
 - `./mach lint check path/to/file` (Python/Javascript/wpt)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: taskcluster/taskgraph/util/docker.py:74
(Diff revision 1)
> +        # Mimick how docker itself presents the output. This code was tested
> +        # with API version 1.18 and 1.26.
> +        if 'status' in data:
> +            if 'id' in data:
> +                if sys.stderr.isatty():
> +                    l = len(status_line)

Error: ambiguous variable name 'l' [flake8: E741]
Example of uses in bug 1433033.
Comment on attachment 8945384 [details]
Bug 1432390 - Vendor requests-unixsocket.

https://reviewboard.mozilla.org/r/215568/#review221338

I'm not sure I'm qualified to review a vendoring, but this looks pretty straightforward
Attachment #8945384 - Flags: review?(dustin) → review+
Comment on attachment 8945392 [details]
Bug 1432390 - Allow to derive docker images from other in-tree images.

https://reviewboard.mozilla.org/r/215584/#review221374

::: taskcluster/taskgraph/transforms/docker_image.py:66
(Diff revision 2)
>              docker_image_schema, task,
>              "In docker image {!r}:".format(task.get('name', 'unknown')))
>          yield task
>  
>  
> +def order_image_tasks(tasks):

This could also be done, perhaps more simply, in a loader, but this works!
Attachment #8945392 - Flags: review?(dustin) → review+
Comment on attachment 8945391 [details]
Bug 1432390 - Make `mach taskcluster-build-image` talk directly to the docker socket in the image builder.

https://reviewboard.mozilla.org/r/215582/#review221370

::: taskcluster/docker/image_builder/build-image.sh:26
(Diff revision 2)
> -# Construct a CONTEXT_FILE
> -CONTEXT_FILE=/builds/worker/workspace/context.tar
> +# The docker socket is mounted by the taskcluster worker in a way that prevents
> +# us changing its permissions to allow the worker user to access it. Create a
> +# proxy socket that the worker user can use.
> +export DOCKER_SOCKET=/var/run/docker.proxy
> +socat UNIX-LISTEN:$DOCKER_SOCKET,fork,group=worker,mode=0775 UNIX-CLIENT:/var/run/docker.sock </dev/null &
> +trap "kill $!" EXIT

Good fix!  When we switch to docker-engine, we should work out a way to remove this, but until that time it will hopefully work fine :)
Attachment #8945391 - Flags: review?(dustin) → review+
Comment on attachment 8945389 [details]
Bug 1432390 - Directly call the docker API over its unix socket instead of calling docker load.

https://reviewboard.mozilla.org/r/215578/#review221366

::: taskcluster/taskgraph/docker.py:212
(Diff revision 2)
> +                remaining -= len(buf)
> +                yield buf
> +            # Pad to fill a 512 bytes block.
> +            remainder = member.size % 512
> +            if remainder:
> +                yield '\0' * (512 - remainder)

Wow, this is pretty cool!  Worth mentioning that this 512-block is part of the tar format.  In fact, for someone not familiar with the vagaries of tar this code could be pretty baffling.  Please add a few comments describing how download_and_modify_image edits the tarfile "on the fly" by reading each member and then writing it out in tar format.
Attachment #8945389 - Flags: review?(dustin) → review+
Comment on attachment 8945390 [details]
Bug 1432390 - Directly call the docker API over its unix socket instead of calling `docker build`.

https://reviewboard.mozilla.org/r/215580/#review221376
Attachment #8945390 - Flags: review?(dustin) → review+
Comment on attachment 8945388 [details]
Bug 1432390 - Use zstandard and requests modules instead of spawning curl | zstd in docker.load_image.

https://reviewboard.mozilla.org/r/215576/#review221354

::: taskcluster/taskgraph/docker.py:164
(Diff revision 1)
>      image, tag, layer = None, None, None
>      try:
> -        # Setup piping: curl | zstd | tarin
> -        curl = Popen(['curl', '-#', '--fail', '-L', '--retry', '8', url], stdout=PIPE)
> -        zstd = Popen(['zstd', '-d'], stdin=curl.stdout, stdout=PIPE)
> -        tarin = tarfile.open(mode='r|', fileobj=zstd.stdout)
> +        print("Downloading from {}".format(url))
> +        # get_session() gets us a requests.Session set to retry several times.
> +        req = get_session().get(url, stream=True)
> +        if req.status_code != 200:

maybe `req.raise_for_status()`?
Attachment #8945388 - Flags: review?(dustin) → review+
Comment on attachment 8945386 [details]
Bug 1432390 - Avoid creating a temporary file for generate_context_hash.

https://reviewboard.mozilla.org/r/215572/#review221350

::: taskcluster/taskgraph/util/docker.py:110
(Diff revision 1)
>      """
> +    with open(out_path, 'wb') as fh:
> +        return stream_context_tar(topsrcdir, context_dir, fh, prefix, args)
> +
> +
> +def stream_context_tar(topsrcdir, context_dir, out_file, prefix, args=None):

Maybe `write_context_tar` is a better name?
Comment on attachment 8945387 [details]
Bug 1432390 - Remove explicit exception handling in docker.load_image().

https://reviewboard.mozilla.org/r/215574/#review221378
Attachment #8945387 - Flags: review?(dustin) → review+
Comment on attachment 8945386 [details]
Bug 1432390 - Avoid creating a temporary file for generate_context_hash.

https://reviewboard.mozilla.org/r/215572/#review221380
Attachment #8945386 - Flags: review?(dustin) → review+
Comment on attachment 8945385 [details]
Bug 1432390 - Hash the contents of the docker image context as it is created.

https://reviewboard.mozilla.org/r/215570/#review221382
Attachment #8945385 - Flags: review?(dustin) → review+
Comment on attachment 8945389 [details]
Bug 1432390 - Directly call the docker API over its unix socket instead of calling docker load.

https://reviewboard.mozilla.org/r/215578/#review221394

Great series!

Obvious scope bloat for this series, but I'd like to point out that we currently build Docker images with multiple layers. Each layer effectively tracks the differences from the previous layer. That means each layer constitutes overhead for export, import, and even run-time operations. For example, I/O requests for a file need to resolve which layer(s) the state of that file is in. I think it would be a worthwhile improvement (as a follow-up) to squash everything down to a single layer as part of exporting/saving the images. Newer versions of docker have a "squash" parameter on the "image build" HTTP API. But I don't think we run a new-enough Docker in DIND to support that. There are userland processes that do it. https://github.com/goldmann/docker-squash (available via `pip install docker-squash`) is one such program. Something to look into now that we have more powerful control over image building.
> But I don't think we run a new-enough Docker in DIND to support that.

Squash is marked "experimental build only" even in the most recent version of the docker API. It's not in the version used on dind anyways.

Another thing I was thinking about is that it would be nice if taskcluster itself was able to download multiple images, so that it would, for instance, download the debian7-base and the toolchain-build images. That would avoid duplicating the data from the base image in the child image.
(In reply to Mike Hommey [:glandium] from comment #27)
> Another thing I was thinking about is that it would be nice if taskcluster
> itself was able to download multiple images, so that it would, for instance,
> download the debian7-base and the toolchain-build images. That would avoid
> duplicating the data from the base image in the child image.

My understanding about how the Docker "Registry" image repositories work is they index the layers separately. When someone pulls an image, the client fetches each layer as a separate stream and applies those concurrently. We could probably get some of the benefits of that without running a full server. For example, we could store the layers to separate artifacts. But we'd have to reinvent image saving and loading. And I'm not sure if Docker allows you to import individual layers. So we may not be able to import images concurrently. This optimization feels a bit beyond around reach at this time.
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s a616a5750451737dc323bd24f0d5e27d39312585 -d 804f26b2c6b8: rebasing 444158:a616a5750451 "Bug 1432390 - Vendor requests-unixsocket. r=dustin"
rebasing 444159:03da06c2c41f "Bug 1432390 - Hash the contents of the docker image context as it is created. r=dustin"
rebasing 444160:904ff21f52c5 "Bug 1432390 - Avoid creating a temporary file for generate_context_hash. r=dustin"
rebasing 444161:9c0dd711592b "Bug 1432390 - Remove explicit exception handling in docker.load_image(). r=dustin"
rebasing 444162:829b108168e4 "Bug 1432390 - Use zstandard and requests modules instead of spawning curl | zstd in docker.load_image. r=dustin"
rebasing 444163:99d54a80a474 "Bug 1432390 - Directly call the docker API over its unix socket instead of calling docker load. r=dustin"
rebasing 444164:c557f45eaf9e "Bug 1432390 - Directly call the docker API over its unix socket instead of calling `docker build`. r=dustin"
rebasing 444165:65b91704340e "Bug 1432390 - Make `mach taskcluster-build-image` talk directly to the docker socket in the image builder. r=dustin"
merging taskcluster/docker/image_builder/build-image.sh
merging taskcluster/docker/image_builder/setup.sh
warning: conflicts while merging taskcluster/docker/image_builder/build-image.sh! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
hg error in cmd: hg push -r tip ssh://hg.mozilla.org/integration/autoland: pushing to ssh://hg.mozilla.org/integration/autoland
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 9 changesets with 33 changes to 24 files
remote: 
remote: 
remote: ************************** ERROR ****************************
remote: Rev fa3b28f942bb uses try syntax. (Did you mean to push to Try instead?)
remote: Mike Hommey <mh+mozilla@glandium.org>
remote: Bug 1432390 - Remove explicit exception handling in docker.load_image(). r=dustin
remote: 
remote: The used pattern:
remote:   try:
remote:     ...
remote:   except Exception:
remote:     error = sys.exc_info()[0]
remote:   finally:
remote:     ...
remote:     if error:
remote:       raise error
remote: 
remote: actually loses everything that is interesting about the original
remote: exception. Not catching the exception just makes it thrown up the stack,
remote: except when a different exception is thrown from the finally block,
remote: which is what that if error: raise error is attempting to do... except
remote: it doesn't throw the original exception, but its type only.
remote: *************************************************************
remote: 
remote: 
remote: transaction abort!
remote: rollback completed
remote: pretxnchangegroup.c_commitmessage hook failed
abort: push failed on remote
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/557157d299f2
Vendor requests-unixsocket. r=dustin
https://hg.mozilla.org/integration/autoland/rev/28bbbfd93724
Hash the contents of the docker image context as it is created. r=dustin
https://hg.mozilla.org/integration/autoland/rev/67b3289061df
Avoid creating a temporary file for generate_context_hash. r=dustin
https://hg.mozilla.org/integration/autoland/rev/c80238a31d8d
Remove explicit exception handling in docker.load_image(). r=dustin
https://hg.mozilla.org/integration/autoland/rev/d7d6780f2b42
Use zstandard and requests modules instead of spawning curl | zstd in docker.load_image. r=dustin
https://hg.mozilla.org/integration/autoland/rev/5ed2c5587d03
Directly call the docker API over its unix socket instead of calling docker load. r=dustin
https://hg.mozilla.org/integration/autoland/rev/3543eaf9b9c4
Directly call the docker API over its unix socket instead of calling `docker build`. r=dustin
https://hg.mozilla.org/integration/autoland/rev/1316aa94e6bc
Make `mach taskcluster-build-image` talk directly to the docker socket in the image builder. r=dustin
https://hg.mozilla.org/integration/autoland/rev/4b0499734887
Allow to derive docker images from other in-tree images. r=dustin
Product: TaskCluster → Firefox Build System
It looks like the changes to image_builder here were never deployed.  I think that would need the image pushed to docker hub and the sha updated in taskcluster/taskgraph/transforms/docker_image.py.  It looks like that was last done in bug 1430037, before this one landed.

I'll give that a shot and do a try push.  Mike, do you recall if there was a reason this wasn't done?
Status: RESOLVED → REOPENED
Flags: needinfo?(dustin)
Resolution: FIXED → ---
The only image that uses the docker-hub hosted image_builder image is the image_builder itself. The rest of the images get built using the in-tree image. So it wouldn't hurt to update the docker image to the in-tree image, but the code here is being successfully used in-tree.
Ah, indeed.  How meta!  The old version did have some (vague..) issues trying to rebuild image_builder:
https://tools.taskcluster.net/groups/OpzQnFdOT72GbFSOkhZRiQ/tasks/ALFKJKt7Tou3lkgGO6vQdQ/runs/0/logs/public%2Flogs%2Flive.log

So hopefully the update fixes that..
This also switches the image_builder build to get the image hash from HASH like
other registry images, rather than hard-coding it in the source.  And updates
the docs.
Depends on: 1486071, 1498626, 1498629
Flags: needinfo?(dustin)
For comparison,
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf64a3ba8304e66c2c2cbf3673a4e34d42f986e5
is a try push with just a whitespace change to the image_builder dockerfile, to check that the issues in the linked bugs are due to bitrot of the docker images and not to the image_builder issues.
(taking this bug since deployment is turning into A Project)
Assignee: mh+mozilla → dustin
Would it be better to move the additional work to a new bug, since we have had working in-tree docker dependencies working for half a year?
No longer depends on: 1486071, 1498626, 1498629
Depends on: 1498640
Status: REOPENED → RESOLVED
Closed: 2 years agoLast year
Resolution: --- → FIXED
Attachment #9016529 - Attachment is obsolete: true
No longer depends on: 1498640
You need to log in before you can comment on or make changes to this bug.