Ability to add any file from source directory to docker image build context

RESOLVED FIXED

Status

task
RESOLVED FIXED
3 years ago
Last year

People

(Reporter: gps, Assigned: gps)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(8 attachments)

I'll upload a proof of concept for this in a few minutes...
A limitation of traditional `docker build` is that it only has access
to files in the same directory as the Dockerfile.

Typically, when you do `docker build`, Docker will create a tar archive
of all files in the same directory as the Dockerfile and upload that to
Docker and the image building process will have access to all files in
the archive.

Over a year ago, I realized you could write some code to create custom
context archives and talk to the Docker build API directly to use your
custom archive. I hacked some code into version-control-tools that
parsed Dockerfiles for special syntax denoting extra paths from the
source checkout to add to the context and proceed to add them to
context archives. This commit essentially copied that code for use
by taskgraph's built-in Docker image building.

Using the syntax "# %include <path>" you are able to include paths
or directories (relative from the top source directory root) in the
generated context archive. Files add this way are available under the
"topsrcdir/" path.

The "lint" image has been changed to use this syntax to add in
in-tree version of tooltool.py (instead of downloading from github.com).
This eliminates a dependency on a third party service and increases
security and determinism. Yay.

In order to write tests, I had to make archiving deterministic. That's
why we no longer use a single "tar.add()" for the Dockerfile directory.
Instead, we obtain the list of files up front, sort them, then add with
uid/gid set to 0, so uid/gid is consistent no matter what it is on the
filesystem performing context creation. More determinism, yay.

I would like to test this feature a bit more. However, the test
environment for custom Docker image building doesn't currently
facilitate custom source paths: it expects Docker files to be in
$topsrcdir/testing/docker. If we add more functionality to this, we
should definitely invest in writing better tests.

Review commit: https://reviewboard.mozilla.org/r/66244/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/66244/
It looks like tar.gz files are not deterministic by 1 byte:

00000000  1f 8b 08 08 28 63 91 57  02 ff 63 6f 6e 74 65 78  |....(c.W..contex|
00000010  74 2e 74 61 72 00 ed 7d  6d 77 db 36 96 70 3f eb  |t.tar..}mw.6.p?.|

00000000  1f 8b 08 08 68 63 91 57  02 ff 63 6f 6e 74 65 78  |....hc.W..contex|
00000010  74 2e 74 61 72 00 ed 7d  6d 77 db 36 96 70 3f eb  |t.tar..}mw.6.p?.|

I'll have to research the tar.gz format to see why the 5th byte isn't identical.
According to http://www.forensicswiki.org/wiki/Gzip, offset 4-7 (0 indexed) are last modified time. Not sure what that is computed from. But it should be easy enough to set those bytes to a deterministic value.
From CPython's stdlib:

   def _init_write_gz(self):
        self.cmp = self.zlib.compressobj(9, self.zlib.DEFLATED,
                                            -self.zlib.MAX_WBITS,
                                            self.zlib.DEF_MEM_LEVEL,
                                            0)

        timestamp = struct.pack("<L", long(time.time()))
        self.__write("\037\213\010\010%s\002\377" % timestamp)

The time.time() is obviously not deterministic. We can fix it in post :)
Depends on: 1288610
I'm factoring this into a proper series with better tests. Expect something reviewable by EOD.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Upcoming commits will refactor how context tarballs are created. In
preparation for this, we establish a standalone function for creating
context tarballs and refactor docker_image.py to use it.

Review commit: https://reviewboard.mozilla.org/r/66536/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/66536/
Attachment #8773546 - Attachment description: Bug 1288567 - Add special Dockerfile syntax to add arbitrary files to context → Bug 1288567 - Add special Dockerfile syntax to add arbitrary files to context;
Attachment #8773915 - Flags: review?(dustin)
Attachment #8773916 - Flags: review?(dustin)
Attachment #8773917 - Flags: review?(dustin)
Attachment #8773918 - Flags: review?(dustin)
Attachment #8773919 - Flags: review?(dustin)
Attachment #8773920 - Flags: review?(dustin)
Attachment #8773546 - Flags: review?(dustin)
We recently implemented code in mozpack for performing deterministic
tar file creation. It normalizes things like uids, gids, and mtimes
that creep into archives.

Review commit: https://reviewboard.mozilla.org/r/66538/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/66538/
Now that tar file generation is deterministic, we can use the hash
of the created archive rather than the hash of the files that are
(presumably) in the archive.

This temporarily breaks consistent hashing by using independent
hashing mechanisms. This will be cleaned up in a subsequent commit.

Review commit: https://reviewboard.mozilla.org/r/66540/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/66540/
Relying on global variables like GECKO is a bit dangerous. To facilitate
testing of archive generation in subsequent commits, let's pass an
path into this function.

The argument is currently unused.

Review commit: https://reviewboard.mozilla.org/r/66542/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/66542/
Now that the context tar creation function is standalone and doesn't
rely on external state, we can start unit testing it easier.

We establish a basic unit test that verifies the function works as
advertised and that output is deterministic.

Review commit: https://reviewboard.mozilla.org/r/66544/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/66544/
This restores order to only having a single hash for a context
directory.

Using a tempfile here is a bit unfortunate. It can be optimized later,
if needed.

Review commit: https://reviewboard.mozilla.org/r/66546/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/66546/
Comment on attachment 8773546 [details]
Bug 1288567 - Add special Dockerfile syntax to add arbitrary files to context;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66244/diff/1-2/
Blocks: 1247168
Comment on attachment 8773919 [details]
Bug 1288567 - Add basic test for context tar creation;

https://reviewboard.mozilla.org/r/66544/#review63656
Attachment #8773919 - Flags: review?(dustin) → review+
Attachment #8773915 - Flags: review?(dustin) → review+
Comment on attachment 8773915 [details]
Bug 1288567 - Extract function for creating context tars;

https://reviewboard.mozilla.org/r/66536/#review63648

::: taskcluster/taskgraph/task/docker_image.py:141
(Diff revision 1)
>          'Creates a tar file of a particular context directory.'
>          destination = os.path.abspath(destination)
>          if not os.path.exists(os.path.dirname(destination)):
>              os.makedirs(os.path.dirname(destination))
>  
> -        with tarfile.open(destination, 'w:gz') as tar:
> +        create_context_tar(context_dir, destination, image_name)

Why leave this as a class method -- why not just call it directly?
Comment on attachment 8773916 [details]
Bug 1288567 - Use deterministic tar archive generation;

https://reviewboard.mozilla.org/r/66538/#review63658
Attachment #8773916 - Flags: review?(dustin) → review+
Comment on attachment 8773918 [details]
Bug 1288567 - Pass topsrcdir into create_context_tar;

https://reviewboard.mozilla.org/r/66542/#review63662
Attachment #8773918 - Flags: review?(dustin) → review+
Attachment #8773920 - Flags: review?(dustin) → review+
https://reviewboard.mozilla.org/r/66244/#review63668

::: taskcluster/docs/docker-images.rst:31
(Diff revision 2)
> +
> +The argument to ``# %include`` is a relative path from the root level of
> +the source directory. It can be a file or a directory.
> +
> +Files added using ``# %include`` syntax are available inside the build
> +context under the ``topsrcdir/`` path.

Please also mention that the contents will be written into the context tarball -- as written, I'd be tempted to put `# %include .` at the top of every Dockerfile, because why not, but that would end up with very large context tarballs.

::: taskcluster/taskgraph/util/docker.py:65
(Diff revision 2)
> +    If a line in the Dockerfile has the form ``# %include <path>``,
> +    the relative path specified on that line will be matched against
> +    files in the source repository and added to the context under the
> +    path ``topsrcdir/``. If an entry is a directory, we add all files
> +    under that directory.
> +

If we later change things around so that docker-image tasks no longer download a context tar from the decision task (which we would like to do so we can avoid attaching every context tar to every decision task..), this will mean we need a full gecko checkout for docker-image tasks.  I think that's probably OK, but thought it worth bringing up.
Comment on attachment 8773546 [details]
Bug 1288567 - Add special Dockerfile syntax to add arbitrary files to context;

https://reviewboard.mozilla.org/r/66244/#review63670
Attachment #8773546 - Flags: review?(dustin) → review+
https://reviewboard.mozilla.org/r/66536/#review63648

> Why leave this as a class method -- why not just call it directly?

I didn't want to change too much. I also wasn't sure if you wanted to retain the @classmethod for future API needs.

I feel this is territory for a follow-up bug. So I'm going to drop the issue (you did grant r+ after all) :)
Honestly, it's a simple enough fix to remove the classmethod.  As you know, I love me some class methods, but this one is totally unnecessary.  Follow-up bug is fine, or just another short cset on this bug (which I'll look at quickly if you ping me, or someone else can review).
Comment on attachment 8773915 [details]
Bug 1288567 - Extract function for creating context tars;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66536/diff/1-2/
Comment on attachment 8773916 [details]
Bug 1288567 - Use deterministic tar archive generation;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66538/diff/1-2/
Comment on attachment 8773917 [details]
Bug 1288567 - Use context hash of tar file;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66540/diff/1-2/
Comment on attachment 8773918 [details]
Bug 1288567 - Pass topsrcdir into create_context_tar;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66542/diff/1-2/
Comment on attachment 8773919 [details]
Bug 1288567 - Add basic test for context tar creation;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66544/diff/1-2/
Comment on attachment 8773920 [details]
Bug 1288567 - Use create_context_tar in generate_context_hash;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66546/diff/1-2/
Comment on attachment 8773546 [details]
Bug 1288567 - Add special Dockerfile syntax to add arbitrary files to context;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66244/diff/2-3/
The function was only used once and was providing little to no value.

A test of this function has been removed. Tests for the lower-level
context creation function are sufficient.

Review commit: https://reviewboard.mozilla.org/r/66896/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/66896/
Attachment #8774448 - Flags: review?(dustin)
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f36727c412bc
Extract function for creating context tars; r=dustin
https://hg.mozilla.org/integration/autoland/rev/ce408d28753e
Use deterministic tar archive generation; r=dustin
https://hg.mozilla.org/integration/autoland/rev/df2587962e55
Use context hash of tar file; r=dustin
https://hg.mozilla.org/integration/autoland/rev/f13b4d398a1b
Pass topsrcdir into create_context_tar; r=dustin
https://hg.mozilla.org/integration/autoland/rev/a8b8e6806110
Add basic test for context tar creation; r=dustin
https://hg.mozilla.org/integration/autoland/rev/d5555a753cc6
Use create_context_tar in generate_context_hash; r=dustin
https://hg.mozilla.org/integration/autoland/rev/a1ab5ac3c5f4
Add special Dockerfile syntax to add arbitrary files to context; r=dustin
https://hg.mozilla.org/integration/autoland/rev/3178d22820d4
Inline create_context_tar; r=dustin
Blocks: 1290531
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.