Closed Bug 1288610 Opened 3 years ago Closed 3 years ago

Create functions that produce deterministic tar archives

Categories

(Firefox Build System :: General, defect)

defect
Not set

Tracking

(firefox50 fixed)

RESOLVED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: gps, Assigned: gps)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

We have some Python APIs that produce deterministic and reproducible zip/jar files. We should add something similar for tar files.

I need this for bug 1288567. But it also blocks deterministic build efforts.
I have a need to create tar archives deterministically
and reproducibly. Since we already have similar functionality in
mozpack for producting zip/jar archives, I figured it made sense for
this functionality to live in mozpack.

I made the functionality as simple as possible: we only accept
files from the filesystem and the set of files must be known in
advance. No class to hold/buffer state: just a simple function
that takes a mapping of files and writes to a stream.

Review commit: https://reviewboard.mozilla.org/r/66340/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/66340/
Attachment #8773607 - Flags: review?(mh+mozilla)
https://reviewboard.mozilla.org/r/66340/#review63104

::: python/mozbuild/mozpack/archive.py:36
(Diff revision 1)
> +            # Set uid, gid, username, and group as deterministic values.
> +            ti.uid = 0
> +            ti.gid = 0
> +            ti.uname = ''
> +            ti.gname = ''
> +
> +            # Set mtime to a constant value.
> +            ti.mtime = DEFAULT_MTIME

I wonder if we should also normalize the file mode. e.g. unset other and possibly group write bits. Ensure setuid and setgid bits are also unset.
Comment on attachment 8773607 [details]
Bug 1288610 - Add functions for creating deterministic tar archives;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66340/diff/1-2/
Summary: Creating functions that produce deterministic tar archives → Create functions that produce deterministic tar archives
Comment on attachment 8773607 [details]
Bug 1288610 - Add functions for creating deterministic tar archives;

https://reviewboard.mozilla.org/r/66340/#review63702

::: python/mozbuild/mozpack/archive.py:14
(Diff revision 2)
> +import stat
> +import tarfile
> +
> +
> +# 2016-01-01T00:00:00+0000
> +DEFAULT_MTIME = 1451606400

The zip-writing code we have uses a similar fake mtime, is it worthwhile to use the same value?

::: python/mozbuild/mozpack/archive.py:17
(Diff revision 2)
> +
> +# 2016-01-01T00:00:00+0000
> +DEFAULT_MTIME = 1451606400
> +
> +
> +def create_tar_from_files(fp, files):

I don't know what your intended use cases look like, but having the API only accept a file object seems awkward for users. I would imagine many callers would want to pass a filename, so they'd wind up having to wrap this in `with open(...)` themselves.

::: python/mozbuild/mozpack/archive.py:42
(Diff revision 2)
> +            # Disallow setuid and setgid bits. This is an arbitrary restriction.
> +            # However, since we set uid/git to root:root, setuid and setgid
> +            # would be a glaring security hole if the archive were
> +            # uncompressed as root.
> +            if ti.mode & (stat.S_ISUID | stat.S_ISGID):
> +                raise ValueError('cannot add file with setuid or setgit set: '

You typoed `gid` as `git` here a few times. :)

::: python/mozbuild/mozpack/archive.py:96
(Diff revision 2)
> +        data = self.compressor.flush()
> +        self.pos += len(data)
> +        self.fp.write(data)
> +
> +
> +def create_tar_bz2_from_files(fp, files, compresslevel=9):

The API feels a little wonky here, seems like compression type ought to be an argument to `create_tar_from_files` or something like that instead of having a separate method per type of compression. Not worth blocking landing this over, but maybe file a followup to clean that up?

::: python/mozbuild/mozpack/archive.py:104
(Diff revision 2)
> +    This is a glorified wrapper around ``create_tar_from_files`` that
> +    adds bzip2 compression.
> +
> +    This function is similar to ``create_tar_gzip_from_files()``.
> +    """
> +    proxy = _BZ2Proxy(fp, compresslevel=compresslevel)

Is there a reason you can't use `BZ2File` here?
https://reviewboard.mozilla.org/r/66340/#review63702

> I don't know what your intended use cases look like, but having the API only accept a file object seems awkward for users. I would imagine many callers would want to pass a filename, so they'd wind up having to wrap this in `with open(...)` themselves.

I wanted to make the API as generic as possible so callers could e.g. write into BytesIO instances.

I agree there is room to provide an API to produce a file so callers don't have to do the file management themselves. Follow-up as needed?

> Is there a reason you can't use `BZ2File` here?

BZ2File operates on file names and won't accept a file descriptor. This is strangely and sadly not how GZipFile works (which does accept a fileobj argument).
https://reviewboard.mozilla.org/r/66340/#review63702

> The zip-writing code we have uses a similar fake mtime, is it worthwhile to use the same value?

I'm not sure if this buys us anything. The zip writing code uses an mtime from the distant past. At least this value is somewhat modern.
Comment on attachment 8773607 [details]
Bug 1288610 - Add functions for creating deterministic tar archives;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66340/diff/2-3/
Attachment #8773607 - Flags: review+
Attachment #8773607 - Flags: review?(mh+mozilla)
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6f6d78bc38f4
Add functions for creating deterministic tar archives; r=ted
https://hg.mozilla.org/mozilla-central/rev/6f6d78bc38f4
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.