Closed Bug 1479533 Opened 4 years ago Closed 3 years ago

More fetch task improvements

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(5 files)

Follow-up from bug 1467359 to track things that didn't initially land.
Previously, we told `tar` or `unzip` to operate on an explicit file.
This worked when `tar` understood the compression format of the file.
And this worked in the majority of cases.

But `tar` does not support zstandard compression (at least not outside
extremely new versions, which aren't yet widely deployed). And not all
versions of `tar` support the `-a` argument.

This commit changes our invocation of `tar` so input data is piped
to it from Python. In the case of `tar`, we perform decompression in
Python, if possible. This allows us to support zstandard and `tar`
binaries that don't support `-a` to auto-detect the compression format.

I wanted to be consistent and always pipe the raw data via stdin.
But `unzip` doesn't appear to like this. Oh well.

We also refactor the logic around detecting archives. We have a
function to identify the archive type based on a filename. We then
pass the archive type to the extraction function and key off that
logic within. We also conditionally call extract_archive() and
fail hard in extract_archive() when things fail. This will make
future archive code easier to reason about.
Comment on attachment 8996099 [details]
Bug 1479533 - Refactor archive decompression; r?tomprince

Tom Prince [:tomprince] (limited availability Jul 16-29) has approved the revision.

https://phabricator.services.mozilla.com/D1576
Attachment #8996099 - Flags: review+
Currently, we download archives and store them as artifacts verbatim.

Sometimes it is useful to change the behavior of an archive. For example,
we may want to normalize on a specific compression format. Or we may want
to normalize the uid/gid stored in tar archives. Or we may want to
strip leading path components and/or replace them with a static value.
Or we may want to produce multiple archive variations - one optimized
for size and another for speed.

This commit introduces a mechanism for "rearchiving" archives that
allows us to do these things.

Given a source archive or directory and an output filename, the command
will deduce the output format automatically and produce an archive
matching the expected output.

Subsequent work will integrate this feature into CI.
This is what a lot of programs do.

We do logging in a helper function so we can flush after every write.
We currently feed static-url fetched artifacts directly into dependent
tasks, where they are typically extracted. This works and is the
simplest implementation.

A deficiency with the existing model is that archives used by tasks
are whatever is provided by upstream. We currently have a mix of
bzip, gzip, and xz compressed tarballs. Compared to say zstandard,
all of these archive formats are relatively slow to decompress. And
in all cases but xz, their compression ratios aren't as good.

This commit introduces the ability to "repackage" fetches. Essentially,
fetch tasks define an array of "repackage" operations to perform on
the fetch. Each entry gets converted to a new task which takes the
original fetch artifact and repackages it to a new archive format /
artifact.

By default, fetched archives automatically derive a repackage task
that produces a zstandard compressed tarball.

It was tempting to perform the repackging in the main fetch task
itself. This is certainly doable. However, if things were implemented
this way, we'd need to fetch the original artifact when repackaging
settings change. A goal of the fetch tasks is to minimize dependencies
on 3rd party services. It is advantageous for the 3rd party fetches
to run as infrequently as possible. This means we need to separate
3rd party fetches from repackaging tasks.

Because we derive job descriptions that are fed into the job
transform, we needed to update the "using: run-task" schema to
allow task references in the "command" attribute of definitions.
Life is too short to wait for bzip2 or lzma decompression when faster
alternatives are available.

Now that we automatically produce zstandard tarballs for all fetched
artifacts, it is trivial to change the dependencies so we use the
(faster) zstandard tarballs instead of the vanilla upstream tarballs.

The toolchain build scripts pull the fetch dependencies from an
environment variable and automatically recognize and support zstandard
tarballs. So we just need to point the dependencies at the zstandard
tarballs and everything should "just work."
Comment on attachment 8996104 [details]
Bug 1479533 - Use zstandard tarballs for toolchain fetches; r?glandium

Mike Hommey [:glandium] has approved the revision.

https://phabricator.services.mozilla.com/D1607
Attachment #8996104 - Flags: review+
Comment on attachment 8996101 [details]
Bug 1479533 - Log to stderr, capitalize messages; r?tomprince

Tom Prince [:tomprince] (limited availability Jul 16-29) has approved the revision.

https://phabricator.services.mozilla.com/D2526
Attachment #8996101 - Flags: review+
Comment on attachment 8996103 [details]
Bug 1479533 - Support repackaging fetched artifacts; r?tomprince, glandium

Tom Prince [:tomprince] (limited availability Jul 16-29) has approved the revision.

https://phabricator.services.mozilla.com/D1606
Attachment #8996103 - Flags: review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0a056e94342d
Refactor archive decompression; r=tomprince
https://hg.mozilla.org/integration/mozilla-inbound/rev/560bed9dac99
Log to stderr, capitalize messages; r=tomprince
Keywords: leave-open
Comment on attachment 8996100 [details]
Bug 1479533 - Command to repackage archives; r?glandium

Tom Prince [:tomprince] has approved the revision.

https://phabricator.services.mozilla.com/D1605
Attachment #8996100 - Flags: review+
Comment on attachment 8996100 [details]
Bug 1479533 - Command to repackage archives; r?glandium

Mike Hommey [:glandium] has approved the revision.
Attachment #8996100 - Flags: review+
Assignee: gps → nobody
Status: ASSIGNED → NEW

The leave-open keyword is there and there is no activity for 6 months.
:tomprince, maybe it's time to close this bug?

Flags: needinfo?(mozilla)

Repacking fetches has landed (in slightly different form in Bug 1571589, and the rest of patches here have landed.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(mozilla)
Resolution: --- → FIXED
Assignee: nobody → gps
You need to log in before you can comment on or make changes to this bug.