Closed Bug 1467359 Opened 7 years ago Closed 7 years ago

Fetch task improvements

Tracking

(firefox-esr60 fixed, firefox63 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla63

Tracking Flags:

Tracking

Status

firefox-esr60

---

fixed

firefox63

---

fixed

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(5 files, 5 obsolete files)

Bug 1467359 - Implement fetch-url jobs using a transform; r?tomprince 7 years ago Phabricator Automation 46 bytes, text/x-phabricator-request	tomprince : review+	Details \| Review
Bug 1467359 - Derive Treeherder symbol names for fetch jobs; r?tomprince 7 years ago Phabricator Automation 46 bytes, text/x-phabricator-request	tomprince : review+	Details \| Review
Bug 1467359 - Refactor archive decompression; r?dustin 7 years ago Phabricator Automation 46 bytes, text/x-phabricator-request	tomprince : review+	Details \| Review
Bug 1467359 - Refactor logic for extracting an archive; r?dustin 7 years ago Phabricator Automation 46 bytes, text/x-phabricator-request		Details \| Review
Bug 1467359 - Command to repackage archives; r?dustin 7 years ago Phabricator Automation 46 bytes, text/x-phabricator-request		Details \| Review
Bug 1467359 - Support repackaging fetched artifacts; r?dustin, glandium 7 years ago Phabricator Automation 46 bytes, text/x-phabricator-request		Details \| Review
Bug 1467359 - Use zstandard tarballs for toolchain fetches; r?glandium 7 years ago Phabricator Automation 46 bytes, text/x-phabricator-request	glandium : review+	Details \| Review
Bug 1467359 - Add Git and Subversion to fetch image; r?glandium 7 years ago Phabricator Automation 46 bytes, text/x-phabricator-request		Details \| Review
Bug 1467359 - Add fetch-content commands for obtaining svn and git archives; r?glandium 7 years ago Phabricator Automation 46 bytes, text/x-phabricator-request		Details \| Review
Bug 1467359 - Tasks to fetch LLVM subversion checkouts; r?dustin, glandium 7 years ago Phabricator Automation 46 bytes, text/x-phabricator-request		Details \| Review

Gregory Szorc [:gps]

Assignee

Description

•

7 years ago

Filing a bug to track the patches I'm about to submit for review.

Gregory Szorc [:gps]

Assignee

Updated

•

7 years ago

Assignee: nobody → gps

Status: NEW → ASSIGNED

Phabricator Automation

Comment 1

•

7 years ago

Attached file Bug 1467359 - Implement fetch-url jobs using a transform; r?tomprince — Details

Previously, the fetch kind was defined by a job "using" flavor. An upcoming commit will need to derive new tasks from fetch job definitions. In order to do this, we require a transform. And this transform would need access to the original data from the job description. But by the time the "using" transform runs, a lot of this data is thrown away. It would be possible to stuff the meaningful metadata inside attributes on the created task. But this would result in the fetch logic being fragmented across multiple Python modules (a fetch-specific transform and a job/using- specific transform). I think it is better to keep all that logic in a single Python module. This commit converts the "using: fetch-url" job transform to a dedicated transform for the fetch kind. Since we're now using a dedicated transform, we no longer need to use the normal job schema for defining fetch jobs. So we refactor the schema a little so it is simpler. I verified the taskgraph output is nearly identical to before by diffing the JSON output of `mach taskgraph full`. Aside from the additions of ['worker']['implementation'] and ['worker']['os'] fields (which seem to be required by a schema somewhere), everything was identical.

Phabricator Automation

Comment 2

•

7 years ago

Attached file Bug 1467359 - Derive Treeherder symbol names for fetch jobs; r?tomprince — Details

Having to define them explicitly feels too redundant for my liking. I believe we didn't do this before because we were defining things in terms of "using: fetch-url." Now that we are using a custom transform, we can have nice things.

Phabricator Automation

Comment 3

•

7 years ago

Attached file Bug 1467359 - Refactor archive decompression; r?dustin (obsolete) — Details

Previously, we told `tar` or `unzip` to operate on an explicit file. This worked when `tar` understood the compression format of the file. And this worked in the majority of cases. But `tar` does not support zstandard compression (at least not outside extremely new versions, which aren't yet widely deployed). And not all versions of `tar` support the `-a` argument. This commit changes our invocation of `tar` and `unzip` so input data is piped to it from Python. In the case of `tar`, we perform decompression in Python, if possible. This allows us to support zstandard and `tar` binaries that don't support `-a` to auto-detect the compression format.

Phabricator Automation

Comment 4

•

7 years ago

Comment on attachment 8984029 [details] Bug 1467359 - Derive Treeherder symbol names for fetch jobs; r?tomprince Tom Prince [:tomprince] has approved the revision. https://phabricator.services.mozilla.com/D1575

Attachment #8984029 - Flags: review+

Phabricator Automation

Comment 5

•

7 years ago

Attached file Bug 1467359 - Refactor logic for extracting an archive; r?dustin (obsolete) — Details

We add an is_archive() function to help identify a file as an archive. We make extract_archive() error if it is passed a file that isn't an archive. We now conditionally call extract_archive() depending on the result of is_archive(). This will make future code easier to implement.

Phabricator Automation

Comment 6

•

7 years ago

Attached file Bug 1467359 - Command to repackage archives; r?dustin (obsolete) — Details

Currently, we download archives and store them as artifacts verbatim. Sometimes it is useful to change the behavior of an archive. For example, we may want to normalize on a specific compression format. Or we may want to normalize the uid/gid stored in tar archives. Or we may want to strip leading path components and/or replace them with a static value. Or we may want to produce multiple archive variations - one optimized for size and another for speed. This commit introduces a mechanism for "rearchiving" archives that allows us to do these things. Given a source archive or directory and an output filename, the command will deduce the output format automatically and produce an archive matching the expected output. Subsequent work will integrate this feature into CI.

Phabricator Automation

Comment 7

•

7 years ago

Attached file Bug 1467359 - Support repackaging fetched artifacts; r?dustin, glandium (obsolete) — Details

We currently feed static-url fetched artifacts directly into dependent tasks, where they are typically extracted. This works and is the simplest implementation. A deficiency with the existing model is that archives used by tasks are whatever is provided by upstream. We currently have a mix of bzip, gzip, and xz compressed tarballs. Compared to say zstandard, all of these archive formats are relatively slow to decompress. And in all cases but xz, their compression ratios aren't as good. This commit introduces the ability to "repackage" fetches obtained via static-url fetch types. Essentially, the job defines an array of "repackage" operations to perform to the fetch. Each entry gets converted to a new task which takes the original fetch artifact and repackages it to a new archive format / artifact. By default, fetched tarballs automatically derive a repackage task that produces a zstandard compressed tarball. It was tempting to perform the repackging in the main fetch task itself. This is certainly doable. However, if things were implemented this way, we'd need to fetch the original artifact when repackaging settings change. A goal of the fetch tasks is to minimize dependencies on 3rd party services. It is advantageous for the 3rd party fetches to run as infrequently as possible. This means we need to separate 3rd party fetches from repackaging tasks. Because we derive job descriptions that are fed into the job transform, we needed to update the "using: run-task" schema to allow task references in the "command" attribute of definitions.

Phabricator Automation

Comment 8

•

7 years ago

Attached file Bug 1467359 - Use zstandard tarballs for toolchain fetches; r?glandium (obsolete) — Details

Life is too short to wait for bzip2 or lzma decompression when faster alternatives are available. Now that we automatically produce zstandard tarballs for all fetched artifacts, it is trivial to change the dependencies so we use the (faster) zstandard tarballs instead of the vanilla upstream tarballs. The toolchain build scripts pull the fetch dependencies from an environment variable and automatically recognize and support zstandard tarballs. So we just need to point the dependencies at the zstandard tarballs and everything should "just work."

Phabricator Automation

Comment 9

•

7 years ago

Attached file Bug 1467359 - Add Git and Subversion to fetch image; r?glandium — Details

So we can produce artifacts of Git and Subversion checkouts.

Phabricator Automation

Comment 10

•

7 years ago

Attached file Bug 1467359 - Add fetch-content commands for obtaining svn and git archives; r?glandium — Details

Various tasks operate on files that come from a checkout of a Git or Subversion repository. This commit introduces 2 new fetch-content commands for producing archives for Git and Subversion checkouts. Essentially, you provide the URL of a repository, a revision to operate on, and a destination path and the commands write an archive of all files in the checkout at that revision.

Phabricator Automation

Comment 11

•

7 years ago

Attached file Bug 1467359 - Tasks to fetch LLVM subversion checkouts; r?dustin, glandium — Details

We introduce a new fetch type that produces a tarball of a Subversion repository export. The tasks simply call into a `fetch-content` command. Rather than manually define the 3*6=18 tasks that would be required to fetch all LLVM project repositories, we also introduce an "llvm-release" fetch type that effectively calls into the svn-export-archive code path to generate all required tasks per entry. This keeps the YAML simple at the expense of some Python code. We don't yet hook the tasks or their artifacts up to the toolchain tasks. This will be done in a subsequent commit. I'm not yet 100% convinced that these tasks are producing the correct artifacts. But this definitely gets us a few steps closer to having toolchain tasks not perform `svn` operations in CI. Tweaks can be done as follow-ups easily enough.

Phabricator Automation

Comment 12

•

7 years ago

Comment on attachment 8984028 [details] Bug 1467359 - Implement fetch-url jobs using a transform; r?tomprince Tom Prince [:tomprince] has approved the revision. https://phabricator.services.mozilla.com/D1574

Attachment #8984028 - Flags: review+

Phabricator Automation

Comment 13

•

7 years ago

Comment on attachment 8984030 [details] Bug 1467359 - Refactor archive decompression; r?dustin Tom Prince [:tomprince] has approved the revision. https://phabricator.services.mozilla.com/D1576

Attachment #8984030 - Flags: review+

Phabricator Automation

Comment 14

•

7 years ago

Comment on attachment 8984898 [details] Bug 1467359 - Use zstandard tarballs for toolchain fetches; r?glandium Mike Hommey [:glandium] has approved the revision. https://phabricator.services.mozilla.com/D1607

Attachment #8984898 - Flags: review+

Phabricator Automation

Updated

•

7 years ago

Attachment #8984028 - Attachment description: Bug 1467359 - Implement fetch-url jobs using a transform; r?dustin → Bug 1467359 - Implement fetch-url jobs using a transform; r?tomprince

Phabricator Automation

Updated

•

7 years ago

Attachment #8984029 - Attachment description: Bug 1467359 - Derive Treeherder symbol names for fetch jobs; r?dustin → Bug 1467359 - Derive Treeherder symbol names for fetch jobs; r?tomprince

Phabricator Automation

Updated

•

7 years ago

Attachment #8984895 - Attachment is obsolete: true

Phabricator Automation

Updated

•

7 years ago

Attachment #8984030 - Attachment description: Bug 1467359 - Pipe decompressed data to tar; r?dustin → Bug 1467359 - Refactor archive decompression; r?dustin

Pulsebot

Comment 15

•

7 years ago

Pushed by gszorc@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d22981772dd9 Implement fetch-url jobs using a transform; r=tomprince https://hg.mozilla.org/integration/mozilla-inbound/rev/3bb31648056b Derive Treeherder symbol names for fetch jobs; r=tomprince

Gregory Szorc [:gps]

Assignee

Updated

•

7 years ago

Blocks: 1479533

Cosmin Sabou [:CosminS]

Comment 16

•

7 years ago

Backed out for linting opt failure on taskgraph/transforms/fetch.py Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=success&fromchange=3bb31648056bf9d7760c59420d26de25fbf2b268&tochange=f3adbcf8e716b4faf47710b745a7541ed0ee1ef7&filter-searchStr=Linting%20opt%20source-test-mozlint-py-flake8%20(f8)&selectedJob=190941495 Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=190941495&repo=mozilla-inbound&lineNumber=266 Backout link: https://hg.mozilla.org/integration/mozilla-inbound/rev/f3adbcf8e716b4faf47710b745a7541ed0ee1ef7 [task 2018-07-30T18:53:27.620Z] [task 2018-07-30T18:53:27.620Z] Error processing command. Ignoring because optional. (optional:packages.txt:comm/build/virtualenv_packages.txt) [task 2018-07-30T18:54:20.104Z] TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/fetch.py:12:1 | 'voluptuous.Extra' imported but unused (F401) [taskcluster 2018-07-30 18:54:20.642Z] === Task Finished === [taskcluster 2018-07-30 18:54:20.642Z] Unsuccessful task run with exit code: 1 completed in 327.367 seconds

Flags: needinfo?(gps)

Pulsebot

Comment 17

•

7 years ago

Pushed by gszorc@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/18e9f8b05874 Implement fetch-url jobs using a transform; r=tomprince https://hg.mozilla.org/integration/mozilla-inbound/rev/6dce0e15337b Derive Treeherder symbol names for fetch jobs; r=tomprince

Phabricator Automation

Updated

•

7 years ago

Attachment #8984030 - Attachment is obsolete: true

Phabricator Automation

Updated

•

7 years ago

Attachment #8984896 - Attachment is obsolete: true

Phabricator Automation

Updated

•

7 years ago

Attachment #8984897 - Attachment is obsolete: true

Phabricator Automation

Updated

•

7 years ago

Attachment #8984898 - Attachment is obsolete: true

Stefan Hindli [:stefan_hindli]

Comment 18

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/18e9f8b05874 https://hg.mozilla.org/mozilla-central/rev/6dce0e15337b

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox63: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla63

Pulsebot

Comment 19

•

7 years ago

Pushed by mozilla@jorgk.com: https://hg.mozilla.org/comm-central/rev/588cd38b040d Port bug 1467359 - add Fetch-URL to config.yml. rs=bustage-fix DONTBUILD

Gregory Szorc [:gps]

Assignee

Comment 20

•

7 years ago

Relanded with trivial fix to remove unused import.

Flags: needinfo?(gps)

Tom Prince [:tomprince]

Comment 21

•

6 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-esr60/rev/0aa10e365245 https://hg.mozilla.org/releases/mozilla-esr60/rev/a02c3d2f6c0e

status-firefox-esr60: --- → fixed

Jorg K (CEST = GMT+2)

Comment 22

•

6 years ago

https://hg.mozilla.org/releases/comm-esr60/rev/e224e5b9ec04a6a83119e5698f96f0b6c93c9be4

You need to log in before you can comment on or make changes to this bug.