Enable sourcedir caches for Windows generic-worker builds
Categories
(Firefox Build System :: Task Configuration, task, P1)
Tracking
(firefox75 fixed)
Tracking | Status | |
---|---|---|
firefox75 | --- | fixed |
People
(Reporter: ahal, Assigned: tomprince)
References
Details
(Whiteboard: [ci-costs-2020:done])
Attachments
(24 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
Bug 1527313: [taskgraph] Ensure that the mozharness workspace is part of a docker volume; r?glandium
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
In bug 1519472 I turned on caching for generic-worker tasks. However there's a bug in generic-worker that was causing these builds to fail. The error happens because the cache is set to a different volume than the workspace.
We can either wait for the fix in bug 1526311 or try to move the cache from Y: to Z:. In bug 1526311#c1 it looks like the fact the checkout happens on Y: might have been a hack to work around an issue anyway, and keeping it on Z: is more optimal. Might be worth investigating whether switching back to Z: causes any problems.
Reporter | ||
Updated•6 years ago
|
Comment 1•6 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #0)
In bug 1519472 I turned on caching for generic-worker tasks. However there's a bug in generic-worker that was causing these builds to fail. The error happens because the cache is set to a different volume than the workspace.
We can either wait for the fix in bug 1526311 or try to move the cache from Y: to Z:. In bug 1526311 comment 1 it looks like the fact the checkout happens on Y: might have been a hack to work around an issue anyway, and keeping it on Z: is more optimal. Might be worth investigating whether switching back to Z: causes any problems.
I've added bug 1528198 as a dependency, which is the bug to move caches from Y: to Z:. That bug is blocked on bug 1433854, which is another generic-worker fix.
Comment 2•6 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #1)
I've added bug 1528198 as a dependency, which is the bug to move caches from Y: to Z:. That bug is blocked on bug 1433854, which is another generic-worker fix.
It turns out bug 1528198 does not depend on bug 1433854 after all, so we can probably go ahead with bug 1528198, which also removes the dependency on bug 1526311, since that only exists when the caches are on a different drive.
So I think we only need to implement bug 1528198 to unblock this bug. \o/
Comment 3•6 years ago
|
||
Please don't enable caches without doing something about bug 1528891. The current situation is a mess.
- Many (most?) build scripts assume some paths, ignoring whatever taskcluster sets in the environment
- Windows builds are set up differently from docker-worker builds
- Both Windows builds and docker-worker builds setups wrt caches are broken, in different ways
- Docker worker builds set a checkout cache and a workspace cache. The former is shared across all job types. The latter is not. Most builds checkout in the workspace cache directory, which is actually a good thing because they also build in there, which means the checkout is polluted with build artifacts. Lots of them.
- Windows builds, with caches enabled, set a checkout cache and nothing else, shared across all job types. Builds happen in that directory. Which means build artifacts are being cached along the checkouts.
- Interestingly, we do checkout in a way that cleans up build artifacts, so they don't actually after subsequent builds, but that means we may spend an enormous amount of time cleaning up a large amount of data that we cached... for nothing. (and with the slow I/O on Windows, this is noticeable)
- While bug 1519472 disabled windows caches on some task kinds, it didn't do so on all windows builds... most notably, use-caches is still true for spidermonkey jobs.
Comment 4•6 years ago
|
||
It turns out the setup on docker workers is actually fine, because robustcheckout does put the hg-share in the checkout cache. But per bug 1528891, Windows workers don't do that.
Assignee | ||
Comment 5•6 years ago
|
||
I think the way to address this move the objdir out of the source directory, so that we can cache the source directory, and the object directory doesn't get cached. This might improve the behavior slightly on linux, as well.
Assignee | ||
Updated•6 years ago
|
Comment 6•5 years ago
|
||
(In reply to Tom Prince [:tomprince] from comment #5)
I think the way to address this move the objdir out of the source directory, so that we can cache the source directory, and the object directory doesn't get cached. This might improve the behavior slightly on linux, as well.
Tom, is this something we can resurface as a higher priority? If you're super swamped I completely understand, but maybe consider it for the near future. It would be great if we could reduce our rather long Windows build times a bit, it's not clear how big of an impact this would have but it looks like maybe 5-10 minutes.
Comment 7•5 years ago
|
||
Jordan, could you please add this to your roadmap for H1? 5-10 minutes of Windows build would save a lot of money.
thanks
Comment 8•5 years ago
|
||
Sorry I was on PTO on Jan 2nd and 3rd.
Tom and I will discuss this this week. Leaving needinfo open and reply with an update tomorrow (Jan 7th).
Comment 9•5 years ago
|
||
Tom met with some build and dev workflow folks to try and get a better idea of the options we have, what they can do, and what work would be involved on our side.
Tom, could you update status here so we are all on the same page.
We can allocate time for releng bits in H1.
Assignee | ||
Comment 10•5 years ago
|
||
I chatted with the build team. objdir outside the checkout should be supported (and there are people that use this configuraiton locally.
It should be possible to adjust the taskcluster+mozharness config to point to a new objdir and the build should work, though it would be good to verify via diffoscope that that is in fact the case.
Comment 11•5 years ago
|
||
Note that some commands do always create an objdir under the source directory for virtualenvs. Those would need to be changed to be able to create them outside the source directory.
Comment 12•5 years ago
|
||
As a data point to validate the potential benefit here: it looks like the median number of tasks performed by the windows build workers is 5 tasks per instance.
Comment 13•5 years ago
|
||
the median number of tasks performed by the windows build workers is 5 tasks per instance.
That doesn't seem like a lot.
Reporter | ||
Comment 14•5 years ago
•
|
||
(In reply to Mike Hommey [:glandium] from comment #11)
Note that some commands do always create an objdir under the source directory for virtualenvs. Those would need to be changed to be able to create them outside the source directory.
Afaik, they'll create the objdir wherever the objdir is supposed to exist, as determined by MozbuildObject.from_environment().topobjdir
. So we shouldn't need to worry about this as long as the build system is configured to use .
Comment 15•5 years ago
|
||
The builds have MOZ_OBJDIR set to obj-firefox, and that doesn't prevent obj-x86_64-pc-linux-gnu from being created on builders (I've seen it on loaners).
Reporter | ||
Comment 16•5 years ago
•
|
||
Interesting, here's where the virtualenv path is defined:
https://searchfox.org/mozilla-central/rev/f98dad153b59a985efd4505912588d4651033395/python/mozbuild/mozbuild/base.py#269
I see configure instantiates a few of these classes, but appears to use the same path as well. So I guess for whatever reason, that MOZ_OBJDIR
isn't getting picked up by the MozbuildObject
class.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 17•5 years ago
|
||
The alternative code path was unused (as demonstrated by the presence of
manifestdestiny
package). Remove that code path, so we can fail with a better
error message, if we don't have the right path to the requirements file.
Updated•5 years ago
|
Assignee | ||
Comment 18•5 years ago
|
||
The alternative code path was unused (as demonstrated by the presence of
manifestdestiny
package). Remove that code path, so we can fail with a better
error message, if we don't have the right path to the requirements file.
Assignee | ||
Comment 19•5 years ago
|
||
Assignee | ||
Comment 20•5 years ago
|
||
Assignee | ||
Comment 21•5 years ago
|
||
Assignee | ||
Comment 22•5 years ago
|
||
The build tasks already use that spelling, so make the naming consistent.
Assignee | ||
Comment 23•5 years ago
|
||
The original code hard-coded the path of the source directory. Instead, use the
actual source directory.
Assignee | ||
Comment 24•5 years ago
|
||
Comment 25•5 years ago
|
||
Pushed by mozilla@hocat.ca: https://hg.mozilla.org/integration/autoland/rev/5db1dc4cff3d [mozharness] Fail if we can't find the mozbase requirements file; r=ahal https://hg.mozilla.org/integration/autoland/rev/40216892b9f4 [mozharness] Fail if we can't find the marionette requirements file; r=ahal
Assignee | ||
Comment 26•5 years ago
|
||
In automation, GECKO_PATH
always refers to the source directory, so use that in mozharness
rather than assuming it is somewhere relative to work_dir
.
Assignee | ||
Comment 27•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 28•5 years ago
|
||
Assignee | ||
Comment 29•5 years ago
|
||
Comment 30•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 31•5 years ago
|
||
We use the shell to expand this, rather than substituting the value here,
because GECKO_PATH
will be set in the run_task
transform (after the rest of this
stack lands).
Assignee | ||
Comment 32•5 years ago
|
||
We set it here, rather than depending on taskcluster script to set it, so that
we can use it to construct the objdir we will use.
Assignee | ||
Comment 33•5 years ago
|
||
This moves the object directory and source directories around in all mozharness
jobs, to allow enabling caching on windows builders.
This makes a number of changes that all need to land at once:
- Move the source checkout for the workspace cache mount, to the checkouts
cache mount. - Makes the object directoy from underneath the source directory, to directly
in the work directory (which is still underworkspace
). - Sets the object directory to
obj-build
instead ofobj-firefox
. - Stops caching the workspace directory (it is still a volume in docker workers,
so writes perform well; a followup revision add some checks around this). - Removes one level of directory in the mozharness workdir (things were under
workspace/build
, but are now just underworkspace/
. - Adjust paths in environment variables and artifact specifications to match
the above changes.
Assignee | ||
Comment 34•5 years ago
|
||
Since mozharness tasks are no longer caching the workspace directory, we don't
need a key for different tasks.
Assignee | ||
Comment 35•5 years ago
|
||
Since the workspace is no longer cached, but needs to be a volume for
performance reasons, add a check to ensure that is the case.
Assignee | ||
Comment 36•5 years ago
|
||
Assignee | ||
Comment 37•5 years ago
|
||
These jobs only use the workspace for obj directories, have some some logic to
cleanup the directory at the beginning of the run, so there is no reason to
cache the directory.
This also removes the now-unused common code for creating a workspace cache directory.
Assignee | ||
Comment 38•5 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #15)
The builds have MOZ_OBJDIR set to obj-firefox, and that doesn't prevent obj-x86_64-pc-linux-gnu from being created on builders (I've seen it on loaners).
I think this is because linux workers run mozharness with mach
, so at that point mach
is being run without a .mozconfig
, and does not appear to honor the MOZ_OBJDIR
in the environment at that point, in that state.
Comment 39•5 years ago
|
||
Pushed by mozilla@hocat.ca: https://hg.mozilla.org/integration/autoland/rev/7989682c179c [taskgraph] Use `artifact-reference` in upload-sources task; r=Callek https://hg.mozilla.org/integration/autoland/rev/5625ed988192 Consolidate multi-locale config, since it does not vary by branch; r=Callek https://hg.mozilla.org/integration/autoland/rev/e72c5f59ff11 [mozharness] Use `abs_src_dir` instead of `abs_mozilla_dir` in l10n; r=Callek https://hg.mozilla.org/integration/autoland/rev/073a74ec38c4 [mozharness] Find multi-l10n source files relative to source directory; r=Callek https://hg.mozilla.org/integration/autoland/rev/07bfd4b77f6c [mozharness] Rename `abs_objdir` to `abs_obj_dir`; r=Callek https://hg.mozilla.org/integration/autoland/rev/2e2bf618e78c [mozharness] Use `GECKO_PATH` consitently to find the source directory; r=Callek https://hg.mozilla.org/integration/autoland/rev/9ce1849d70cc [mozharness] Unconditionally set `abs_obj_dir` in l10n repcks; r=Callek https://hg.mozilla.org/integration/autoland/rev/7dd467519723 [mozharness] Don't hardcode `MOZ_OBJDIR` in mozharness configs; r=Callek
Comment 40•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7989682c179c
https://hg.mozilla.org/mozilla-central/rev/5625ed988192
https://hg.mozilla.org/mozilla-central/rev/e72c5f59ff11
https://hg.mozilla.org/mozilla-central/rev/073a74ec38c4
https://hg.mozilla.org/mozilla-central/rev/07bfd4b77f6c
https://hg.mozilla.org/mozilla-central/rev/2e2bf618e78c
https://hg.mozilla.org/mozilla-central/rev/9ce1849d70cc
https://hg.mozilla.org/mozilla-central/rev/7dd467519723
Comment 41•5 years ago
|
||
Pushed by mozilla@hocat.ca: https://hg.mozilla.org/integration/autoland/rev/c290aa0e39be Reduce extraneous differences in taskcluster mozharness scripts; r=Callek https://hg.mozilla.org/integration/autoland/rev/328669c39fa8 [mozharness] Checkout l10n-central next to mozilla-central; r=firefox-build-system-reviewers,mshal https://hg.mozilla.org/integration/autoland/rev/d8b482f1e496 [taskgraph] Use `GECKO_PATH` to find the taskcluster mozharness script; r=Callek https://hg.mozilla.org/integration/autoland/rev/182892303a6e [taskgraph] Set `$WORKSPACE` for mozharness tasks in taskgraph; r=Callek
Assignee | ||
Comment 42•5 years ago
|
||
Assignee | ||
Comment 43•5 years ago
|
||
Comment 44•5 years ago
|
||
bugherder |
Assignee | ||
Comment 45•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 46•5 years ago
|
||
Assignee | ||
Comment 47•5 years ago
|
||
This is instead of extracting them in the gecko source directory, where they
will get deleted by the next task anyway.
Comment 48•5 years ago
|
||
Pushed by mozilla@hocat.ca: https://hg.mozilla.org/integration/autoland/rev/9339904adde3 Pass workdir down to multi-l10n script; r=Callek https://hg.mozilla.org/integration/autoland/rev/f2a1721f5d43 Use `GECKO_PATH` based directories in more places in android pgo; r=nalexander https://hg.mozilla.org/integration/autoland/rev/5ee475239171 Set `MOZ_OBJDIR` explicitly in release-source tasks; r=Callek https://hg.mozilla.org/integration/autoland/rev/278f9917616e Adjust openh264 build to not assume the the source checkout is in work-dir; r=Callek https://hg.mozilla.org/integration/autoland/rev/42db12ccc1be Move objdir out of source directory for all mozharness builds; r=glandium,Callek https://hg.mozilla.org/integration/autoland/rev/66b544de8846 [taskgraph] Remove workspace-key from mozharness tasks; r=glandium https://hg.mozilla.org/integration/autoland/rev/424f2ad03fe8 [taskgraph] Ensure that the mozharness workspace is part of a docker volume; r=glandium https://hg.mozilla.org/integration/autoland/rev/f4c911c975a3 Adjust openh264 build to extract tools in uncached workspace; r=Callek https://hg.mozilla.org/integration/autoland/rev/53bbc6da5941 Enable caching on windows builds; r=glandium https://hg.mozilla.org/integration/autoland/rev/fbe3fcbb31d8 [taskgraph] Don't use a workspace cache for hazard builds; r=glandium
Assignee | ||
Updated•5 years ago
|
Comment 49•5 years ago
|
||
Pushed by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/autoland/rev/dfaa6a437e00 Fix Windows cross-compilation bustage. patch by glandium on Matrix
Comment 50•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9339904adde3
https://hg.mozilla.org/mozilla-central/rev/f2a1721f5d43
https://hg.mozilla.org/mozilla-central/rev/5ee475239171
https://hg.mozilla.org/mozilla-central/rev/278f9917616e
https://hg.mozilla.org/mozilla-central/rev/42db12ccc1be
https://hg.mozilla.org/mozilla-central/rev/66b544de8846
https://hg.mozilla.org/mozilla-central/rev/424f2ad03fe8
https://hg.mozilla.org/mozilla-central/rev/f4c911c975a3
https://hg.mozilla.org/mozilla-central/rev/53bbc6da5941
https://hg.mozilla.org/mozilla-central/rev/fbe3fcbb31d8
https://hg.mozilla.org/mozilla-central/rev/dfaa6a437e00
Description
•