"comm" clone remains in Firefox checkout, even in non-TB CI jobs.
Categories
(Firefox Build System :: Task Configuration, defect)
Tracking
(firefox97 fixed)
Tracking | Status | |
---|---|---|
firefox97 | --- | fixed |
People
(Reporter: mhentges, Assigned: mozilla)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
This can lead to intermittents due to in-tree code assuming it's running from a specific revision without any code changes. Since this "mutated checkout" state isn't consistent, tracking down associated failures will be tough.
This bug was caused by this, because a previous Thunderbird task left comm/
sitting in the checkout.
Assignee | ||
Comment 1•2 years ago
|
||
For docker-worker, at least, it looks like we:
- grab a cache that persists across tasks, e.g.
gecko-level-3-checkouts-hg58-v3-35e6d2147228a7dd8319
, and mount that at/builds/worker/checkouts
- we set the hg-store path to
/builds/worker/checkouts/hg-store
, which makes sense, because we want the hg store to persist across tasks - we set the gecko path to
/builds/worker/checkouts/gecko
, which also persists across tasks
Then we hg robustcheckout
with --purge
, which in theory should clean up all unknown files lying around inside the clone and give us a clean checkout. This runs hg purge.
However, hg purge
does not remove a comm
checkout inside of your gecko
checkout, presumably because we're only purging our clone, not nested clones. To verify, I cloned comm-central inside mozilla-unified, then ran hg purge -p --all
. This printed all unknown files, including my objdir, but did not mention comm/
.
Therefore, I'm inclined to rename this bug to one of the following:
- "
robustcheckout --purge
doesn't handle nested clones", or - "thunderbird builds should use a separate checkout path, e.g. /builds/worker/checkouts/tb-gecko/ [until they run from a single branch]" (this may be disk inefficient, but may avoid this bug), or
- "thunderbird builds should use separate worker pools". This last will be very inefficient, especially with our limited PGO hardware builder pools.
I'm leaning towards someone adding a nested clone cleanup option to robustcheckout. What do you think? Keeping this under the --purge
option will be easier, since we don't have to uplift everywhere, but --purge
that does more than hg purge
may be a confusing misnomer. But having a different option may require us to uplift everywhere to roll out.
Assignee | ||
Comment 2•2 years ago
|
||
(Maybe hg purge
should be able to handle nested clones, given a --include-nested-clones arg or similar.)
Assignee | ||
Comment 3•2 years ago
|
||
(Or maybe whatever directory traversal we're using for our python stuff should be clone-boundary aware?)
Reporter | ||
Comment 4•2 years ago
|
||
Therefore, I'm inclined to rename this bug to one of the following:
Hmm, good point, this bug name is a little bit too broad. I'll modify it :)
I'm leaning towards someone adding a nested clone cleanup option to robustcheckout. What do you think? Keeping this under the --purge option will be easier, since we don't have to uplift everywhere, but --purge that does more than hg purge may be a confusing misnomer. But having a different option may require us to uplift everywhere to roll out.
That makes sense - it looks like hg
handles nested repos in weird ways.
For example, even if comm
is removed from .hgignore
, it still doesn't show up in hg status
or even hg status -A
.
Another option is that, in the same code that runs hg purge
, we can optionally remove <checkout>/comm
if the current task doesn't need Thunderbird? Unsure how viable that is :)
Assignee | ||
Comment 5•2 years ago
|
||
I'd rather we be generic since the issue will appear for any nested clone, but yeah, we could hardcode nuking comm/.
Assignee | ||
Comment 6•2 years ago
•
|
||
Moving components. If we hack the comm
removal, that's probably in https://searchfox.org/mozilla-central/source/taskcluster/scripts/run-task, probably after we run robustcheckout --purge
on line 471.
Otherwise we may want to hack https://hg.mozilla.org/hgcustom/version-control-tools/file/tip/hgext/robustcheckout to update robustcheckout --purge
to purge across clone boundaries, or we may want to hack https://www.mercurial-scm.org/repo/hg/file/tip/hgext/purge.py to purge across clone boundaries. (Edit: this last may be more dirstate
; purge is now hg core)
Reporter | ||
Comment 7•2 years ago
|
||
Thanks Aki, I appreciate it. I'll float this by :sheehan as well next time we have a chat :)
Assignee | ||
Comment 8•2 years ago
•
|
||
I suspect https://www.mercurial-scm.org/repo/hg/file/tip/rust/hg-core/src/dirstate_tree/status.rs for the upstream, but I'm not familiar enough with the mercurial project to know if they've fully cut over to rust dirstate. (Edit: yeah, I think dirstate_node
means it has an .hg/dirstate
, and we compare these against our current path, and so we ignore nested clones. Not sure if we can convince upstream that we want to be able to nuke, or at least detect, nested clones)
Assignee | ||
Comment 9•2 years ago
|
||
Oh, https://www.mercurial-scm.org/repo/hg/annotate/tip/rust/hg-core/src/dirstate_tree/status.rs#l703
# DirEntry
707 /// If a `.hg` sub-directory is encountered:
708 ///
709 /// * At the repository root, ignore that sub-directory
710 /// * Elsewhere, we’re listing the content of a sub-repo. Return an empty
711 /// list instead.
<snip>
719 if name == b".hg" {
720 if is_at_repo_root {
721 // Skip the repo’s own .hg (might be a symlink)
722 continue;
723 } else if metadata.is_dir() {
724 // A .hg sub-directory at another location means a subrepo,
725 // skip it entirely.
726 return Ok(Vec::new());
727 }
728 }
Comment 10•2 years ago
•
|
||
An easy way out would be for comm-central to use different caches. As a matter of fact, it is concerning that comm-central ends up using caches from m-c!! (and vice versa)
Assignee | ||
Comment 11•2 years ago
|
||
Hm, https://hg.mozilla.org/comm-central/file/tip/taskcluster/ci/config.yml#l5 says trust-domain: comm
, and https://searchfox.org/mozilla-central/source/taskcluster/gecko_taskgraph/transforms/task.py#2086 seems to tell me their caches should be named e.g. comm-level-3-*
, not sure what's going on.
Assignee | ||
Comment 12•2 years ago
|
||
And indeed, https://firefox-ci-tc.services.mozilla.com/tasks/aUU8VrF4S7KkN8qMOrGR3A/definition says
"cache": {
"comm-level-3-checkouts-hg58-v3-35e6d2147228a7dd8319": "/builds/worker/checkouts"
},
Assignee | ||
Comment 13•2 years ago
|
||
Ah, mystery solved.
https://firefox-ci-tc.services.mozilla.com/provisioners/gecko-3/worker-types/b-linux/workers/us-west-2/i-0f07dbdd34c9d0b11 is the busted worker. Before running the busted task it ran https://firefox-ci-tc.services.mozilla.com/tasks/Z_7cNdyASPi5LMkEnfcEbw/runs/0 .
Cross-channel clones all of the shipping repos and then builds a set of all en-US strings for localizers, so we cloned comm-central as part of the task.
Solutions:
- nuke comm/ at the end of cross-channel (finally block after https://searchfox.org/mozilla-central/rev/361f258f46af4b9c881be81d1291000827c15704/tools/compare-locales/mach_commands.py#182 ?)
- I think :rjl was looking at moving cross-channel for thunderbird into a separate task / repo so :flod doesn't have to deal with Thunderbird strings; if we ran these tasks on comm workers, we wouldn't have to worry about comm clones. (This solution may take a while to implement, though.)
- continue with robustcheckout/hg purge nested-clone support, since the above two address this one case, but this solution will solve future cases as well
Assignee | ||
Comment 14•2 years ago
|
||
Updated•2 years ago
|
Assignee | ||
Comment 15•2 years ago
|
||
^ should address the first point. I think the second is in progress, and we may want to file a followup for the third.
Comment 16•2 years ago
|
||
Pushed by asasaki@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/68098f573c46 nuke comm/ after cross-channel. r=releng-reviewers,jmaher DONTBUILD
Comment 17•2 years ago
|
||
bugherder |
Comment 18•2 years ago
|
||
Backed out for causing Bug 1747545.
[task 2021-12-24T11:08:06.995Z] Processing mozilla-unified in /builds/worker/checkouts/gecko
[task 2021-12-24T11:08:06.995Z] Gathering files for central
[task 2021-12-24T11:08:06.995Z] Gathering files for beta
[task 2021-12-24T11:08:06.995Z] Gathering files for release
[task 2021-12-24T11:08:06.995Z] Gathering files for esr91
[task 2021-12-24T11:08:06.995Z] Writing mozilla-unified content to target
[task 2021-12-24T11:08:06.995Z] Error running mach:
[task 2021-12-24T11:08:06.995Z]
[task 2021-12-24T11:08:06.995Z] ['l10n-cross-channel', '-o', '/builds/worker/artifacts/outgoing.diff', '--attempts', '5', '--ssh-secret', 'project/releng/gecko/build/level-3/l10n-cross-channel-quarantine-ssh', 'prep', 'create', 'push']
[task 2021-12-24T11:08:06.995Z]
[task 2021-12-24T11:08:06.995Z] The error occurred in code that was called by the mach command. This is either
[task 2021-12-24T11:08:06.995Z] a bug in the called code itself or in the way that mach is calling it.
[task 2021-12-24T11:08:06.995Z] You can invoke |./mach busted| to check if this issue is already on file. If it
[task 2021-12-24T11:08:06.995Z] isn't, please use |./mach busted file l10n-cross-channel| to report it. If |./mach busted| is
[task 2021-12-24T11:08:06.995Z] misbehaving, you can also inspect the dependencies of bug 1543241.
[task 2021-12-24T11:08:06.995Z]
[task 2021-12-24T11:08:06.995Z] If filing a bug, please include the full output of mach, including this error
[task 2021-12-24T11:08:06.995Z] message.
[task 2021-12-24T11:08:06.995Z]
[task 2021-12-24T11:08:06.995Z] The details of the failure are as follows:
[task 2021-12-24T11:08:06.995Z]
[task 2021-12-24T11:08:06.995Z] hglib.error.ServerError: server exited with status 255: b'abort: repository /builds/worker/checkouts/gecko/comm not found'
[task 2021-12-24T11:08:06.995Z]
[task 2021-12-24T11:08:06.995Z] File "/builds/worker/checkouts/gecko/tools/compare-locales/mach_commands.py", line 194, in cross_channel
[task 2021-12-24T11:08:06.995Z] actions,
[task 2021-12-24T11:08:06.995Z] File "/builds/worker/checkouts/gecko/third_party/python/redo/redo/__init__.py", line 185, in retry
[task 2021-12-24T11:08:06.995Z] return action(*args, **kwargs)
[task 2021-12-24T11:08:06.995Z] File "/builds/worker/checkouts/gecko/tools/compare-locales/mach_commands.py", line 297, in _do_create_content
[task 2021-12-24T11:08:06.995Z] status = ccc.create_content()
[task 2021-12-24T11:08:06.995Z] File "/builds/worker/checkouts/gecko/python/l10n/mozxchannel/__init__.py", line 103, in create_content
[task 2021-12-24T11:08:06.995Z] with hglib.open(repo_config["path"]) as repo:
[task 2021-12-24T11:08:06.995Z] File "/builds/worker/checkouts/gecko/third_party/python/python-hglib/hglib/__init__.py", line 11, in open
[task 2021-12-24T11:08:06.995Z] return client.hgclient(path, encoding, configs)
[task 2021-12-24T11:08:06.995Z] File "/builds/worker/checkouts/gecko/third_party/python/python-hglib/hglib/client.py", line 67, in __init__
[task 2021-12-24T11:08:06.995Z] self.open()
[task 2021-12-24T11:08:06.995Z] File "/builds/worker/checkouts/gecko/third_party/python/python-hglib/hglib/client.py", line 261, in open
[task 2021-12-24T11:08:06.995Z] % (ret, serr.strip()))
[taskcluster 2021-12-24 11:08:07.393Z] === Task Finished ===
[taskcluster 2021-12-24 11:08:07.506Z] Artifact "public/build" not found at "/builds/worker/artifacts"
[taskcluster 2021-12-24 11:08:07.606Z] Unsuccessful task run with exit code: 1 completed in 261.352 seconds
Comment 20•2 years ago
|
||
Pushed by asasaki@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/237d30dec089 nuke comm/ after cross-channel. r=mhentges,releng-reviewers,jmaher DONTBUILD
Comment 21•2 years ago
|
||
bugherder |
Description
•