Closed Bug 1382982 Opened 7 years ago Closed 6 years ago

Permafailing artifact builds on autoland: Exception: Could not find any candidate pushheads in the last 50 revisions.

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox57 wontfix, firefox61 fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox57 --- wontfix
firefox61 --- fixed

People

(Reporter: aryx, Assigned: chmanchester)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(1 file)

So, according to the logs of various of those builds, the current state of affairs is that we're lucky when the artifacts builds work on autoland, not unlucky when they don't.

What artifact builds do is look at the last public changeset in the last 50 changesets in the history of the repo being built. Then it tries to find it on mozilla-central, mozilla-inbound and beta.

For this to actually work, it needs the last public changeset on autoland to be on those repos. Guess what? according to the logs of the artifact builds, green or red, the last public changeset on autoland changes kind of randomly. I suspect manual pushes to autoland are one source, but that doesn't seem to be all.

Anyways, it appears that we're lucky in most cases, and the last public changeset happens to also be on mozilla-central because it's not recent enough and was part of the last merge. But sometimes, we're not so lucky, and it's not, and then artifact builds are busted. The last time I know this happened was in February, but maybe it happened since then and I'm just unaware of it, I don't look at autoland bustages ;)

The only way this fixes itself currently is when the last public changeset on autoland is merged to mozilla-central.

An easy way out would be add autoland to the candidate trees in artifacts.py. That would only paper over deeper issues:
- not all changesets on autoland are public, which may be a conscious choice, but then /something/, at least one of which is people pushing to autoland manually, is making some of them public while they shouldn't be.
- artifact builds are building js/xul changes against the last public changeset, and when they're not busted currently, that means a changeset on mozilla-central, so, many changesets ago. This might not really be in sync with whatever changes have happend on autoland.

Ideally artifact builds should be depending on the corresponding task they should download artifacts from, and be forced to download from there, not some random changeset that happens to be public and have artifacts.
As glandium said, this is a fundamental problem with the way we look for artifacts today. The builds will start working again once autoland and mozilla-central are merged with each other.

Bug 1382507 will also help. I may try to hack on that today now that I know the current implementation can cause problems like this.
Adding the keyword so this gets suggested in Treeherder for the failures.


When Ryan did merges this morning, we got one green artifact build out of it, before it went back to failing: https://treeherder.mozilla.org/#/jobs?repo=autoland&fromchange=320f9642bcef85a3934e94a6ef600cd3f3b5622f&noautoclassify&filter-searchStr=28bf2bc2f2111945292e86c9a27b389ce3d534bc&selectedJob=116389135
Summary: artifact builds on autoland broken: Exception: Could not find any candidate pushheads in the last 50 revisions. → Permafailing artifact builds on autoland: Exception: Could not find any candidate pushheads in the last 50 revisions.
Things will work once a public changeset from central or inbound gets near the top of the autoland repo, preferably not behind a head of a merge.
Whiteboard: [stockwell needswork]
Component: Buildduty → Build Config
Product: Release Engineering → Core
QA Contact: catlee
Whiteboard: [stockwell needswork] → [stockwell infra]
:catlee, this keeps showing up on my intermittent failure dashboards, is there work to do here so the failures are not so common?
Flags: needinfo?(catlee)
Flags: needinfo?(catlee) → needinfo?(gps)
The easy fix is a) search for artifact builds on autoland repo b) increase the range of pushes that we examine.

The proper fix is to change the artifact search code to be DAG aware. I have patches for bug 1382507 in my local repo. Those are waiting on bug 1393242, which should land shortly. I'll try to get those landed in the next week or so.
Flags: needinfo?(gps)
:ted, would you have any ideas why this might have occurred at such a high frequency yesterday?
Flags: needinfo?(ted)
I don't know much about the artifact build code, sorry. chmanchester or nalexander would be better people to ask.
Flags: needinfo?(ted)
This is the issue described in comment 1. Somewhere around the back out of f13bc708c440 yesterday the artifact builds started seeing this changeset as public, and because it wasn't on central yet subsequent artifact builds failed, I guess until it got merged to central.
This bug has failed 68 times in the last 7 days, only failed  Linux affecting debug build.

Failing test: build-linux64-artifact.

Part of that log:ask 2018-02-14T22:03:27.642Z] 22:03:27     INFO -  Exception: Could not find any candidate pushheads in the last 50 revisions.
[task 2018-02-14T22:03:27.643Z] 22:03:27     INFO -  Search started with 57b4c9bc875a6b7d61ced24ee06bce7cd6585f30, which must be known to Mozilla automation.
[task 2018-02-14T22:03:27.643Z] 22:03:27     INFO -  see https://developer.mozilla.org/en-US/docs/Artifact_builds
[task 2018-02-14T22:03:27.643Z] 22:03:27     INFO -    File "/builds/worker/workspace/build/src/python/mozbuild/mozbuild/mach_commands.py", line 1182, in artifact_install
[task 2018-02-14T22:03:27.643Z] 22:03:27     INFO -      return artifacts.install_from(source, self.distdir)
[task 2018-02-14T22:03:27.643Z] 22:03:27     INFO -    File "/builds/worker/workspace/build/src/python/mozbuild/mozbuild/artifacts.py", line 1164, in install_from
[task 2018-02-14T22:03:27.644Z] 22:03:27     INFO -      return self.install_from_recent(distdir)
[task 2018-02-14T22:03:27.644Z] 22:03:27     INFO -    File "/builds/worker/workspace/build/src/python/mozbuild/mozbuild/artifacts.py", line 1125, in install_from_recent
[task 2018-02-14T22:03:27.644Z] 22:03:27     INFO -      return self._install_from_hg_pushheads(hg_pushheads, distdir)
[task 2018-02-14T22:03:27.644Z] 22:03:27     INFO -    File "/builds/worker/workspace/build/src/python/mozbuild/mozbuild/artifacts.py", line 1104, in _install_from_hg_pushheads
[task 2018-02-14T22:03:27.645Z] 22:03:27     INFO -      for trees, hg_hash in hg_pushheads:
[task 2018-02-14T22:03:27.645Z] 22:03:27     INFO -    File "/builds/worker/workspace/build/src/python/mozbuild/mozbuild/artifacts.py", line 1021, in _find_pushheads
[task 2018-02-14T22:03:27.645Z] 22:03:27     INFO -      rev=last_revs[0], num=NUM_PUSHHEADS_TO_QUERY_PER_PARENT))
[task 2018-02-14T22:03:27.645Z] 22:03:27     INFO -  Makefile:221: recipe for target 'recurse_artifact' failed
[task 2018-02-14T22:03:27.645Z] 22:03:27     INFO -  make[3]: *** [recurse_artifact] Error 1
[task 2018-02-14T22:03:27.646Z] 22:03:27     INFO -  make[3]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox'
[task 2018-02-14T22:03:27.646Z] 22:03:27     INFO -  /builds/worker/workspace/build/src/config/recurse.mk:32: recipe for target 'artifact' failed

:chmanchester Do you have any updates?
Flags: needinfo?(cmanchester)
Whiteboard: [stockwell infra] → [stockwell infra], [stockwell needswork
I still don't quite understand why random changesets on autoland end up being public, but given this only happens occasionally, and artifact builds on autoland only exist to test that we aren't breaking artifact builds with build system changes landed on autoland, I don't know if fixing this is a big priority from the perspective of artifact builds at this point.

As far as this bug shows up in intermittent triage and people spend time coming back to it, that is a bigger concern, and as far as I can tell something is going wrong here. These jobs are Tier 2, which according to the visibility policy (at https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy ) means these jobs are "not sheriff-managed", and their results are shown "for information only". In practice it seems our tools and processes are treating these much as they would a Tier 1 failure. We don't usually have jobs in Tier 2 for the long term, but I've found these jobs useful for catching regressions in the ~2 years since standing them up even though they occasionally fail for a while due to a quirk of how they interact with version control in automation. 

To cut down on headaches we can just get this fixed, but unfortunately for the easy fixes mentioned in comment 14 expanding the range of pushes we look at isn't going to work without a significant refactor of the artifact build code, and while broadening our search for artifacts to autoland would easily work, it would make everyone's artifact builds marginally slower, and wouldn't serve any purpose I can see other than working around this issue.
Flags: needinfo?(cmanchester)
Product: Core → Firefox Build System
> while broadening our search for artifacts to autoland would easily work, it would make everyone's artifact builds marginally slower, and wouldn't serve any purpose I can see other than working around this issue.

What if we searched for artifact in autoland, after we've searched the other places?
Then it would only make artifact builds slower when no artifacts are found.
I'm not too worried about these actual failures (comment 27), but if we're going to keep spending time attempting to track them down...
Assignee: nobody → cmanchester
Flags: needinfo?(cmanchester)
Attachment #8963268 - Flags: review?(core-build-config-reviews) → review?(nalexander)
Comment on attachment 8963268 [details]
Bug 1382982 - Accept artifacts from autoland when performing an artifact build in automation.

https://reviewboard.mozilla.org/r/232160/#review237642

::: commit-message-ddf01:1
(Diff revision 1)
> +Bug 1382982 - Accept artifacts from autoland when performing an artifact build in automation.

Can you explain why you're not changing `CANDIDATE_TREES` directly?  I see that it is also used in https://searchfox.org/mozilla-central/source/python/mozbuild/mozbuild/artifacts.py#1148 and it's not clear to me why we wouldn't include integration/autoland as a source for HG to pull from.

::: python/mozbuild/mozbuild/artifacts.py:934
(Diff revision 1)
>          """
>  
>          with self._pushhead_cache as pushhead_cache:
>              found_pushids = {}
> -            for tree in CANDIDATE_TREES:
> +            search_trees = list(CANDIDATE_TREES)
> +            if os.environ.get('MOZ_AUTOMATION'):

Can we get a comment here explaining that we could have a lot of pushes to autoland in between merges to one of the "blessed" branches?
Attachment #8963268 - Flags: review?(nalexander) → review+
Comment on attachment 8963268 [details]
Bug 1382982 - Accept artifacts from autoland when performing an artifact build in automation.

https://reviewboard.mozilla.org/r/232160/#review237642

> Can you explain why you're not changing `CANDIDATE_TREES` directly?  I see that it is also used in https://searchfox.org/mozilla-central/source/python/mozbuild/mozbuild/artifacts.py#1148 and it's not clear to me why we wouldn't include integration/autoland as a source for HG to pull from.

We use phase to determine whether a changeset is likely to be known to automation, but since autoland isn't a publishing repository artifact builds based on autoland isn't going to work as expected in general. This isn't too much of a concern because people are generally instructed not to work locally based on autoland, but comes up in this bug because some changesets on autoland become public for unknown reasons (some related to backouts, not all). So adding autoland to CANDIDATE_TREES would slow everyone's artifact builds down, but only serve the purpose of working around this issue, which only really occurs in automation.

> Can we get a comment here explaining that we could have a lot of pushes to autoland in between merges to one of the "blessed" branches?

The number of pushes isn't the problem here, but I'll add a comment...
Pushed by cmanchester@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/79d9d02670f5
Accept artifacts from autoland when performing an artifact build in automation. r=nalexander
Whiteboard: , [stockwell needswork[stockwell unknown] → [stockwell unknown]
https://hg.mozilla.org/mozilla-central/rev/79d9d02670f5
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: