Release sanity should check for all platforms to be completed before grabbing l10_config in release runner

RESOLVED INVALID

Status

Release Engineering
Release Automation
RESOLVED INVALID
2 years ago
2 years ago

People

(Reporter: mtabara, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
While preparing this build for promotion, https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=191f5eb4cbd72590277296cdb90d355adb347d45 we have encountered:

174080   File "release-runner.py", line 588, in main
174081     "l10n_config": get_l10n_config(release, branchConfig, branch, l10n_changesets, index),
174082   File "release-runner.py", line 266, in get_l10n_config
174083     platform=platform,
174084   File "/builds/releaserunner/lib/python2.7/site-packages/taskcluster/client.py", line 455, in apiCall
174085     return self._makeApiCall(e, *args, **kwargs)
174086   File "/builds/releaserunner/lib/python2.7/site-packages/taskcluster/client.py", line 232, in _makeApiCall
174087     return self._makeHttpRequest(entry['method'], route, payload)
174088   File "/builds/releaserunner/lib/python2.7/site-packages/taskcluster/client.py", line 424, in _makeHttpRequest
174089     superExc=rerr
174090 TaskclusterRestFailure: Indexed task not found

Release runner failed in grabbing its l10n-configs. In order for it to do that, it looks up the artifacts in TC index, specifically the ones under l10n_release_platforms - http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla/config.py#l2742. Since the build that's to be promoted was not completed, release runner failed to grab the tasks for the aforementioned platforms. 

We should sanity check or make sure those are completed prior to looking up for the l10n configs.
(Reporter)

Comment 1

2 years ago
Note to self: out of the conversations we've had today, this might not be necessarily an issue. The release runner failed fast in the kwargs config preparation. The release sanity check starts after that so this is considered pre-sanity-sort-of-thing.

On the other hand, what we could improve is the error wrapping into something that makes more sense in the logs.
Catlee also suggested that eventually, we might even want to have the release started before all the builds in treeherder are done, in which case this is no longer an issue.
 
More info to come here next week.
(Reporter)

Comment 2

2 years ago
We talked about this in the relpro meeting today and we decided to leave it for now as more release sanity will be turned on soon - we will deal with it then. We'll have an update on this whenever that will happen, in the next days/week.
(Reporter)

Comment 3

2 years ago
Note to self:
* most likely that will include nicer error-handling in this scenario
* for the long term shot we shouldn't even have to deal with this as the scheduling in taskcluster is to cover the dependency thing
(Reporter)

Comment 4

2 years ago
Not currently working on this, deferring to avoid blocking.
Assignee: mtabara → nobody
(Reporter)

Comment 5

2 years ago
This can be ignored for now, the idea is to not proceed with automation if something is unexpected. 
I'll close the bug, no need to have it lurk in bug triage.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.