Closed Bug 1461919 Opened 7 years ago Closed 2 years ago

[tracking] improve reruns in release automation

Categories

(Release Engineering :: Release Automation, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mtabara, Unassigned)

References

(Depends on 4 open bugs)

Details

(Whiteboard: [releaseduty])

Am back in releaseduty cycle 61 and I couldn't help notice how flaky and fragile the reruns are. Lots of them need manual reruns; however some of them do have it. We've already filed a bunch of similar bugs in the past few weeks, let's track the whole discussion of these here and bump this priority. We need to automatize the reruns as much as possible.
Kind of worrying to see the following (a jamun-based staging release task) referenced in the "balrog-my-linux64-nightly/opt" job's currently running in "balrogworker-3": 2018-05-16T09:53:57 DEBUG - Getting source url for balrog:beetmover:signing:partials:docker-image:parent Z2xxHbjbR5a64PcnVvfW-Q... 2018-05-16T09:53:57 INFO - balrog:beetmover:signing:partials:docker-image:parent Z2xxHbjbR5a64PcnVvfW-Q: found https://hg.mozilla.org/projects/jamun/raw-file/3544f46fd55df80340b38f96d793e739c93b99a2/.taskcluster.yml 2018-05-16T09:53:57 DEBUG - task_ids: {'default': 'Z2xxHbjbR5a64PcnVvfW-Q', 'decision': 'Z2xxHbjbR5a64PcnVvfW-Q'} 2018-05-16T09:53:57 INFO - Pushlog url https://hg.mozilla.org/projects/jamun/json-pushes?changeset=3544f46fd55df80340b38f96d793e739c93b99a2&tipsonly=1&version=2&full=1 2018-05-16T09:53:57 INFO - Downloading https://hg.mozilla.org/projects/jamun/json-pushes?changeset=3544f46fd55df80340b38f96d793e739c93b99a2&tipsonly=1&version=2&full=1 2018-05-16T09:53:57 INFO - Done 2018-05-16T09:53:57 WARNING - Pushlog error: expected a single push at https://hg.mozilla.org/projects/jamun/json-pushes?changeset=3544f46fd55df80340b38f96d793e739c93b99a2&tipsonly=1&version=2&full=1 but got {}! 2018-05-16T09:53:57 CRITICAL - Fatal exception Traceback (most recent call last): File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/worker.py", line 124, in main loop.run_until_complete(async_main(context)) File "/tools/python36/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete return future.result() File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/worker.py", line 99, in async_main await run_tasks(context) File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/worker.py", line 64, in run_tasks await verify_chain_of_trust(chain) File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/cot/verify.py", line 1843, in verify_chain_of_trust task_count = await verify_task_types(chain) File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/cot/verify.py", line 1624, in verify_task_types await valid_task_types[task_type](chain, obj) File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/cot/verify.py", line 1393, in verify_parent_task await verify_parent_task_definition(chain, link) File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/cot/verify.py", line 1298, in verify_parent_task_definition chain, parent_link, decision_link, tasks_for File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/cot/verify.py", line 1224, in populate_jsone_context await _get_additional_hgpush_jsone_context(parent_link, decision_link) File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/cot/verify.py", line 1138, in _get_additional_hgpush_jsone_context pushlog_id = list(pushlog_info['pushes'].keys())[0]
Depends on: 1450075
Depends on: 1456293
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #1) > Kind of worrying to see the following (a jamun-based staging release task) > referenced in the "balrog-my-linux64-nightly/opt" job's currently running in > "balrogworker-3": I'm guessing we first landed a docker-image change on jamun? We'll keep using that image until the hash of the various files for that docker image change.
(In reply to Aki Sasaki [:aki] from comment #2) > (In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #1) > > Kind of worrying to see the following (a jamun-based staging release task) > > referenced in the "balrog-my-linux64-nightly/opt" job's currently running in > > "balrogworker-3": > > I'm guessing we first landed a docker-image change on jamun? We'll keep > using that image until the hash of the various files for that docker image > change. This sounds plausable, I'll try to dive in a bit when I tackle this.
To chase the low-hanging fruits we'd need to improve the exitCodes in the graph, possibly rope in ciduty for those one-liners. For the others, we'd need to take the things step by step.
61.0b6 reruns: J-3x3FJ0Tfar5hx8CKNLrA Jo_-PXtwS32sxXC1GDAkkQ
Depends on: 1465639
We agreed that this is something that ciduty could help with if it happens again. Might be a good starting point for them in the release overview process.
repackage-l10n-mai-win32-nightly bUByc-Y_TiCsGG7HYf6M6w - failed to checkout needed rerun
Depends on: 1421530
Depends on: 1484924
Depends on: 1318173
Depends on: 1371378
Depends on: 1499265
Depends on: 1500264
Depends on: 1501519
Depends on: 1502122
Depends on: 1502269
See Also: → 1461895
Depends on: 1504353
Depends on: 1768026, 1782607
Severity: normal → S3

Most dependencies are fixed, no need to keep this old tracker open.

Status: NEW → RESOLVED
Type: defect → task
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.