Closed Bug 1489405 Opened Last year Closed 7 months ago

improvements for bouncer locations job that runs on nightly

Categories

(Release Engineering :: Release Automation: Bouncer, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mtabara, Assigned: tomprince)

References

Details

(Keywords: leave-open, Whiteboard: [releaseduty])

Attachments

(4 files, 1 obsolete file)

In bug 1432656 we added support to automate the bouncer nightly locations instead of manual steps as part of mergeduty. However, there are still some tiny bits and polishing that needs to be addressed. That is:

* change the TH to fit it under somewhere else, it's confusing to see the
task under "Firefox Release Tasks"
* potentially change the Tier-2 to Tier-1 once this is fixed and working
* fix the proper dependency in post-beetmover-dummy task to chain it at the end of the graph

Given bug 1432656 comment 26, we need to fix the latter before we perform mergeduty - day II on the 23 October 2018.
Whiteboard: [releaseduty]
> * change the TH to fit it under somewhere else, it's confusing to see the
task under "Firefox Release Tasks"

I think this is the correct place for it, given that it is non-platform specific tasks related to the nightly release.


We should also look at improving the scriptworker constraints, so that this action is the only one that m-c can run, while not limiting what actions level-1 jobs can do.
(In reply to Tom Prince [:tomprince] from comment #1)
> > * change the TH to fit it under somewhere else, it's confusing to see the
> task under "Firefox Release Tasks"
> 
> I think this is the correct place for it, given that it is non-platform
> specific tasks related to the nightly release.
At second thoughts, you might be right. However, in TreeHerder if you search for nightly, the task won't show up in the graph. So it's more difficult to trace unless paying attention to the Tier-* tasks. 
 
> We should also look at improving the scriptworker constraints, so that this
> action is the only one that m-c can run, while not limiting what actions
> level-1 jobs can do.

+1

Note to self: See above comments too when addressing these.
User Story: (updated)
This slipped my radar during this release cycle so I don't have the correct solutions implemented yet.

Brainstormed with Aki the other week and solutions seem to be as follow:
Correct solution long-term is to block this task against post-beetmover-dummy. However, since we don't have that on central nightlies, here's a few temporary solutions:

1. With shippable builds, we'll get post-dummy-beetmover for free in central nightlies too, so we can chain it then to that. Before then, we can define a specific post-dummy-beetmover set of tasks for central only.
2. Have a human breakpoint landed on central before the bump happens and revert the code afterwards (in order to avoid the need for any of us to resolve that task via TC client twice a day for the remaining of the release cycle).
3. Make sure we fail the task as soon as it's scheduled to give beetmover enough time to actually send the files and then rerun it to correctly update the version.

I'm going to go with a 2 for now as it's the safest. 1) is still blocked on shippable-builds while 3 is error-prone as sheriffs may rerun the task unless coordinated otherwise.
Status: NEW → ASSIGNED
Priority: P2 → P1
Dumping here some thoughts from IRC:

13:40:10 <jlorenzo> mtabara: what if we make https://tools.taskcluster.net/groups/XMi8OU8hRPqhybCggyyUNA/tasks/e5nOmw0lQY6gUWGZBR0RIw/details depend on the build-signing tasks? It will reduce the time of 404 and avoid a human task 
13:40:18 <jlorenzo> what do you think? 
13:41:54 <mtabara> that's a good compromise. however, I think we hit the same issue. we update the l10n bouncer entries too, which means we need all the locales artifacts beetmove-d too
13:42:17 <mtabara> which means we'll need build-signing + nightly-l10n-signing which for which we need the "post-dummmy-beetmover" look-like task 
13:45:52 <jlorenzo> or what if we changed the number of chunks of nightly-l10n-signing to have exaclty 100 tasks. We're 5 tasks away from this limit 
13:46:09 <jlorenzo> meaning: 
13:47:09 <jlorenzo> 1. nightly-l10n-signing gets 20 tasks per platform
13:47:09 <jlorenzo> 2. bouncer-locations-firefox gets split into 2: bouncer-locations-firefox-en-US bouncer-locations-firefox-l10n
Comment on attachment 9019010 [details] [diff] [review]
[in-tree] temporarily add a bouncer-locations-breakpoint in the nightly graph

Review of attachment 9019010 [details] [diff] [review]:
-----------------------------------------------------------------

Okay to get this patch landed before we bump nightly to 64. However, I think there's a solution that doesn't require releaseduty to perform any human action. Until we get shippable-builds, I suggest to bump the number of locales to 6[1]. This way, the number of l10n jobs is down to 17-18 per platform. Given that we have 5 platforms, we don't reach the 100-dependency-limit. This way, we can let bouncer-locations depend on both build-signing and nightly-l10n-signing and then have bouncer-location run not too early. 

[1] https://searchfox.org/mozilla-central/rev/0ec9f5972ef3e4a479720630ad7285a03776cdfc/taskcluster/ci/nightly-l10n/kind.yml#44
Attachment #9019010 - Flags: review?(jlorenzo) → review+
Note to self: Johan's idea could work well as a mid-term solution until we have shippable builds.  I gave it a try here[1] with Johan adding: "yeah, that looks like it! I'd put a more precise comment on why we put 6 instead of another number."

Sadly while doing the taskgraph-diff thing, the graph is changed more than it should be. Since we're under time pressure, we might go for the breakpoint for now but revert + fix this current approach to land it until we have shippable builds. We're also making good progres on declarative artifacts so hopefully we can start testing this week with nightlies on birch, so we could give this approach a go there too.

[1]: https://gist.github.com/MihaiTabara/c1a2294d1f69f5c3814a2f75dae29136
Note to self: more contest for previous comment:

14:51:37 <mtabara> I think there's something wrong I'm doing about the deps appending
14:51:37 <mtabara> the other tasks using that in the release graphs use the release_deps transform which is slightly more complex than this
14:51:48 <mtabara> but relies on shipping* attributes which are undefined in nightly graphs
14:52:56 <mtabara> it's not only the build dependencies that are wrong, there's also some devedition jobs being added and more signing than it should.
14:56:13 <jlorenzo> okay
14:58:27 <mtabara> unrelated, maybe there's worth investigating if instead of signing we could chain repackage-signing instead?
14:59:10 <mtabara> even though the dmg/exe/target.bz are signed in the "build-signing" / "nightly-l10n-signing" jobs, the beetmover counterpart won't actually be unblocked until repackage + repackage-signing happens
14:59:38 <mtabara> so we do save a lot of the 404 time (build + build signing) but we won't have any artifacts in S3 until repackage + repackage signing happens
14:59:51 <mtabara> still way better than currently, but worth considering
Pushed by mtabara@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1f47a5207daa
temp add bouncer-locations-breakpoint in nightly graphs.r=jlorenzo
(In reply to Pulsebot from comment #10)
> Pushed by mtabara@mozilla.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/1f47a5207daa
> temp add bouncer-locations-breakpoint in nightly graphs.r=jlorenzo

Landed to inbound to be grabbed by upcoming merge to central + before the 10pm nightlies tonight, after the bump to 65.
Callek mentioned we should also hold on from landing the chunkified stuff yet as we might benefit a lot from multidep work which might simplify our lives.

Plan for this week is:
* see https://hg.mozilla.org/integration/mozilla-inbound/rev/1f47a5207daa on central
* see the central successfully bumped to 65
* wait for nightlies to be triggered to confirm the location-bouncer-breakpoint worked
* resolve the task only when the task is green
* once that happens, revert this patch first on inbound + then on central
* touch base in 2-3 weeks to see if multi-dep is landed and how to approach this
Pushed by mtabara@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4688a861d512
Backed out changeset 1f47a5207daa r=jlorenzo
Attachment #9019011 - Flags: feedback?(jlorenzo)
Will leave this baked for another two weeks. I'm waiting on the multi-dep work to see if we can redo some of the logic with those upcoming goodies. If not, I'll go with Johan's aforementioned idea.
Component: Release Automation: Other → Release Automation: Bouncer
QA Contact: sfraser
Pushed by mtabara@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/27fd8e123b64
temp add bouncer-locations-breakpoint in nightly graphs.r=jlorenzo
Similar plan like last time to prevent 66 bump to pend with no builds on the firefox official page. 
I landed https://hg.mozilla.org/integration/mozilla-inbound/rev/27fd8e123b64 on inbound, hopefully it makes it in timp for central before we bump it.
Duplicate of this bug: 1513100

I needed to push some fixes for bouncer-locations job this week and I remembered of this bug. I'm thinking to solve this since yet-another mergeduty is coming.

@aki: I'm sort of out of context with the most recent multi-dep changes that took place in-tree taskgraph. Feel free to 302 as well.

tl;dr - currently the bouncer-locations job runs at the beginning of the nightly graph (has no deps). One of the side effects of this is that after each mergeduty, when we bump the version, is that for 2-3h until all (enUs + locales) builds/platforms finish, the bouncer pages on the Mozilla website point to 404 urls. (well, technically that has never happened because we've always used various hacks to prevent that, but it's time to properly fix this).

IIUC the long-term solutions is shippable-builds when tasks such as release-generate-checksums and alike are being ported in the nightly graphs. But until that happens I need something like 'post-beetmover-dummy' but for nightly desktop graph as well so that bouncer-locations job awaits until all artifacts are in place before enabling URLs in bouncer.

Question: Is there an easier way to do this with multi-dep stuff that landed in the past weeks/months or should I consider duplicating the post-beetmover-dummy tasks into something like post-beetmover-dummy-nightly tasks and ensure they are not being filtered out when we built the nightly desktop graph?

Flags: needinfo?(aki)
User Story: (updated)

Tom Prince gave this an initial attempt in https://phabricator.services.mozilla.com/D16580.
I'm to play with that and run it against taskgraph-diff to ensure we're not disruptive with the graphs.
Thanks again Tom!

Flags: needinfo?(aki)
Comment on attachment 9019010 [details] [diff] [review]
[in-tree] temporarily add a bouncer-locations-breakpoint in the nightly graph

I had to rebase this patch: https://phabricator.services.mozilla.com/D17824
Attachment #9019010 - Attachment is obsolete: true
Pushed by jlorenzo@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/d5549e46baed
temporarily add a bouncer-locations-breakpoint in the nightly graph r=Callek a=Aryx

Callek, would you mind resolving the human task after all beetmover jobs are done in next nightly? I won't be around at 23pm UTC.

Flags: needinfo?(bugspam.Callek)

Resolved D2SBiegZSFOHSNaAZXCpPw

Flags: needinfo?(bugspam.Callek)
Pushed by mozilla@hocat.ca:
https://hg.mozilla.org/integration/autoland/rev/2de6f0429d77
Run bouncer-locations job after beetmover has run; r=mtabara

With Tom's https://phabricator.services.mozilla.com/D16580 landed on autoland, let's see how next set of nightlies behave once it merges to central. Thanks again Tom!

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #32)

https://hg.mozilla.org/mozilla-central/rev/2de6f0429d77

This is live and working well in the nigthly graphs: https://tools.taskcluster.net/groups/flqno_C7QSC0Rnqn9E_34Q/tasks/fq4AxTqrR0O-Lw3jFWPCPw/details

One less thing to worry about in next mergeduty.
We can finally close this.

Switching assignee to Tom since he's been doing the heavylifting in porting the post-beetmover-dummy within the nightly graph.

Assignee: mtabara → mozilla
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED

Changes to the desktop_nightly filter chain selected only tasks with no
shipping_product attribute set or tasks where it is set to "firefox".

This led to the decision task optimizer removing a large portion of the tasks
that run as part of Thunderbird's nightly builds.

This update adds "thunderbird" to the list of shipping_product's whose tasks
are kept in the task graph.

Pushed by thunderbird@calypsoblue.org:
https://hg.mozilla.org/integration/autoland/rev/1aadea60432f
Followup: Allow for thunderbird as shipping_product. r=tomprince
You need to log in before you can comment on or make changes to this bug.