automate nightly latest bouncer aliases

RESOLVED FIXED

Status

enhancement
P2
normal
RESOLVED FIXED
2 years ago
6 months ago

People

(Reporter: jlund, Assigned: lguo)

Tracking

(Blocks 1 bug)

unspecified
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 fixed)

Details

(Whiteboard: [releaseduty])

Attachments

(5 attachments, 3 obsolete attachments)

as part of automating merge steps effort
Blocks: 1432518
I'll add the "[releaseduty]" tag to keep it under the radar.
Whiteboard: [releaseduty]
Priority: -- → P2
As part of mergeduty, one of the tasks is to manually[1] update bouncer entries, once the newly generated nightlies (following up the version bump) have successfully built. As this is done manually and after mergeduty is done, it is easy to skip/miss by accident. It's also eating time so automation++

What we want to do in this bug is:
a) automate the nightly latest bouncer aliases and make sure this happens with each nightly graph. For 99% of the cases, if no version change happened, it'll simply barf and no-op. But when a version change is detected, it can updated the bouncer aliases accordingly.

b) make sure we no longer update the stub installers ones. More details in bug 1464026 comment 9.

[1]: https://github.com/mozilla-releng/releasewarrior-2.0/blob/master/docs/mergeduty/howto.md#bump-bouncer-versions
See Also: → 1464026
Mihai and I decided to add another object in the task payload for Bouncer, as follows:

'payload': {
    'bouncer_location': {
        'version': 65,
        'products': ['firefox-nightly-latest', 'firefox-nightly-ssl', 'firefox-nightly-latest-l10n', 'firefox-nightly-latest-l10n-ssl']
        
    }
}

We're working on changing the transforms and bouncerscript now.
To add a bit more context on this, for posterity.

Initially, we thought these were simply aliases, hence easily updated via existing behavior in bouncerscript[1] in a nightly-run job. However, at second glance during last Friday's sprint with Lisa, we realized those are actually products with locations per platform in bouncer. My guess is that someone, at the beginnings of bouncer's existence, added those entries as such, in order to solve the nightly locations. The normal releases are being bumped on bi-weekly basis (betas) and cycle basis (every six weeks for releases) and therefore serving them is easier via a fixed bouncer alias. 

For nightlies, this doesn't apply as we only bump the version every six weeks but we ship twice a day. Hence, aliases won't work here since all nightlies have the same version "X.0a1". To solve this, a handful of "products" (that look more like an alias in namings) were added, each with paths per platform. We update those at the end of mergeduty[2].

In order to automate and fix this, Lisa and I have taken the following approach:
a) add a task that runs in the nightly graph (twice a day) and schedules a task to talk to bouncer "locations"
* it takes the [2] products and current version in the tree
b) the "locations" job:
* for each of the "products", query that information from bouncer
* validate paths to make sure they validate properly
* if version from tree differs from version returned in bouncer, update bouncer via `location_modify`
* else: no-op, exit successfully.

Codewise, this means:
* add an in-tree task in the nightly graph that can talk to bouncer
* add a new behavior in bouncerscript to enhance this communication

[1]: https://github.com/mozilla-releng/bouncerscript/blob/master/bouncerscript/script.py#L64
[2]: https://github.com/mozilla-releng/releasewarrior-2.0/blob/master/docs/mergeduty/howto.md#bump-bouncer-versions
To reiterate, on Friday, Lisa and I have sprinted on this bug. Our work efforts have led to:
* identifying the problem and solutions
* implementing the in-tree side of comment 5

The in-tree task is currently landed here[1] on maple. I still need to find a way to test that. I'm thinking to fake https://tools.taskcluster.net/hooks/project-releng/nightly-fennec%2Fmaple and create a slightly modified version that runs a desktop nightly version so that I can generate maple nightly via hooks. Alternatively, I'll setup a project branch for nightly development. 

The in-tree task model we've copied from is the bouncer-submission one, e.g. here[2].

Leftovers now is to amend the behavior in bouncerscript to make it work for product locations as well.
a) new behavior to handle the "locations" mechanism[3]
b) implement logic to check and update those locations [4-7]

Also we'd need to add scopes to handle this new behavior. 
c) That means adding "project:releng:bouncer:action:locations" to the prod/staging clients. More details here[8].

[1]: https://hg.mozilla.org/projects/maple/rev/48ea9d81ed77b56edfcf4fd2d127fb9cd34ccbd2
[2]: https://tools.taskcluster.net/groups/WD8eqRHGQaGKdHiwWC5gew/tasks/XPXccbEqTjSf2rlQ7M5YuA/details
[3]: https://github.com/mozilla-releng/bouncerscript/blob/master/bouncerscript/script.py#L83
[4]: https://bounceradmin.mozilla.com/admin/mirror/location/?product__id__exact=2005
[5]: https://bounceradmin.mozilla.com/admin/mirror/location/?product__id__exact=6508
[6]: https://bounceradmin.mozilla.com/admin/mirror/location/?product__id__exact=6506
[7]: https://bounceradmin.mozilla.com/admin/mirror/location/?product__id__exact=6507
[8]: https://bugzilla.mozilla.org/show_bug.cgi?id=1466627#c18
Assignee: nobody → lguo
Status: NEW → ASSIGNED
Lisa did the heavylifting here, hence assigning this to her. I'm merely documenting to keep things up-to-date. I pushed on her behalf as she doesn't have L3. Depending on her timeframe after her presentation next week and EOI (End-of-internship), she might  sprint again to finish the bouncerscript side documented above. If not, she'll handoff to me and I'll add it to my backlog.
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #6)
> Also we'd need to add scopes to handle this new behavior. 
> c) That means adding "project:releng:bouncer:action:locations" to the
> prod/staging clients. More details here[8].
> [8]: https://bugzilla.mozilla.org/show_bug.cgi?id=1466627#c18

Fixed this, it's now done. Maple decision task was broken because of missing scopes.
Note to self: there's an issue with the in-tree task as it runs on-push, instead of just the nightly graphs. See this[1]. Can ignore for now as it barfs anyway until we add support in bouncerscript, but this should not be generated on-push as part of CI, but only in the nightly graphs. Or maybe it does since CI = nightly on maple? I need to double check in #mozbuild.

[1]: https://tools.taskcluster.net/groups/bVnVr09gR2iWy79E-TfuMQ/tasks/cCAo1SztQrG2k38WTqCwAA/details
Maple runs "nightly" graphs on push, as does beta.
We may want to test on oak, if the oak owners are ok with it, since oak is configured more like m-c.
I don't think maple and beta run the "nightly" graphs. They run graphs with target task `mozilla_beta_tasks` where as nightly runs with `nightly_desktop` and `nightly_fennec`.

The `mozilla_beta_tasks` does include nightly builds, but doesn't include L10n or any of the release process. It also includes debug builds. The `nigthly_*` tasks include nightly builds and the release process for them.
13:44:52 <aki> we probably want to make sure (via taskgraph-diff or the like?) that this change doesn't break something in the beta/release graphs
13:45:05 <aki> but oak should help test the nightly graph case
13:50:08 <tomprince> I don't think maple runs a nightly graph on-push.
13:50:28 <tomprince> It does run nightly builds, but that is from the `mozilla_beta_tasks` graph.
13:50:58 <aki> right
13:51:28 <aki> but it does run is_nightly tasks, which is probably where the issue comes from
13:51:34 <aki> combined with not parsing mozilla_beta_tasks
13:51:46 <mtabara> isn't that the same? as in a nightly graph trimmed to  just build + signed (with dep) but absent everything afterwards (beetmover, balrog, etc)
13:52:22 <aki> it's the same, but "nightly graph" may mean "full nightly release graph including shipping to the nightly channel", so tom's description is more accurate
13:53:28 <mtabara> ok, makes sense
13:55:11 <aki> however, it doesn't change things -- you still want to test on oak
14:03:01 <mtabara> if I get approval to land there :) 
14:03:11 <mtabara> otherwise, we'll have to set our own project branch I suppose
14:04:58 <aki> we can also switch maple or birch or something if other people aren't actively testing
14:05:37 <mtabara> I think changing maple is not a good idea, I've seen people testing there various things in the past month and we still have upcoming stuff
14:05:54 <mtabara> but I'd have no mercy for birch :) 
14:06:09 <aki> taskgraph/decision.py target_tasks_method i think
14:10:43 <tomprince> I wonder if the nightly graph would run on try? Probably.
14:11:05 <aki> i think so. bouncerscript no
14:11:12 <aki> not til try-staging phase 2
14:11:16 <tomprince> Although, bouncer doesn't currently work on try, since only the TB-dev workers have staging-only credentials.
14:11:17 <aki> unless we relax some rules
14:11:29 <aki> right, so testing there would be either useless or introduce hairier yaks
14:11:52 <tomprince> It wouldn't be too hard to create some new credentials, and switch bouncer-dev over to that.
14:12:27 <aki> calendar has you on pto :)
14:13:12 <tomprince> Yeah. I'm working ~1h a day or so. Doing reviews and stuff.
14:14:05 <aki> might be worth a shot. i'll leave it to m.tabara and l.isa whether they prefer getting try-staging bouncer or project branch bouncer working
14:15:13 <mtabara> mind if I paste these irc history in the bug or is it sec-sensitive and I should sum-it up differently? 
14:15:30 <aki> i'm good with it
14:15:40 <mtabara> just for context later on in the week, I like to "dump thoughts" for posterity.
14:17:54 <aki> we'd likely need to remove this line or use a dev scope + dev pool if we want to use try https://github.com/mozilla-releng/scriptworker/blob/master/scriptworker/constants.py#L228
Note to self: taskgraph-diff this before landing.
I'm piggy-backing on Lisa's great work[1]. More info in the PR.

[1]: https://github.com/mozilla-releng/bouncerscript/pull/32
Attachment #8995484 - Attachment description: PR → [bouncerscript] automate nightly latest bouncer products
Attachment #8995484 - Flags: review?(jlorenzo)
In'tree patches on maple look like this:
* https://hg.mozilla.org/projects/maple/rev/48ea9d81ed77
* https://hg.mozilla.org/projects/maple/rev/53684ea92a91

Once this validates on staging releases good enough, I'll add them up for r?
For the moment, dropping this here for reference. Will update once it's fully ready and taskgraph-diff confirmed.
Attachment #8995484 - Flags: review?(jlorenzo) → review+
Attachment #9001261 - Flags: review?(jlorenzo) → review+
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #18)
> In'tree patches on maple look like this:
> * https://hg.mozilla.org/projects/maple/rev/48ea9d81ed77
> * https://hg.mozilla.org/projects/maple/rev/53684ea92a91
> 
> Once this validates on staging releases good enough, I'll add them up for r?

+ https://hg.mozilla.org/projects/maple/rev/b1d7d3b9bff6
Attachment #9002758 - Flags: review?(jlorenzo)
Throwing this to review once again for I've added:
* mozilla-version as dep of bouncerscript
* bouncerscript version bump
Attachment #9001261 - Attachment is obsolete: true
Attachment #9002767 - Flags: review?(jlorenzo)
@jlorenzo:

While discussing with you today I realized we have a problem with deploying this. Right now - as the in-tree patches look like - bouncer locations job is among the first ones to run (since it has no dependency in the nightly graph). That means that for a short window (1-2h), assuming the job goes green (the one after we bumped the in-tree version), the bouncer entries will point to the new version but their corresponding values won't be in S3 yet. 

That means that for 1-2h, the official website at Mozilla will point to inexistent URLs, returning 404s. 

The official download page[1]points to
* `firefox-nightly-latest-l10n-ssl` (e.g. this[2]) for localized installations 
* `firefox-nightly-latest-ssl` for  en-US e.g. this[3]

So does the all possible products page[4] points to:
* `firefox-nightly-latest-l10n-ssl` for localized Firefox e.g. this[5]
* `firefox-nightly-latest-ssl` for en-US e.g. this[6]

All these will return 404 until beetmover will backfill these up in the S3 buckets.

That means we'd need to chain all the beetmover-repackage jobs to a post-beetmover-dummy task, like we do in promotion phases of relpro.

Questions is: Not sure if it's doable in-tree, what do you think, can we reuse the existing post-beetmover-dummy one or should we define a separate one solely for nightly? 

[1]: https://www.mozilla.org/en-GB/firefox/channel/desktop/
[2]: https://download.mozilla.org/?product=firefox-nightly-latest-l10n-ssl&os=osx&lang=en-GB
[3]: https://download.mozilla.org/?product=firefox-nightly-latest-l10n-ssl&os=osx&lang=en-GB
[4]: https://www.mozilla.org/en-GB/firefox/nightly/all/
[5]: https://download.mozilla.org/?product=firefox-nightly-latest-l10n-ssl&os=linux&lang=zh-CN
[6]: https://download.mozilla.org/?product=firefox-nightly-latest-ssl&os=osx&lang=en-US
Flags: needinfo?(jlorenzo)
Instructions to land this whilst I'm in PTO, if situation asks for it:
* merge the bouncerscript PR
* bump the version to 4.0.0 and put it under python packags-3x
* land the puppet patch
* land in-tree patches (ideally before the mergeduty part 2 that happens on the 4th of September)
* once we're confirmed all is good, remove the releasewarrior part.
Attachment #9002769 - Flags: review?(jlorenzo) → review+
Attachment #9002767 - Flags: review?(jlorenzo) → review+
Comment on attachment 9002758 [details] [diff] [review]
[in-tree] add nightly bouncer products task in the nightly graph

Review of attachment 9002758 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, modulo the tiny nit!

::: taskcluster/ci/bouncer-locations/kind.yml
@@ +35,5 @@
> +
> +jobs:
> +   firefox:
> +      bouncer-products:
> +         by-project:

Nit: by-project is not needed as we provide the same dataset in both cases.
Attachment #9002758 - Flags: review?(jlorenzo) → review+
(In reply to Mihai Tabara [:mtabara]⌚️GMT - on PTO - back on 3rd Sept from comment #26)
> Questions is: Not sure if it's doable in-tree, what do you think, can we
> reuse the existing post-beetmover-dummy one or should we define a separate
> one solely for nightly? 

post-beetmover-dummy looks like the perfect dependency to me. I'm not sure if there's something specific to the nightly graph, but it's worth a try, I think!
Flags: needinfo?(jlorenzo)
Given comment 26 and the risk of having 404 for locales on the download.mozilla.org page, we decided to not landing this just yet.

Plan is:
* land all stuff non-in-tree related
* wait until we’ve bumped central + nightlies available
* address Johan's comment 28
* land central patch and see what’s going on with the bouncer entries
* check the no-op afterwards in a follow-up run or wait for next set of nightlies
* fix it properly during the next 6 weeks by chaining the post-beetmover-dummy task
Comment on attachment 9002767 [details] [diff] [review]
[puppet] bump version, add mozilla-version, add action in script_config.json

Throwing this again at review, as this time it's a proper PR, not a splinter diff :)

Once this is landed, changes from bug 1486747 will also be taken into account.
Attachment #9002767 - Flags: review+ → review?(jlorenzo)
Leftover(s): 
* land the puppet patch tomorrow (Tuesday)
* wait for mergeduty to be completed (Tuesday)
* wait for first set of 64 green nightlies (Tuesday/Wednesday)
* land in-tree changes to mozilla-central (Wednesday)
* confirm bump was successful (Wednesday)
* address in the next 6 weeks the real problem - which is chaining the task to a post-beetmover-dummy task in the nightly graph
Attachment #9002767 - Flags: review?(jlorenzo) → review+
Comment on attachment 9002767 [details] [diff] [review]
[puppet] bump version, add mozilla-version, add action in script_config.json

https://github.com/mozilla-releng/build-puppet/commit/ad809a14416e8706b194596045a9b38dd52253dd
Attachment #9002767 - Flags: checked-in+
Comment on attachment 9006461 [details] [review]
[releasewarrior] more bouncer cleanup in mergeduty docs

r+'ed by Johan in the PR.
https://github.com/mozilla-releng/releasewarrior-2.0/commit/7c8d5cb02eeaab77f04603e51fcf7b6d9322c905
Attachment #9006461 - Flags: review?(jlorenzo)
Attachment #9006461 - Flags: review+
Attachment #9006461 - Flags: checked-in+
Pushed by mtabara@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/b82e60a43397
add automation for nightly latest bouncer products. r=jlorenzo a=Aryx
Comment on attachment 9002758 [details] [diff] [review]
[in-tree] add nightly bouncer products task in the nightly graph

https://hg.mozilla.org/mozilla-central/rev/b82e60a43397c740d9f3f32837930cd200a3921a
Attachment #9002758 - Attachment description: [in-tree] add release-mark-as-started kind → [in-tree] add nightly bouncer products task in the nightly graph
Attachment #9002758 - Flags: checked-in+
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #34)
> Leftover(s): 
> * land the puppet patch tomorrow (Tuesday)
> * wait for mergeduty to be completed (Tuesday)
> * wait for first set of 64 green nightlies (Tuesday/Wednesday)
> * land in-tree changes to mozilla-central (Wednesday)
Done so far.

Leftovers ...
> * confirm bump was successful (Wednesday)
> * address in the next 6 weeks the real problem - which is chaining the task
> to a post-beetmover-dummy task in the nightly graph
Pushed by mtabara@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/26990836dc5c
follow-up linting fixes. r=me a=Aryx
Note to self:
* I need to plan better landing of these to not land directly to central, unless I really have to
* make sure to check linters before landing. either via try or local tools via `mach lint`
* make sure the scopes are fine (maple vs central, relpro vs nightlies, hooks / roles)
Thanks to :jlorenzo who unblocked me, I've added:
* project:releng:bouncer:action:locations
* project:releng:bouncer:server:production

under this[1]

[1]: https://tools.taskcluster.net/auth/scopes/project%3Areleng%3Abeetmover%3Abucket%3Anightly/role:project%3Areleng%3Anightly%3Alevel-3%3A*

Hopefully that's going to be reflected under the following as well:
* for cron job hook https://tools.taskcluster.net/hooks/project-releng/cron-task-mozilla-central 
> assume:repo:hg.mozilla.org/mozilla-central:cron:* (Do not edit!)
* for nightly desktop hook https://tools.taskcluster.net/hooks/project-releng/nightly-desktop%2Fmozilla-central
> assume:repo:hg.mozilla.org/mozilla-central:* (Do not edit!)
Scheduling worked for both cron/hooks.
However, we're not hitting another issue, a CoT error which makes sense.

"scriptworker.exceptions.CoTError: 'bouncer Hbag2iSCRoG2hqNBfld4SQ: repo /mozilla-central not allowlisted for scope project:releng:bouncer:server:production!'"

Full log is here[1]. It's somehow expected as pushing to bouncer prod is reserved solely to production release branches, of which mozilla-central is not one. We need to patch scriptworker to allow that as well, but most likely add it separately, rather than make central a production branch. I'll prep a PR for that.

Given that I'm out tomorrow and the downloads.mozilla.org page is still oferring 63, I think it's time to go with plan B and go to default values.

Action items for RelEng today, Wednesday 5 sept:
* manually bump the versions as we used to so far by using this[3]
* ignore BncLoc failing with CoT exceptions for now

Action items for the following days:
* change the TH to fit it under somewhere else, it's confusing to see the task under "Firefox Release Tasks"
* potentially change the Tier-2 to Tier-1 once this is fixed and working
* as said above, fix the scopes in scriptworker - something is not right - https://github.com/mozilla-releng/scriptworker/blob/master/scriptworker/constants.py#L227
* make sure the CI-admin scopes are correct It needs to be applied manually by some one with appropriate scopes, by running a command from ci-admin (the nones from previous comment)
* fix the proper dependency in post-beetmover-dummy task to chain it at the end of the graph

[1]: https://taskcluster-artifacts.net/Hbag2iSCRoG2hqNBfld4SQ/1/public/logs/chain_of_trust.log
[2]: https://github.com/mozilla-releng/scriptworker/blob/master/scriptworker/constants.py#L227
[3]: https://github.com/mozilla-releng/releasewarrior-2.0/pull/173/files#diff-e235d9d2d06299a092b678b115f40079L304
Tom fixed scriptworker[1] whilst I was gone and the follow-up reruns worked smoothly, yay!
This[2] was the first task to run successfully after the merge was done and patches landed on central. The following bouncer locations jobs[3][4], as part of nightly jobs, are no-op-ing correctly. The downloads page[4] at Mozilla offers correctly 64.0a1!

All bumped products look good!

[1]: https://github.com/mozilla-releng/scriptworker/pull/251
[2]: https://taskcluster-artifacts.net/PkGoIDiUREaDLsF88SwmOg/1/public/logs/live_backing.log
[3]: https://taskcluster-artifacts.net/NdNqFwdBQEKk4a-o-54mXQ/0/public/logs/live_backing.log
[4]: https://taskcluster-artifacts.net/YbT4ztzTQ8-7BEGnudAU8w/0/public/logs/live_backing.log
[5]: https://download.mozilla.org/?product=firefox-nightly-latest-ssl&os=osx&lang=en-US
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #45)
> Action items for RelEng today, Wednesday 5 sept:
> * manually bump the versions as we used to so far by using this[3]
> * ignore BncLoc failing with CoT exceptions for now

These were fixed by automation afterall, see previous comment!
 
> Action items for the following days:
> * change the TH to fit it under somewhere else, it's confusing to see the
> task under "Firefox Release Tasks"
> * potentially change the Tier-2 to Tier-1 once this is fixed and working
> * as said above, fix the scopes in scriptworker - something is not right -
> * fix the proper dependency in post-beetmover-dummy task to chain it at the
> end of the graph

Still TODO. 

This successfully worked and is now part of the nightly graph. Given the aforementioned are improvements, I filed bug 1489405 to track those, happening during the 63 cycle to release.

Thanks again @lisa for providing the groundwork for fixing this!
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Duplicate of this bug: 1083328
You need to log in before you can comment on or make changes to this bug.