Closed Bug 1398796 Opened 7 years ago Closed 6 years ago

Do uptake monitoring in TC

Categories

(Release Engineering :: Release Automation: Other, enhancement, P2)

enhancement

Tracking

(firefox60 fixed)

RESOLVED FIXED
Tracking Status
firefox60 --- fixed

People

(Reporter: catlee, Assigned: rail)

References

Details

(Keywords: leave-open)

Attachments

(3 files, 4 obsolete files)

Alternatively, we could kill uptake monitoring. I'm not sure what it buys us right now.
Depends on: 1413179
Assignee: nobody → rail
No longer blocks: 1433459
Attached patch uptake.diff (obsolete) — Splinter Review
I ended up using a mozharness script + futures. I tested it as a standalone script, it worked as expected.

Hopefully the scheduling part will work too!
Attachment #8949531 - Flags: feedback?(mtabara)
Comment on attachment 8949531 [details] [diff] [review]
uptake.diff

Review of attachment 8949531 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great to me, nice to see it off of BB. 
However, I'm still somehow concerned about the name.

Are we to finally kill the notion of "uptake monitoring" and instead continue with "bouncer check"? 
If so, I think there are pros and cons to this:

Pros:
* there are three operations related to bouncer: bouncer submission, bouncer aliases and bouncer check. I think it's nice to uniformize this and kill the good-old "uptake" notion which is somehow outside of context.

Cons:
* bouncer submission and bouncer alias are to be operated by bouncer-scriptworker while `bouncer check` remains docker-worker. Might be confusing at first glance for anyone willing to debug.

What do you think?

::: taskcluster/ci/release-bouncer-check/kind.yml
@@ +94,5 @@
> +                    maple:
> +                        - releases/dev_bouncer_firefox_beta.py
> +                    default:
> +                        - releases/bouncer_firefox_dev.py
> +        treeherder:

I think we can squeeze all the `treeherder` properties in the job-defaults to reduce boilerplate code and keep only the differences within the jobs themselves:

For firefox:

treeherder:
     platform: linux64/opt 

For Fennec
devedition:
     platform: linux64-devedition/opt

::: testing/mozharness/scripts/release/bouncer_check.py
@@ +33,5 @@
> +            "dest": "prev_versions",
> +            "action": "extend",
> +            "help": "Previous version(s)",
> +        }],
> +        [["--locale"], {

Neat idea!

@@ +36,5 @@
> +        }],
> +        [["--locale"], {
> +            "dest": "locales",
> +            # Intentionally limited for several reasons:
> +            # 1) faster to heck

s,heck,check

@@ +99,5 @@
> +            product_name = product["product-name"] % {"version": self.config["version"]}
> +            for path in product["paths"].values():
> +                bouncer_platform = path["bouncer-platform"]
> +                for locale in self.config["locales"]:
> +                    url = "{bouncer_prefix}?product={product}&os={os}&lang={lang}".format(

Optional nit: there are close to zero chanced to see this change over the following years, but for the sake of it, I'd rather see it outside of this function. A templated string either as a class variable or even in the configs. And then extrapolated here with `product`, `os` and `lang`.
Attachment #8949531 - Flags: feedback?(mtabara) → feedback+
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #2)
> For Fennec
> devedition:
>      platform: linux64-devedition/opt
> 

Obviously meant s/Fennec/Devedition here ^_^
Thank you for the quick review!

(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #2)
> However, I'm still somehow concerned about the name.
> 
> Are we to finally kill the notion of "uptake monitoring" and instead
> continue with "bouncer check"? 

This was intentional. "Uptake" assumes that you have some uptake values to watch. When we used the community mirror network, every mirror had a weight and we waited until enough mirrors had the files synced, so the uptake value is high enough. 

The current approach just tries to check a single server, so I thought that in 5 years from now the "uptake" term would be confusing.

I'll address your comments and test it on maple. Thank you!
(In reply to Rail Aliiev [:rail] ⌚️ET from comment #4)
> Thank you for the quick review!
> 
> (In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #2)
> > However, I'm still somehow concerned about the name.
> > 
> > Are we to finally kill the notion of "uptake monitoring" and instead
> > continue with "bouncer check"? 
> 
> This was intentional. "Uptake" assumes that you have some uptake values to
> watch. When we used the community mirror network, every mirror had a weight
> and we waited until enough mirrors had the files synced, so the uptake value
> is high enough. 
> 
> The current approach just tries to check a single server, so I thought that
> in 5 years from now the "uptake" term would be confusing.
> 
> I'll address your comments and test it on maple. Thank you!

Sounds good to me, thought so too.
Awesome to see BBB-- :)
Attached patch uptake.diff (obsolete) — Splinter Review
Addressed the comments and pushed to maple: https://hg.mozilla.org/projects/maple/rev/e79dacf0182225c64a8c70ade840924f5bbc815c
Attachment #8949531 - Attachment is obsolete: true
Now need to address PEP8/yaml errors :/
Attached patch uptake-maple.diff (obsolete) — Splinter Review
With linter fixes
Attachment #8949779 - Attachment is obsolete: true
Comment on attachment 8949531 [details] [diff] [review]
uptake.diff

Review of attachment 8949531 [details] [diff] [review]:
-----------------------------------------------------------------

::: taskcluster/ci/release-bouncer-check/kind.yml
@@ +30,5 @@
> +    index:
> +        type: release
> +    routes:
> +       - index.releases.v1.{branch}.latest.{product}.latest.bouncer_check
> +       - index.releases.v1.{branch}.{revision}.{product}.{underscore_version}.build{build_number}.bouncer_check

At a second glance, what is `underscore_version` here?
underscore_version = version.replace(".", "_") - indexes use dots to separate namespaces.
(In reply to Rail Aliiev [:rail] ⌚️ET from comment #10)
> underscore_version = version.replace(".", "_") - indexes use dots to
> separate namespaces.

Ah, right. Makes sense, mahalo!
Attached patch uptake.diff (obsolete) — Splinter Review
Another take on this, using run.using.run-task to avoid unnecessary hacks.
Attachment #8949784 - Attachment is obsolete: true
See Also: → 1437565
Comment on attachment 8950745 [details]
Bug 1398796 - Do uptake monitoring in TC

actually, it's not ready yet, the partials didin't work for some reason
Attachment #8950745 - Flags: review?(mtabara)
Attachment #8950745 - Flags: review?(aki)
(In reply to Rail Aliiev [:rail] ⌚️ET from comment #16)
> https://hg.mozilla.org/projects/maple/rev/
> 468fd0e4e4a935fa8dd555832890c3dcbde6239b#l1.12 should fix the partials issue.

Pushed your latest staging release to test the patch - 59.0b10-build1
The job is here: https://tools.taskcluster.net/groups/Qn-rtcT6Sm6yqqKKmPlLKw/tasks/A8shEWQFROCbr5puh_Dgbg/

It's still failing, not sure if it's because of missing interpolation there or just the bouncer products could not be found.
Logs are here: https://public-artifacts.taskcluster.net/A8shEWQFROCbr5puh_Dgbg/0/public/logs/live_backing.log
Those failures are expected, because https://bouncer-bouncer-releng.stage.mozaws.net/ is 404 everywhere.
You can submit the same task using the prod config and it should work fine. I actually tested it like that yesterday and it was green. The only outstanding issue ATM is the missing partials.
I think this one is ready to go. I ran a staging release and this is the group https://tools.taskcluster.net/groups/K2tOotpWStaL6CU2SBqMCw with a failed check, what is expected due to not properly working staging bouncer server.
I hope my changes to scriptworker.py and release_promotion.py are valid. This is why I added 2 people to the review request.
To get a green task, I edited the existing task:
1) use the prod config
2) set the version to 59.0b9
3) set the corresponding partials for that release

https://tools.taskcluster.net/groups/O2cW5fINQVmcNSLv5Q0f4Q/tasks/O2cW5fINQVmcNSLv5Q0f4Q/details
Comment on attachment 8950745 [details]
Bug 1398796 - Do uptake monitoring in TC

https://reviewboard.mozilla.org/r/219988/#review226184

\o/

::: taskcluster/ci/release-bouncer-check/kind.yml:24
(Diff revision 3)
> +    run-on-projects: []  # to make sure this never runs as part of CI
> +    shipping-phase: push
> +    worker-type: aws-provisioner-v1/gecko-{level}-b-linux
> +    worker:
> +        max-run-time: 1200
> +        docker-image: {in-tree: "lint"}

Is this right?

::: taskcluster/ci/release-bouncer-check/kind.yml:28
(Diff revision 3)
> +        max-run-time: 1200
> +        docker-image: {in-tree: "lint"}
> +    run:
> +        using: run-task
> +        sparse-profile: mozharness
> +    # 'release' index type interpolates the routes placeholders

If you rebase on top of Nick's tc-notify patches, we probably don't need the index, routes, or the notifications blocks, making this a lot cleaner.

I'm ok with that as a followup if you prefer to land this sooner.

::: taskcluster/taskgraph/transforms/bouncer_check.py:26
(Diff revision 3)
> +def add_command(config, jobs):
> +    release_config = get_release_config(config)
> +    version = release_config["version"]
> +    for job in jobs:
> +        job = copy.deepcopy(job)  # don't overwrite dict values here
> +        command = [

I have a slight preference for specifying the command in the kind yaml, and then formatting the strings inside with a replacement dict, e.g.

kind.yml:

```
command: ['foo', 'bar', '--version={version}']
```

transforms:
```
    if isinstance(command, list):
        for index, substr in enumerate(command):
            command[index] = substr.format(version=version)
```

I don't think there's a huge difference though.

::: taskcluster/taskgraph/transforms/bouncer_check.py:77
(Diff revision 3)
> +@transforms.add
> +def command_to_string(config, jobs):
> +    """Convert command to string to make it work properly with run-task"""
> +    for job in jobs:
> +        job = copy.deepcopy(job)  # don't overwrite dict values here
> +        job["run"]["command"] = " ".join(job["run"]["command"])

`subprocess.list2cmdline(command)` will do this with some intelligent quoting etc. It'll be a little less fragile.
Attachment #8950745 - Flags: review?(aki) → review+
Comment on attachment 8950745 [details]
Bug 1398796 - Do uptake monitoring in TC

https://reviewboard.mozilla.org/r/219988/#review226232

I have overall question, I saw you killing the TC task but not the mozharness itself.
Should we kill that too + corresponding configs and let it ride the trains?

::: build/sparse-profiles/mozharness:1
(Diff revision 3)
> +%include build/sparse-profiles/mach

Not sure I understand what this is but seems to work, so wfm!

::: testing/mozharness/configs/releases/bouncer_fennec.py:7
(Diff revision 3)
>  config = {
> +    "bouncer_prefix": "https://download.mozilla.org/",
>      "products": {
>          "apk": {
>              "product-name": "Fennec-%(version)s",
>              "check_uptake": True,

Should we kill the `check_uptake` configs and let this ride the trains?
Attachment #8950745 - Flags: review?(mtabara) → review+
Thank you for the review. Some comments below.

(In reply to Aki Sasaki [:aki] from comment #23)
> > +        docker-image: {in-tree: "lint"}
> 
> Is this right?

I picked one of the existing docker images. The only thing that I need is python, so I can run mozharness. I have no strong opinion here.

> If you rebase on top of Nick's tc-notify patches, we probably don't need the
> index, routes, or the notifications blocks, making this a lot cleaner.

Oh, nice. I definitely can use those. Thanks for the tip.
 
> I'm ok with that as a followup if you prefer to land this sooner.

I'll probably land this in one chunk. There is no rush with this.
 
> I have a slight preference for specifying the command in the kind yaml, and
> then formatting the strings inside with a replacement dict, e.g.
> I don't think there's a huge difference though.

That was my initial idea too. There are 2 cons:

1) Multiple --previous-version parameters. I don't know any good way to represent this in yaml. As a work around I could use --previous-versions=x,y,z and split in the script.

2) The list of partials is not known until we start a release or it's not applicable for Fennec (not implemented yet. So in this case the approach in 1) would fail.

Another approach would be not specifying the previous version parameters in yaml, but appending them in the transform. In this case we end up with `command` specified in multiple places, what makes it harder to find and edit later.

These were my thoughts about `command` and this is how I ended up with the current solution.

I'm open to more ideas! :)

> `subprocess.list2cmdline(command)` will do this with some intelligent
> quoting etc. It'll be a little less fragile.

Oh, GTK!
Thanks!

(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #24)
> I have overall question, I saw you killing the TC task but not the
> mozharness itself.
> Should we kill that too + corresponding configs and let it ride the trains?

Good point. The configs are shared (by bouncer submitter and aliases), and reused them by the new script, so it's early to kill them. I'll remove the old script though.

> 
> ::: build/sparse-profiles/mozharness:1
> (Diff revision 3)
> > +%include build/sparse-profiles/mach
> 
> Not sure I understand what this is but seems to work, so wfm!

These are mercurial sparse profile definitions to limit the checkouts to particular directories instead of the whole tree.
 
> ::: testing/mozharness/configs/releases/bouncer_fennec.py:7
> (Diff revision 3)
> >  config = {
> > +    "bouncer_prefix": "https://download.mozilla.org/",
> >      "products": {
> >          "apk": {
> >              "product-name": "Fennec-%(version)s",
> >              "check_uptake": True,
> 
> Should we kill the `check_uptake` configs and let this ride the trains?

I probably should use this variable, so we skip some products.
(In reply to Rail Aliiev [:rail] ⌚️ET from comment #26)
> > 
> > ::: build/sparse-profiles/mozharness:1
> > (Diff revision 3)
> > > +%include build/sparse-profiles/mach
> > 
> > Not sure I understand what this is but seems to work, so wfm!
> 
> These are mercurial sparse profile definitions to limit the checkouts to
> particular directories instead of the whole tree.

Oh neat, makes sense.
> I probably should use this variable, so we skip some products.

Apparently I do! :)
Comment on attachment 8950745 [details]
Bug 1398796 - Do uptake monitoring in TC

Hmm, rebasing didn't work as expected... Let me try again.
Attachment #8950745 - Flags: review+
Comment on attachment 8950745 [details]
Bug 1398796 - Do uptake monitoring in TC

Actually the final diff is correct, it's just reviewboard messing up with interdiffs.

The actual interdiff is: https://gist.github.com/rail/01caca703ca8c1d11e99f92b88fc2490
Attachment #8950745 - Flags: review?(mtabara)
Comment on attachment 8950745 [details]
Bug 1398796 - Do uptake monitoring in TC

https://reviewboard.mozilla.org/r/219988/#review226732

Good job, nice to finally see uptake monitoring retired! \o/

Side question - until we're killing BBB completely, we won't be able to kill configs + builder from bbconfigs/bbcustom right? We still need them for esr52 and there's no way to make those ride the trains. Just wanted to confirm.
Attachment #8950745 - Flags: review?(mtabara) → review+
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #32)
> Side question - until we're killing BBB completely, we won't be able to kill
> configs + builder from bbconfigs/bbcustom right? We still need them for
> esr52 and there's no way to make those ride the trains. Just wanted to
> confirm.

Sounds about right.
Attachment #8950745 - Flags: checked-in+
Attachment #8950248 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/45b2ae093db8
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
2 issues:

1) we have to wait for sentry to run, thus we had to rerun: https://tools.taskcluster.net/groups/DzR0kEOiRNy6lZQAOnkk3w/tasks/bTakoitaSCqVox3WtkHpsA/details
2) devedition config is probably wrong: https://tools.taskcluster.net/groups/Wn10U6j5Qf6QrlylAzzhxQ/tasks/LxnAOo3QQKKFWg7Jw4sgPA/details
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch configs.diffSplinter Review
Attachment #8953258 - Flags: review?(jlund)
Also, partials are not set, probably need to add push_firefox, push_devedition, and push_firefox_rc  to PARTIAL_UPDATES_FLAVORS.
Keywords: leave-open
(In reply to Rail Aliiev [:rail] ⌚️ET from comment #39)
> Also, partials are not set, probably need to add push_firefox,
> push_devedition, and push_firefox_rc  to PARTIAL_UPDATES_FLAVORS.

I’m changing that to PARTIAL_UPDATES_PRODUCTS since we need it set for all fx and devedition flavors.
Attachment #8953258 - Flags: review?(jlund) → review?(mtabara)
Attachment #8953258 - Flags: review?(mtabara) → review+
(In reply to Aki Sasaki [:aki] from comment #40)
> (In reply to Rail Aliiev [:rail] ⌚️ET from comment #39)
> > Also, partials are not set, probably need to add push_firefox,
> > push_devedition, and push_firefox_rc  to PARTIAL_UPDATES_FLAVORS.
> 
> I’m changing that to PARTIAL_UPDATES_PRODUCTS since we need it set for all
> fx and devedition flavors.

Is there a bug to track or should I go ahead and fix the issue updating PARTIAL_UPDATES_FLAVORS?
Flags: needinfo?(aki)
I fixed it in https://hg.mozilla.org/projects/maple/rev/a106c41a8cfa5e3ed5e69775f6ebec239be5cd2b for bug 1433488 but the patch had larger problems, so I backed it out. Feel free to take whatever portions of the patch work for you, or add PARTIAL_UPDATES_FLAVORS.
Flags: needinfo?(aki)
Attachment #8954400 - Flags: review?(aki)
Attachment #8954400 - Flags: review?(aki) → review+
Pushed by raliiev@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ea664b3e9e6a
Add push phase to PARTIAL_UPDATES_FLAVORS r=aki
See Also: → 1443104
This worked like a charm in Devedition 60.0b1.
Let's declare victory and reopen if we need to.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
See Also: → 1444131
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: