Closed Bug 1214487 Opened 10 years ago Closed 9 years ago

Implement uptake monitoring on slaves/workers

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: rail, Assigned: mtabara)

References

Details

Attachments

(9 files, 1 obsolete file)

Uptake monitoring logic within mozharness 9 years ago Mihai Tabara [:mtabara]⌚️GMT 7.91 KB, patch	rail : feedback+	Details \| Diff \| Splinter Review
MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD 9 years ago Mihai Tabara [:mtabara]⌚️GMT 58 bytes, text/x-review-board-request	rail : review+	Details
MozReview Request: Bug 1214487 - Add uptake monitoring builder. r=rail 9 years ago Mihai Tabara [:mtabara]⌚️GMT 58 bytes, text/x-review-board-request	rail : review+	Details
MozReview Request: Bug 1214487 - Add uptake monitoring builder config. r=rail 9 years ago Mihai Tabara [:mtabara]⌚️GMT 58 bytes, text/x-review-board-request		Details
MozReview Request: Bug 1214487 - Enable uptake monitoring flag in release runner. r=rail 9 years ago Mihai Tabara [:mtabara]⌚️GMT 58 bytes, text/x-review-board-request	rail : review+	Details
MozReview Request: Bug 1214487 - Add uptake monitoring builder config. r=rail 9 years ago Mihai Tabara [:mtabara]⌚️GMT 58 bytes, text/x-review-board-request	rail : review+	Details
MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD 9 years ago Mihai Tabara [:mtabara]⌚️GMT 58 bytes, text/x-review-board-request	rail : review+	Details
1214487.diff 9 years ago Mihai Tabara [:mtabara]⌚️GMT 14.51 KB, patch		Details \| Diff \| Splinter Review
Add check_uptake flag to products considered for uptake monitoring 9 years ago Mihai Tabara [:mtabara]⌚️GMT 16.22 KB, patch	rail : review+	Details \| Diff \| Splinter Review
Check uptake flag for all unpromoted releases as well: fennec/thunderbird/esr 9 years ago Mihai Tabara [:mtabara]⌚️GMT 7.30 KB, patch	rail : review+	Details \| Diff \| Splinter Review

Rail Aliiev [:rail]

Reporter

Description

•

10 years ago

The current implementation runs as a part of master process, see http://hg.mozilla.org/build/buildbotcustom/file/d93d0b0b1c3e/scheduler.py#l218 Even though the surrounding scheduler code (boilerplate!) is a bit complicated, the actual business logic is here: http://hg.mozilla.org/build/tools/file/19c4bb07e1a8/lib/python/util/tuxedo.py#l66 Basically it just checks multiple URLs (API end points behind AUTH) and calculates some minimum values. The API is XML based. We could use TC if the secret service is ready.

Chris Cooper [:coop] (he/him)

Updated

•

10 years ago

Assignee: nobody → tabara.mihai

Rail Aliiev [:rail]

Reporter

Comment 1

•

10 years ago

Server side implementation: https://github.com/mozilla/tuxedo/blob/master/apps/api/views.py#L57 It accepts 3 params (https://github.com/mozilla/tuxedo/blob/master/apps/api/views.py#L57): product: Firefox-$version os: a platform (see http://hg.mozilla.org/build/tools/file/tip/lib/python/release/platforms.py#l7) fuzzy: is set to true, will match Firefox-42.0 if you search for Firefox-4 Example request: https://bounceradmin.mozilla.com/api/uptake/?product=Firefox-36.0.4-Partial-35.0.1&os=win example response: <mirror_uptake> <item> <available>2000100</available> <product>Firefox-36.0.4-Partial-35.0.1</product> <total>2000100</total> <os>win</os> </item> </mirror_uptake> If you skip "os", the response will contain all "os"es: https://bounceradmin.mozilla.com/api/uptake/?product=Firefox-36.0.4-Partial-35.0.1 <?xml version="1.0"?> <mirror_uptake> <item> <available>2000100</available> <product>Firefox-36.0.4-Partial-35.0.1</product> <total>2000100</total> <os>linux</os> </item> <item> <available>2000100</available> <product>Firefox-36.0.4-Partial-35.0.1</product> <total>2000100</total> <os>win</os> </item> <item> <available>2000100</available> <product>Firefox-36.0.4-Partial-35.0.1</product> <total>2000100</total> <os>linux64</os> </item> <item> <available>2000100</available> <product>Firefox-36.0.4-Partial-35.0.1</product> <total>2000100</total> <os>osx</os> </item> </mirror_uptake> We use the first method (explicit "os") and make requests for every os we care about. This way we can add the contrib platforms (solaris, for example) and don't block ourselves on there uptake. We add these products in http://hg.mozilla.org/mozilla-central/file/tip/testing/mozharness/scripts/bouncer_submitter.py using configs from http://hg.mozilla.org/mozilla-central/file/tip/testing/mozharness/configs/releases We can probably reuse that code, esp the configs. We could mark some platforms as not participating in uptake checks, something like, "check_uptake": True/False. It will also should simplify generating the list of products to be checked and not rely on buildbot-configs

Justin Wood (:Callek)

Comment 2

•

10 years ago

Note that we can't use this from Taskcluster atm, due to bounceradmin (tuxedo) not having a netflow from the public internet. We do have the tuxedoUsername/Password on the masters, (not slaves) that can then be passed down to mozharness if we `use_credentials_file=True` from the buildbot factory for it. And you can see, for example: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/bouncer/submitter.py#20 for how mozharness grabs that data, for other uses.

Rail Aliiev [:rail]

Reporter

Comment 3

•

10 years ago

Also, these configs could be useful to generate a list of products that we need to check: https://dxr.mozilla.org/mozilla-central/search?q=path%3Abouncer&redirect=false&case=true

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

9 years ago

Depends on: 1267629

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 4

•

9 years ago

Attached patch Uptake monitoring logic within mozharness — Details — Splinter Review

Throwing this for a first glance. I still need to add: * buildbotcustom builder * tweak the plug-in within the releasetasks (make sure I pass all info I need) + testing

Attachment #8746211 - Flags: feedback?(rail)

Rail Aliiev [:rail]

Reporter

Comment 5

•

9 years ago

Comment on attachment 8746211 [details] [diff] [review] Uptake monitoring logic within mozharness Review of attachment 8746211 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/mozharness/scripts/release/uptake_monitoring.py @@ +21,5 @@ > +from mozharness.base.script import BaseScript > +from mozharness.mozilla.buildbot import BuildbotMixin > + > + > +def get_tuxedo_uptake_url(tuxedo_server_url, related_product, os=None): I think you can simplify this and always require `os`. At least the only usage here is with os passed. @@ +132,5 @@ > + related_product, bouncer_platform)) > + # handle the partials as well > + subs = { > + "version": version > + } I would move `subs` close to where you set subs["prev_version"] and just create it from scratch every time. This way the dict stay immutable. @@ +137,5 @@ > + prev_versions = self.config.get("partial_versions") > + for product, info in self.config["partials"].iteritems(): > + product_template = info["product-name"] > + for prev_version in prev_versions: > + subs["prev_version"] = prev_version as I said above: subs = { "version": version, "prev_version": prev_version } @@ +162,5 @@ > + def check_uptake(uptake, min_uptake=10000): > + self.info("Current uptake value to check is {}".format(uptake)) > + if uptake >= min_uptake: > + return True > + return False return uptake >= min_uptake :P @@ +168,5 @@ > + def poll(): > + uptake = self._get_release_uptake(auth) > + return check_uptake(uptake) > + > + def run_loop(poll_interval=60, poll_timeout=2*60*60): 2h is a bit long. It'd be better to know about possible issues early rather than late. I'd set it something like 20m. Usually it doesn't take more than 10 minutes for sentry to check. @@ +170,5 @@ > + return check_uptake(uptake) > + > + def run_loop(poll_interval=60, poll_timeout=2*60*60): > + import datetime > + import time these are builtin modules, you can move them to the top to reduce clutter here @@ +178,5 @@ > + if delta > poll_timeout: > + self.error("Uptake monitoring sadly timed-out") > + raise Exception("Time-out during uptake monitoring") > + r = poll() > + if r: poll() sounds a bit off, because it not only polls, but also return some value. I'd probably combine poll() and check_uptake() together and called get_uptake or check_uptake. Or you can just use self._get_release_uptake(auth) here instead and get rid of poll() and check_uptake(). @@ +185,5 @@ > + else: > + self.info("Mirrors not yet updated, sleeping for a bit ...") > + time.sleep(poll_interval) > + > + run_loop() you can even get rid of run_loop() by removing the function definition and dedenting it one level. ;) just need to define poll_interval and poll_timeout somewhere, maybe in super(UptakeMonitoring, self).__init__(..., config={..., poll_interval: xx, ...}) and access them as self.config[...]

Attachment #8746211 - Flags: feedback?(rail) → feedback+

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 6

•

9 years ago

Attached file MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD — Details

Review commit: https://reviewboard.mozilla.org/r/49653/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49653/

Attachment #8746993 - Flags: review?(rail)

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

9 years ago

Depends on: 1268824

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

9 years ago

Keywords: leave-open

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 7

•

9 years ago

Attached file MozReview Request: Bug 1214487 - Add uptake monitoring builder. r=rail — Details

Review commit: https://reviewboard.mozilla.org/r/49673/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49673/

Attachment #8747009 - Flags: review?(rail)

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 8

•

9 years ago

Attached file MozReview Request: Bug 1214487 - Add uptake monitoring builder config. r=rail — Details

Review commit: https://reviewboard.mozilla.org/r/49679/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49679/

Attachment #8747010 - Flags: review?(rail)

Rail Aliiev [:rail]

Reporter

Comment 9

•

9 years ago

Comment on attachment 8747009 [details] MozReview Request: Bug 1214487 - Add uptake monitoring builder. r=rail https://reviewboard.mozilla.org/r/49673/#review46435

Attachment #8747009 - Flags: review?(rail) → review+

Rail Aliiev [:rail]

Reporter

Comment 10

•

9 years ago

Comment on attachment 8747010 [details] MozReview Request: Bug 1214487 - Add uptake monitoring builder config. r=rail https://reviewboard.mozilla.org/r/49679/#review46437 can you also add similar entries for jamun and date in `mozilla/project_branches.py`. Otherwise we won't be able to run it on those branches.

Attachment #8747010 - Flags: review?(rail)

Rail Aliiev [:rail]

Reporter

Comment 11

•

9 years ago

Comment on attachment 8746993 [details] MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD https://reviewboard.mozilla.org/r/49653/#review46439 A nit: ::: testing/mozharness/scripts/release/uptake_monitoring.py:93 (Diff revision 1) > + import requests > + > + url = get_tuxedo_uptake_url(tuxedo_server_url, related_product, os) > + self.info("Requesting {} from tuxedo".format(url)) > + # TODO: wipe off below line > + url = 'https://people.mozilla.org/~mtabara/tuxedo/uptake.xml' I dont' think you want to land this, do you? ;)

Attachment #8746993 - Flags: review?(rail)

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 12

•

9 years ago

Comment on attachment 8746993 [details] MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49653/diff/1-2/

Attachment #8746993 - Flags: review?(rail)

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 13

•

9 years ago

Attached file MozReview Request: Bug 1214487 - Enable uptake monitoring flag in release runner. r=rail — Details

Review commit: https://reviewboard.mozilla.org/r/49709/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49709/

Attachment #8747070 - Flags: review?(rail)

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 14

•

9 years ago

Attached file MozReview Request: Bug 1214487 - Add uptake monitoring builder config. r=rail — Details

Review commit: https://reviewboard.mozilla.org/r/49711/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49711/

Attachment #8747075 - Flags: review?(rail)

Rail Aliiev [:rail]

Reporter

Updated

•

9 years ago

Attachment #8747075 - Flags: review?(rail) → review+

Rail Aliiev [:rail]

Reporter

Comment 15

•

9 years ago

Comment on attachment 8747075 [details] MozReview Request: Bug 1214487 - Add uptake monitoring builder config. r=rail https://reviewboard.mozilla.org/r/49711/#review46469

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 16

•

9 years ago

Attached file MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD — Details

Review commit: https://reviewboard.mozilla.org/r/49719/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49719/

Attachment #8747078 - Flags: review?(rail)

Rail Aliiev [:rail]

Reporter

Updated

•

9 years ago

Attachment #8747078 - Flags: review?(rail) → review+

Rail Aliiev [:rail]

Reporter

Comment 17

•

9 years ago

Comment on attachment 8747078 [details] MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD https://reviewboard.mozilla.org/r/49719/#review46473

Rail Aliiev [:rail]

Reporter

Comment 18

•

9 years ago

Comment on attachment 8747070 [details] MozReview Request: Bug 1214487 - Enable uptake monitoring flag in release runner. r=rail https://reviewboard.mozilla.org/r/49709/#review46475

Attachment #8747070 - Flags: review?(rail) → review+

Rail Aliiev [:rail]

Reporter

Comment 19

•

9 years ago

Attachment #8746993 - Flags: review?(rail) → review+

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 20

•

9 years ago

There are a lot of patches to land, I'll put my thoughts down for now: Note to self: 1. Squash + land buildbot-configs changes + merge 2. Squash + land gecko mozharness changes to inbound + cherry pick to aurora/beta/esr45 3. Land buildbotcustom changes (they depind on buildbot-configs and gecko mozharness) 4. Land release runner changes to default branch 5. Push PR for review, make sure tests pass and merge to mozilla/releasetasks (still-in-progress) Test on date/jamun to see if the uptake monitoring is working properly.

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 21

•

9 years ago

For the record, PR for releasetasks changes is https://github.com/mozilla/releasetasks/pull/170

Rail Aliiev [:rail]

Reporter

Comment 22

•

9 years ago

(In reply to Mihai Tabara [:mtabara] from comment #20) > 2. Squash + land gecko mozharness changes to inbound + cherry pick to > aurora/beta/esr45 and mozilla-release, otherwise uptake monitoring won't work for a dot release.

Rail Aliiev [:rail]

Reporter

Comment 23

•

9 years ago

I squashed and pushed the gecko patch to jamun: https://treeherder.mozilla.org/#/jobs?repo=jamun&revision=9f6c2dca91e7 To test things on jamun we need: * wait for the build to be ready or use mh revision in ship-it * buildbotcustom and buildbot-configs changes landed/merged/reconfigured * deploy the release runner and releasetasks changes on bm83, restart release runner * start a release and see how it fails :D - it'l fail because sentry may not work as expected in staging. We can try to build a 46.0 release using older build and fresh mh revision set in ship-it.

Release Engineering SlaveAPI Service

Comment 24

•

9 years ago

In production: https://hg.mozilla.org/build/buildbot-configs/rev/07af46009466

Release Engineering SlaveAPI Service

Comment 25

•

9 years ago

In production: https://hg.mozilla.org/build/buildbotcustom/rev/6a21eadc4deb

Release Engineering SlaveAPI Service

Comment 26

•

9 years ago

In production: https://hg.mozilla.org/build/buildbotcustom/rev/7d70b0149b14

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 27

•

9 years ago

Just a little status update with the leftovers: 0. * buildbotcustom/bb-configs changes are landed/merged/reconfiged * bm83 is patched with https://github.com/MihaiTabara/releasetasks/tree/uptake_monitoring * bm83 releaserunner is also patched with the change and restarted - all ready to start a new release 1. start a release and see how it fails :D - it'l fail because sentry may not work as expected in staging. We can try to build a 46.0 release using older build and fresh mh revision set in ship-it. 2. once we sort out and have a successfully run, we need to merge: * merge releasetasks PR to mozilla/releasetasks * push release runner changes to default 3. Squash + land gecko mozharness changes to inbound + cherry pick to aurora/beta/esr45/release (currently cherry-picked only in jamun)

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 28

•

9 years ago

Attached patch 1214487.diff (obsolete) — Details — Splinter Review

We had a mock-up release on jamun this morning and few issues came up: * a rookie mistake of mine * the old logic was iterating over all products/partials from mozharness configs, including the candidates dir and partials candidates from mozilla-release configs. The current patch addresses both of these issues and it's built atop of jamun.

Attachment #8748718 - Flags: review?(rail)

Rail Aliiev [:rail]

Reporter

Comment 29

•

9 years ago

Comment on attachment 8748718 [details] [diff] [review] 1214487.diff Can you add "check_uptake": False to other products, so it's explicit and we make less mistakes when copy/paste code?

Attachment #8748718 - Flags: review?(rail)

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 30

•

9 years ago

Attached patch Add check_uptake flag to products considered for uptake monitoring — Details — Splinter Review

Attachment #8748718 - Attachment is obsolete: true

Attachment #8748801 - Flags: review?(rail)

Rail Aliiev [:rail]

Reporter

Comment 31

•

9 years ago

Comment on attachment 8748801 [details] [diff] [review] Add check_uptake flag to products considered for uptake monitoring Review of attachment 8748801 [details] [diff] [review]: ----------------------------------------------------------------- LGTM. Can you also write another patch for tb and fenenc configs, so we don't forget them in the future. The new flag is harmless to have there.

Attachment #8748801 - Flags: review?(rail) → review+

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 32

•

9 years ago

Attached patch Check uptake flag for all unpromoted releases as well: fennec/thunderbird/esr — Details — Splinter Review

I added the check_uptake flag for TB/Fennec and esr since it may be the next in the promotion-ify process. If that's not the case, just let me know and I'll remove it.

Attachment #8748833 - Flags: review?(rail)

Rail Aliiev [:rail]

Reporter

Updated

•

9 years ago

Attachment #8748833 - Flags: review?(rail) → review+

Pulsebot

Comment 33

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/bb7a2a4569d2

Rail Aliiev [:rail]

Reporter

Comment 34

•

9 years ago

Gecko: https://hg.mozilla.org/integration/mozilla-inbound/rev/bb7a2a4569d2 https://hg.mozilla.org/releases/mozilla-aurora/rev/c1849128bc3d https://hg.mozilla.org/releases/mozilla-beta/rev/b38f32029661 https://hg.mozilla.org/releases/mozilla-release/rev/0458959c31c1 https://hg.mozilla.org/releases/mozilla-esr45/rev/78c6f56c9758

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 35

•

9 years ago

All the remaining pieces have been pushed to production: * releasestasks https://github.com/mozilla/releasetasks/pull/170 merged * tools repo changes http://hg.mozilla.org/build/tools/rev/6bc076797116 * gecko changes cherry-picked to inbound,aurora,beta,release,esr45 as Rail indicated above. We've had successful uptake monitoring tests in several releases today on bm83. I updated releasetasks/tools and restarted releaserunner on both bm83/bm85. I'm closing the bug for now, will reopen if any issues come along. Thank you Rail for all the help and patience in solving this bug!

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Carsten Book [:Tomcat]

Comment 36

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bb7a2a4569d2

Joel Maher ( :jmaher ) (UTC -8)

Comment 37

•

9 years ago

I am investigating a linux issue with talos (bug 1271948) where on Monday May 9th our data became unreliable- backfilling showed this was an infrastructure issue vs an in-tree change. I see this landed a few days before the issue started (I can safely ignore weekends due to low volume, we already have other patterns on weekends). :mihai, can you explain what this does on the physical machines? Does this affect the linux machines? Does this run all the time or only as part of a specific job?

Flags: needinfo?(mtabara)

Chris AtLee [:catlee]

Comment 38

•

9 years ago

These jobs don't run on the talos machines, so probably aren't responsible.

Flags: needinfo?(mtabara)

Justin Wood (:Callek)

Updated

•

9 years ago

Blocks: 1283853

Firefox Bug Husbandry Bot

Comment 39

•

8 years ago

Removing leave-open keyword from resolved bugs, per :sylvestre.

Keywords: leave-open

You need to log in before you can comment on or make changes to this bug.