Closed
Bug 1214487
Opened 8 years ago
Closed 8 years ago
Implement uptake monitoring on slaves/workers
Categories
(Release Engineering :: Release Automation: Other, defect)
Release Engineering
Release Automation: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rail, Assigned: mtabara)
References
Details
Attachments
(9 files, 1 obsolete file)
7.91 KB,
patch
|
rail
:
feedback+
|
Details | Diff | Splinter Review |
58 bytes,
text/x-review-board-request
|
rail
:
review+
|
Details |
58 bytes,
text/x-review-board-request
|
rail
:
review+
|
Details |
58 bytes,
text/x-review-board-request
|
Details | |
58 bytes,
text/x-review-board-request
|
rail
:
review+
|
Details |
58 bytes,
text/x-review-board-request
|
rail
:
review+
|
Details |
58 bytes,
text/x-review-board-request
|
rail
:
review+
|
Details |
16.22 KB,
patch
|
rail
:
review+
|
Details | Diff | Splinter Review |
7.30 KB,
patch
|
rail
:
review+
|
Details | Diff | Splinter Review |
The current implementation runs as a part of master process, see http://hg.mozilla.org/build/buildbotcustom/file/d93d0b0b1c3e/scheduler.py#l218 Even though the surrounding scheduler code (boilerplate!) is a bit complicated, the actual business logic is here: http://hg.mozilla.org/build/tools/file/19c4bb07e1a8/lib/python/util/tuxedo.py#l66 Basically it just checks multiple URLs (API end points behind AUTH) and calculates some minimum values. The API is XML based. We could use TC if the secret service is ready.
Updated•8 years ago
|
Assignee: nobody → tabara.mihai
Reporter | ||
Comment 1•8 years ago
|
||
Server side implementation: https://github.com/mozilla/tuxedo/blob/master/apps/api/views.py#L57 It accepts 3 params (https://github.com/mozilla/tuxedo/blob/master/apps/api/views.py#L57): product: Firefox-$version os: a platform (see http://hg.mozilla.org/build/tools/file/tip/lib/python/release/platforms.py#l7) fuzzy: is set to true, will match Firefox-42.0 if you search for Firefox-4 Example request: https://bounceradmin.mozilla.com/api/uptake/?product=Firefox-36.0.4-Partial-35.0.1&os=win example response: <mirror_uptake> <item> <available>2000100</available> <product>Firefox-36.0.4-Partial-35.0.1</product> <total>2000100</total> <os>win</os> </item> </mirror_uptake> If you skip "os", the response will contain all "os"es: https://bounceradmin.mozilla.com/api/uptake/?product=Firefox-36.0.4-Partial-35.0.1 <?xml version="1.0"?> <mirror_uptake> <item> <available>2000100</available> <product>Firefox-36.0.4-Partial-35.0.1</product> <total>2000100</total> <os>linux</os> </item> <item> <available>2000100</available> <product>Firefox-36.0.4-Partial-35.0.1</product> <total>2000100</total> <os>win</os> </item> <item> <available>2000100</available> <product>Firefox-36.0.4-Partial-35.0.1</product> <total>2000100</total> <os>linux64</os> </item> <item> <available>2000100</available> <product>Firefox-36.0.4-Partial-35.0.1</product> <total>2000100</total> <os>osx</os> </item> </mirror_uptake> We use the first method (explicit "os") and make requests for every os we care about. This way we can add the contrib platforms (solaris, for example) and don't block ourselves on there uptake. We add these products in http://hg.mozilla.org/mozilla-central/file/tip/testing/mozharness/scripts/bouncer_submitter.py using configs from http://hg.mozilla.org/mozilla-central/file/tip/testing/mozharness/configs/releases We can probably reuse that code, esp the configs. We could mark some platforms as not participating in uptake checks, something like, "check_uptake": True/False. It will also should simplify generating the list of products to be checked and not rely on buildbot-configs
Comment 2•8 years ago
|
||
Note that we can't use this from Taskcluster atm, due to bounceradmin (tuxedo) not having a netflow from the public internet. We do have the tuxedoUsername/Password on the masters, (not slaves) that can then be passed down to mozharness if we `use_credentials_file=True` from the buildbot factory for it. And you can see, for example: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/bouncer/submitter.py#20 for how mozharness grabs that data, for other uses.
Reporter | ||
Comment 3•8 years ago
|
||
Also, these configs could be useful to generate a list of products that we need to check: https://dxr.mozilla.org/mozilla-central/search?q=path%3Abouncer&redirect=false&case=true
Assignee | ||
Comment 4•8 years ago
|
||
Throwing this for a first glance. I still need to add: * buildbotcustom builder * tweak the plug-in within the releasetasks (make sure I pass all info I need) + testing
Attachment #8746211 -
Flags: feedback?(rail)
Reporter | ||
Comment 5•8 years ago
|
||
Comment on attachment 8746211 [details] [diff] [review] Uptake monitoring logic within mozharness Review of attachment 8746211 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/mozharness/scripts/release/uptake_monitoring.py @@ +21,5 @@ > +from mozharness.base.script import BaseScript > +from mozharness.mozilla.buildbot import BuildbotMixin > + > + > +def get_tuxedo_uptake_url(tuxedo_server_url, related_product, os=None): I think you can simplify this and always require `os`. At least the only usage here is with os passed. @@ +132,5 @@ > + related_product, bouncer_platform)) > + # handle the partials as well > + subs = { > + "version": version > + } I would move `subs` close to where you set subs["prev_version"] and just create it from scratch every time. This way the dict stay immutable. @@ +137,5 @@ > + prev_versions = self.config.get("partial_versions") > + for product, info in self.config["partials"].iteritems(): > + product_template = info["product-name"] > + for prev_version in prev_versions: > + subs["prev_version"] = prev_version as I said above: subs = { "version": version, "prev_version": prev_version } @@ +162,5 @@ > + def check_uptake(uptake, min_uptake=10000): > + self.info("Current uptake value to check is {}".format(uptake)) > + if uptake >= min_uptake: > + return True > + return False return uptake >= min_uptake :P @@ +168,5 @@ > + def poll(): > + uptake = self._get_release_uptake(auth) > + return check_uptake(uptake) > + > + def run_loop(poll_interval=60, poll_timeout=2*60*60): 2h is a bit long. It'd be better to know about possible issues early rather than late. I'd set it something like 20m. Usually it doesn't take more than 10 minutes for sentry to check. @@ +170,5 @@ > + return check_uptake(uptake) > + > + def run_loop(poll_interval=60, poll_timeout=2*60*60): > + import datetime > + import time these are builtin modules, you can move them to the top to reduce clutter here @@ +178,5 @@ > + if delta > poll_timeout: > + self.error("Uptake monitoring sadly timed-out") > + raise Exception("Time-out during uptake monitoring") > + r = poll() > + if r: poll() sounds a bit off, because it not only polls, but also return some value. I'd probably combine poll() and check_uptake() together and called get_uptake or check_uptake. Or you can just use self._get_release_uptake(auth) here instead and get rid of poll() and check_uptake(). @@ +185,5 @@ > + else: > + self.info("Mirrors not yet updated, sleeping for a bit ...") > + time.sleep(poll_interval) > + > + run_loop() you can even get rid of run_loop() by removing the function definition and dedenting it one level. ;) just need to define poll_interval and poll_timeout somewhere, maybe in super(UptakeMonitoring, self).__init__(..., config={..., poll_interval: xx, ...}) and access them as self.config[...]
Attachment #8746211 -
Flags: feedback?(rail) → feedback+
Assignee | ||
Comment 6•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/49653/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49653/
Attachment #8746993 -
Flags: review?(rail)
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Assignee | ||
Comment 7•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/49673/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49673/
Attachment #8747009 -
Flags: review?(rail)
Assignee | ||
Comment 8•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/49679/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49679/
Attachment #8747010 -
Flags: review?(rail)
Reporter | ||
Comment 9•8 years ago
|
||
Comment on attachment 8747009 [details] MozReview Request: Bug 1214487 - Add uptake monitoring builder. r=rail https://reviewboard.mozilla.org/r/49673/#review46435
Attachment #8747009 -
Flags: review?(rail) → review+
Reporter | ||
Comment 10•8 years ago
|
||
Comment on attachment 8747010 [details] MozReview Request: Bug 1214487 - Add uptake monitoring builder config. r=rail https://reviewboard.mozilla.org/r/49679/#review46437 can you also add similar entries for jamun and date in `mozilla/project_branches.py`. Otherwise we won't be able to run it on those branches.
Attachment #8747010 -
Flags: review?(rail)
Reporter | ||
Comment 11•8 years ago
|
||
Comment on attachment 8746993 [details] MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD https://reviewboard.mozilla.org/r/49653/#review46439 A nit: ::: testing/mozharness/scripts/release/uptake_monitoring.py:93 (Diff revision 1) > + import requests > + > + url = get_tuxedo_uptake_url(tuxedo_server_url, related_product, os) > + self.info("Requesting {} from tuxedo".format(url)) > + # TODO: wipe off below line > + url = 'https://people.mozilla.org/~mtabara/tuxedo/uptake.xml' I dont' think you want to land this, do you? ;)
Attachment #8746993 -
Flags: review?(rail)
Assignee | ||
Comment 12•8 years ago
|
||
Comment on attachment 8746993 [details] MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49653/diff/1-2/
Attachment #8746993 -
Flags: review?(rail)
Assignee | ||
Comment 13•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/49709/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49709/
Attachment #8747070 -
Flags: review?(rail)
Assignee | ||
Comment 14•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/49711/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49711/
Attachment #8747075 -
Flags: review?(rail)
Reporter | ||
Updated•8 years ago
|
Attachment #8747075 -
Flags: review?(rail) → review+
Reporter | ||
Comment 15•8 years ago
|
||
Comment on attachment 8747075 [details] MozReview Request: Bug 1214487 - Add uptake monitoring builder config. r=rail https://reviewboard.mozilla.org/r/49711/#review46469
Assignee | ||
Comment 16•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/49719/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49719/
Attachment #8747078 -
Flags: review?(rail)
Reporter | ||
Updated•8 years ago
|
Attachment #8747078 -
Flags: review?(rail) → review+
Reporter | ||
Comment 17•8 years ago
|
||
Comment on attachment 8747078 [details] MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD https://reviewboard.mozilla.org/r/49719/#review46473
Reporter | ||
Comment 18•8 years ago
|
||
Comment on attachment 8747070 [details] MozReview Request: Bug 1214487 - Enable uptake monitoring flag in release runner. r=rail https://reviewboard.mozilla.org/r/49709/#review46475
Attachment #8747070 -
Flags: review?(rail) → review+
Reporter | ||
Comment 19•8 years ago
|
||
Comment on attachment 8746993 [details] MozReview Request: Bug 1214487 - Implement uptake monitoring on slaves/workers. r=rail. a=release DONTBUILD https://reviewboard.mozilla.org/r/49653/#review46477 Sqashing would be great before we push
Attachment #8746993 -
Flags: review?(rail) → review+
Assignee | ||
Comment 20•8 years ago
|
||
There are a lot of patches to land, I'll put my thoughts down for now: Note to self: 1. Squash + land buildbot-configs changes + merge 2. Squash + land gecko mozharness changes to inbound + cherry pick to aurora/beta/esr45 3. Land buildbotcustom changes (they depind on buildbot-configs and gecko mozharness) 4. Land release runner changes to default branch 5. Push PR for review, make sure tests pass and merge to mozilla/releasetasks (still-in-progress) Test on date/jamun to see if the uptake monitoring is working properly.
Assignee | ||
Comment 21•8 years ago
|
||
For the record, PR for releasetasks changes is https://github.com/mozilla/releasetasks/pull/170
Reporter | ||
Comment 22•8 years ago
|
||
(In reply to Mihai Tabara [:mtabara] from comment #20) > 2. Squash + land gecko mozharness changes to inbound + cherry pick to > aurora/beta/esr45 and mozilla-release, otherwise uptake monitoring won't work for a dot release.
Reporter | ||
Comment 23•8 years ago
|
||
I squashed and pushed the gecko patch to jamun: https://treeherder.mozilla.org/#/jobs?repo=jamun&revision=9f6c2dca91e7 To test things on jamun we need: * wait for the build to be ready or use mh revision in ship-it * buildbotcustom and buildbot-configs changes landed/merged/reconfigured * deploy the release runner and releasetasks changes on bm83, restart release runner * start a release and see how it fails :D - it'l fail because sentry may not work as expected in staging. We can try to build a 46.0 release using older build and fresh mh revision set in ship-it.
Comment 24•8 years ago
|
||
In production: https://hg.mozilla.org/build/buildbot-configs/rev/07af46009466
Comment 25•8 years ago
|
||
In production: https://hg.mozilla.org/build/buildbotcustom/rev/6a21eadc4deb
Comment 26•8 years ago
|
||
In production: https://hg.mozilla.org/build/buildbotcustom/rev/7d70b0149b14
Assignee | ||
Comment 27•8 years ago
|
||
Just a little status update with the leftovers: 0. * buildbotcustom/bb-configs changes are landed/merged/reconfiged * bm83 is patched with https://github.com/MihaiTabara/releasetasks/tree/uptake_monitoring * bm83 releaserunner is also patched with the change and restarted - all ready to start a new release 1. start a release and see how it fails :D - it'l fail because sentry may not work as expected in staging. We can try to build a 46.0 release using older build and fresh mh revision set in ship-it. 2. once we sort out and have a successfully run, we need to merge: * merge releasetasks PR to mozilla/releasetasks * push release runner changes to default 3. Squash + land gecko mozharness changes to inbound + cherry pick to aurora/beta/esr45/release (currently cherry-picked only in jamun)
Assignee | ||
Comment 28•8 years ago
|
||
We had a mock-up release on jamun this morning and few issues came up: * a rookie mistake of mine * the old logic was iterating over all products/partials from mozharness configs, including the candidates dir and partials candidates from mozilla-release configs. The current patch addresses both of these issues and it's built atop of jamun.
Attachment #8748718 -
Flags: review?(rail)
Reporter | ||
Comment 29•8 years ago
|
||
Comment on attachment 8748718 [details] [diff] [review] 1214487.diff Can you add "check_uptake": False to other products, so it's explicit and we make less mistakes when copy/paste code?
Attachment #8748718 -
Flags: review?(rail)
Assignee | ||
Comment 30•8 years ago
|
||
Attachment #8748718 -
Attachment is obsolete: true
Attachment #8748801 -
Flags: review?(rail)
Reporter | ||
Comment 31•8 years ago
|
||
Comment on attachment 8748801 [details] [diff] [review] Add check_uptake flag to products considered for uptake monitoring Review of attachment 8748801 [details] [diff] [review]: ----------------------------------------------------------------- LGTM. Can you also write another patch for tb and fenenc configs, so we don't forget them in the future. The new flag is harmless to have there.
Attachment #8748801 -
Flags: review?(rail) → review+
Assignee | ||
Comment 32•8 years ago
|
||
I added the check_uptake flag for TB/Fennec and esr since it may be the next in the promotion-ify process. If that's not the case, just let me know and I'll remove it.
Attachment #8748833 -
Flags: review?(rail)
Reporter | ||
Updated•8 years ago
|
Attachment #8748833 -
Flags: review?(rail) → review+
Reporter | ||
Comment 34•8 years ago
|
||
Gecko: https://hg.mozilla.org/integration/mozilla-inbound/rev/bb7a2a4569d2 https://hg.mozilla.org/releases/mozilla-aurora/rev/c1849128bc3d https://hg.mozilla.org/releases/mozilla-beta/rev/b38f32029661 https://hg.mozilla.org/releases/mozilla-release/rev/0458959c31c1 https://hg.mozilla.org/releases/mozilla-esr45/rev/78c6f56c9758
Assignee | ||
Comment 35•8 years ago
|
||
All the remaining pieces have been pushed to production: * releasestasks https://github.com/mozilla/releasetasks/pull/170 merged * tools repo changes http://hg.mozilla.org/build/tools/rev/6bc076797116 * gecko changes cherry-picked to inbound,aurora,beta,release,esr45 as Rail indicated above. We've had successful uptake monitoring tests in several releases today on bm83. I updated releasetasks/tools and restarted releaserunner on both bm83/bm85. I'm closing the bug for now, will reopen if any issues come along. Thank you Rail for all the help and patience in solving this bug!
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 36•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/bb7a2a4569d2
Comment 37•8 years ago
|
||
I am investigating a linux issue with talos (bug 1271948) where on Monday May 9th our data became unreliable- backfilling showed this was an infrastructure issue vs an in-tree change. I see this landed a few days before the issue started (I can safely ignore weekends due to low volume, we already have other patterns on weekends). :mihai, can you explain what this does on the physical machines? Does this affect the linux machines? Does this run all the time or only as part of a specific job?
Flags: needinfo?(mtabara)
Comment 38•8 years ago
|
||
These jobs don't run on the talos machines, so probably aren't responsible.
Flags: needinfo?(mtabara)
Comment 39•6 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•