Closed Bug 394498 Opened 17 years ago Closed 14 years ago

Release automation should support for mirror-replication monitoring

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: joduinn, Assigned: rail)

References

Details

(Whiteboard: [automation])

Attachments

(4 files, 12 obsolete files)

3.69 KB, patch
bhearsum
: review+
Details | Diff | Splinter Review
11.62 KB, patch
bhearsum
: review+
Details | Diff | Splinter Review
455 bytes, patch
catlee
: review+
Details | Diff | Splinter Review
827 bytes, patch
catlee
: review+
Details | Diff | Splinter Review
As part of build process, we push bits out to mirrors, and then wait until "an acceptable amount" of mirrors have picked up the bits, before we proceed to the next steps in the build/release process. In the past, "an acceptable amount" of mirrors was decided by waiting 8-12 hours. However, with bug#390201, bouncer can now tell us what % of our total mirror bandwidth has picked up the newly pushed bits, which means we can now wait typically 4-6 hours, a big saving. Currently, this is through a webpage, so still requires human interaction. bug#394494 tracks getting a command line util to do this same thing. Once this util is available, we should call this util from Bootstrap, so we can check mirror absorption as part of automation.
Priority: -- → P3
See my question in bug 394496, please; I'm confused about what Bootstrap is supposed to use this information for.
Once mirror absorption is enough to proceed, automation would send the notification email, telling QA its ok to proceed. This saves a human build engineer from manually polling bouncer, and then manually sending the same email.
Assignee: build → nobody
QA Contact: mozpreed → build
Tweaked bug dependency after closing bug#394494 as DUP.
Depends on: 408673
Assignee: nobody → joduinn
Havent had time to work on this - putting back in the pool.
Assignee: joduinn → nobody
Component: Release Engineering → Release Engineering: Future
QA Contact: build → release
Found in triage. Tweaking summary because at this point, we'd only do this for moz2-based releases, and they do not use bootstrap.
Priority: P3 → --
Summary: Bootstrap support for mirror-replication monitoring → Release automation should support for mirror-replication monitoring
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
I have been looking at r10249 from bug 408673 and it pretty much meet our needs to return a string indicating the uptake percentage for product and per o.s. It seems that it requires CSE Python library (http://wiki.osuosl.org/display/Bouncer/cse+utils) according to bug 408673#c3 I am thinking of where should this script be run to detect the proper percentage uptake. I will discuss this with Nick since he has given some thought in it (https://bugzilla.mozilla.org/show_bug.cgi?id=408673#c10) John, we are looking into getting the average uptake of the 3 main platforms for a product and then shoot the email, right?
(In reply to comment #6) > John, we are looking into getting the average uptake of the 3 main platforms > for a product and then shoot the email, right? I think we should look for a minimum number in 'available' for all product/platform combinations. Dave, how much availability do we normally want before we ship?
I generally pick a number out of the air for each release based on traffic of previous releases and how much relative traffic we think this release will take. In general for the past several releases, for a non-throttled security release which is auto-pushed to all users, I've been using 35000. We opted for 45000 for the Firefox 3.5 release, but that was probably unwarranted (it had less traffic than a usual security release). I so have Sentry pulling that number and putting it at the top of https://nagios.mozilla.org/sentry/ to make it easy to see, for the current releases. The number shown there is the *lowest* number among the different types of downloads (direct, complete, partial) for all Windows platforms. I don't even bother to look at Mac and Linux because those two are comparatively enough smaller in load that if we have enough to release Windows we have MORE than enough to release Mac and Linux... :)
bug 508965 is tracking replacing Bouncer with MirrorBrain - not sure if we still want to spend time doing this with Bouncer.
(In reply to comment #9) > bug 508965 is tracking replacing Bouncer with MirrorBrain - not sure if we > still want to spend time doing this with Bouncer. That is, bug 482160.
I would venture to say we're a month or two out from setting up MirrorBrain, at *least*. Probably longer. We need a web UI to manage it, and MirrorBrain doesn't have one at all right now, so our webdev is going to be writing that part. The scope of what we wanted in the web UI isn't going to happen overnight, and we'll need some time to play around with it and test it before we outright switch.
Priority: P3 → P2
Priority: P2 → P3
It is half of the quarter and I have not even touched this goal and the mobile-L10n infra still needs few more weeks. Let me know what to do with this.
Havent had time to work on this - putting back in the pool.
Assignee: armenzg → nobody
Component: Release Engineering → Release Engineering: Future
Priority: P3 → --
Moving to Future until we have time for this. :-(
Priority: -- → P3
I'm planning to resolve this bug this quarter.
Assignee: nobody → bhearsum
Whiteboard: [automation]
The dependent bugs here are closed now, but we're still waiting for them to be rolled out before we can start work here. Frederic tells me this should be happening in the next couple of weeks.
Moving over to Rail.
Assignee: bhearsum → rail
Component: Release Engineering: Future → Release Engineering
Assignee: rail → raliiev
Depends on: 543452
Depends on: 545642
Attached file Prototype (obsolete) —
Attached is the first prototype. It's implemented as change source for buildbot. Comments are welcome.
Attachment #427286 - Attachment mime type: text/x-python → text/plain
Attached file Implementation (obsolete) —
Thanks for the comments on IRC. Current version contains is available at: http://bitbucket.org/rail/tuxedo-client/src (attaching the current tip). The has been split into 3 parts: 1. TuxedoClient library 2. tuxedo-cli (CLI script) 3. TuxedoPoller buildbot change source Example usage is included in each part. Comments are welcome.
Attachment #427286 - Attachment is obsolete: true
Attached file tuxedo-add (obsolete) —
Updated version. * Added --dry-run parameter to dump URLs and POST data without real HTTP requests. * Added --platform parameter, so Solaris can be passed as well (multiple, default set to --platform win32 --platform linux --platform macosx) * Path templates moved to the config file (--config). Example config embedded. Configure the file to get any FTP layout.
Attachment #427759 - Attachment is obsolete: true
Attached file tuxedo-add (obsolete) —
Attachment #430322 - Attachment is obsolete: true
Attached file test-tuxedo-add (obsolete) —
Attached is a simple bash script to test tuxedo-add script. After submitting entries to Tuxedo (takes ~15 min here) it tries to download the list of the submitted files and check if the files exist (HTTP HEAD). Staging Tuxedo account is required. The scripts don't try to delete products, this should be done manually for safety.
Depends on: 550510
Comment 20 and the following are not related to this bug (mess with bug 372746). Going to reupload all attachments.
Attached file lib part (obsolete) —
Attachment #430324 - Attachment is obsolete: true
Attachment #430331 - Attachment is obsolete: true
Attached file buildbotcustom/changes/tuxedopoller.py (obsolete) —
Long time without any touch, back to pool.
Assignee: rail → nobody
Priority: P3 → --
What's left do do here?
(In reply to comment #27) > What's left do do here? IIRC, 1) Revise the code against current Tuxedo REST API 2) drop buildbot/bouncer platform mapping code, reuse buildbotcustom.common functions 3) add a builder which email the results to r-d@
This is a P5 IMHO.
Priority: -- → P5
Assignee: nobody → rail
Ideas from catlee: 1) send email to r-d when uptake is enough for testing (10k) 2) send email to r-d when uptake is enough for release (45k?) 3) run backupsnip
Priority: P5 → P3
(In reply to comment #30) > Ideas from catlee: > > 1) send email to r-d when uptake is enough for testing (10k) > 2) send email to r-d when uptake is enough for release (45k?) > 3) run backupsnip Yeah, I'm thinking of something like a "push to mirrors" builder that: - submits bouncer entries - pushes to mirrors - send email to r-d when uptake is enough for testing (10k) - send email to r-d when uptake is enough for release (45k?) - run backupsnip
Blocks: 478420
Attached file Triggerable poller (obsolete) —
Attached implements the approach of triggerable poller we talked on IRC: * TuxedoUptakeClient simplified, we don't use uptake values per OS nor per installer, complete/partial mar. We use the minimum value of all products for all platforms. * push to mirrors step (bug 540598) which already has revision, repo and release_config properties (required to know how to push to mirrors) triggers the scheduler and passes these properties. * trigger() fires a deferred which gets release config from HG repo. After populating needed variables (version, oldVersion, productname, etc.), trigger() runs poll() in a loop. After a timeout (pollTimeout, 12 hours would be enough, IMO) polling will be stopped to prevent undead pollers. * poll() gets the current uptake. If the uptake is good enough, it calls Triggerable.trigger to trigger the corresponding builders and stops the loop. * In my test configuration I have 3 builders and 2 schedulers. 1st scheduler fires "ready_for_qa_builder" (which supposed to be a dummy builder with an email notification) and "final_verification" (already exists in our configuration). 2nd one fires dummy "ready_for_release_builder" builder. c['schedulers'].append(TriggerBouncerCheck( name="ready_for_qa", builderNames=["ready_for_qa_builder", "final_verification"], min_uptake=qa_uptake, pollTimeout=30, username=None, password=None)) c['schedulers'].append(TriggerBouncerCheck( name="ready_for_release", builderNames=["ready_for_release_builder"], min_uptake=release_uptake, pollTimeout=60, username=None, password=None)) # Note copy_properties parameter f1 = factory.BuildFactory() f1.addStep(trigger.Trigger(schedulerNames=['ready_for_qa', 'ready_for_release'], copy_properties=['revision','repo', 'release_config'])) Comments are welcome. At the moment the following variables are semi-hardcoded (require reconfig): addMARs, min_uptake, username, password. IMO, the only variable we may want to be version specific is min_uptake (may be lower for betas). If we want it to be version specific (and "reconfigless"), we need to add 2 variables to release_config.py (ready_for_qa_uptake, ready_for_release_uptake) and pass another variable to the new triggerable poller which tell us which variable to use for a particular instance. Not sure if it's worth to implement.
Attachment #433531 - Attachment is obsolete: true
Attachment #433532 - Attachment is obsolete: true
Attachment #491505 - Flags: feedback?(dustin)
Attachment #491505 - Flags: feedback?(catlee)
Attachment #491505 - Flags: feedback?(bhearsum)
From our 1x1 with Chris: * use getPage instead of urllib * it would be great to have a generic triggerable poller class which accepts pollInterval and polling_fn parameters (at least)
Comment on attachment 491505 [details] Triggerable poller Nice approach. As you mentioned already, using getPage and getting rid of threading is probably the right way to go here. Also, not sure if lxml is available on all our masters.
Attachment #491505 - Flags: feedback?(catlee) → feedback+
Attached file Triggerable poller (obsolete) —
(In reply to comment #33) > * use getPage instead of urllib Done. > Also, not sure if lxml is available on all our masters. xml.dom.minidom used instead.
Attachment #491505 - Attachment is obsolete: true
Attachment #491818 - Flags: feedback?(dustin)
Attachment #491818 - Flags: feedback?(bhearsum)
Attachment #491505 - Flags: feedback?(dustin)
Attachment #491505 - Flags: feedback?(bhearsum)
Comment on attachment 491818 [details] Triggerable poller def get_release_uptake(tuxedoServerUrl, productName, version, oldVersion=None, addMARs=True, username=None, password=None): bouncerProduct = '%s-%s' % (productName, version) bouncerCompleteMARProduct = '%s-Complete' % bouncerProduct bouncerPartialMARProduct = '%s-Partial-%s' % (bouncerProduct, oldVersion) d = [] d.append(TuxedoUptakeClient(tuxedoServerUrl=tuxedoServerUrl, productName=bouncerProduct, username=username, password=password).getUptake()) In general, 'd' is a deferred, so I would prefer to see this called 'dl'. Also, it looks like you're appending TuxedoUptakeClients, rather than deferreds that one of their methods return. Can you set up a class method instead? dl.append(TuxedoUptakeClient.getUptake(...)) As a further stylistic note, it looks like you've defined a class here merely as a container for several async functions. It's much cleaner to define such a thing with nested methods: def get_tuxedo_uptake(..): d = defer.succeed(None) def getUptakePage(_): ... d.addCallback(getUptakePage) def calculateUptake(page): ... d.addCallback(calculateUpake) return d this has the advantage of nesting a sequence of operations into a function, as expected, and those operations execute more-or-less in lexical order. You can write @d.addCallback def calculateUptake(page): .... too, but that might be getting a bit fancy. It *is* possible to write readable Twisted code. You just need to plan ahead.
Attachment #491818 - Flags: feedback?(dustin) → feedback+
Comment on attachment 491818 [details] Triggerable poller Where is this script going to live? Please rename addMars to 'checkMars' or something similar. There's a couple things here that should go in a library: - bouncer product name generation - bouncer url generation Can you use the readReleaseConfig method from release.info?
(In reply to comment #37) > Comment on attachment 491818 [details] > Triggerable poller > > Where is this script going to live? It's not a script. It will be a part of buildbot-custom and will be running on a scheduler master. There is a simple usage of it in comment 32. I plan to put Tuxedo related functions somewhere to 'lib' and poller stuff to buildbot-custom/schedulers. I've used all-in-one-file approach just to simplify local testing. > Please rename addMars to 'checkMars' or something similar. I reused this name from tuxedo-add.py, but in the current context we can use checkMars, sure. > There's a couple things here that should go in a library: > - bouncer product name generation > - bouncer url generation Sure. Probably we can use tool's lib for this. > Can you use the readReleaseConfig method from release.info? It requires master side hg checkout, so we may have race conditions. The current implementation is very compact and works in memory. I'll post an updated version tomorrow. Thanks for the suggestions.
Attached patch Triggerable poller (obsolete) — Splinter Review
* Made on top of changes from 540598: push_to_mirrors_factory triggers mirror monitoring scheduler * Minimum uptakes set to 10K and 45K * Tuxedo functions moved to util/tuxedo.py * Not sure where to put credentialsFromFile (and maybe rename it) * Dummy builder requires an idle slave to run I tested the patch in staging against an example xml file protected by .htaccess. It worked pretty smooth. This patch and the patch from bug 588150 won't work without fixing external imports introduced in bug 588150.
Attachment #491818 - Attachment is obsolete: true
Attachment #493259 - Flags: review?(bhearsum)
Attachment #491818 - Flags: feedback?(bhearsum)
(In reply to comment #38) > (In reply to comment #37) > > Can you use the readReleaseConfig method from release.info? > > It requires master side hg checkout, so we may have race conditions. The > current implementation is very compact and works in memory. Can you expand on this? I don't understand...
Comment on attachment 493259 [details] [diff] [review] Triggerable poller I'd feel better about the tuxedo username and password being passed at config time. Since this is master side I don't think there's any reason we need to read it at runtime. I think util.tuxedo belongs in build/tools/lib/python, not buildbotcustom. We already depend on tools for release masters, so that shouldn't be a big deal. This is mostly there!
Attachment #493259 - Flags: review?(bhearsum) → review-
(In reply to comment #40) > > It requires master side hg checkout, so we may have race conditions. The > > current implementation is very compact and works in memory. > > Can you expand on this? I don't understand... Triggerable mirror monitoring code runs on masters and may have more than one instances. Unlike builders, schedulers (this one is a scheduler) have no unique work directory per instance. We're going to run more than one instances of this scheduler even if case if we support 1 branch: one for minimum (ready for qa) uptake and the second one for maximum (ready for release) uptake. If we want to reuse the mentioned code and use local hg clones, we have to keep these clones in unique directories (a directory per instance) to prevent a mess with updating the files (hg up -r $TAG), which implies additional coding. The current implementation is very simple and straightforward, IMHO.
Attached patch Triggerable poller (obsolete) — Splinter Review
(In reply to comment #41) > I'd feel better about the tuxedo username and password being passed at config > time. Since this is master side I don't think there's any reason we need to > read it at runtime. I used the following scenario. Is it OK? import BuildSlaves # time to move to passwords.py? ...... mirror_scheduler1 = TriggerBouncerCheck( ...... password=BuildSlaves.tuxedoPassword, ...... > I think util.tuxedo belongs in build/tools/lib/python, not buildbotcustom. We > already depend on tools for release masters, so that shouldn't be a big deal. Moved. See the next patch. > This is mostly there! Yay!
Attachment #493259 - Attachment is obsolete: true
Attachment #493376 - Flags: review?(bhearsum)
Attachment #493377 - Flags: review?(bhearsum)
(In reply to comment #42) > (In reply to comment #40) > > > It requires master side hg checkout, so we may have race conditions. The > > > current implementation is very compact and works in memory. > > > > Can you expand on this? I don't understand... > > Triggerable mirror monitoring code runs on masters and may have more than one > instances. Unlike builders, schedulers (this one is a scheduler) have no unique > work directory per instance. We're going to run more than one instances of this > scheduler even if case if we support 1 branch: one for minimum (ready for qa) > uptake and the second one for maximum (ready for release) uptake. > > If we want to reuse the mentioned code and use local hg clones, we have to keep > these clones in unique directories (a directory per instance) to prevent a mess > with updating the files (hg up -r $TAG), which implies additional coding. > > The current implementation is very simple and straightforward, IMHO. Ah, ok. I was proposing the alternative of reading the config from the master directory, but that has the disadvantage of not necessarily getting the tagged version. This is fine, then.
(In reply to comment #43) > Created attachment 493376 [details] [diff] [review] > Triggerable poller > > (In reply to comment #41) > > I'd feel better about the tuxedo username and password being passed at config > > time. Since this is master side I don't think there's any reason we need to > > read it at runtime. > > I used the following scenario. Is it OK? > > import BuildSlaves # time to move to passwords.py? > ...... > mirror_scheduler1 = TriggerBouncerCheck( > ...... > password=BuildSlaves.tuxedoPassword, > ...... Yeah, that's fine.
Attachment #493376 - Flags: review?(bhearsum) → review+
Attachment #493377 - Flags: review?(bhearsum) → review+
Depends on: 615097
Attached patch Triggerable poller (obsolete) — Splinter Review
Refreshed patch. Carrying r+.
Attachment #493376 - Attachment is obsolete: true
Attachment #494342 - Flags: review+
Added ready_for_qa, ready_for_release and final verification to notify_builders.
Attachment #494342 - Attachment is obsolete: true
Attachment #494347 - Flags: review?(bhearsum)
Attachment #494347 - Flags: review?(bhearsum) → review+
Comment on attachment 494347 [details] [diff] [review] Triggerable poller I imported the older patch accidentally. :( I backed out the commit and imported the proper one. http://hg.mozilla.org/build/buildbotcustom/rev/9c48a2f4a700
Preproduction tests fail due to missing variables in BuildSlaves.py.template.
Attachment #497527 - Flags: review?(catlee)
Attachment #497527 - Flags: review?(catlee) → review+
Comment on attachment 494347 [details] [diff] [review] Triggerable poller I've backed out this patch because it requires tools/lib/python in PYTHONPATH for every master, including those which don't use this scheduler. As a possible solution I can move this scheduler into a separate file, so only masters used for releases (they have tools in PYTHONPATH already) use this scheduler. Other solution would be moving util.tuxedo from tools to buildbotcustom.
Attachment #494347 - Flags: checked-in+ → checked-in-
(In reply to comment #54) > I've backed out this patch because it requires tools/lib/python in PYTHONPATH > for every master, including those which don't use this scheduler. Hrm, now, after a backout of the backout, I run into exactly this problem on my master. Any way we can make things smoother, i.e. not failing with: File "/tools/buildbotcustom/buildbotcustom/scheduler.py", line 20, in <module> from util.tuxedo import get_release_uptake ImportError: No module named util.tuxedo ? Where does util.tuxedo live right now and why isn't it in buildbotcustom in the first place?
(In reply to comment #55) > Where does util.tuxedo live right now and why isn't it in buildbotcustom in the > first place? Ooops, sorry for late reply. util.tuxedo lives in tools repository. We've added tools/lib/python to PYTHONPATH environment for all master. Note that reconfig doesn't work if you change PYTHONPATH, you need a restart or a temporary symlinks of needed module directories to existing PYTHONATH (builbotcustom for example).
Depends on: 620641
Bouncer API backend (Tuxedo) uses https for the REST API URLs, what requires Python OpenSSL module installed on builder masters. IIRC, some of the masters already has this module installed. Bug 620641 filed
Priority: P3 → P2
A silly mistake. I should have used "available" instead of "total"...
Attachment #503801 - Flags: review?(catlee)
Attachment #503801 - Flags: review?(catlee) → review+
Attachment #503801 - Flags: checked-in+
No longer blocks: 478420
oldbugs--!
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Verified during the Firefox 4.0b11 release cycle.
Status: RESOLVED → VERIFIED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: