Closed Bug 421917 Opened 17 years ago Closed 17 years ago

Talos changes for improvements to staging setup

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: nthomas)

References

Details

Attachments

(2 files, 2 obsolete files)

As part of bug 419978, we've been planning to improve the virus scanning for everything that goes onto ftp.m.o. Currently new files are scanned after they are published, and there is no delay between a tinderbox pushing a build and it being downloadable. The new system will go one better and scan them before they become available, so there will be a short delay. The directories used by Talos have a special fast-path to a scanner, so that time-critical bits don't (eg) get stuck behind a huge pile of l10n hourlies. We're testing this now, but the time-to-scan-and-publish should be less than 10 minutes. The other change is that http://stage.m.o will no longer be accessible, and http://ftp.m.o should be used instead. This attachment is a stab at the changes needed. There's currently no difference between ftp.m.o and stage.m.o, so I think this should be safe to use now.
Blocks: 419978
Attached patch Use stage-old.m.o (obsolete) — Splinter Review
Alternatively, the existing staging server will be available at stage-old.m.o (once bug 421915 is fixed), so this an option to preserve the status quo while any bugs shake out.
Rob, Alice, Sorry for the late notice, the full impact on Talos only occurred to me today. We had Thursday pencilled in for this change, but need your feedback on how plausible this is. 20 question time - Did I put enough information into comment #0 ? How much testing/prep would make you comfortable with this change ? Are the patches any good ?
Blocks: 394069
No longer blocks: 419978
From my understanding of comment #0: - talos exclusively uses stage.m.o links to download builds, we'd need that to switch over the ftp.m.o - the builds will be published as available but we won't be able to download them So, for the switch to ftp.m.o I'd need a time line of when that should be done and then we can get that fixed. I'm more concerned about tinderbox publishing builds as complete/successful and them not being available for download. We've already had some problems in the past with talos attempting to download builds that aren't there and I'd like to avoid that in the future. Can I get a better idea of how the talos buildmaster could realize that a build is completed but not yet available? Ideally, we have some in between state where a build was completed/unavailable and then switched to completed/available - or we'd simply include the virus scan as part of the build and consider it incomplete without it.
Sorry, I just looked over the patches and that would be enough to switch from stage.m.o to ftp.m.o/stage-old.m.o. But, I'm still more concerned about talos seeing a build as available and then failing on attempting to download it.
I'm not sure how easy this will be to do in buildbot, but here goes. From the looks of tinderboxpoller.py, it's currently pulling quickparse.txt and knows there is something to test when the timestamp changes. With all the hourlies going into one directory, we could test the timestamp on the file. If it's later than the stamp from tinderbox (build start time) then the build has been scanned and published, and the Change should be fired. Otherwise, don't update the saved state and check again on the next poll. Once we keep 24 hours worth of hourlies (bug 291167) then there's a unique dir & file to test for (using the build start time). I guess there is a danger when Talos's 10 min poll interval + the scan lag is more than a tinderbox cycle time (eg 15 mins on Fx/Trunk/Linux), so that start times and builds get scrambled. Perhaps we can mitigate that by polling more frequently, and/or making sure the scan fast-path doesn't wait long between it's checks for new builds. Probably also good to use the uncached quickparse at (eg) http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox&quickparse=1
If we need time to get something like comment #5 into place, then the fallback plan is to have Talos builds not go thru the scanner yet. Then we'd only need to switch to ftp.m.o to keep Talos going.
It does sound like there would need to be some work on tinderboxpoller.py, along with testing/baking time to ensure that it was working correctly. We could split that into another bug and go with switching to ftp.m.o if you want to move this ahead quickly.
Sounds good.
Assignee: nobody → nrthomas
Priority: -- → P2
Attachment #308425 - Flags: review?(anodelman)
Switch the 6 machines doing 1.8branch and trunk nightlies/hourlies to push to stage-old.m.o until we can teach Talos how to cope with the scan lag.
Attachment #308427 - Attachment is obsolete: true
Attachment #309164 - Flags: review?(rhelmer)
Farmed the talos changes out to bug 422725.
Comment on attachment 309164 [details] [diff] [review] Firefox Trunk & Moz1.8 should not go thru the scanner for now Oops, these should be done in the bootstrap config.
Attachment #309164 - Attachment is obsolete: true
Attachment #309164 - Flags: review?(rhelmer)
Bootstrap changes for 1.8 branch (and 1.9 for completeness), and tinder-config.pl for trunk.
Attachment #309179 - Flags: review?(rhelmer)
Attachment #309179 - Flags: review?(rhelmer) → review+
Attachment #308425 - Flags: review?(anodelman) → review+
Comment on attachment 309179 [details] [diff] [review] [checked in] Firefox Trunk & Moz1.8 should not go thru the scanner for now - v2 Checking in release-auto-nightly/fx-moz18-nightly-bootstrap.cfg; /cvsroot/mozilla/tools/release/configs/fx-moz18-nightly-bootstrap.cfg,v <-- fx-moz18-nightly-bootstrap.cfg new revision: 1.12; previous revision: 1.11 done Checking in release-auto-nightly/fx-moz19-nightly-bootstrap.cfg; /cvsroot/mozilla/tools/release/configs/fx-moz19-nightly-bootstrap.cfg,v <-- fx-moz19-nightly-bootstrap.cfg new revision: 1.9; previous revision: 1.8 done Checking in trunk/firefox/linux/tinder-config.pl; /cvsroot/mozilla/tools/tinderbox-configs/firefox/linux/tinder-config.pl,v <-- tinder-config.pl new revision: 1.24; previous revision: 1.23 done Checking in trunk/firefox/macosx/tinder-config.pl; /cvsroot/mozilla/tools/tinderbox-configs/firefox/macosx/tinder-config.pl,v <-- tinder-config.pl new revision: 1.40; previous revision: 1.39 done Checking in trunk/firefox/win32/tinder-config.pl; /cvsroot/mozilla/tools/tinderbox-configs/firefox/win32/tinder-config.pl,v <-- tinder-config.pl new revision: 1.31; previous revision: 1.30 done
Attachment #309179 - Attachment description: Firefox Trunk & Moz1.8 should not go thru the scanner for now - v2 → [checked in] Firefox Trunk & Moz1.8 should not go thru the scanner for now - v2
Checking in master.cfg; /cvsroot/mozilla/tools/buildbot-configs/testing/talos/perfmaster/master.cfg,v <-- master.cfg new revision: 1.46; previous revision: 1.45 done
Attachment #308425 - Attachment description: Use ftp.m.o → [checked in] Use ftp.m.o
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
A "buildbot reconfig" worked fine this time.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: