Closed
Bug 1148424
Opened 9 years ago
Closed 9 years ago
don't block cdntest update channels on antivirus completion
Categories
(Release Engineering :: Release Automation: Other, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: bhearsum)
Details
Attachments
(2 files)
6.04 KB,
patch
|
rail
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
2.90 KB,
patch
|
nthomas
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
We discussed this at length on release-drivers, here's the summary: * Making cdntest updates available prior to AV completion would let QE run tests in parallel with it, which shaves off time in the critical path. * We still want to block shipping releases on running antivirus. * If antivirus has a failure after we pushed to cdn, one of two things will happen: ** It will be a false positive, as it usually is. We'll want to address that as usual (get the file whitelisted by the vendors). ** It will a real failure. This would most likely indicate some sort of failure of security, and we probably won't be able to ship anything until we get to the bottom of it. The fact that we've put files on the cdn already doesn't really matter at this point. Once completed, Betas, beta-cdntest for RCs, and ESR releases should all end up ready for cdntest testing much sooner than before. release-cdntest will no longer be blocked on antivirus, but it will still wait for the "push to mirrors" step to be run, which doesn't happen automatically. We will have the option of making it available much sooner than in the past though.
Assignee | ||
Comment 1•9 years ago
|
||
It's a little tough to verify this because the scheduler graphs I generate don't work with triggered builders. But, I'm pretty confident this does the trick. ready-for-cdntest is triggered by the start_uptake_monitoring builder, which is triggered by the TriggerBouncerCheck scheduler ("uptake_check"). That has potential upstreams of push_to_mirrors (if requiresMirrors is True), and updates (if enabled). With this patch moving push_to_mirrors as _always_ downstream of updates, it means that antivirus is never in this chain of events. I also fixed a bug that was recently introduced where Fennec schedulers end up depending on the Firefox antivirus builder because of bad naming. These two commits might make this patch easier to review: https://github.com/mozilla/build-buildbotcustom/commit/234eced6c5b2a3a72171ea4123902c7a33d2f496 https://github.com/mozilla/build-buildbotcustom/commit/337b0588825f77670800117d25137521748834f8 Here's the before and after scheduler graphs, in case it helps: http://people.mozilla.org/~bhearsum/beforebug1148424/ http://people.mozilla.org/~bhearsum/afterbug1148424/
Attachment #8584657 -
Flags: review?(rail)
Updated•9 years ago
|
Attachment #8584657 -
Flags: review?(rail) → review+
Assignee | ||
Comment 2•9 years ago
|
||
Planning to land this right after 38.0b1 ships to avoid any potential risk to important releases.
Comment 3•9 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #0) > * If antivirus has a failure after we pushed to cdn, one of two things will > happen: > ** It will be a false positive, as it usually is. We'll want to address that > as usual (get the file whitelisted by the vendors). > ** It will a real failure. This would most likely indicate some sort of > failure of security, and we probably won't be able to ship anything until we > get to the bottom of it. The fact that we've put files on the cdn already > doesn't really matter at this point. I'd very much want to origin-delete and CDN-purge all files in a directory firefox/releases/<infected-version>/ if we get an AV failure, since being in firefox/releases/ implies we've released it to the outside world. No reason that can't happen in parallel to getting to the bottom of it though, and repushing is cheap if the files aren't changing (aka flaky CDN purge is not an issue).
Assignee | ||
Comment 4•9 years ago
|
||
Comment on attachment 8584657 [details] [diff] [review] don't block anything except "ready for release" on antivirus Pushed this, should go to production today. 37.0.1 is starting soon, and this may help it ship faster.
Attachment #8584657 -
Flags: checked-in+
Comment 5•9 years ago
|
||
In production: https://hg.mozilla.org/build/buildbotcustom/rev/1f45eb189b13
Assignee | ||
Comment 6•9 years ago
|
||
Hard to tell if this worked or not because 38.0b2's cdntest channel push was delayed for other reasons. 38.0b3 will probably be a proper verification, leaving open for now.
Comment 7•9 years ago
|
||
I think there's a bug here. Previously av took a while and then the push happened, now there isn't time for checksums to run before the push starts. In 38.0b3 both checksums and check_permissions happened after the push was finished so we don't have SUMS files in http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/38.0b3/. We also hit a file permission error, which looks like it happened to check right between the SUMs file upload and chmod. This may have been a long standing issue and we were just unlucky to hit it at the same time.
Comment 8•9 years ago
|
||
Before I fixed this up the dir listing was nthomas@upload1:/pub/mozilla.org/firefox/releases/38.0b3$ ls linux-i686 linux-x86_64 mac source update win32 win64
Assignee | ||
Comment 9•9 years ago
|
||
Verified to be working in 38.0b3. beta-cdntest became available ~6min after beta-localtest, which was hours before AV finished.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 10•9 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #7) > I think there's a bug here. Previously av took a while and then the push > happened, now there isn't time for checksums to run before the push starts. > In 38.0b3 both checksums and check_permissions happened after the push was > finished so we don't have SUMS files in > http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/38.0b3/. We also hit > a file permission error, which looks like it happened to check right between > the SUMs file upload and chmod. This may have been a long standing issue and > we were just unlucky to hit it at the same time. Ack, sorry for not reading comments before closing the bug. I'll look into this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 11•9 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #7) > I think there's a bug here. Previously av took a while and then the push > happened, now there isn't time for checksums to run before the push starts. > In 38.0b3 both checksums and check_permissions happened after the push was > finished so we don't have SUMS files in > http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/38.0b3/. We also hit > a file permission error, which looks like it happened to check right between > the SUMs file upload and chmod. This may have been a long standing issue and > we were just unlucky to hit it at the same time. The file permission thing is interesting. We've previously seen races that cause failures in the AV builder (bug 848351), but those happen because "checksums" uploads mid-scan. I'm guessing that check_permissions can only his that same race condition...
Assignee | ||
Comment 12•9 years ago
|
||
Alright, I think I've got this sorted. I've got new graphs up in http://people.mozilla.org/~bhearsum/afterbug1148424-round2/. The Firefox beta and release+beta ones are awfully difficult to read, sorry about that. Basically, we're now blocking on all "updates" builders as well as check permissions and checksums. I think this actually fixed another potential bug, where you'd fire push_to_mirrors twice if you had multiple channels and enableAutomaticPushToMirrors enabled, because you'd have two "updates_done" builders which both had push_to_mirrors as a downstream. Doing so let me get rid of the "post_update_builders" list, because it only contained update_verify_builders now. The graphs look sane to me now...the beta releases have push_to_mirrors downstream of updates, check_permissions, and checksums. The other releases have no upstreams for push_to_mirrors (because it's manually triggered), which leaves checksums and check_permissions with no downstreams. Technically, we should make these upstreams to "ready for release", but that's too much churn for this fix IMO. I'd like to do it as part of bug 1150162, though.
Attachment #8590876 -
Flags: review?(nthomas)
Comment 13•9 years ago
|
||
Comment on attachment 8590876 [details] [diff] [review] fix push to mirrors dependencies Review of attachment 8590876 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, I think ;-) I resorted to braindump/buildbot-related/dump_master_json.py with a hack to include upstream builders to get something diffable, see http://people.mozilla.org/~nthomas/graph-diff.txt. Hoping that templates for TC task graphs will easier to untangle. ::: process/release.py @@ +1861,2 @@ > )) > + Nit, stray whitespace.
Attachment #8590876 -
Flags: review?(nthomas) → review+
Assignee | ||
Comment 14•9 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #13) > Comment on attachment 8590876 [details] [diff] [review] > fix push to mirrors dependencies > > Review of attachment 8590876 [details] [diff] [review]: > ----------------------------------------------------------------- > > Looks good, I think ;-) I resorted to > braindump/buildbot-related/dump_master_json.py with a hack to include > upstream builders to get something diffable, see > http://people.mozilla.org/~nthomas/graph-diff.txt. Hoping that templates for > TC task graphs will easier to untangle. Oh, wow - this is much more diffable!
Assignee | ||
Comment 15•9 years ago
|
||
Comment on attachment 8590876 [details] [diff] [review] fix push to mirrors dependencies Landed with the whitespace fixed.
Attachment #8590876 -
Flags: checked-in+
Comment 16•9 years ago
|
||
In production: https://hg.mozilla.org/build/buildbotcustom/rev/43f28d202700
Assignee | ||
Comment 17•9 years ago
|
||
Firefox 38.0b5 validated this, finally. antivirus blocked on both checksums and check_permissions, and all three jobs passed \o/
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•