Closed Bug 1148424 Opened 8 years ago Closed 8 years ago

don't block cdntest update channels on antivirus completion

Categories

(Release Engineering :: Release Automation: Other, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

Details

Attachments

(2 files)

We discussed this at length on release-drivers, here's the summary:
* Making cdntest updates available prior to AV completion would let QE run tests in parallel with it, which shaves off time in the critical path.
* We still want to block shipping releases on running antivirus.
* If antivirus has a failure after we pushed to cdn, one of two things will happen:
** It will be a false positive, as it usually is. We'll want to address that as usual (get the file whitelisted by the vendors).
** It will a real failure. This would most likely indicate some sort of failure of security, and we probably won't be able to ship anything until we get to the bottom of it. The fact that we've put files on the cdn already doesn't really matter at this point.

Once completed, Betas, beta-cdntest for RCs, and ESR releases should all end up ready for cdntest testing much sooner than before. release-cdntest will no longer be blocked on antivirus, but it will still wait for the "push to mirrors" step to be run, which doesn't happen automatically. We will have the option of making it available much sooner than in the past though.
It's a little tough to verify this because the scheduler graphs I generate don't work with triggered builders. But, I'm pretty confident this does the trick. ready-for-cdntest is triggered by the start_uptake_monitoring builder, which is triggered by the TriggerBouncerCheck scheduler ("uptake_check"). That has potential upstreams of push_to_mirrors (if requiresMirrors is True), and updates (if enabled). With this patch moving push_to_mirrors as _always_ downstream of updates, it means that antivirus is never in this chain of events.

I also fixed a bug that was recently introduced where Fennec schedulers end up depending on the Firefox antivirus builder because of bad naming. These two commits might make this patch easier to review:
https://github.com/mozilla/build-buildbotcustom/commit/234eced6c5b2a3a72171ea4123902c7a33d2f496
https://github.com/mozilla/build-buildbotcustom/commit/337b0588825f77670800117d25137521748834f8

Here's the before and after scheduler graphs, in case it helps:
http://people.mozilla.org/~bhearsum/beforebug1148424/
http://people.mozilla.org/~bhearsum/afterbug1148424/
Attachment #8584657 - Flags: review?(rail)
Attachment #8584657 - Flags: review?(rail) → review+
Planning to land this right after 38.0b1 ships to avoid any potential risk to important releases.
(In reply to Ben Hearsum [:bhearsum] from comment #0)
> * If antivirus has a failure after we pushed to cdn, one of two things will
> happen:
> ** It will be a false positive, as it usually is. We'll want to address that
> as usual (get the file whitelisted by the vendors).
> ** It will a real failure. This would most likely indicate some sort of
> failure of security, and we probably won't be able to ship anything until we
> get to the bottom of it. The fact that we've put files on the cdn already
> doesn't really matter at this point.

I'd very much want to origin-delete and CDN-purge all files in a directory firefox/releases/<infected-version>/ if we get an AV failure, since being in firefox/releases/ implies we've released it to the outside world. No reason that can't happen in parallel to getting to the bottom of it though, and repushing is cheap if the files aren't changing (aka flaky CDN purge is not an issue).
Comment on attachment 8584657 [details] [diff] [review]
don't block anything except "ready for release" on antivirus

Pushed this, should go to production today. 37.0.1 is starting soon, and this may help it ship faster.
Attachment #8584657 - Flags: checked-in+
Hard to tell if this worked or not because 38.0b2's cdntest channel push was delayed for other reasons. 38.0b3 will probably be a proper verification, leaving open for now.
I think there's a bug here. Previously av took a while and then the push happened, now there isn't time for checksums to run before the push starts. In 38.0b3 both checksums and check_permissions happened after the push was finished so we don't have SUMS files in http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/38.0b3/. We also hit a file permission error, which looks like it happened to check right between the SUMs file upload and chmod. This may have been a long standing issue and we were just unlucky to hit it at the same time.
Before I fixed this up the dir listing was
 nthomas@upload1:/pub/mozilla.org/firefox/releases/38.0b3$ ls
 linux-i686  linux-x86_64  mac  source  update  win32  win64
Verified to be working in 38.0b3. beta-cdntest became available ~6min after beta-localtest, which was hours before AV finished.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
(In reply to Nick Thomas [:nthomas] from comment #7)
> I think there's a bug here. Previously av took a while and then the push
> happened, now there isn't time for checksums to run before the push starts.
> In 38.0b3 both checksums and check_permissions happened after the push was
> finished so we don't have SUMS files in
> http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/38.0b3/. We also hit
> a file permission error, which looks like it happened to check right between
> the SUMs file upload and chmod. This may have been a long standing issue and
> we were just unlucky to hit it at the same time.

Ack, sorry for not reading comments before closing the bug.

I'll look into this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Nick Thomas [:nthomas] from comment #7)
> I think there's a bug here. Previously av took a while and then the push
> happened, now there isn't time for checksums to run before the push starts.
> In 38.0b3 both checksums and check_permissions happened after the push was
> finished so we don't have SUMS files in
> http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/38.0b3/. We also hit
> a file permission error, which looks like it happened to check right between
> the SUMs file upload and chmod. This may have been a long standing issue and
> we were just unlucky to hit it at the same time.

The file permission thing is interesting. We've previously seen races that cause failures in the AV builder (bug 848351), but those happen because "checksums" uploads mid-scan. I'm guessing that check_permissions can only his that same race condition...
Alright, I think I've got this sorted. I've got new graphs up in http://people.mozilla.org/~bhearsum/afterbug1148424-round2/. The Firefox beta and release+beta ones are awfully difficult to read, sorry about that.

Basically, we're now blocking on all "updates" builders as well as check permissions and checksums. I think this actually fixed another potential bug, where you'd fire push_to_mirrors twice if you had multiple channels and enableAutomaticPushToMirrors enabled, because you'd have two "updates_done" builders which both had push_to_mirrors as a downstream.

Doing so let me get rid of the "post_update_builders" list, because it only contained update_verify_builders now.

The graphs look sane to me now...the beta releases have push_to_mirrors downstream of updates, check_permissions, and checksums. The other releases have no upstreams for push_to_mirrors (because it's manually triggered), which leaves checksums and check_permissions with no downstreams. Technically, we should make these upstreams to "ready for release", but that's too much churn for this fix IMO. I'd like to do it as part of bug 1150162, though.
Attachment #8590876 - Flags: review?(nthomas)
Comment on attachment 8590876 [details] [diff] [review]
fix push to mirrors dependencies

Review of attachment 8590876 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, I think ;-)  I resorted to braindump/buildbot-related/dump_master_json.py with a hack to include upstream builders to get something diffable, see http://people.mozilla.org/~nthomas/graph-diff.txt. Hoping that templates for TC task graphs will easier to untangle.

::: process/release.py
@@ +1861,2 @@
>              ))
> +    

Nit, stray whitespace.
Attachment #8590876 - Flags: review?(nthomas) → review+
(In reply to Nick Thomas [:nthomas] from comment #13)
> Comment on attachment 8590876 [details] [diff] [review]
> fix push to mirrors dependencies
> 
> Review of attachment 8590876 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good, I think ;-)  I resorted to
> braindump/buildbot-related/dump_master_json.py with a hack to include
> upstream builders to get something diffable, see
> http://people.mozilla.org/~nthomas/graph-diff.txt. Hoping that templates for
> TC task graphs will easier to untangle.

Oh, wow - this is much more diffable!
Comment on attachment 8590876 [details] [diff] [review]
fix push to mirrors dependencies

Landed with the whitespace fixed.
Attachment #8590876 - Flags: checked-in+
Firefox 38.0b5 validated this, finally. antivirus blocked on both checksums and check_permissions, and all three jobs passed \o/
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.