Closed Bug 1389312 Opened 4 years ago Closed 4 years ago

Make releases with stale or no release_eta publish to balrog using immediate scheduled_changes

Categories

(Release Engineering :: Release Automation: Other, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: tomprince)

References

Details

(Whiteboard: [releaseduty])

Attachments

(3 files)

http://mozilla-release-logs.s3.amazonaws.com/mozilla-release/firefox-55.0.1/build2/firefox_mozilla-release_publish_balrog-all-x95sg3qHTPGGYfBwl7Muvw-0

16:40:14     INFO -  Balrog request to https://aus4-admin.mozilla.org/api/rules/145
16:40:14     INFO -  Data sent: {'data_version': 153, 'mapping': 'Firefox-55.0.1-build2'}
16:40:14     INFO -  "POST /api/rules/145 HTTP/1.1" 400 188
16:40:14     INFO -  Caught HTTPError: {"status": 400, "exception": "This change requires signoff, it cannot be done directly. No Signoffs given", "type": "about:blank", "detail": "SignoffRequiredError", "title": "Bad Request"}

This is like bug 1388330 in that we're using the old rule change API and have enabled signoffs in the meantime. Separate bug because likely a releasetasks fix.
Priority: -- → P1
Whiteboard: [releaseduty]
Assignee: nobody → rgarbas
I believe this is broken because release_eta wasn't specified in ship-it.
If this is mandatory, we should either populate a default release_eta, or require release_eta in ship-it.
See Also: → 1395128
Assignee: rgarbas → nobody
Summary: Firefox X.0.Y releases fail in 'publish balrog' task → 'publish balrog' task fails if release_eta not set
I don't see the correlation between comment 0 and comment 1. Per the logs, I agree with comment 0, we're not allowed to modify rules directly.

I think we're now good. It's been several releases where release signoffs were correctly sent to Balrog. Hence, this bug is valid anymore. Do you agree, Nick?
Flags: needinfo?(nthomas)
I strongly suspect that RelMan always set a release eta now, but IMO that's not always the right thing to do.

In email I suggested that we only know an eta for scheduled releases like 'version X.0 on release', and sometimes X.0b1. Anything else (beta 2+, esr, X.0.Y release, Thunderbird) we ship when we're ready, and don't know the eta when starting the release. In those cases RelMan have to guess late enough that Balrog doesn't complain about about the eta being in the past, and not so late that we signoff and then it takes a long while to go live (or RelEng has a manual step to change the when on the scheduled change).

We also have an issue where the code only uses the scheduled change endpoints if release eta is set, otherwise it uses the old API. This doesn't make sense when all our Firefox changes require signoff.

My proposal was:
1, release eta shouuld only be set in ship-it for Firefox X.0 releases, when we know we're going to ship at 8am Pacific (or w/e). Otherwise we leave it blank
2, we change releasetasks and the balrog script so that
 a, it always submit scheduled changes
 b, if release eta isn't specified, or in the past, we use now + 2 minutes

RelMan were on board with this. Does it make sense to you ?
Flags: needinfo?(nthomas)
Thank you for the clarification, Nick! I understand what's going on. I updated the title to reflect the proposal.
Summary: 'publish balrog' task fails if release_eta not set → Make releases without release_eta published to balrog thanks to immediate scheduled_changes
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #3)
> I strongly suspect that RelMan always set a release eta now, but IMO that's
> not always the right thing to do.
> 
> In email I suggested that we only know an eta for scheduled releases like
> 'version X.0 on release', and sometimes X.0b1. Anything else (beta 2+, esr,
> X.0.Y release, Thunderbird) we ship when we're ready, and don't know the eta
> when starting the release. In those cases RelMan have to guess late enough
> that Balrog doesn't complain about about the eta being in the past, and not
> so late that we signoff and then it takes a long while to go live (or RelEng
> has a manual step to change the when on the scheduled change).
> 
> We also have an issue where the code only uses the scheduled change
> endpoints if release eta is set, otherwise it uses the old API. This doesn't
> make sense when all our Firefox changes require signoff.
> 
> My proposal was:
> 1, release eta shouuld only be set in ship-it for Firefox X.0 releases, when
> we know we're going to ship at 8am Pacific (or w/e). Otherwise we leave it
> blank
> 2, we change releasetasks and the balrog script so that
>  a, it always submit scheduled changes
>  b, if release eta isn't specified, or in the past, we use now + 2 minutes
> 
> RelMan were on board with this. Does it make sense to you ?

Found in triaging: FWIW, from outside of releaseduty - this makes a lot of sense ;-)
Assignee: nobody → mozilla
Comment on attachment 8921553 [details]
Bug 1389312: When pushing updates, always submit balrog rule changes as scheduled-changes.

https://reviewboard.mozilla.org/r/192564/#review197856

Seems promising but unfortunately we can't land this as it is.

I tried running it in Firefox staging with release_eta set to null and got https://archive.mozilla.org/pub/firefox/tinderbox-builds/jamun-noarch/release-jamun-firefox_updates-bm70-build1-build12.txt.gz. My guess is that Balrog doesn't accept aliases for signoffs and sends back that 400, ie bug 1374638, but that's what tools/scripts/build-promotion/balrog-release-pusher.py is given based on https://hg.mozilla.org/projects/jamun/file/default/testing/mozharness/configs/releases/dev_updates_firefox_beta.py (staging config!).

For Thunderbird the sequence of jobs is l10n repacks -> updates -> update verify. That's a blocking sequence now but will become async when ReleasePusher() starts using ReleaseScheduler() to modify the test channels. The scheduled change could take up to 3 minutes to be enacted (two plus up to 1 minute more for the agent to come along), but the update verify jobs are scheduled immediately after the updates job is done. So Balrog will return stale information for a short window, and I'm expecting we'll have to retrigger update verify jobs for spurious failures. You could work around this by adding a delay to tools/scripts/updates/balrog-release-pusher.py

Both of these problems are the result of using signoffs to modify the test channels, when one isn't required there. Any ideas to avoid that ?

::: lib/python/balrog/submitter/cli.py:509
(Diff revision 1)
>  
>      def run(self, productName, version, build_number, rule_ids, backgroundRate=None):
> -        name = get_release_blob_name(productName, version, build_number,
> -                                     self.suffix)
> -        for rule_id in rule_ids:
> -            data = {"mapping": name}
> +        when = arrow.now().shift(minutes=2)
> +        self.scheduler.run(
> +            productName=productName,
> +            versoin=version,

Looks like a typo here
Attachment #8921553 - Flags: review?(nthomas) → review-
FYI: I backed this out of https://hg.mozilla.org/users/stage-ffxbld/tools for now to unbreak staging releases. Feel free to reland when it's ready to be tested again.
I don't think I'lll get a chance to look at this week. I plan to pick it up again next week.
Comment on attachment 8921553 [details]
Bug 1389312: When pushing updates, always submit balrog rule changes as scheduled-changes.

https://reviewboard.mozilla.org/r/192564/#review197856

Based on the above, balrog-release-shipper.py (in two versions) is the only thing that pushes to release channels (and so require signoff). So we can just update them to use scheduled changes. I've changed ReleaseScheduer to default to 2 minutes if not passed a time.
Comment on attachment 8921553 [details]
Bug 1389312: When pushing updates, always submit balrog rule changes as scheduled-changes.

https://reviewboard.mozilla.org/r/192564/#review206226

::: lib/python/balrog/submitter/cli.py:528
(Diff revision 2)
> -    def run(self, productName, version, build_number, rule_ids, when, backgroundRate=None):
> +    def run(self, productName, version, build_number, rule_ids, when=None, backgroundRate=None):
>          name = get_release_blob_name(productName, version, build_number,
>                                       self.suffix)
> +
> +        if when is not None:
> +            when = arrow.get(when)

Very close! Could we please test for `when` already in the past, and also use now + 2 min for that case.
Attachment #8921553 - Flags: review?(nthomas) → review-
Comment on attachment 8921553 [details]
Bug 1389312: When pushing updates, always submit balrog rule changes as scheduled-changes.

https://reviewboard.mozilla.org/r/192564/#review208010

Tested fine in staging with https://github.com/mozilla-releng/releasetasks/pull/296. I'd like to land them together.
Attachment #8921553 - Flags: review?(nthomas) → review+
Comment on attachment 8921553 [details]
Bug 1389312: When pushing updates, always submit balrog rule changes as scheduled-changes.

https://hg.mozilla.org/build/tools/rev/af7058b4e00bf49bfa37cbd0f9de813448407911
Attachment #8921553 - Flags: checked-in+
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #14)
> Tested fine in staging with
> https://github.com/mozilla-releng/releasetasks/pull/296. I'd like to land
> them together.

https://github.com/mozilla-releng/releasetasks/commit/17b85980223e2f7e4cb2c8468f1ac8e60c7a77a2 and deployed to staging and prod.
Removes these buildbot builders:
  release-{jamun,maple,mozilla-beta}-{devedition,firefox}_publish_balrog
  release-mozilla-{release,esr52}-firefox_publish_balrog
while leaving alone
  release-{jamun,maple,mozilla-beta}-{devedition,firefox}_schedule_publishing_in_balrog
  release-mozilla-{release,esr52}-firefox_schedule_publishing_in_balrog

Thunderbird never had two sets of builders, and the code path of the existing builder changes in the tools patch.
Attachment #8931982 - Flags: review?(rail)
Attachment #8931982 - Flags: review?(rail) → review+
I've moved the tags THUNDERBIRD_58_0b1_BUILD3 & THUNDERBIRD_58_0b1_RELEASE from tools rev c43e705ae164 to af7058b4e00b to test this fix. The only changes in that range are the update verify/patcher config for that release. 

The release-comm-beta-update_shipping_beta builder succeeded in scheduling a change with that. It sent email with subject
  [release] Thunderbird 58.0b1 build3: Updates available on beta
which is incorrect. The templates for those messages are in https://dxr.mozilla.org/build-central/source/buildbot-configs/mozilla/release_templates/update_shipping_beta_success and friends.
Should also fix up the Firefox email message, generated at https://github.com/mozilla-releng/releasetasks/blob/master/releasetasks/templates/notification/email_release_drivers_task.yml.tmpl#L12. It sends 'updates are available' messages when release_eta is not set, which is untrue because only a change has been scheduled.

Probably it would be better to change both cases to a 'scheduled changes are ready for signoff' type message, which gets us a notification we've been missing anyway.
Comment on attachment 8933085 [details]
Bug 1389312: Change update message to indicate that updates are scheduled, rather than available.

https://reviewboard.mozilla.org/r/204058/#review209568
Attachment #8933085 - Flags: review?(nthomas) → review+
Duplicate of this bug: 1392703
Duplicate of this bug: 1395128
Summary: Make releases without release_eta published to balrog thanks to immediate scheduled_changes → Make releases with stale or no release_eta publish to balrog using immediate scheduled_changes
Duplicate of this bug: 1388330
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Oh, that slipped my memory. I haven't had a chance to examine that test failure.
Looks like it was a regression in pytest 3.3.0 that was fixed in 3.3.1. The PR has been merged at 
 https://github.com/mozilla-releng/releasetasks/commit/a759cc7ececc13e0e3f2eb8e8a1e799573ab9770

Thanks for the fixes.
You need to log in before you can comment on or make changes to this bug.