Closed Bug 1425571 Opened 2 years ago Closed 2 years ago

Use taskcluster-notify for release notifications

Categories

(Release Engineering :: Release Automation: Other, defect)

defect
Not set

Tracking

(firefox-esr60 fixed, firefox60 fixed, firefox61 fixed)

RESOLVED FIXED
Tracking Status
firefox-esr60 --- fixed
firefox60 --- fixed
firefox61 --- fixed

People

(Reporter: nthomas, Assigned: nthomas)

References

Details

Attachments

(4 files, 11 obsolete files)

59 bytes, text/x-review-board-request
aki
: review+
Details
59 bytes, text/x-review-board-request
aki
: review+
Details
59 bytes, text/x-review-board-request
aki
: review+
Details
1.64 KB, patch
aki
: review+
Details | Diff | Splinter Review
... so that we can get notifications on failures/exception via pulse notify.

Yesterday Rok fixed up prod pulse notify to listen on maple but we noticed that some tasks don't have index.releases.v1... routes set like releasetasks does:
- balrog submitters like balrog-bs-android-api-16-nightly & co (all android and en-US desktop)
- beetmover-checksums-macosx64-devedition-nightly/opt & co
- push to cdns
- final verify
- update verify
- updates
Blocks: 1397773
Most of this is moving routes into job-defaults and using {product} to get back the same result. In release-updates-builder/kind.yml some routes are added.
Attachment #8938320 - Flags: review?(bhearsum)
This is much more speculative. 

taskcluster/taskgraph/transforms/task.py is the best place to start. The idea is that there are jobs which are shared between nightly and release graphs, and some which don't have any of the index or route config set right now. So define some templates and add them based on shipping_phase being set, and no obvious release routes already present. It's a bit hacky, and may catch more than intended in the en-US builds.

release-source and release-update-verify leverage the default routes by just defining release type in index.

In the dummy workers, convert to job-defaults to avoid repetition, and add index: type: release. job-defaults blows up in post-beetmover-checksums-dummy, so do it the dummy way.

I've been testing this using 
./mach taskgraph target-graph -p /path/to/braindump/taskcluster/taskgraph-diff/params/mb-promote-firefox/mb-promote-firefox.yml --json
then various inspection techniques. The patch makes sure there are routes defined on all jobs (based on the shipping_phase attribute) except for the 5 en-US desktop builds (eg build-linux-nightly/opt). I'm not sure they are all valid/sensible.

Known issues
* not enough testing! eg taskgraph-gen/taskgraph-diff
* is / valid in an index ?
* en-US builds like build-signing-linux64-nightly/opt gets a release route, probably shouldn't
* release-update-verify don't end up with product set, eg routes of   index.releases.v1.mozilla-beta.latest.None.release-update-verify-firefox-linux-2/12
Attachment #8938326 - Flags: feedback?(bhearsum)
Comment on attachment 8938320 [details] [diff] [review]
Part 1, simple fixes to exisiting routes, and updates-builder

Review of attachment 8938320 [details] [diff] [review]:
-----------------------------------------------------------------

This one looks good to me. I ran it through taskgraph-diff and the only changes were the addition of indexes to the updates builder, like:
[
    "add", 
    "release-updates-builder-firefox.task.routes", 
    [
        [
            0, 
            "index.releases.v1.mozilla-release.latest.firefox.latest.updates"
        ], 
        [
            1, 
            "index.releases.v1.mozilla-release.196059cada7070ab1e35db83a22b60389bd0c794.firefox.58_0b23.build1.updates"
        ]
    ]
]
Attachment #8938320 - Flags: review?(bhearsum) → review+
Comment on attachment 8938326 [details] [diff] [review]
Part 2, add default release routes

Review of attachment 8938326 [details] [diff] [review]:
-----------------------------------------------------------------

The code looks reasonable, modulo one comment below, but something isn't working as intended here - I don't see any release routes on some tasks (eg: final verify).

::: taskcluster/taskgraph/transforms/task.py
@@ +1221,5 @@
>  
>      for rt in task.get('routes', []):
>          routes.append(rt.format(**subs))
>  
> +    # Append default release routes when none are present, so that we get

This comment says "when none are present" but the block below uses < 2 to determine that. Is that on purpose? It might be safer to look for the entire route rather than counting.
Attachment #8938326 - Flags: feedback?(bhearsum) → feedback+
(In reply to Ben Hearsum (:bhearsum) from comment #4)
> The code looks reasonable, modulo one comment below, but something isn't
> working as intended here - I don't see any release routes on some tasks (eg:
> final verify).

I've only looked at the promote graph, so it's likely there are other tasks in push and ship which need some extra help.

> ::: taskcluster/taskgraph/transforms/task.py
> > +    # Append default release routes when none are present, so that we get
> 
> This comment says "when none are present" but the block below uses < 2 to
> determine that. Is that on purpose? It might be safer to look for the entire
> route rather than counting.

I agree that the comment isn't consistent with the code, and the check isn't very robust. TBH I wasn't sure how to avoid reconstructing the middle part of the route to do a better check.
Duplicate of this bug: 1289980
Duplicate of this bug: 1430772
In bug 1430772 RelMan mentioned that we're losing some of the failures along the way. I did some debugging and reached the same conclusion as here, then found this bug :facepalm: :) 

@nthomas: 
Are the patches from this bug still good, can we retest against latest HEAD and land? I'm happy to help if your plate is full with other stuff, just want to make sure we're getting these notifications fixed asap sincer we're live in beta with the in-tree scheduling.
Flags: needinfo?(nthomas)
Attached patch Part 2, add default routes v2 (obsolete) — Splinter Review
This isn't final but makes some progress. From looking at taskgen-gen/diff output:

What it does do:
* defines routes on all release jobs, with one exception. To be clear I haven't vetted them all, and dictdiffer makes it hard to tell
* make sure we don't use any / in index paths

What it doesn't do:
* handle firefox-push-to-cdns jobs. There are routes added to the kind.yml but they don't get substitutions from taskgraph.transforms.tasks.add_release_index_routes, while other jobs do

What it does but shouldn't do:
* The change to taskgraph.transforms.tasks.add_index_routes must be too broad, because index.v1.releases routes get added for fennec and dekstop nightly builds, jobs like 
   balrog-ach-android-api-16-nightly/opt
   beetmover-android-aarch64-nightly/opt
   beetmover-checksums-android-aarch64-nightly/opt
   beetmover-l10n-ca-android-api-16-nightly/opt
   checksums-signing-bs-android-api-16-nightly/opt
   nightly-l10n-android-api-16-nightly-10/opt
   nightly-l10n-signing-android-api-16-nightly-16/opt
* leave stray logging code
Flags: needinfo?(nthomas)
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #9)
> What it does but shouldn't do:
> * The change to taskgraph.transforms.tasks.add_index_routes must be too
> broad, because index.v1.releases routes get added for fennec and dekstop
> nightly builds, jobs like 
>    balrog-ach-android-api-16-nightly/opt
>    beetmover-android-aarch64-nightly/opt
>    beetmover-checksums-android-aarch64-nightly/opt
>    beetmover-l10n-ca-android-api-16-nightly/opt
>    checksums-signing-bs-android-api-16-nightly/opt
>    nightly-l10n-android-api-16-nightly-10/opt
>    nightly-l10n-signing-android-api-16-nightly-16/opt
> * leave stray logging code

Removing this has the opposite problem - release repacks, signing, beetmover etc don't have routes. I'm not familiar enough with the task generation to know if there is a better way to pick up the relevant jobs than shipping_phase.
This is a subset of previous part 2's, basically just re-arranging kind.yml files for some other builders.
Attachment #8938326 - Attachment is obsolete: true
Attachment #8944658 - Attachment is obsolete: true
Attachment #8945370 - Attachment description: Part, more simple fixes (dummies, final & update verify, source) → Part 2, more simple fixes (dummies, final & update verify, source)
Attached patch Part 3, beetmover-cdn fix (obsolete) — Splinter Review
This worker has a schema which accepts index and routes, just needs to pass then on to the later transforms for us to get routes and do substitutions in them.
We have some jobs that we use for nightlies on m-c and for release on beta. This is a good thing, except when they mean slightly different things in each case. I took the view that I didn't want to modify anything for try or m-c, only promote/push/ship for m-b and m-r.

We also aren't super consistent about how we set up jobs, hence the conditions on shipping_product (to ensure we catch all all jobs shared or exclusively for releases) and target_tasks_method (to ensure we're running a release action). This feels a bit 'bang it with a hammer until you get what you want'. Open to advice here.
This hasn't been tested in staging yet. Maple got a merge from m-c rather than beta, so has a version of 60.0a1, I'm not sure what our intentions are there. Had been expecting a merge from beta.

I have done a bunch of checks taskgraph-gen/diff, and will attach some files here. This was with a small patch to munge the dictdiffer output a bit - https://gist.github.com/nthomas-mozilla/fabab96b8a5b45d352aac08e7e9b9798. Wanted to reverse the order of the routes to make the diffs a bit shorter, and output json so I could read it back easily.

These graphs don't change:  {maple,mb,mc}-onpush.json,  mc-{desktop,fennec}-nightly.json, try,json.

Everything else does. Mostly it's adding two index.releases.v1 routes, with any / in the job names replaced with a -. The exceptions are the source builders, which lose
 "index.gecko.v2.mozilla-release.revision.196059cada7070ab1e35db83a22b60389bd0c794.firefox.linux64-source-opt",
 "index.gecko.v2.mozilla-release.pushlog-id.8069.firefox.linux64-source-opt",
 "index.gecko.v2.mozilla-release.latest.firefox.linux64-source-opt"
 "index.gecko.v2.mozilla-release.pushdate.2017.10.28.20171028042225.firefox.linux64-source-opt"
to gain
 "index.releases.v1.mozilla-release.196059cada7070ab1e35db83a22b60389bd0c794.firefox.59_0b3.build1.release-source-linux64-source-opt"
 "index.releases.v1.mozilla-release.latest.firefox.release-source-linux64-source-opt"

Over in the Fennec builds, like build-signing-android-api-16-nightly/opt, the jobs pick up a couple of index.releases.v1 while retaining their index.gecko.v2. dictdiffer has a hard time comparing two lists of routes, we don't actually care about the ordering but it does.
Attached file mb-ship-firefox.json.diff.xz (obsolete) —
Attached file mb-ship-fennec.json.diff.xz (obsolete) —
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #14)
> This hasn't been tested in staging yet. Maple got a merge from m-c rather
> than beta, so has a version of 60.0a1

I don't know what happened, but :mtabara and I saw 59.0b4 at the current tip[1]. Then, I went ahead and landed your 3 patches[2]. However, some missing scopes broke the promote_* tasks[3]. I backed out the patches[4]. 

[1] https://hg.mozilla.org/projects/maple/file/cd6b97fce1ec/browser/config/version_display.txt
[2] https://treeherder.mozilla.org/#/jobs?repo=maple&revision=3bef821e5950056e8d7f8c4fb8639d59f281f1ab
[3] https://public-artifacts.taskcluster.net/fluzVadXSKyPJLb1JWqmZQ/0/public/logs/live_backing.log and https://public-artifacts.taskcluster.net/Z9bjt1rGR8aKooivcsWnYw/0/public/logs/live_backing.log
[4] https://treeherder.mozilla.org/#/jobs?repo=maple&revision=41b83945f6894219ef4b3f6d69b8a8e741b2cb96
Comment on attachment 8945370 [details] [diff] [review]
Part 2, more simple fixes (dummies, final & update verify, source)

Review of attachment 8945370 [details] [diff] [review]:
-----------------------------------------------------------------

Neat cleanup overall to trim duplicate code!

::: taskcluster/ci/post-beetmover-checksums-dummy/kind.yml
@@ +23,1 @@
>           implementation: docker-worker

I wonder if we can move this under `job-defaults` here too and leave solely the `shipping_product` field as done for other tasks in this patch.
Attachment #8945370 - Flags: review+
Comment on attachment 8945374 [details] [diff] [review]
Part 3, beetmover-cdn fix

Review of attachment 8945374 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch!
Attachment #8945374 - Flags: review+
Comment on attachment 8945375 [details] [diff] [review]
Part 4, add release routes to jobs shared with nightlies when they're used for releases

Review of attachment 8945375 [details] [diff] [review]:
-----------------------------------------------------------------

I agree this type of munging is not ideal, but nor I have a better idea for now.
Sharing code logic between nightly/release graphs sort of implies some drawbacks too at some point, I suppose.

::: taskcluster/taskgraph/transforms/task.py
@@ +1261,5 @@
>          routes.append(rt.format(**subs))
>  
> +    # Append default release routes when the normal 2 aren't present, so that we get
> +    # notifications via pulse-notify, and logs in the index
> +    if len(filter(lambda r: r.startswith('index.releases.v1'), routes)) < 2:

if by any magic one of the `index.release.v1` routes ends up here wouldn't we add both entries from RELEASE_TEMPLATES, hence duplicating the original one?
Attachment #8945375 - Flags: review+
Comment on attachment 8945375 [details] [diff] [review]
Part 4, add release routes to jobs shared with nightlies when they're used for releases

Re scopes,

15:56 <aki> we have "index:insert-task:gecko.v2.maple.*" so if you s,v1,v2 it may work
15:56 <aki> s,releases.v1,gecko.v2,
15:57 <aki> otherwise we can get index:insert-task:releases.v1.{project}.* added to the role

I don't know the layout. Mike Shal worked on the indexes iirc; he may know the history of v1 vs v2. I also don't know the downstream consumers who may be watching the nightly routes that will change to release routes. Outside of these concerns, I think these diffs look good.

> +        # Some release jobs are also part of nightly graphs, and should have release routes set
> +        # when used in release case (where target_task_method corresponds to an action name)

At some point this year, we'll have promoted nightlies, which means the nightly indexes may go away? We may eventually make this per-project, maybe.

> +        if task.get('attributes', {}).get('shipping_phase') in ('promote', 'push', 'ship') and \

We may add more phases at some point. I'd rather we make this a constant somewhere and import it, e.g.

RELEASE_PHASES = ('promote', 'push', 'ship')
KNOWN_PHASES = tuple([None, 'build'] + list(RELEASE_PHASES))

if shipping_phase in RELEASE_PHASES:
    ...
pulse-notify watches for index.releases.v1 changes. We could modify that but it might open up a different can of worms. Which role did you mean for adding the insert-task scope to ?

Good point about the constant.
https://tools.taskcluster.net/auth/roles/moz-tree:level:1:* if we want try staging releases to work.
https://tools.taskcluster.net/auth/roles/moz-tree:level:3:* if we're more concerned with only having these updated with release tree info, with the caveat that this is something we will have to follow up on to get try staging releases to work.
Looks like project:releng:branch:gecko:level-1:* instead (https://tools.taskcluster.net/auth/roles/project%3Areleng%3Abranch%3Agecko%3Alevel-1%3A*). That has
  queue:route:index.gecko.v2.<..>.*
  index:insert-task:gecko.v2.<..>.*
and
  queue:route:index.releases.v1.<..>.*
but not
  index:insert-task:releases.v1.<..>.*.

I'll open a TC service bug to add the last.
Depends on: 1436258
Depends on: 1436501
I'd pushed a WIP patch to maple at https://hg.mozilla.org/projects/maple/rev/16b32573d44d45588f477e6559f87064aa1c5d43, and with the scopes fixed (bug 1436258, thanks bhearsum) we were able to run the promote action. However a lot of papertrail alerts came in with this form:

Feb 07 19:54:37 heroku-release-notifications app/worker.1: 2018-02-07 19:54:37,551 - pulsenotify.consumer - ERROR - Task(id=WL9bHD2dRg64WTHEcXN3pw, status=task-completed) has no notifications section.
Feb 07 19:54:38 heroku-release-notifications app/worker.1: KeyError: 'notifications'
Feb 07 19:54:38 heroku-release-notifications app/worker.1: raise NoNotificationConfigurationError() from ke
Feb 07 19:54:38 heroku-release-notifications app/worker.1: pulsenotify.consumer.NoNotificationConfigurationError

So I'd missed that we need a task['extra']['notifications'] for pulse-notify, and for a firefox ship action 2174 of the 4750 are missing that. Backed out from maple at 704219569302.
We're going to use taskcluster-notify instead, which allows us to deprecate pulse-notify (probably after ESR52 dies). We don't need to set any index.releases.v1 routes to use tc-notify.
Summary: Add index.releases.v1 routes to in-tree scheduled tasks → Use taskcluster-notify for release notifications
Attached file Obsoleting older attachments (obsolete) —
Attachment #8938320 - Attachment is obsolete: true
Attachment #8945370 - Attachment is obsolete: true
Attachment #8945374 - Attachment is obsolete: true
Attachment #8945375 - Attachment is obsolete: true
Attachment #8945379 - Attachment is obsolete: true
Attachment #8945380 - Attachment is obsolete: true
Attachment #8950803 - Attachment is obsolete: true
Known followups
* add a test that ensures all release jobs have notifications set. One approach here is to use ./mach taskgraph test-action-callback, and then you know all the tasks in the generated graph are for promotion
* in the meantime watch for new/converted jobs that need new config and cleanup
* convert the release-notify-{promote,push,ship} over to tc-notify. This will require allowing setting the subject template in the kind, and fall back to the default if not specified. Similarly email destination.
Comment on attachment 8950805 [details]
Bug 1425571 - switch to taskcluster-notify for all but the notify tasks,

https://reviewboard.mozilla.org/r/220042/#review225866

:ship:
Attachment #8950805 - Flags: review?(aki) → review+
Comment on attachment 8950806 [details]
Bug 1425571 - remove deprecated pulse-notify config for jobs that moved to tc-notify,

https://reviewboard.mozilla.org/r/220044/#review225868

Awesome!
Attachment #8950806 - Flags: review?(aki) → review+
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #27)
> We're going to use taskcluster-notify instead, which allows us to deprecate
> pulse-notify (probably after ESR52 dies). We don't need to set any
> index.releases.v1 routes to use tc-notify.

This is music to my ears, greata job nthomas!
Pushed by nthomas@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6f856b4b8e27
switch to taskcluster-notify for all but the notify tasks, r=aki
https://hg.mozilla.org/integration/autoland/rev/22fa2b14913a
remove deprecated pulse-notify config for jobs that moved to tc-notify, r=aki
Keywords: leave-open
https://hg.mozilla.org/releases/mozilla-beta/rev/fe558d76faddcf10d0c0da32ac84f1289c9a7003
https://hg.mozilla.org/releases/mozilla-beta/rev/d0d8c94efadf5dd4006a0bd28a7cedeb06e5fd29

The first leaves out the change to release-secondary-update-verify/kind.yml in autoland's 6f856b4b8e27, because bug 1398799 isn't on beta yet. Left a note on that bug too.
Opened https://github.com/taskcluster/taskcluster-notify/issues/23 to look for solutions to the 'get spammed with 1000s of exception emails' when we cancel release graphs.
See Also: → 1374497
TODO - Log artifacts may have only a 2 week expiry, longer than that would be better, eg 1 year.
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #39)
> TODO - Log artifacts may have only a 2 week expiry, longer than that would
> be better, eg 1 year.

This is true for signing/repackage-signing jobs, via 
 https://github.com/mozilla-releng/signingworker/blob/8f83a8c4f76ef20e713b8ac6f4132587a04367f4/signingworker/worker.py#L176
Otherwise 1 year
 https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/task.py#1441
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #40)
> (In reply to Nick Thomas [:nthomas] (UTC+13) from comment #39)
> > TODO - Log artifacts may have only a 2 week expiry, longer than that would
> > be better, eg 1 year.
> 
> This is true for signing/repackage-signing jobs, via 
>  https://github.com/mozilla-releng/signingworker/blob/
> 8f83a8c4f76ef20e713b8ac6f4132587a04367f4/signingworker/worker.py#L176
> Otherwise 1 year
>  https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/
> transforms/task.py#1441

signingworker is funsize esr52 signing.
D'oh, wrong code reference. Here's the artifact listing from a Linux32 Ns from the tip of beta:
(https://queue.taskcluster.net/v1/task/eadDvmbwQfifN1gkh-nSFw/runs/0/artifacts)

{
  "artifacts": [
    {
      "storageType": "s3",
      "name": "public/build/target.tar.bz2",
      "expires": "2018-03-07T14:09:21.518Z",
      "contentType": "application/x-tar"
    },
    {
....
I think this kind of works, based on inspection of a few task definitions.

One thing to note is that you can only have one subject/message per task with tc-notify, so we'll get duplicate emails to both release-signoff and release-automation-notifications for these tasks, for all task statuses.
Attachment #8961737 - Flags: feedback?
Comment on attachment 8961737 [details] [diff] [review]
switch notify tasks to use tc-notify

>diff --git a/taskcluster/ci/release-notify-promote/kind.yml b/taskcluster/ci/release-notify-promote/kind.yml
>--- a/taskcluster/ci/release-notify-promote/kind.yml
>+++ b/taskcluster/ci/release-notify-promote/kind.yml

>    notifications:
>-      completed:
>-         subject: "{task[shipping-product]} {release_config[version]} build{release_config[build_number]}/{config[params][project]} is in the candidates directory"
>-         message: "{task[shipping-product]} {release_config[version]} build{release_config[build_number]}/{config[params][project]} is in the candidates directory"
>-         plugins: ["ses"]
>-         emails:
>-            by-project:
>-               mozilla-beta: ["release-signoff@mozilla.org"]
>-               mozilla-release: ["release-signoff@mozilla.org"]
>-               try: ["{task_def[metadata][owner]}"]
>-               maple: ["release+tcstaging@mozilla.com"]
>-               default: []
>+     subject: "{task[shipping-product]} {release_config[version]} build{release_config[build_number]}/{config[params][project]} is in the candidates directory"

Looks like you're changing indents. `./mach lint -l yaml` will complain.

>+     message: "{task[shipping-product]} {release_config[version]} build{release_config[build_number]}/{config[params][project]} is in the candidates directory"
>+     emails:
>+        by-project:
>+           mozilla-beta: ["release-signoff@mozilla.org"]
>+           mozilla-release: ["release-signoff@mozilla.org"]
>+           try: ["{task_def[metadata][owner]}"]
>+           maple: ["release+tcstaging@mozilla.com"]

We probably want to add birch.

>+           default: []
>    routes:
>       - index.releases.v1.{branch}.{revision}.{product}.{underscore_version}.build{build_number}.email-{channel}
>       - index.releases.v1.{branch}.latest.{product}.latest.email-{channel}
>    index:
>       type: release
>       channel:
>          by-project:
>             mozilla-beta: beta-candidates

I think we can drop index and routes; these are for pulse-notify.
Attachment #8961737 - Flags: feedback? → feedback+
(In reply to Chris AtLee [:catlee] from comment #44)
> One thing to note is that you can only have one subject/message per task
> with tc-notify, so we'll get duplicate emails to both release-signoff and
> release-automation-notifications for these tasks, for all task statuses.

Hm, I think we only need release-signoff for the -notify tasks, and release-automation-notifications for the non-notify tasks.
I can land this first on maple or other staging branch if you like.
Attachment #8961737 - Attachment is obsolete: true
Attachment #8962373 - Flags: review?(aki)
A little bit related: it looks like the current wording of the push cdn email isn't quite obvious for release management. For instance:

> 17:45:25 <RyanVM> jlorenzo: looks like firefox-push-to-cdns ran successfully but no email went out?
> 17:46:00 <jlorenzo> RyanVM: I think it did. I see "firefox 59.0.2 build1/mozilla-release has been pushed to releases". I got it 30 minutes ago 
> 17:46:09 <RyanVM> i saw that, is there one for release-cdntest also?
> 17:46:17 <RyanVM> or is that implied by the other?
> 17:46:52 <jlorenzo> I think this is the one for cdntest. I remember the wording isn't quite right. Let me check 
> 17:46:58 <RyanVM> oh, ok
> 17:47:01 <RyanVM> good to know :)
> 17:47:36 <•bhearsum> i think we fixed that in 60.0
> 17:47:50 <jlorenzo> yeah, that's the right email https://tools.taskcluster.net/groups/ScnkAhV0Tdm8_H5j8Ln9Ng/tasks/RAGByIPORBK1OiJ0xN15cQ/details (see "extra" section) 
> 17:49:47 <jlorenzo> bhearsum: same message in 60, though https://tools.taskcluster.net/groups/Ikhsz6f0SMqBXsnmNZJDog/tasks/bJgV3ac_RR-BetJqsIExBA/details 
> 17:50:50 <•catlee> is that something I should keep in mind for https://bugzilla.mozilla.org/show_bug.cgi?id=1425571 ?
Comment on attachment 8962373 [details] [diff] [review]
switch notify tasks to use tc-notify

(In reply to Chris AtLee [:catlee] from comment #47)
> Created attachment 8962373 [details] [diff] [review]
> switch notify tasks to use tc-notify
> 
> I can land this first on maple or other staging branch if you like.

That would make sense. We can land directly on beta if you're super confident :)

>diff --git a/taskcluster/ci/release-notify-push/kind.yml b/taskcluster/ci/release-notify-push/kind.yml
>+      subject: "{task[shipping-product]} {release_config[version]} build{release_config[build_number]}/{config[params][project]} has been pushed to releases"
>+      message: "{task[shipping-product]} {release_config[version]} build{release_config[build_number]}/{config[params][project]} has been pushed to releases"

As Johan mentioned above, this subject line makes sense to me, but per #releaseduty, other people may find something about cdntest more clear.

>diff --git a/taskcluster/taskgraph/transforms/release_notifications.py b/taskcluster/taskgraph/transforms/release_notifications.py
>+        # Handle notification overrides
>+        notifications = job.get('notifications')
>+        if notifications:
>+            resolve_keyed_by(notifications, 'emails', label, project=config.params['project'])
>+            emails = notifications['emails']
>+            format_kwargs = dict(
>+                task=job,
>+                config=config.__dict__,
>+                release_config=release_config,
>+            )
>+            subject = notifications['subject'].format(**format_kwargs)
>+            message = notifications['message'].format(**format_kwargs)
>+            routes = ['notify.email.{email_dest}.on-any']

Hm. This is interesting. I suppose -notify tasks can fail or throw an exception, and we still want the email sent. I'm not sure this is expected behavior, though. It might make more sense to have a completed email subject line, and fall back to the SUBJECT_TEMPLATE.

If we keep this as is, let's add a comment to this block so we know that custom notifications get sent no matter what, and we don't differentiate by status.
Attachment #8962373 - Flags: review?(aki) → review+
Attachment #8963286 - Flags: review?(aki)
Attachment #8962373 - Attachment is obsolete: true
Comment on attachment 8963286 [details]
Bug 1425571: Replace pulse-notify with taskcluster-notify

https://reviewboard.mozilla.org/r/232186/#review237664

:ship:
Attachment #8963286 - Flags: review?(aki) → review+
Pushed by catlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a4a40801ee2e
Replace pulse-notify with taskcluster-notify r=aki
Comment on attachment 8963286 [details]
Bug 1425571: Replace pulse-notify with taskcluster-notify

Approval Request Comment
[Feature/Bug causing the regression]:
[User impact if declined]: releng will be sad to support pulse-notify for the lifetime of esr60.
[Is this code covered by automated tests?]: 
[Has the fix been verified in Nightly?]: n/a
[Needs manual test from QE? If yes, steps to reproduce]: 
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: no
[Why is the change risky/not risky?]: only impacts notification of release tasks
[String changes made/needed]: no
Attachment #8963286 - Flags: approval-mozilla-beta?
Comment on attachment 8963286 [details]
Bug 1425571: Replace pulse-notify with taskcluster-notify

Nobody wants a sad RelEng. Approved for 60.0b9.
Attachment #8963286 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Thanks for working on the last part here Chris. FWIW, the content of the emails is the usual tc-notify output instead of repeating the subject. That comes from https://hg.mozilla.org/releases/mozilla-beta/rev/a44672d0cf7c#l6.95 setting the 'message' key in the dict instead of 'content', see https://docs.taskcluster.net/reference/core/taskcluster-notify/docs/usage#setting-custom-messages.
ah, should we change that to 'content' then to have simpler emails?
I don't really mind myself, but cleaner may be better for RelMan/QE ?
Reopen to fix up the bodies on the notify kinds, and also only send them on success (which avoids false positives when cancelling graphs etc).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Sends 'in the candidates directory', 'pushed to cdntest', etc emails only when the underlying dummy tasks succeed, avoiding false messages when graphs are cancelled or the dummy job fails for infrastructure reasons.

Also sets the body of the email to be the same string as the subject, as we used to do, to avoid unnecessary information about the dummy task.

tc-notify docs at https://docs.taskcluster.net/reference/core/taskcluster-notify/docs/usage.
Attachment #8971920 - Flags: review?(aki)
Attachment #8971920 - Flags: review?(aki) → review+
Pushed by nthomas@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0247bf652299
fixups for release notifications, r=aki
https://hg.mozilla.org/releases/mozilla-release/rev/5754290305728aa8a2ba8da7ef48dd4f2e5dd51f
https://hg.mozilla.org/releases/mozilla-esr60/rev/aed520e7ef1df0cb23ecce2243f0f760be013f2d

beta will be taken care of by the sheriff merges from central.
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Docs for tc-notify were incorrect - should say .on-completed instead of .on-success. Bug 1425571 fixes this up.
Depends on: 1459185
You need to log in before you can comment on or make changes to this bug.