Closed Bug 1500166 Opened Last year Closed Last year

notify ciduty by email if a nightly hook fails

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set

Tracking

(firefox-esr60 fixed, firefox66 fixed)

RESOLVED FIXED
Tracking Status
firefox-esr60 --- fixed
firefox66 --- fixed

People

(Reporter: aryx, Assigned: rmutter)

Details

Attachments

(1 file, 3 obsolete files)

Today code sheriffs manually requested desktop and android Nightly builds.

The desktop cron task for that failed: https://tools.taskcluster.net/groups/TfhsspsISMaTtuQkfYTV4A/tasks/TfhsspsISMaTtuQkfYTV4A/details

The desktop one got requested at https://tools.taskcluster.net/hooks/project-releng/nightly-desktop%2fmozilla-central

It mentions "Email On Error? true" but neither the person who requested the Nightly nor the CI sheriff on duty got notified of this failure which was not shown on Treeherder. The task details mention "Owner	release@m....c..."

That way it got missed that desktop nightly builds were not running. Please notify either the CI duty team or the code sheriffs who will then take care of it.
It looks like the hook itself succeeded, but the decision task failed.  To capture that, you'll want to modify the task to include a notify route for this cron task.  One option is to modify `.taskcluster.yml` to add a route to every cron task (there are already similar routes for actions and pushes).  But that may be more notifications than you want!

The other option is to modify the cron task generation for those specific cron tasks to add the route.  That would let you be a bit more precise about which jobs you want an alert from, but is more complex.
Component: Hooks → Task Configuration
Product: Taskcluster → Firefox Build System
@ciduty - comment 1 sounds like a good idea. Let's setup alerts when cron tasks fail. We'll see how much we get alerted and then possibly add filters to ones we care about.

Can someone please add a route to .taskcluster.yml and ask for review here? iiuc, it would be to this block:

https://dxr.mozilla.org/mozilla-central/source/.taskcluster.yml#94-97

and something among the lines of:

```
- "notify.email.ciduty+failedcron@mozilla.com.on-failed"
- "notify.email.ciduty+exceptioncron@mozilla.com.on-exception"
```
Flags: needinfo?(ciduty)
Attached patch diff.txt (obsolete) — Splinter Review
Assignee: nobody → rmutter
Flags: needinfo?(ciduty)
Attachment #9018756 - Flags: review?(dustin)
Comment on attachment 9018756 [details] [diff] [review]
diff.txt

Review of attachment 9018756 [details] [diff] [review]:
-----------------------------------------------------------------

::: .taskcluster.yml
@@ +99,5 @@
>                  - "index.gecko.v2.${repository.project}.latest.firefox.decision-${cron.job_name}"
> +                - "notify.email.ciduty+failedcron@mozilla.com.on-failed"
> +                - "notify.email.ciduty+exceptioncron@mozilla.com.on-exception"
> +                - "notify.email.${ownerEmail}.on-failed"
> +                - "notify.email.${ownerEmail}.on-exception"

We probably don't want to email the owner here. For cron tasks, that is probably going to be cron@noreply.mozilla.org, which probably doesn't accept mail. I suspect that trying to email it might cause us issues with other mail delivery as well.
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #1)
> It looks like the hook itself succeeded, but the decision task failed.

https://tools.taskcluster.net/groups/TfhsspsISMaTtuQkfYTV4A/tasks/TfhsspsISMaTtuQkfYTV4A/details
looks like it actually the hook task. We probably need to change the hook as well with those routes. Ideally, ci-admin could be extended to generate these types of hooks.
Attached patch diff2.txt (obsolete) — Splinter Review
Attachment #9018756 - Attachment is obsolete: true
Attachment #9018756 - Flags: review?(dustin)
Attachment #9018774 - Flags: review?(mozilla)
(In reply to Tom Prince [:tomprince] from comment #5)
> https://tools.taskcluster.net/groups/TfhsspsISMaTtuQkfYTV4A/tasks/
> TfhsspsISMaTtuQkfYTV4A/details
> looks like it actually the hook task. We probably need to change the hook as
> well with those routes. Ideally, ci-admin could be extended to generate
> these types of hooks.

This sounds fine.  Several crontasks failed today because of vcs issues (bug 1493702) for example.
Comment on attachment 9018774 [details] [diff] [review]
diff2.txt

Review of attachment 9018774 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine, as long as sherrifs are OK with getting this email. Can you add a comment explaining the reason for it, and move it above "# These are the old index routes for the decision task." (no need for another round of review).

I do want to emphasize that this will send email for failures that *do* show up on treeherder, so I'm not sure it address the original bug.
Attachment #9018774 - Flags: review?(mozilla) → review+
@Aryx are you okay with what was stated in comment 8?
Flags: needinfo?(aryx.bugmail)
Let's give a Try. We can still remove the sheriffs from those mails if it's only helpful for the CI team.
Flags: needinfo?(aryx.bugmail)
Pushed by dlabici@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/cc803b2429d8
notify ciduty by email if a nightly hook fails, r=tomprince a=dlabici
https://hg.mozilla.org/mozilla-central/rev/cc803b2429d8
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
It looks like that's invalid yaml:


Oct 23 14:52:08 mozilla-taskcluster app/worker.1: YAMLException: end of the stream or a document separator is expected at line 84, column 3: 
Oct 23 14:52:08 mozilla-taskcluster app/worker.1:     		- "notify.email.ciduty+failedcro ...
Status: RESOLVED → REOPENED
Flags: needinfo?(dlabici)
Resolution: FIXED → ---
Pushed by dlabici@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/38b95489a34a
notify ciduty by email if a nightly hook fails + fix identation issue. r=tomprice a=dlabici
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/mozilla-central/rev/eea979fe8a92
notify ciduty by email if a nightly hook fails + fix identation issue: fix indentation. a=bustage-fix
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Flags: needinfo?(dlabici)
ReOpening this as more issues were found:

1) The two mails are not correct, they should be sheriffs+{failed,exception}cron@mozilla.org not .not

2) It seems we get emails from Try pushes also. For example:
Your try push has been submitted. Use the link to view the status of your jobs.

With accent on the "try" part. We are supposed to only get them for nightly builds.
Anything wrong in the revision, other than the email? 
https://hg.mozilla.org/mozilla-central/rev/eea979fe8a92
Status: RESOLVED → REOPENED
Flags: needinfo?(dustin)
Resolution: FIXED → ---
It looks like you added the new routes under the on-push section
https://hg.mozilla.org/mozilla-central/file/eea979fe8a92/.taskcluster.yml#l73
not the cron section
https://hg.mozilla.org/mozilla-central/file/eea979fe8a92/.taskcluster.yml#l99

I didn't catch this when reviewing, as the patch didn't include enough context. It is probably better to use phabricator for things like this, as that includes context in the patch by default.
Flags: needinfo?(dustin)
Hi Roland, what's the plan forward here?
Flags: needinfo?(rmutter)
Thanks :aryx for NI .
Made the patch , waiting for review.
Attachment #9018774 - Attachment is obsolete: true
Flags: needinfo?(rmutter)
Attachment #9030391 - Flags: review?(mozilla)
Comment on attachment 9030391 [details] [diff] [review]
Move ciduty and sheriffs mail notification on cron

Review of attachment 9030391 [details] [diff] [review]:
-----------------------------------------------------------------

::: .taskcluster.yml
@@ +94,5 @@
> +		 # BUG 1500166 Notify ciduty by email if a nightly hook fails
> +                - "notify.email.ciduty+failedcron@mozilla.com.on-failed"
> +                - "notify.email.ciduty+exceptioncron@mozilla.com.on-exception"
> +                - "notify.email.sheriffs+failedcron@mozilla.com.on-failed"
> +                - "notify.email.sheriffs+exceptioncron@mozilla.com.on-exception"

1) The sheriffs lines need @mozilla.org as email address (comment 16 issue 1).
2) Please add the notify block below the complete index block.
3) Getting notifications for Try pushes (comment 16 issue 2) should be fixed by the move to the cron block but Tom can confirm.
Attachment #9030391 - Flags: review-
Attached patch diff4.txtSplinter Review
Attachment #9030391 - Attachment is obsolete: true
Attachment #9030391 - Flags: review?(mozilla)
Attachment #9030410 - Flags: review?(aryx.bugmail)
Comment on attachment 9030410 [details] [diff] [review]
diff4.txt

Review of attachment 9030410 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine to land once the tabs are removed.

::: .taskcluster.yml
@@ +91,4 @@
>                  - "index.gecko.v2.${repository.project}.pushlog-id.${push.pushlog_id}.decision-${cron.job_name}"
>                  # list each cron task on this revision, so actions can find them
>                  - 'index.gecko.v2.${repository.project}.revision.${push.revision}.cron.${as_slugid("decision")}'
> +		 # BUG 1500166 Notify ciduty by email if a nightly hook fails

This line has a tab that should be replaced by spaces.
Attachment #9030410 - Flags: review?(mozilla) → review+
Flags: needinfo?(dlabici)
Pushed by dlabici@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/89801fda4bb7
notify ciduty by email if a nightly hook fails + fix identation issue. r=tomprice
Patch landed: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=89801fda4bb727def3fffc0f1b0a92786731bd71

Assuming this bug is now complete, so I'm closing it.
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Flags: needinfo?(dlabici)
Resolution: --- → FIXED
The bug will get resolved once this merges to central.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla65 → ---
https://hg.mozilla.org/mozilla-central/rev/89801fda4bb7
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.