Closed
Bug 939817
Opened 11 years ago
Closed 8 years ago
more vcs-sync email tweaks
Categories
(Developer Services :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Unassigned)
References
Details
Attachments
(4 files, 1 obsolete file)
2.96 KB,
patch
|
hwine
:
review+
mozilla
:
checked-in+
|
Details | Diff | Splinter Review |
2.24 KB,
patch
|
hwine
:
review+
mozilla
:
checked-in+
|
Details | Diff | Splinter Review |
40.66 KB,
application/pdf
|
Details | |
5.77 KB,
patch
|
mozilla
:
review+
pmoore
:
checked-in-
|
Details | Diff | Splinter Review |
Per email thread. I've got this patch running in staging, so far...
Comment 1•11 years ago
|
||
Comment on attachment 8333876 [details] [diff] [review] email.diff Review of attachment 8333876 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozharness/base/vcs/vcssync.py @@ +51,5 @@ > + subject = "[vcs2vcs] Successful no-op conversion for %s" % job_name > + if error_contents and not fatal: > + subject += " with warnings" > + if self.successful_repos: > + subject += ' (' + ','.join(self.successful_repos) + ')' question: how many items could be in self.successful_repos? If 20, that would make for a very messy subject line that would not be displayed in many viewers. If it could be long, the info s/b repeated in the message body.
Reporter | ||
Comment 2•11 years ago
|
||
(In reply to Hal Wine [:hwine] (use needinfo) from comment #1) > Comment on attachment 8333876 [details] [diff] [review] > email.diff > > Review of attachment 8333876 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: mozharness/base/vcs/vcssync.py > @@ +51,5 @@ > > + subject = "[vcs2vcs] Successful no-op conversion for %s" % job_name > > + if error_contents and not fatal: > > + subject += " with warnings" > > + if self.successful_repos: > > + subject += ' (' + ','.join(self.successful_repos) + ')' > > question: how many items could be in self.successful_repos? If 20, that > would make for a very messy subject line that would not be displayed in many > viewers. > > If it could be long, the info s/b repeated in the message body. I'm trying not to spam release+vcs2vcs with no-text emails by adding text to the email.
Reporter | ||
Comment 3•11 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #2) > (In reply to Hal Wine [:hwine] (use needinfo) from comment #1) > > question: how many items could be in self.successful_repos? If 20, that > > would make for a very messy subject line that would not be displayed in many > > viewers. > > > > If it could be long, the info s/b repeated in the message body. > > I'm trying not to spam release+vcs2vcs with no-text emails by adding text to > the email. Also, for there to be ~20 repos in the list, each of those 20 repos would have to be updated since the last run. That's definitely possible the first run; very unlikely if we're running every minute.
Reporter | ||
Comment 4•11 years ago
|
||
Trying this: if self.successful_repos: - subject += ' (' + ','.join(self.successful_repos) + ')' + if len(self.successful_repos) <= 5: + subject += ' (' + ','.join(self.successful_repos) + ')' + else: + text += "Successful repos: %s\n\n" % ', '.join(self.successful_repos)
Comment 5•11 years ago
|
||
Thanks Aki for this. :)
Reporter | ||
Comment 6•11 years ago
|
||
Seems kosher in staging.
Attachment #8334539 -
Flags: review?(hwine)
Reporter | ||
Updated•11 years ago
|
Attachment #8333876 -
Attachment is obsolete: true
Updated•11 years ago
|
Attachment #8334539 -
Flags: review?(hwine) → review+
Reporter | ||
Comment 7•11 years ago
|
||
Comment on attachment 8334539 [details] [diff] [review] email.diff http://hg.mozilla.org/build/mozharness/rev/9fe8cd18e1b6
Attachment #8334539 -
Flags: checked-in+
Reporter | ||
Updated•11 years ago
|
Reporter | ||
Updated•11 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•11 years ago
|
Attachment #8334871 -
Flags: review?(hwine) → review+
Reporter | ||
Comment 9•11 years ago
|
||
Comment on attachment 8334871 [details] [diff] [review] email.diff http://hg.mozilla.org/build/mozharness/rev/6ea046dcfd0c Transplanted to production Nuked the repo_update.json's on vcssync{1,2} so everything gets pushed.
Attachment #8334871 -
Flags: checked-in+
Reporter | ||
Updated•11 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 10•11 years ago
|
||
something[s] here made it to production
Comment 11•10 years ago
|
||
The email thread that led to this ticket being created.
Comment 12•10 years ago
|
||
I think we still need to fix the subject so that it does not say "successful" if there is a non-self-recovering error that requires human attention.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 13•10 years ago
|
||
For ref, I asked Pete to look into this today following a chain of cron spam this morning reporting "Successful no-op conversion for build-repos" when the output contained the following: 10:17:06 ERROR - abort: data/lib/python/mozilla_buildtools/test/sample-patcher-config.cfg.i@59fcd33f2135: no match found! 10:17:06 ERROR - Automation Error: hg not responding 10:17:06 ERROR - Return code: 255 ... 10:17:09 ERROR - error: src refspec refs/tags/FIREFOX_28_0b6_RELEASE does not match any. 10:17:09 ERROR - error: src refspec refs/tags/FIREFOX_28_0b6_BUILD1 does not match any. 10:17:09 ERROR - error: src refspec refs/tags/FENNEC_28_0b6_RELEASE does not match any. 10:17:09 ERROR - error: src refspec refs/tags/FENNEC_28_0b6_BUILD1 does not match any. 10:17:09 ERROR - error: failed to push some refs to 'git@github.com:mozilla/build-tools.git' 10:17:09 ERROR - Return code: 1 10:17:09 ERROR - build-tools: Can't push /opt/vcs2vcs/vcs_sync/build/conversion/build-tools to build-tools-github! 10:17:09 ERROR - Unable to push build-tools. ... 10:17:09 FATAL - Unable to push these repos: 10:17:09 FATAL - build-tools: Can't push /opt/vcs2vcs/vcs_sync/build/conversion/build-tools to build-tools-github! 10:17:09 FATAL - 10:17:09 FATAL - Running post_fatal callback... Either we're not catching those failures properly, or we're ignoring them. We're retrying this conversion every minute without true "Success," so we should fix or throttle.
Reporter | ||
Comment 14•10 years ago
|
||
Pete: is this on your plate, or did you want me to fix this?
Flags: needinfo?(pmoore)
Updated•10 years ago
|
Assignee: aki → pmoore
Flags: needinfo?(pmoore)
Comment 15•10 years ago
|
||
I can take it.
Comment 16•10 years ago
|
||
I haven't forgotten about this - I'm going to work on this after I land the work for Bug 799719 - (vcs-sync) tracker to retire legacy vcs2vcs. It will essentially mean altering this function: http://hg.mozilla.org/build/mozharness/file/3347b848256c/mozharness/base/vcs/vcssync.py#l28
Comment 17•10 years ago
|
||
This patch makes sure we always have an email subject of "[vcs2vcs] Failed conversion for XXXXX" if there are any failures. This is something of a simplification, but hopefully should be clearer to people receiving the email, that there is a problem, and the problems are then detailed precisely in the body of the email. I also choose not to explicitly list successful repos in the case there are failures too, to avoid that a user sees a list of successful repos, and assumes that all were successful. Therefore, this is an attempt to keep things simple - if there is a failure, the subject states this simply, and the body of the email details the problems. Let me know what you think! :) Pete
Attachment #8459902 -
Flags: review?(aki)
Reporter | ||
Comment 18•10 years ago
|
||
Comment on attachment 8459902 [details] [diff] [review] bug939817_mozharness_v1.patch It is a simplification. I currently am wondering what happens when there's an intermittent hg burp -- let's say one out of every hg pulls or clones fails. Right now with that kind of thing will go into my filtered vcs-sync box, and the actual fatal() calls go into my inbox. Now both will go into my inbox, or I'll be tempted to filter the failures as well. Not that the status quo is particularly good or anything. Maybe this is the first step, and we keep some sort of info about latest email sent about a particular repo in the repo_update.json, so we don't email every minute? We can definitely try this to see if it works for the team.
Attachment #8459902 -
Flags: review?(aki) → review+
Comment 19•10 years ago
|
||
Comment on attachment 8459902 [details] [diff] [review] bug939817_mozharness_v1.patch Rolled out to production, which caused the following issue: Uncaught exception: Traceback (most recent call last): File "/opt/vcs2vcs/mozharness/mozharness/base/script.py", line 1243, in run self.run_action(action) File "/opt/vcs2vcs/mozharness/mozharness/base/script.py", line 1185, in run_action self._possibly_run_method(method_name, error_if_missing=True) File "/opt/vcs2vcs/mozharness/mozharness/base/script.py", line 1126, in _possibly_run_method return getattr(self, method_name)() File "/opt/vcs2vcs/mozharness/mozharness/base/vcs/vcssync.py", line 52, in notify if len(message) > max_log_sample_size: TypeError: object of type 'NoneType' has no len() 04:49:17 INFO - ##### 04:49:17 INFO - ##### Running notify step. 04:49:17 INFO - ##### 04:49:17 INFO - Running main action method: notify 04:49:17 INFO - Job took 16 seconds. 04:49:17 FATAL - Uncaught exception: Traceback (most recent call last): 04:49:17 FATAL - File "/opt/vcs2vcs/mozharness/mozharness/base/script.py", line 1243, in run 04:49:17 FATAL - self.run_action(action) 04:49:17 FATAL - File "/opt/vcs2vcs/mozharness/mozharness/base/script.py", line 1185, in run_action 04:49:17 FATAL - self._possibly_run_method(method_name, error_if_missing=True) 04:49:17 FATAL - File "/opt/vcs2vcs/mozharness/mozharness/base/script.py", line 1126, in _possibly_run_method 04:49:17 FATAL - return getattr(self, method_name)() 04:49:17 FATAL - File "/opt/vcs2vcs/mozharness/mozharness/base/vcs/vcssync.py", line 52, in notify 04:49:17 FATAL - if len(message) > max_log_sample_size: 04:49:17 FATAL - TypeError: object of type 'NoneType' has no len() 04:49:17 FATAL - Running post_fatal callback... 04:49:17 INFO - Job took 16 seconds. 04:49:17 INFO - Getting output from command: ['egrep', '-C5', '^[0-9:]+ +(ERROR|CRI[T]ICAL|FATAL) -', '/opt/vcs2vcs/logs/project-branches_info.log'] 04:49:17 INFO - Copy/paste: egrep -C5 "^[0-9:]+ +(ERROR|CRI[T]ICAL|FATAL) -" /opt/vcs2vcs/logs/project-branches_info.log I fixed this with a bustage fix: https://hg.mozilla.org/build/mozharness/rev/0322797e1ac9 However, then I spotted that we were getting success emails constantly, which was not intended. In fairness Aki had pointed this out in comment 18 ("Maybe this is the first step, and we keep some sort of info about latest email sent about a particular repo in the repo_update.json, so we don't email every minute?") - but I had (falsely) thought this was related to when there were some failures, not when everything was successful - my bad. I have now backed out the changes. Will fix up, and attach a new patch for review. Apologies for the mess up. Pete
Attachment #8459902 -
Flags: checked-in-
Reporter | ||
Comment 20•10 years ago
|
||
Stepping back. The intent here was to create 5? classes of email: 1) Success! No new changes, but maybe you want to see that we're running and how long a no-op run takes. These are skipped with 'skip_empty_messages': True 2) Success! Something was converted and pushed, but nothing went wrong, and nothing went wrong with this repo the previous time. These are also skipped with 'skip_empty_messages': True 3) Success! Something was converted and pushed, and this is the first time this repo was pushed ever, or the first time this repo was pushed after a failure on that repo. This is probably skipped if 'failure_only': True, but it might be nice to get these anyway, like a nagios ok or a releng-ci yippie! 4) Error! The script finished all the way through, but something threw an error along the way. Send the log, but with a "Success with errors" subject so it can be parsed at a non-urgent pace. Ideally this is only sent if it fails X number of times, and then it only sends an email after X amount of time. We may need to deal with flapping as well. 5) Fatal! Actionable! The script needs urgent attention! Do not filter this email! Have it hit your inbox and jump on it ASAP! This would also be a good class of email to have sent to a pager. The subject has useful info: success, success with errors, failure; time to convert; what repos were converted. The first 3 allow you to filter the first two easily while paying very close attention to the third. There are definitely bugs here, especially around the "don't send email on every hiccup" threshold -- it would be nice to have that configurable. Your simplification is great from a code perspective; way fewer code paths to care about. It's also great from a multi-repo-conversion perspective; why fatal() on one repo when you can convert the rest? I don't like it from an email user perspective: both success and failure emails spammed, so all 5 classes just went into a massive clump of email that meant I either have to slog through the whole thing to separate signal from noise. My preferences here no longer apply as of EOD Friday, however, so whatever works for people. I think the more-noise-less-signal won't work for the team. I think you agree that we need more signal and less noise. I thought explicitly stating original intent might help in getting there.
Reporter | ||
Comment 21•10 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #20) > The subject has useful info: success, success with errors, failure; time to > convert; what repos were converted. The first 3 allow you to filter the > first two easily while paying very close attention to the third. "The third" meaning failure. The latter two, time and which repos, can be useful for other reasons. Time helps when you're dealing with someone complaining about conversion turnaround times, which hasn't happened in recent memory. Which repos might be helpful, but I haven't really found it critical info. It went in the subject rather than the body to have this class of email fall under "skip_empty_messages" only.
Updated•9 years ago
|
Assignee: pmoore → nobody
Comment 23•8 years ago
|
||
moving to current product/component, as relevant for anyone working on mozharness based vcs-sync, but appears all work done.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 8 years ago
Component: Tools → General
Flags: needinfo?(hwine)
Product: Release Engineering → Developer Services
QA Contact: hwine
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•