Closed Bug 1104317 Opened 5 years ago Closed 5 years ago

Signatures for shutdown crashes

Categories

(Socorro :: General, task)

x86_64
Windows 7
task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dmajor, Assigned: lars)

References

Details

The shutdown crashes added by 1038342 are coming from a watchdog thread which is uninteresting for purposes of crash bucketing and analysis. It would be more helpful to see the main thread was doing.

If the signature matches:
[@ mozilla::`anonymous namespace''::RunWatchdog(void*)]
[@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
(or maybe just some regex for RunWatchdog)

Then instead let's show the stack for thread 0, ignoring anything on the regular ignore-list as well as anything containing ProcessNextEvent. E.g. bp-569807e3-431a-4f7b-a800-0bbc62141124 would display as "shutdownhang | 
mozilla::layers::CompositorParent::ShutDown()"

This won't work well for busy-hangs like bp-b776b61d-a73a-40b1-81dd-581212141124. Taking the approach above we'd get tons of random signatures. bsmedberg are you ok with that or do you want to include some kind of cleverness like search for a frame containing the word Shutdown?
Flags: needinfo?(benjamin)
Let's ignore the busy-hang case for now: if we want to we can go back and instrument that case by measuring and annotating CPU usage.

In bug 1103833 I suggested that rather than using RunWatchdog as the marker, we could use an explicit annotation. But in the short term, Run Watchdog is probably good enough.
Flags: needinfo?(benjamin)
See Also: → 1103833
Blocks: 1103833
Lonnen can you find an owner for this? It's needed in order to diagnose one of our biggest nightly crashes.
Flags: needinfo?(chris.lonnen)
can "ProcessNextEvent" be added to the general ignore list or must it be a special case for this signature variant only? 

adding that frame signature to the general ignore list makes this trivial to implement.  Having it as a special case makes the implementation a bit more complicated.
Flags: needinfo?(dmajor)
Flags: needinfo?(chris.lonnen)
Flags: needinfo?(benjamin)
QA Contact: lars
Am I remembering correctly, that there are two lists, an "ignore" list and an "append" list?

I think it would be reasonable to include /ProcessNextEvent/ on the general append list. (I wouldn't want to strip it altogether though.)
Flags: needinfo?(dmajor)
Yes, that is correct, there are both "ignore" and "append" lists (see: http://socorro.readthedocs.org/en/v8/signaturegeneration.html).  

I have proceeded with adding ProcessNextEvent to the "append" list.  

To see if my modifications produce what you expect, please compare these*:

from production, a crash with the target signature:
https://crash-stats.mozilla.com/report/index/1d69a631-7ece-4570-8c72-bffa12150120

that same crash in staging, reprocessed with a new signature rule for "shutdown hangs":
https://crash-stats.allizom.org/report/index/1d69a631-7ece-4570-8c72-bffa12150120

Please verify that the signature in staging is correct.  On your approval, I will submit this PR and if you act quickly, I bet we can get this into production this week.  

* the orignial example cited in Comment #0 could not be used because its symbols have expired, reprocessing now results in a signature that wouldn't trigger the shutdown hang rule.
Assignee: nobody → lars
Flags: needinfo?(dmajor)
QA Contact: lars
Flags: needinfo?(benjamin)
f+
Flags: needinfo?(dmajor)
Blocks: 1123698
See Also: 1103833
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/a9a3cd3ff837b7e44d48499d29c72215588bcccd
Merge pull request #2588 from twobraids/runwatchdog

Fixes Bug 1104317 - adds SignatureRunWatchDog rule to processor
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
"shutdownhang..." signatures are now streaming out of the Socorro processor.  There are about 4 per minute.
You need to log in before you can comment on or make changes to this bug.