Closed Bug 1709423 Opened 3 years ago Closed 3 years ago

some crash signatures on Windows 7 and 8.1 with unexpected function in frame #0 of crashing thread

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr78 --- unaffected
firefox88 --- unaffected
firefox89 --- wontfix
firefox90 --- wontfix
firefox91 --- wontfix
firefox92 --- affected

People

(Reporter: aryx, Assigned: gsvelto)

References

Details

Attachments

(1 file)

It took me a while to figure out what's going on. The crash reason for all these crashes is EXCEPTION_BREAKPOINT which is covered by our exception handler so it was unclear why they were caught by WER instead... until I realized they're not crashes, they're hangs. If you ignore the crashing thread and look at the main one instead you'll see consistent stacks under those signatures.

WER also captures hangs when an application stops responding but I didn't think those would be passed on to the runtime exception module so I hadn't planned for them. These are interesting to us but we're not ready to handle them so in the short term I'll disable them; we'll re-enable their capture when we'll be ready to handle them.

Flags: needinfo?(gsvelto)

I made a bunch of tests to figure out what's going on but couldn't come up with a definitive solution. Here's what I found though:

  • I tried deliberately hanging Firefox on Windows 10 and this doesn't cause the WER module to be invoked. So it seems like this problem only affects Windows 7 and 8.1. On Windows 10 it seems that those reports go straight to the Windows event log (or to Microsoft?). This explains why these reports only affect Windows 7 and 8.1.
  • WER seems to offer an option that sounds like it could be used to opt out. It's called WER_FAULT_REPORTING_DISABLE_SNAPSHOT_HANG and needs to be passed to WerSetFlags() in the process that registered the WER module. However this option is not documented outside of the werapi.h header so it's unclear if it actually does what the name suggests. Additionally this flag was added in the Windows 8 SDK so even assuming it allows us to opt out hang reports I'm not sure if it's guaranteed to work on Windows 7.
  • The WER exception does not include information to tell apart crashes from hangs (or at least nothing is documented as such). There seems to be a way to detect hangs anyway: when encountering a crash the crashing thread is suspended before the WER module callbacks are invoked. In hangs on the other hand the crashing thread isn't suspended. We could print out the suspend count of the threads in Socorro's stackwalker and then use that information to make Socorro's signature generation to flag those crashes as hangs.

The last option would be the most desirable but it's also the hardest to implement so I'll first try to disable these entirely.

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/995a96533a0f
Opt-out of WER hang reports r=KrisWright
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch
Regressions: 1710046

The patch landed in nightly and beta is affected.
:gsvelto, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.
If yes, don't forget to request an uplift for the patches in the regression caused by this fix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(gsvelto)

Comment on attachment 9220609 [details]
Bug 1709423 - Opt-out of WER hang reports r=KrisWright

Beta/Release Uplift Approval Request

  • User impact if declined: None but we'll get crash reports for things we don't know how to handle and that will make triage harder.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This only flips an option that informs Windows Error Reporting not to grab snapshots of application hangs. We hope this fixes the problem here but it's impossible to be sure until this gets into beta and we see the volume of the existing crashes go down (or not if it didn't work).
  • String changes made/needed: none
Flags: needinfo?(gsvelto)
Attachment #9220609 - Flags: approval-mozilla-beta?

Comment on attachment 9220609 [details]
Bug 1709423 - Opt-out of WER hang reports r=KrisWright

That sounds like an improvement we want in beta, approved for 89 beta 11, thanks.

Attachment #9220609 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

I'm not 100% sure if this is fixed. The original signatures all but disappeared but there's a couple new suspect ones that popped up:

These are all very old versions of Windows though so we can't rule out a bug in the old WER implementations.

Again not 100% sure this is fixed, as I found more crashes such as these:

The volume is very low but I'll file a follow-up to address this. It will require a fair bit of surgery in the runtime exception module though, as there's no documented way of telling apart crashes from hangs in WER.

Blocks: 1714334

I can still find instances of this in recent-ish builds:

Reopening, this needs a more complex fix.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 90 Branch → ---
Depends on: 1718226

It seems that bug 1718226 got rid of this. I couldn't find any new instances in nightly versions following the first one with the patch applied. Let's keep this open until it rides to beta so that we're 100% sure it's fixed before closing the bug.

Confirmed that bug 1718226 made this issue go away, closing.

Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: