Closed Bug 1667476 Opened 5 years ago Closed 1 year ago

Crash in [@ xul.dll | wmain] with most calls in the crash stack unsymbolified

Categories

(Toolkit :: Crash Reporting, defect)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr78 --- unaffected
firefox81 --- unaffected
firefox82 --- wontfix
firefox83 --- wontfix
firefox86 --- wontfix

People

(Reporter: aryx, Unassigned)

References

Details

(Keywords: crash, topcrash)

Crash Data

This seems to have started in late August with at least 82.0a1 build id 20200828153126 affected. Before there was ~1 crash/week.

I also experienced the crash during some tab cleanup and had switched to a Google Docs tab whose UI still seemed to be initializing when I closed it. FTR, I have the warp JIT enabled (javascript.options.warp).

Crash report: https://crash-stats.mozilla.org/report/index/b4b9effb-15e0-4223-9fcf-f19790200925

Top 10 frames of crashing thread:

0 xul.dll xul.dll@0x2713efa 
1 xul.dll xul.dll@0x2712d12 
2 xul.dll xul.dll@0x2712fc1 
3 xul.dll xul.dll@0x1929493 
4  @0x37948d1d 
5 xul.dll xul.dll@0xe1893 
6 xul.dll xul.dll@0xeb7e2 
7 xul.dll xul.dll@0x2881a1 
8 xul.dll xul.dll@0x4858cb 
9 xul.dll xul.dll@0x16e0e9f 

Only the bottom of the stack is symbolified:

22 	firefox.exe	wmain(int, wchar_t**)	toolkit/xre/nsWindowsWMain.cpp:138 	frame_pointer
23 	firefox.exe	__scrt_common_main_seh()	/builds/worker/workspace/obj-build/browser/app/f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:288 	cfi
24 	kernel32.dll	BaseThreadInitThunk		cfi
25 	ntdll.dll	_RtlUserThreadStart		cfi
26 	ntdll.dll	_RtlUserThreadStart		cfi

These are catastrophic OOM crashes, most of them are lacking available memory information and those who have it report only a few MiBs of free commit space. The XUL DLL is getting unloaded so the minidump generation code doesn't populate its entry correctly and thus the stack walking machinery can't find the appropriate symbols. This is very similar to bug 1572723 and in fact we might want to duplicate against that bug.

Crash Signature: [@ xul.dll | wmain] → [@ OOM | large | xul.dll | wmain ] [@ xul.dll | wmain]

(In reply to Gabriele Svelto [:gsvelto] from comment #1)

The XUL DLL is getting unloaded so the minidump generation code doesn't populate its entry correctly

This raises another question: Should we pin our dependent DLLs so this (hopefully) can't happen?

(In reply to Aaron Klotz [:aklotz] from comment #2)

This raises another question: Should we pin our dependent DLLs so this (hopefully) can't happen?

Is it possible? If it is then I'd try it. BTW I'm guessing that the DLL has been unloaded because we don't have any form of error-reporting in the minidump writer (it's provided by a Microsoft library) so I can't be 100% sure that's the issue; it could be that the minidump writer fails to gather the code/debug ID for some other reason (like memory being too tight).

Severity: -- → S3
Crash Signature: [@ OOM | large | xul.dll | wmain ] [@ xul.dll | wmain] → [@ OOM | large | xul.dll | wmain ] [@ xul.dll | wmain] [@ xul.dll]

There are 7 crashes xul.dll crashes from 3+ installations with 64-bit builds on Nightly in the last 24 hours (Nightly updates were disabled for some time).

Chris, could this be from the rollout increase in the Fission experiment?

Flags: needinfo?(cpeterson)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #4)

There are 7 crashes xul.dll crashes from 3+ installations with 64-bit builds on Nightly in the last 24 hours (Nightly updates were disabled for some time).

Chris, could this be from the rollout increase in the Fission experiment?

Possibly, though it looks those 7 crash reports are from only 3 users, 2 with Fission and 1 without. Let's keep an eye on this.

crash query with "dom fission enabled" column

Flags: needinfo?(cpeterson)

I don't think this is a Fission crash. 24.49% of [@ xul.dll] crash reports from 86 Nightly have Fission enabled, which is only a little higher than the 15% of 86 Nightly users with Fission enabled. Fission users launch more content processes, so it's not surprising they would have more crashes.

Yes, while this is not immediately apparent what you're seeing here is an increase in OOM crashes and Fission users might be affected. >95% of the crashes here are coming from 32-bit users were the crashed process seem to have run out of virtual address space. That explains the lack of symbols in libxul. On 32-bit builds we set aside a chunk of virtual memory for Breakpad which we release upon hitting a crash, hoping that it's enough to let dbghelp.dll do its job while writing the minidump. In these cases apparently it's not.

I don't think there's anything actionable here, most of the crashes are on 32-bit capable OSes so while the underlying hardware seem to support 64-bit operation the users wouldn't even be able to switch to a 64-bit build because of the OS.

The bug is linked to topcrash signatures, which match the following criteria:

  • Top 20 desktop browser crashes on release (startup)
  • Top 20 desktop browser crashes on beta (startup)

:gsvelto, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gsvelto)

This crash isn't actionable.

Flags: needinfo?(gsvelto)

I'm gonna retract comment 9. Thanks to :willkg excellent work we're this close to solving this issue, see this comment.

Depends on: 1746940

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

This was fixed by bug 1746940, the remaining crashes here are coming from prehistoric installations for which we don't have symbols anymore because they're too old.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.