Closed Bug 1572723 Opened 5 years ago Closed 3 years ago

Crash in [@ xul.dll | NS_internal_main]. Mostly OOM.

Categories

(Core :: General, defect, P3)

x86
Windows
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox69 --- wontfix
firefox70 --- fix-optional
firefox71 --- fix-optional

People

(Reporter: pascalc, Unassigned)

Details

(Keywords: crash, regression, topcrash-thunderbird, Whiteboard: [tbird topcrash])

Crash Data

This bug is for crash report bp-0c1c8c78-cc44-4ae1-83b5-c547c0190806.

Top 10 frames of crashing thread:

0 xul.dll xul.dll@0x96b01f 
1 xul.dll xul.dll@0xb16358 
2 xul.dll xul.dll@0x9b6742 
3 xul.dll xul.dll@0x9b5faf 
4 xul.dll xul.dll@0x859889 
5 xul.dll xul.dll@0x84a747 
6 xul.dll xul.dll@0x8491b8 
7 xul.dll xul.dll@0x854275 
8 xul.dll xul.dll@0x854945 
9 xul.dll xul.dll@0x8aa72a 

Not sure we can do something with this signature but they started popping in with the 69 beta cycle.

Does either of you know why xul.dll would be unmapped in these cases? And/or, given that we know the module revision of "69.0.0.7158" is there a way to pull the symbols based on that or based on the signatures of the sibling modules? (Context: The triage robot wants me to triage this bug and I'm worried that this might be some type of systemic issue. Certainly it's problematic that we have a MOZ_CRASH that should be pointing to a very specific location but it's not easily accessible.)

Flags: needinfo?(willkg)
Flags: needinfo?(gsvelto)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #2)

Does either of you know why xul.dll would be unmapped in these cases?

No idea, it's rather odd. Most of the crashes are in MOZ_CRASH(). Some have the crash reason set to Can't allocate mozilla::ReentrantMonitor and IPC FatalError in the parent process!. Most have no reason at all though; sadly we have dozens of callers invoking MOZ_CRASH() w/o a reason so it's hard to tell where they're coming from.

And/or, given that we know the module revision of "69.0.0.7158" is there a way to pull the symbols based on that or based on the signatures of the sibling modules? (Context: The triage robot wants me to triage this bug and I'm worried that this might be some type of systemic issue. Certainly it's problematic that we have a MOZ_CRASH that should be pointing to a very specific location but it's not easily accessible.)

I believe we can get the symbol manually but we'd need to modify the stack walking logic to pick symbols for a module that has no debug ID field.

Flags: needinfo?(gsvelto)

I don't know how you'd figure out which symbols file it is without a debug id. I don't think I can help here.

Flags: needinfo?(willkg)
Priority: -- → P3

these are some generated correlations on release:
(88.16% in signature vs 33.29% overall) os_arch = x86
(100.0% in signature vs 39.00% overall) cpu_arch = x86
(33.33% in signature vs 100.0% overall) is_garbage_collecting = null
(32.89% in signature vs 00.14% overall) ipc_system_error = 6
(32.89% in signature vs 00.14% overall) moz_crash_reason = MOZ_CRASH(IPC FatalError in the parent process!)
(31.14% in signature vs 00.05% overall) ipc_fatal_error_msg = Error deserializing 'data' (SerializedStructuredCloneBuffer) member of 'ClonedMessageData'

like with other xul.dll signatures, this may be a by-product of an OOM situation

Keywords: regression

Any other possibilities here or is this stalled?

Obtaining the symbols is a non-trivial activity. Looking I'd the crashes they seem to be three different scenarios but the vast majority point to an OOM cause. Either the available commit space (available page file on Socorro) or the available virtual memory are low which hints pretty strongly at an OOM and might explain why we don't get proper debug IDs for the DLLs.

Flags: needinfo?(gsvelto)

Thunderbird crashes currently exceeds Firefox crashes, with Firefox crashes per day doubling and Thunderbird increased five fold since October 1.
https://crash-stats.mozilla.org/signature/?signature=xul.dll%20%7C%20NS_internal_main&date=%3E%3D2019-07-29T13%3A00%3A00.000Z&date=%3C2020-01-29T13%3A00%3A00.000Z#graphs

OS: Windows 7 → Windows
Summary: Crash in [@ xul.dll | NS_internal_main] → Crash in [@ xul.dll | NS_internal_main]. Mostly OOM.
Whiteboard: [tbird topcrash]
QA Whiteboard: qa-not-actionable

Thank you for looking into the crash signatures! This sounds like this can be closed then; marking WFM as that seems to best cover "this seemed to get fixed/go away on its own for reasons that are not directly understood".

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(overholt)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.