Large OOMs with only xul.dll frames in 36 and higher

RESOLVED FIXED in Firefox 36

Status

()

Toolkit
Crash Reporting
--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: Robert Kaiser, Assigned: dmajor)

Tracking

({crash})

unspecified
mozilla38
x86
Windows NT
crash
Points:
---

Firefox Tracking Flags

(firefox35 unaffected, firefox36+ fixed, firefox37+ fixed, firefox38+ fixed)

Details

(crash signature)

Attachments

(2 attachments)

(Reporter)

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-e5e141e8-fc0d-4852-a2ef-1e17f2150122.
=============================================================

We've been seeing OOM|large crashes with mostly ~2M allocations in topcrash ranges of 36 and higher, see for example https://crash-stats.mozilla.com/signature/?build_id=20150120155007&product=Firefox&release_channel=beta&version=36.0&hang_type=crash&signature=OOM+|+large+|+mozalloc_abort%28char+const*+const%29+|+mozalloc_handle_oom%28unsigned+int%29+|+moz_xmalloc+|+xul.dll%400xd3956+|+xul.dll%400x11c648c+|+xul.dll%400x11327b1+|+xul.dll%400x112e90d+|+xul.dll%400x112e730+|+xul.dll%400x1541f5+|+xul.dll%400x158795+|+xul.dll%400xdf784+...&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1


I feel like that's an OOM|large that makes us unable to actually map xul.dll and not get a debug ID for it, see bp-e5e141e8-fc0d-4852-a2ef-1e17f2150122 as one example.
(Assignee)

Comment 1

3 years ago
xul.dll has grown by 8 megs between 34 and 36, mostly due to the DLL merges in bug 609976 and bug 922912. We likely need to update ReserveBreakpadVM's kReserveSize.

(I bet OOM|small is equally affected, but since Socorro doesn't put stack frames in those signatures, we don't notice)
(Assignee)

Comment 2

3 years ago
Bug 1028972 comment 15: "This ought to buy us a year and change. Here's hoping we'll revisit this for unified xul before then." Well, at least that came true :-)
(Assignee)

Comment 3

3 years ago
[Tracking Requested - why for this release]: This causes a lot of useless crash signatures on 36+
status-firefox35: --- → unaffected
status-firefox36: --- → affected
status-firefox37: --- → affected
tracking-firefox36: --- → ?
tracking-firefox37: --- → ?
(Assignee)

Comment 4

3 years ago
Created attachment 8553365 [details] [diff] [review]
Adjust Breakpad reservation for xul.dll inflation

Increase by 10MB to cover recent growth plus a little future breathing room.

(For anyone following along, this patch is not intended to fix the crashes, but it should turn the "xul.dll@0x1234" signatures into something more actionable)
Attachment #8553365 - Flags: review?(benjamin)
Some of the signatures I checked had lots of youtube URLs. That makes the importance of trying to figure these crashes out all the more important for MSE. Adding Anthony and Sheila to the bug as we will need to discuss this next week.

Updated

3 years ago
Attachment #8553365 - Flags: review?(benjamin) → review+
Important crash, tracking.
tracking-firefox36: ? → +
tracking-firefox37: ? → +
(Reporter)

Comment 7

3 years ago
David, can you please request uplift to aurora and beta for this ASAP? As you know, we have those crashes on beta that we need data for and then need to react to that, so the sooner we can get this there, the better.
Flags: needinfo?(dmajor)
Comment on attachment 8553365 [details] [diff] [review]
Adjust Breakpad reservation for xul.dll inflation

[Triage Comment]
I am going to bypass the procedure to make sure we have it in beta 4.
David, hope you don't mind!
Attachment #8553365 - Flags: approval-mozilla-beta+
Attachment #8553365 - Flags: approval-mozilla-aurora+
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/8aa2406244bc
https://hg.mozilla.org/releases/mozilla-aurora/rev/94ecdce16831
https://hg.mozilla.org/releases/mozilla-beta/rev/59aa16cfd49f
status-firefox36: affected → fixed
status-firefox37: affected → fixed
status-firefox38: --- → fixed
tracking-firefox38: --- → ?
Keywords: checkin-needed
(Reporter)

Comment 10

3 years ago
Given that it landed on the branches, clearing the ni?
Flags: needinfo?(dmajor)
https://hg.mozilla.org/mozilla-central/rev/8aa2406244bc
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
(Assignee)

Comment 12

3 years ago
Agreed re the uplifts, thanks for jumping in!
(Reporter)

Comment 13

3 years ago
From what I see, it's substantially lower volume, but we still have reports like this in 36.0b4 even with this patch: https://crash-stats.mozilla.com/report/list?signature=OOM%20%7C%20large%20%7C%20mozalloc_abort%28char%20const%2A%20const%29%20%7C%20mozalloc_handle_oom%28unsigned%20int%29%20%7C%20moz_xmalloc%20%7C%20xul.dll%400x278d96%20%7C%20xul.dll%400x7bfa98%20%7C%20xul.dll%400x10f419a%20%7C%20xul.dll%400x10f0227%20%7C%20xul.dll%400x10f004a%20%7C%20xul.dll%400x1e8527%20%7C%20xul.dll%400x10867e%20%7C%20xul.dll%400x286684%2E%2E%2E and https://crash-stats.mozilla.com/report/list?signature=xul.dll%400x31f195%20%7C%20xul.dll%400x54561%20%7C%20xul.dll%400x56dd3%20%7C%20xul.dll%400x3f66c3%20%7C%20xul.dll%400x3f697a%20%7C%20xul.dll%400x603795%20%7C%20xul.dll%400x1a47e43
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #13)
> From what I see, it's substantially lower volume, but we still have reports
> like this in 36.0b4 even with this patch:

Yes, I was just going to mention that as well, http://bit.ly/1tyKWex shows almost all youtube.com URLs which concerns me a bit since this patch made into that build ID.
tracking-firefox38: ? → +
David: Is there anything in particular we could tweak to get the remaining signatures to show more information? The concern from the MSE side would be that many URLs show youtube, and it would great to find out what the root cause of those remaining crashes is. Thanks.
Flags: needinfo?(dmajor)
(Assignee)

Comment 16

3 years ago
The previous patch really should have worked, but I guess we can try increasing the value even higher.

In the meantime, WinDbg can make sense of the dump files if you already have a PDB cached for that xul version. If you have a list of the top N bogus signatures that you care about, I can go and open bugs with the correct stacks.
Flags: needinfo?(dmajor) → needinfo?(mozillamarcia.knous)
(Assignee)

Comment 17

3 years ago
Created attachment 8556794 [details] [diff] [review]
If at first you don't succeed, use more memory
Attachment #8556794 - Flags: review?(benjamin)
(Assignee)

Comment 18

3 years ago
> http://bit.ly/1tyKWex 
That one looks like it's bug 1102642.

Updated

3 years ago
Attachment #8556794 - Flags: review?(benjamin) → review+
It looks as if it may be too late to get this in B5 since I already see build candidates - ni on Sylvestre to see the possibilities.

Sylvestre we see crashes in the latest B4 explosive report that we would like to get a better stack for, so getting this patch into Beta 5 may be important for the MSE decision.
Flags: needinfo?(mozillamarcia.knous) → needinfo?(sledru)
Not sure we should postpone beta 5. Even if we start a build2 right now, we won't have the QE sign off from SV and QE before next Monday.
Beta 6 is going to build next Monday, should be released next Tuesday.
Flags: needinfo?(sledru)
(Assignee)

Comment 21

3 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b3d6f5a3f1b
(Assignee)

Comment 22

3 years ago
Comment on attachment 8556794 [details] [diff] [review]
If at first you don't succeed, use more memory

Approval Request Comment
[Feature/regressing bug #]: xul.dll growth, mostly bug 609976 and bug 922912
[User impact if declined]: Unactionable crash reports
[Describe test coverage new/current, TreeHerder]: m-i is green
[Risks and why]: We will use more address space
[String/UUID change made/needed]: None
Attachment #8556794 - Flags: approval-mozilla-beta?
Attachment #8556794 - Flags: approval-mozilla-aurora?
Comment on attachment 8556794 [details] [diff] [review]
If at first you don't succeed, use more memory

I don't know if it is possible but maybe we should automatically detect if this needs to be increased.
Attachment #8556794 - Flags: approval-mozilla-beta?
Attachment #8556794 - Flags: approval-mozilla-beta+
Attachment #8556794 - Flags: approval-mozilla-aurora?
Attachment #8556794 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/4b3d6f5a3f1b
(Assignee)

Comment 25

3 years ago
I went ahead and pulled up the correct stacks for a few of these on 36b4: they led to bug 1128409, bug 1114976, bug 1081911.
(Assignee)

Comment 26

3 years ago
Ryan could you help with the uplift of comment 23? (I'm surprised it didn't happen automagically. Were the scripts thrown off by the previous patch? Was there some marking I should have put on the bug?)
Flags: needinfo?(ryanvm)
(In reply to David Major [:dmajor] (UTC+13) from comment #26)
> Ryan could you help with the uplift of comment 23? (I'm surprised it didn't
> happen automagically. Were the scripts thrown off by the previous patch? Was
> there some marking I should have put on the bug?)

Yeah, the status flags should have been set back to affected when there was more follow-up work to land.
status-firefox36: fixed → affected
status-firefox37: fixed → affected
Flags: needinfo?(ryanvm)
https://hg.mozilla.org/releases/mozilla-aurora/rev/5e50ea349b47
status-firefox37: affected → fixed
https://hg.mozilla.org/releases/mozilla-beta/rev/5a2b0dfc3832
status-firefox36: affected → fixed
You need to log in before you can comment on or make changes to this bug.