Closed Bug 1028972 Opened 10 years ago Closed 10 years ago

crash in xul.dll@0x8e13cd | xul.dll@0x1ab44c | xul.dll@0x126047 | xul.dll@0x19effe | xul.dll@0x1c8403 | _MD_CURRENT_THREAD | PR_Unlock | xul.dll@0x11d2d4 | xul.dll@0x1c844d | xul.dll@0x34a70d

Categories

(Core :: General, defect)

31 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla33
Tracking Status
firefox30 --- unaffected
firefox31 + verified
firefox32 --- verified
firefox33 --- verified

People

(Reporter: u279076, Assigned: away)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-d79585ce-bb49-435f-a639-d88c72140622.
=============================================================
0 	xul.dll 	xul.dll@0x8e13cd 	
1 	xul.dll 	xul.dll@0x1ab44c 	
2 	xul.dll 	xul.dll@0x126047 	
3 	xul.dll 	xul.dll@0x19effe 	
4 	xul.dll 	xul.dll@0x1c8403 	
5 	nss3.dll 	_MD_CURRENT_THREAD 	nsprpub/pr/src/md/windows/w95thred.c
6 	nss3.dll 	PR_Unlock 	nsprpub/pr/src/threads/combined/prulock.c
7 	xul.dll 	xul.dll@0x11d2d4 	
8 	xul.dll 	xul.dll@0x1c844d 	
9 	xul.dll 	xul.dll@0x34a70d 	
=============================================================
More reports:
https://crash-stats.mozilla.com/report/list?signature=xul.dll%400x8e13cd%20|%20xul.dll%400x1ab44c%20|%20xul.dll%400x126047%20|%20xul.dll%400x19effe%20|%20xul.dll%400x1c8403%20|%20_MD_CURRENT_THREAD%20|%20PR_Unlock%20|%20xul.dll%400x11d2d4%20|%20xul.dll%400x1c844d%20|%20xul.dll%400x34a70d#tab-sigsummary

All reports for this crash are with Firefox 31.0b2 so it looks like we may have a regression there. It currently ranks at #80 with 321 crashes per 302 installations in the last week. The majority of the URLs are with Facebook.
Benjamin, is there anything you can advise here?
Flags: needinfo?(benjamin)
Looking at the link in comment 0, we haven't recorded the debug ID of xul.dll which means we can't look up symbols for it. This doesn't appear to be a problem with this buildid in general, just some kinds of crashes.

I think we've seen problems with not recording debug IDs in OOM situations before, but I don't recognize it recently. Kairo can you poke through the data and see if there are any other patterns here?
Flags: needinfo?(benjamin) → needinfo?(kairo)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #4)
> Looking at the link in comment 0, we haven't recorded the debug ID of
> xul.dll which means we can't look up symbols for it.

I asked Anthony to ask you about this because you or dmajor can probably figure out the actual stack by looking at the minidump, or am I wrong in that?

> I think we've seen problems with not recording debug IDs in OOM situations
> before, but I don't recognize it recently. Kairo can you poke through the
> data and see if there are any other patterns here?

I know I discussed a similar case with dmajor or someone else a while back and IIRC we need to map the library into memory to determine the debug ID to put into the crash report, and libxul tends to be very large so it's the most likely to fail and end up with and empty debug ID.

From all I have seen recently, this is not something that happens a lot, but it seems to happen with the 31.0b2 build for two signatures that end up visible in explosiveness with topcrash ranks of 40-50.
Flags: needinfo?(kairo)
I doubt we can easily do that, no, because it seems that information isn't in the minidump.

Do these crashes have consistently low available memory?

This particular signature will be build-specific, but you could look back through nightly/aurora for any crashes with xull.dll@* to see if there was a specific place this started.
Flags: needinfo?(kairo)
With a local copy of the binary and symbols, I managed to get a real stack:

xul!nsTArray_Impl<nsTextFrame::LineDecoration,nsTArrayInfallibleAllocator>::AppendElements<nsTextFrame::LineDecoration>+0x7f117d
xul!nsThread::ProcessNextEvent+0x1fc
xul!NS_ProcessNextEvent+0x2d
xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0x70
xul!MessageLoop::RunHandler+0x51
xul!MessageLoop::Run+0x19
xul!nsThread::ThreadFunc+0x90
nss3!_PR_NativeRunThread+0x167
msvcr100!_callthreadstartex+0x1b
msvcr100!_threadstartex+0x64
kernel32!BaseThreadInitThunk+0xe
ntdll!__RtlUserThreadStart+0x70
ntdll!_RtlUserThreadStart+0x1b

This looks a lot like the OOM in bug 1007763 comment 9, including the low VM values and Mega URLs.

Any other signature starting with "xul.dll@0x8e13cd | xul.dll@0x1ab44c" is probably the same thing. Its predecessor was "xul.dll@0x8f7247 | xul.dll@0xb368c" in 31b1.

These xul.dll hex signatures are showing up in the top-200 on beta and aurora. It might be that xul has begun to outgrow our Breakpad reservation. Here are some recent sizes:
30.0 - 23,390 KB
31b1 - 23,848 KB
31b2 - 23,844 KB
(In reply to David Major [:dmajor] from comment #7)
> With a local copy of the binary and symbols, I managed to get a real stack

Thanks, David!

> 
> xul!nsTArray_Impl<nsTextFrame::LineDecoration,nsTArrayInfallibleAllocator>::
> AppendElements<nsTextFrame::LineDecoration>+0x7f117d
> xul!nsThread::ProcessNextEvent+0x1fc
> xul!NS_ProcessNextEvent+0x2d
> xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0x70
> xul!MessageLoop::RunHandler+0x51
> xul!MessageLoop::Run+0x19
> xul!nsThread::ThreadFunc+0x90
> nss3!_PR_NativeRunThread+0x167
> msvcr100!_callthreadstartex+0x1b
> msvcr100!_threadstartex+0x64
> kernel32!BaseThreadInitThunk+0xe
> ntdll!__RtlUserThreadStart+0x70
> ntdll!_RtlUserThreadStart+0x1b
> 
> This looks a lot like the OOM in bug 1007763 comment 9, including the low VM
> values and Mega URLs.

As a note, that was duped to bug 991845, which seems to have started with the GGC landing.

> These xul.dll hex signatures are showing up in the top-200 on beta and
> aurora. It might be that xul has begun to outgrow our Breakpad reservation.

Hmm, should we increase the reservation, then?

I think David's comment answers Benjamin's questions, thanks!
Flags: needinfo?(kairo)
Robert, does that mean this is GGC related? I've marked it blocking the GGC Crash tracking bug just in case.
Blocks: 994589
Well, it sounds to me like this just might be a dupe of bug 1007763, which is a dupe of bug 991845, which is a signature that started with GGC. I can't tell if that means it's directly related to GGC or not.
Those crashes keep happening with those kinds of signatures:
https://crash-stats.mozilla.com/query/?product=Firefox&version=ALL%3AALL&range_value=4&range_unit=weeks&query_search=signature&query_type=contains&query=_MD_CURRENT_THREAD+|+PR_Unlock

That said, the volume is not too high, about rank #50, ~40 crashes per million ADI on beta.
Crash Signature: [@ xul.dll@0x8e13cd | xul.dll@0x1ab44c | xul.dll@0x126047 | xul.dll@0x19effe | xul.dll@0x1c8403 | _MD_CURRENT_THREAD | PR_Unlock | xul.dll@0x11d2d4 | xul.dll@0x1c844d | xul.dll@0x34a70d] → [@ xul.dll@0x8e13cd | xul.dll@0x1ab44c | xul.dll@0x126047 | xul.dll@0x19effe | xul.dll@0x1c8403 | _MD_CURRENT_THREAD | PR_Unlock | xul.dll@0x11d2d4 | xul.dll@0x1c844d | xul.dll@0x34a70d] [@ xul.dll@0x901098 | xul.dll@0x14c1fc | xul.dll@0xc1be7 | xul.dll@…
xul has grown by 1846 KB since we started reserving address space.

Want me to spend some time in xperf to figure out what the minidump overhead is? On the other hand, the cheaper solution would be to just add a few megs and back out if that doesn't help.
Flags: needinfo?(benjamin)
Crash Signature: xul.dll@0x1534fe0 | _MD_CURRENT_THREAD | PR_Unlock | xul.dll@0x112d34 ] [@ xul.dll@0x901098 | xul.dll@0x14c1fc | xul.dll@0xc1be7 | xul.dll@0x16ab0e | xul.dll@0x165a13 | xul.dll@0x1534fe0 | _MD_CURRENT_THREAD | PR_Unlock | xul.dll@0xbaee4 ] → xul.dll@0x1534fe0 | _MD_CURRENT_THREAD | PR_Unlock | xul.dll@0x112d34 ] [@ xul.dll@0x901098 | xul.dll@0x14c1fc | xul.dll@0xc1be7 | xul.dll@0x16ab0e | xul.dll@0x165a13 | xul.dll@0x1534fe0 | _MD_CURRENT_THREAD | PR_Unlock | xul.dll@0xbaee4 ] [@ xul.dll@0…
Up to you. The original number was mostly a guess.
Assignee: nobody → dmajor
Flags: needinfo?(benjamin)
So I just remembered that the bulk of the allocation is in MapViewOfFile areas that xperf doesn't see, and I don't feel like analyzing this by hand. Plan B!
This ought to buy us a year and change. Here's hoping we'll revisit this for unified xul before then.
Attachment #8452069 - Flags: review?(benjamin)
Attachment #8452069 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/6b925c984240
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
David, if you think it is low risk, can we have an uplift request for aurora and beta? Thanks
Flags: needinfo?(dmajor)
Comment on attachment 8452069 [details] [diff] [review]
Adjust Breakpad reservation for inflation.

Approval Request Comment
[Feature/regressing bug #]: All of them (general code growth)
[User impact if declined]: Useless stacks for some OOM crashes
[Describe test coverage new/current, TBPL]: Only crash-stats
[Risks and why]: This puts us 4MB closer to VM OOM. It's unclear whether this will fully address the broken xul.dll signatures.
[String/UUID change made/needed]: None
Attachment #8452069 - Flags: approval-mozilla-beta?
Attachment #8452069 - Flags: approval-mozilla-aurora?
Flags: needinfo?(dmajor)
Attachment #8452069 - Flags: approval-mozilla-beta?
Attachment #8452069 - Flags: approval-mozilla-beta+
Attachment #8452069 - Flags: approval-mozilla-aurora?
Attachment #8452069 - Flags: approval-mozilla-aurora+
I've looked through Socorro for crashes from the last 4 weeks, containing strings:

1. "| _MD_CURRENT_THREAD | PR_Unlock | xul.dll@" - https://crash-stats.mozilla.com/query/?product=Firefox&version=ALL%3AALL&range_value=4&range_unit=weeks&date=07%2F22%2F2014+11%3A00%3A00&query_search=signature&query_type=contains&query=%7C+_MD_CURRENT_THREAD+%7C+PR_Unlock+%7C+xul.dll%40&reason=&release_channels=&build_id=&process_type=any&hang_type=any

- Firefox 31 - many crashes up to July 3rd, 1 crash in build from July 7th (before the fix landed), 0 crashes since then
- Firefox 32 - 12 crashes for builds between July 4th - July 13th (before the fix landed), 3 crashes (xul.dll@0x929fa7 | xul.dll@0x16fe00 | xul.dll@0xf591e | xul.dll@0xf5928 | xul.dll@0x16713e | xul.dll@0x1917a3 | _MD_CURRENT_THREAD | PR_Unlock | xul.dll@0xf37a4 | xul.dll@0x18930d | xul.dll@0x35a2f8) for build from July 16th (20140716004001), 0 crashes since July 17th
- Firefox 33 - 0 crashes

2. "_MD_CURRENT_THREAD | PR_Unlock" (additional crashes than #1) - https://crash-stats.mozilla.com/query/?product=Firefox&version=ALL%3AALL&range_value=4&range_unit=weeks&query_search=signature&query_type=contains&query=_MD_CURRENT_THREAD+|+PR_Unlock

- Firefox 31 - the most recent are 3 crashes (with 3 different signatures) with build from July 10th (20140710141843), before the fix landed
- Firefox 32 - 0 crashes for July
- Firefox 33 - 0 crashes

Looking at the time when the fix landed, given that after these moments there are only 3 crashes in the 32.0 Aurora build from July 16th, and 0 crashes for 33.0 Nightly and 31.0 Beta, I think this can be marked as verified.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: