Closed Bug 815542 Opened 12 years ago Closed 11 years ago

crash in IncrementalCollectSlice

Categories

(Core :: JavaScript Engine, defect)

19 Branch
All
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla21
Tracking Status
firefox19 + unaffected
firefox20 + fixed
firefox21 --- fixed
firefox-esr17 --- unaffected
b2g18 --- unaffected

People

(Reporter: scoobidiver, Assigned: jonco)

References

Details

(4 keywords, Whiteboard: [js:p1:fx21][native-crash])

Crash Data

Attachments

(1 file)

It's #2 top browser crasher in 20.0a1 on Linux.
It started spiking around 19.0a1/20121117. The regression range for the spike might be:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=58ebb638a7ea&tochange=9a6d708faf3f

Signature 	IncrementalCollectSlice More Reports Search
UUID	dc45ff0c-32fc-4517-b219-2ae202121126
Date Processed	2012-11-26 22:38:35
Uptime	1889
Last Crash	35.4 minutes before submission
Install Age	31.5 minutes since version was first installed.
Install Time	2012-11-26 22:06:43
Product	Firefox
Version	20.0a1
Build ID	20121126030823
Release Channel	nightly
OS	Linux
OS Version	0.0.0 Linux 3.5.2 #1 SMP Wed Aug 15 17:35:07 EDT 2012 i686
Build Architecture	x86
Build Architecture Info	AuthenticAM family 15 model 75 stepping 2
Crash Reason	SIGSEGV
Crash Address	0x2c9
App Notes 	
OpenGL: VMware, Inc. -- Gallium 0.4 on llvmpipe (LLVM 0x209) -- 2.1 Mesa 8.0.4 -- texture_from_pixmap
EMCheckCompatibility	True

Frame 	Module 	Signature 	Source
0 	libxul.so 	IncrementalCollectSlice 	js/src/jsgc.cpp:3429
1 	libxul.so 	GCCycle 	js/src/jsgc.cpp:4558
2 	libxul.so 	Collect 	js/src/jsgc.cpp:4673
3 	libxul.so 	js::IncrementalGC 	js/src/jsfriendapi.cpp:190
4 	libxul.so 	nsJSContext::GarbageCollectNow 	dom/base/nsJSEnvironment.cpp:2948
5 	libxul.so 	GCTimerFired 	dom/base/nsJSEnvironment.cpp:3210
6 	libxul.so 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:482
7 	libxul.so 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:565
8 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:627
9 	libxul.so 	NS_ProcessNextEvent_P 	obj-firefox/xpcom/build/nsThreadUtils.cpp:221
10 	libxul.so 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:117
11 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:215
12 	libxul.so 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:163
13 	libxul.so 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:290
14 	libxul.so 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3823
15 	libxul.so 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3890
16 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:4084
17 	firefox 	main 	browser/app/nsBrowserApp.cpp:174

More reports at:
https://crash-stats.mozilla.com/report/list?signature=IncrementalCollectSlice
CC'ing billm and jonco who touched this code the most over the summer when this is likely to have regressed.

Worries me that crashing in GC could be a security problem. Many of the crashes are near(ish) null like this one and worry me somewhat less. For example, the crash on this line occurs a lot and is either near 0x541 in 64-bit builds or 0x2c9 in x86 or arm builds (some minor variation by build date). "dst->compartment()->maybeAlive = true;"

Some of the Firefox 17.0 crashes on Windows look scarier, but maybe it's a separate GC problem: bp-d6f96ab7-c82c-4343-b021-f3b0a2121124. and few in Linux look like they've jumped off into nowhere as well.
Bug 812393 is in that range.
There seem to be a few problems grouped in the same signature. The spike is from a NULL pointer deref in which a cross-compartment wrapper is wrapping NULL.
One comment says Firefox crashed after accessing settings of LastPass 2.0.
Tracking for FF20 since it's a top crash there, but this isn't yet a major user issue in FF19.
Keywords: qawanted
QA Contact: jbecerra
(In reply to Bill McCloskey (:billm) from comment #3)
> The spike is
> from a NULL pointer deref in which a cross-compartment wrapper is wrapping
> NULL.

Sounds like that might same issue as bug 816054 then.
Assignee: general → jcoppeard
Whiteboard: [js:p2]
(In reply to Alex Keybl [:akeybl] from comment #5)
> Tracking for FF20 since it's a top crash there, but this isn't yet a major
> user issue in FF19.
It's #1 top browser crasher in 19.0a2 on Linux.
These are mostly crashes on a small-ish address (0x2cd) here: http://hg.mozilla.org/releases/mozilla-aurora/annotate/32dba69af0fa/js/src/jsgc.cpp#l3457

This code was introduced by Bill's TRANSPLANT gc patch, bug 803376.
That line is:
  dst->compartment()->maybeAlive = true;

0x2cd is 717, and JSCompartment is a pretty huge structure with maybeAlive fairly far along it, so my guess would be that given the consistency in that particular address that compartment() is NULL, which sounds similar to comment 3.
Dropping qawanted as per yesterdays Channel Meeting discussion. Given level of engagement by Engineering there's no immediate QA need here. Please re-add QAWANTED if there's something you'd like QA to follow-up on.
Keywords: qawanted
Whiteboard: [js:p2] → [js:p1:fx21]
KaiRo - should we consider uplifting bug 816054 to FF19? Looks like this isn't a very prevalent crash in FF20/21, which both have that fix.
Flags: needinfo?(kairo)
(In reply to Alex Keybl [:akeybl] from comment #11)
> KaiRo - should we consider uplifting bug 816054 to FF19?

If :billm and/or :jonco think it fixes this problem, I'm all for it, but I have no clue as to this could be true or not.
Flags: needinfo?(kairo)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12)

The patch for bug 816054 doesn't actually fix any problems, it simply improves the assertions to make the problem apparent sooner and make it easier to debug.

Whatever caused this must have got fixed by something else I think.
(In reply to Jon Coppeard (:jonco) from comment #13)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12)
> 
> The patch for bug 816054 doesn't actually fix any problems, it simply
> improves the assertions to make the problem apparent sooner and make it
> easier to debug.
> 
> Whatever caused this must have got fixed by something else I think.

How are you planning on determining that? This is a high volume Linux top crasher, and beta 3 is going to build today. We need your help.
Attached patch Possible fixSplinter Review
Here's a fix for an issue introduced by 812393, which is possibly the cause of these crashes.  It's a bit difficult to tell for sure because I haven't been able to reproduce the issue.

The problem is that XPCStringConvert::ReadableToJSVal keeps a cached JSString and it's possible for it to hand out a pointer to this after sweeping has started but before the string is finalized, resulting in a pointer to garbage.

If this is in fact the root of this issue, then the reason the crash occurs more infrequently in 20 onwards is because bug 790338 greatly reduced the window of opportunity for this to happen, because sweeping now happens in small groups of compartments rather than all compartments at once.
Attachment #705971 - Flags: review?(wmccloskey)
Comment on attachment 705971 [details] [diff] [review]
Possible fix

Review of attachment 705971 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks. Sorry you had to track this down; looks like it was all my fault.

::: js/xpconnect/src/XPCString.cpp
@@ -35,5 @@
>      nsStringBuffer* buf = nsStringBuffer::FromData(chars);
> -    if (buf == sCachedBuffer) {
> -        sCachedBuffer = nullptr;
> -        // No need to clear sCachedString
> -    }

Maybe assert here that buf != sCachedBuffer.
Attachment #705971 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #16)

No worries :)

> ::: js/xpconnect/src/XPCString.cpp
> @@ -35,5 @@
> >      nsStringBuffer* buf = nsStringBuffer::FromData(chars);
> > -    if (buf == sCachedBuffer) {
> > -        sCachedBuffer = nullptr;
> > -        // No need to clear sCachedString
> > -    }
> 
> Maybe assert here that buf != sCachedBuffer.

I tried this but actually I think it is valid for another string with the same nsStringBuffer to be created and cached after sweeping starts but before the original is finalized.  Really I think we want to assert that the string being finalized isn't cached, but the string pointer is not passed to the finalizer.
Whiteboard: [js:p1:fx21] → [js:p1:fx21][native-crash]
https://hg.mozilla.org/mozilla-central/rev/cd471ce8fd8e
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
(In reply to Jon Coppeard (:jonco) from comment #18)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/cd471ce8fd8e

Jon - can you nominate for uplift to aurora/beta no later than today with a risk evaluation?
Comment on attachment 705971 [details] [diff] [review]
Possible fix

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 812393
User impact if declined: Occasional crashes
Testing completed (on m-c, etc.): This patch has been on m-c for a week
Risk to taking this patch (and alternatives if risky): Low
String or UUID changes made by this patch: None
Attachment #705971 - Flags: approval-mozilla-beta?
Attachment #705971 - Flags: approval-mozilla-aurora?
(In reply to Jon Coppeard (:jonco) from comment #21)
> Risk to taking this patch (and alternatives if risky): Low

Do you have any ideas as to where regressions would rear their head if found? Would it be another stability issue?
Comment on attachment 705971 [details] [diff] [review]
Possible fix

Approving for Aurora in preparation for uplift to Beta.
Attachment #705971 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: needinfo?(jcoppeard)
(In reply to Alex Keybl [:akeybl] from comment #22)
> Do you have any ideas as to where regressions would rear their head if
> found? Would it be another stability issue?

It's always possible that a similar issue would occur, I think the patch is pretty safe though.
Flags: needinfo?(jcoppeard)
Spoke with jonco - we plan to backout bug 812393 (thus causing a slight perf regression) for FF19 and only take the forward fix for FF20.
Blocks: 812393
Attachment #705971 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
There are 54 crashes on FF 20b1 (buildID: 20130220104816) in the last week. I assume this is not really fixed ?
This signature contains a bunch of different crashes. However, the Linux crashes do seem to have gone away, as far as I can tell.
(In reply to Bill McCloskey (:billm) from comment #29)
> This signature contains a bunch of different crashes. However, the Linux
> crashes do seem to have gone away, as far as I can tell.

Were the changes made in this bug for Firefox 20 only meant to address the Linux crashes contained in this signature? If so, Paul, please file a follow up bug.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: