Closed Bug 429442 Opened 12 years ago Closed 12 years ago

crashes [@ nsJSIID::HasInstance][@ XPCNativeSet::FindInterfaceWithIID]

Categories

(Core :: XPConnect, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla1.9

People

(Reporter: dbaron, Assigned: bent.mozilla)

References

Details

(Keywords: crash, regression, topcrash, Whiteboard: [has patch][has review][has approval])

Crash Data

Attachments

(1 file, 3 obsolete files)

I just updated, and browsing around I pretty quickly crashed my Linux build.

The crash I hit turns out to be a recent topcrash that started in 20080415 builds:
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&range_unit=weeks&version=Firefox%3A3.0pre&signature=nsJSIID%3A%3AHasInstance(nsIXPConnectWrappedNative*%2C+JSContext*%2C+JSObject*%2C+long%2C+int*%2C+int*)&range_value=1
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&range_unit=weeks&version=Firefox%3A3.0pre&signature=XPCNativeSet%3A%3AFindInterfaceWithIID(nsID+const%26)+const&range_value=1

Stack is (from a build with no inlining):

#5  <signal handler called>
#6  0x01054ad4 in XPCNativeInterface::GetIID (this=0xe900e8)
    at /builds/trunk/mozilla/js/src/xpconnect/src/xpcinlines.h:382
#7  0x01054b23 in XPCNativeSet::FindInterfaceWithIID (this=0x13e9f428, 
    iid=@0x881b5d8)
    at /builds/trunk/mozilla/js/src/xpconnect/src/xpcinlines.h:521
#8  0x01054b70 in XPCWrappedNative::HasInterfaceNoQI (this=0x13c23088, 
    iid=@0x881b5d8)
    at /builds/trunk/mozilla/js/src/xpconnect/src/xpcinlines.h:680
#9  0x01053390 in nsJSIID::HasInstance (this=0xa535048, wrapper=0xa535068, 
    cx=0x9a19708, obj=0xa533900, val=295205824, bp=0xbf8076c0, 
    _retval=0xbf80738c)
    at /builds/trunk/mozilla/js/src/xpconnect/src/xpcjsid.cpp:591
#10 0x0107a18f in XPC_WN_Helper_HasInstance (cx=0x9a19708, obj=0xa533900, 
    v=295205824, bp=0xbf8076c0)
    at /builds/trunk/mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1016
#11 0x00549ee1 in js_HasInstance (cx=0x9a19708, obj=0x0, v=295205824, 
    bp=0xbf8076c0) at /builds/trunk/mozilla/js/src/jsobj.c:4507
#12 0x0052de3e in js_Interpret (cx=0x9a19708)
    at /builds/trunk/mozilla/js/src/jsinterp.c:6214
#13 0x0053e2b3 in js_Invoke (cx=0x9a19708, argc=1, vp=0x14b43ef8, flags=2)
    at /builds/trunk/mozilla/js/src/jsinterp.c:1299
#14 0x0053d71a in js_InternalInvoke (cx=0x9a19708, obj=0x8ed1d60, 
    fval=216719584, flags=2, argc=1, argv=0x123bb800, rval=0xbf8078dc)
    at /builds/trunk/mozilla/js/src/jsinterp.c:1355
#15 0x004e7cbb in JS_CallFunctionValue (cx=0x9a19708, obj=0x8ed1d60, 
    fval=216719584, argc=1, argv=0x123bb800, rval=0xbf8078dc)
    at /builds/trunk/mozilla/js/src/jsapi.c:5058
#16 0x031849eb in nsJSContext::CallEventHandler (this=0x10e273e0, 
    aTarget=0x12fb6000, aScope=0x8ed1d60, aHandler=0xceae0e0, 
    aargv=0x13ce6514, arv=0xbf807a24)
    at /builds/trunk/mozilla/dom/src/base/nsJSEnvironment.cpp:1962
#17 0x031acce3 in nsGlobalWindow::RunTimeout (this=0x12fb6000, 
    aTimeout=0x14109598)
    at /builds/trunk/mozilla/dom/src/base/nsGlobalWindow.cpp:7754
#18 0x031ad339 in nsGlobalWindow::TimerCallback (aTimer=0x1497da78, 
    aClosure=0x14109598)
    at /builds/trunk/mozilla/dom/src/base/nsGlobalWindow.cpp:8085
#19 0x0065a90c in nsTimerImpl::Fire (this=0x1497da78)
    at /builds/trunk/mozilla/xpcom/threads/nsTimerImpl.cpp:400
#20 0x0065abb4 in nsTimerEvent::Run (this=0xb2bb6180)
    at /builds/trunk/mozilla/xpcom/threads/nsTimerImpl.cpp:490
#21 0x00654b1f in nsThread::ProcessNextEvent (this=0x87bc480, mayWait=1, 
    result=0xbf807b60) at /builds/trunk/mozilla/xpcom/threads/nsThread.cpp:524
#22 0x005ec41f in NS_ProcessNextEvent_P (thread=0x0, mayWait=1)
    at nsThreadUtils.cpp:227
#23 0x012c7400 in nsBaseAppShell::Run (this=0x887f790)
    at /builds/trunk/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:170
#24 0x013c109f in nsAppStartup::Run (this=0x88bb480)
    at /builds/trunk/mozilla/toolkit/components/startup/src/nsAppStartup.cpp:181
#25 0x001337cf in XRE_main (argc=<value optimized out>, argv=0xbf8080f4, 
    aAppData=0x87839d8)
    at /builds/trunk/mozilla/toolkit/xre/nsAppRunner.cpp:3172
#26 0x08048d01 in main (argc=4, argv=0x11fe72)
    at /builds/trunk/mozilla/browser/app/nsBrowserApp.cpp:158


(gdb) f 6
#6  0x01054ad4 in XPCNativeInterface::GetIID (this=0xe900e8)
    at /builds/trunk/mozilla/js/src/xpconnect/src/xpcinlines.h:382
382         return NS_SUCCEEDED(mInfo->GetIIDShared(&iid)) ? iid : nsnull;
(gdb) p mInfo
$1 = {mRawPtr = 0xffff1df4}
(gdb) p *$.mRawPtr
Cannot access memory at address 0xffff1df4
(gdb) p *this
$2 = {mInfo = {mRawPtr = 0xffff1df4}, mName = 15500, mMemberCount = 8100, 
  mMembers = {{mName = 15536, mVal = -57356, mIndex = 15572, mFlags = 0}}}
(gdb) up
#7  0x01054b23 in XPCNativeSet::FindInterfaceWithIID (this=0x13e9f428, 
    iid=@0x881b5d8)
    at /builds/trunk/mozilla/js/src/xpconnect/src/xpcinlines.h:521
521             if(iface->GetIID()->Equals(iid))
(gdb) p *this
$3 = {mMemberCount = 230, mInterfaceCount = 231, mInterfaces = {0xe900e8}}
(gdb) p i
No symbol "i" in current context.
(gdb) p pp
$4 = (XPCNativeInterface * const *) 0x13e9f42c
(gdb) p &this->mInterfaces
$5 = (XPCNativeInterface *(*)[1]) 0x13e9f42c
(gdb) up
#8  0x01054b70 in XPCWrappedNative::HasInterfaceNoQI (this=0x13c23088, 
    iid=@0x881b5d8)
    at /builds/trunk/mozilla/js/src/xpconnect/src/xpcinlines.h:680
680         return nsnull != GetSet()->FindInterfaceWithIID(iid);
(gdb) p *this
$6 = (XPCWrappedNative) {<nsIXPConnectWrappedNative> = {<nsIXPConnectJSObjectHolder> = {<nsISupports> = {_vptr.nsISupports = 0x10a59a8}, <No data fields>}, 
    mIdentity = 0x918b208}, mRefCnt = {mValue = 1}, _mOwningThread = {
    mThread = 0x877e548}, 
  static _cycleCollectorGlobal = {<nsXPCOMCycleCollectionParticipant> = {<nsScriptObjectTracer> = {<nsCycleCollectionParticipant> = {
          _vptr.nsCycleCollectionParticipant = 0x10a5a08}, <No data fields>}, <No data fields>}, <No data fields>}, {mMaybeScope = 0x12f87a88, 
    mMaybeProto = 0x12f87a88}, mSet = 0x13e9f428, mFlatJSObject = 0x11987ba0, 
  mScriptableInfo = 0x144e4ae8, mFirstChunk = {mTearOffs = {{
        mInterface = 0x88d95f8, mNative = 0x918b208, mJSObject = 0x0}}, 
    mNextChunk = 0x1372f820}, mWrapper = 0x11987bc0, mThread = {
    mRawPtr = 0x87bc480}}
(gdb) x/wa $6.mNative
There is no member or method named mNative.
(gdb) x/wa 0x918b208
0x918b208:      0x34ef6e8 <_ZTV25nsHTMLSharedObjectElement+8>
(gdb) p $6.mInterface
There is no member or method named mInterface.
(gdb) p (XPCNativeInterface*)0x88d95f8
$7 = (XPCNativeInterface *) 0x88d95f8
(gdb) p *$
$8 = {mInfo = {mRawPtr = 0x88d61e8}, mName = 143447012, mMemberCount = 1, 
  mMembers = {{mName = 143446732, mVal = 144298992, mIndex = 0, mFlags = 3}}}
(gdb) p/x $.mName
$9 = 0x88cd3e4
(gdb) p *(JSString*)0x88cd3e0
$10 = {length = 536870923, u = {chars = 0x88d6200, base = 0x88d6200}}
(gdb) p/x $10.length
$11 = 0x2000000b
(gdb) x/12hc $10.u.chars
0x88d6200:      110 'n' 115 's' 73 'I'  83 'S'  117 'u' 112 'p' 112 'p' 111 'o'
0x88d6210:      114 'r' 116 't' 115 's' 0 '\0'

This makes it seem like the XPCWrappedNative is ok, but the memory pointed to by its mSet is trashed.
Flags: blocking1.9?
Severity: normal → critical
Keywords: regression
I can't reproduce this, fwiw. Maybe this is related to bug 429585?
dbaron:  You still able to reproduce this?
Please re-nom if you find this crash is a top crasher, have more info, or we can easily reproduce.

Marking wanted1.9.0.x+.
Flags: wanted1.9.0.x+
Flags: blocking1.9?
Flags: blocking1.9-
As mentioned in the Firefox 3 meeting (and as noted by the "topcrash" keyword), this is the current topcrash in nightly builds and is a recent regression (first appeared in 2008041500).

Renominating.
Flags: blocking1.9- → blocking1.9?
Where in top crash?  #3 or #40?

re-nom when known.
Flags: blocking1.9? → blocking1.9-
It's the nr. 1 topcrash at http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.0pre
Flags: blocking1.9- → blocking1.9?
OK.  Well, I'll add it to the blocker list, but if we can't find a way to reproduce and fix, it's going to fall off.
Flags: blocking1.9? → blocking1.9+
Almost (but not quite) all of the breakpad crash dumps look like this:

0  	xul.dll  	nsJSIID::HasInstance  	 mozilla/js/src/xpconnect/src/xpcjsid.cpp:591
1 	xul.dll 	XPC_WN_Helper_HasInstance 	mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1016
2 	js3250.dll 	js_Interpret 	mozilla/js/src/jsinterp.c:6214
3 	js3250.dll 	js_Invoke 	mozilla/js/src/jsinterp.c:1299
4 	js3250.dll 	js_InternalInvoke 	mozilla/js/src/jsinterp.c:1355
5 	js3250.dll 	JS_CallFunctionValue 	mozilla/js/src/jsapi.c:5029
6 	xul.dll 	nsJSContext::CallEventHandler 	mozilla/dom/src/base/nsJSEnvironment.cpp:1962
7 	xul.dll 	nsGlobalWindow::RunTimeout 	mozilla/dom/src/base/nsGlobalWindow.cpp:7754

That matches dbaron's stack. So some setTimeout callback does an instanceof on an element and crashes?
For what it's worth, I hit this twice in less than an hour when I filed it.  Since I was about to leave for a trip, I then rolled back my tree to before the checkins that caused the bug.  I updated again after my return on Sunday (April 27), and haven't seen this crash since.  That said, I was probably seeing it on a specific page or with a specific UI action that I was doing then...
It's still responsible for a large percentage of crashes in yesterday's and today's nightly :(
Keywords: qawanted
Whiteboard: [regression range needed]
(In reply to comment #12)
> Based on my crash-stats, this regressed in 2008041500 build and wasn't present
> in 2008041400. See also comment 5 where I first mention this.

... or comment 0, where I mentioned it.

crash-stats chops the hours off, so the query should probably be something more like http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-04-14+02:00&maxdate=2008-04-15+05:00&cvsroot=%2Fcvsroot
The crash can be produced with NoScript enabled and by following the steps at  http://forums.mozillazine.org/viewtopic.php?t=652187
That looks extremely promising! Thanks!!!
(In reply to comment #14)

With noscript enabled refreshing this page (sometimes many times) is enough to crash:

http://lifehacker.com/384795/palbee-does-collaborative-video-conferencing 
Assignee: nobody → bent.mozilla
any thoughts on what a fix might look like?
almost got it nailed down
Attached patch Patch, v1 (obsolete) — Splinter Review
This fixes the crash for 1.9.

So we have 2 problems here.

* Adding script blockers created a failure path in nsHTMLPluginObjElementSH::PostCreate if GetScriptContextFromJSContext failed.

* We don't handle failed PostCreate very well in XPCWrappedNative. Here's the order of wacky events that led us to crash:

1. We try to wrap an HTMLEmbedElement really early in page load due to a content policy in place (I used NoScript in this case).
2. The wrapper is added to our map of wrapped natives using the HTMLEmbedElement as the key.
3. PostCreate is called, but a script blocker exists and so we tried to make a runnable for the plugin instantiation. GetScriptContextFromJSContext was failing and so PostCreate failed.
4. Because PostCreate failed we removed the wrapper from the map of wrapped natives and released the wrapper. However, the JS engine still has a reference to the wrapper's JS object so the wrapper doesn't actually die.
5. At a later time we try to wrap the HTMLEmbedElement again. This time everything goes fine and we add the wrapper to the map.
6. The first wrapper gets garbage collected. In its destructor it blindly removes an object from the wrapped native map - it assumes (incorrectly in this case) that a nice 1:1 mapping exists from identity->wrapper and that it will be removing itself from the map. Unfortunately it actually removes our second wrapper.
7. Now that the second wrapper is no longer in the map we don't protect its mSet from GC. A later GC kills the set and then we've got all the right conditions for this crash.

Yuck.

So this patch does a few things.

1. We make the nsHTMLElementSH::PostCreate function succeed by using XPConnect's safe context in the event that GetScriptContextFromJSContext fails. This entirely prevents the crash.
2. We mark wrappers whose PostCreate fails as invalid (using a bit in mSet). Invalid wrappers won't remove themselves (or some legitimate other wrapper) from the map when they die.
3. Added some assertions to make sure that we don't touch invalid wrappers after they are marked. Unfortunately this can cause crashes because the PostCreate function may either directly or indirectly hand out a reference to the soon-to-be-invalid wrapper (as nsHTMLEmbedElementSH::PostCreate does in SetupProtoChain). Crashing sucks, but the alternative is that callers get a half-constructed wrapper that will probably crash anyway :(
4. Added some assertions to make sure that PostCreate doesn't fail since we really can't handle that case at the moment. We need to do something post-1.9 to safeguard against this case.
5. Added some assertions in the map code to make sure we are always removing the wrapper we think we're removing to avoid having to debug this again in the future.
Attachment #319053 - Flags: review?(jst)
Version: unspecified → Trunk
Attached patch Patch, v2 (obsolete) — Splinter Review
After talking with sicking and jst we decided that we shouldn't change the behavior of GetSet at this late hour. Instead we'll assert but faithfully return the set on an uninitialized wrapper.
Attachment #319053 - Attachment is obsolete: true
Attachment #319083 - Flags: review?(jst)
Attachment #319053 - Flags: review?(jst)
Comment on attachment 319083 [details] [diff] [review]
Patch, v2

- In nsHTMLPluginObjElementSH::PostCreate():

+    rv = SetupProtoChain(wrapper, cx, obj);
 
-  nsCOMPtr<nsIScriptContext> scriptContext =
-    GetScriptContextFromJSContext(cx);
-  NS_ENSURE_TRUE(scriptContext, NS_ERROR_UNEXPECTED);
+    // If SetupProtoChain failed then we're in real trouble. We're about to fail
+    // PostCreate but it's more than likely that we handed our (now invalid)
+    // wrapper to someone already. Bug 429442 is an example of the kind of crash
+    // that can result from such a situation.
+    NS_WARN_IF_FALSE(NS_SUCCEEDED(rv),
+                     "SetupProtoChain failed! Random crashes in XPConnect may "
+                     "result!");
+    return rv;

I wonder if we should simply return NS_OK here. The worst that can happen if we do that is probably still better than what can happen if we fail here.

- In xpcmaps.h:

     inline void Remove(XPCWrappedNative* wrapper)
     {
         NS_PRECONDITION(wrapper,"bad param");
+#ifdef DEBUG
+        XPCWrappedNative* wrapperInMap = Find(wrapper->GetIdentityObject());
+        NS_ASSERTION(!wrapperInMap || wrapperInMap == wrapper,
+                     "About to remove a different wrapper with the same "
+                     "nsISupports identity! This will most likely cause serious"
+                     " problems!");

Move the space before "problems" to the end of the previous line to be consistent.

- In XPCWrappedNative::GetNewOrUsed():

+                NS_ERROR("PostCreate failed! This is known to cause "
+                         "inconsistent state for some class types and may even"
+                         "cause a crash in combination with a JS GC. Fix the "

Add a space after "even".

- In XPCWrappedNative::~XPCWrappedNative():

+    if (!(XPC_SET_WORD(mSet) & XPC_SET_UNINITIALIZED_WRAPPER)) {

I wonder if we should add an IsUninitialized() inline function that could hide this bit-twiddling here and in other tests?

r+sr=with that, but I'm wondering here... Given how late we are in the Firefox 3 cycle, should we Just fix the DOM PostCreate() hook for Firefox 3 (fixes this crash), and leave the XPConnect changes for post-fx3? I'm sort of leaning towards that myself...
Attachment #319083 - Flags: superreview+
Attachment #319083 - Flags: review?(jst)
Attachment #319083 - Flags: review+
> r+sr=with that, but I'm wondering here... Given how late we are in the Firefox
> 3 cycle, should we Just fix the DOM PostCreate() hook for Firefox 3 (fixes this
> crash), and leave the XPConnect changes for post-fx3? I'm sort of leaning
> towards that myself...
> 

Which one do we want to do Bent/JST?
Attached patch Patch, v3 (obsolete) — Splinter Review
(In reply to comment #21)
> I wonder if we should simply return NS_OK here. The worst that can happen if we
> do that is probably still better than what can happen if we fail here.

Done... hesitantly ;)

> Move the space before "problems" to the end of the previous line to be
> consistent.

It was that way to keep the line under 80 chars, but I don't really care.

> Add a space after "even".

Doh, done.

> I wonder if we should add an IsUninitialized() inline function that could hide
> this bit-twiddling here and in other tests?

Done.

> Given how late we are in the Firefox 3 cycle, should we Just fix the DOM
> PostCreate() hook for Firefox 3 (fixes this crash), and leave the XPConnect
> changes for post-fx3?

Done, added a comment saying we need to test IsUninitialized before removing the wrapped native from the map on deletion. Will file a followup to make sure we don't forget about it.
Attachment #319083 - Attachment is obsolete: true
Attachment #319403 - Flags: superreview?(jst)
Attachment #319403 - Flags: review?(jst)
Duplicate of this bug: 432258
Attached patch Patch, vParanoidSplinter Review
Super-paranoid patch that doesn't change anything except for this specific crash.
Attachment #319403 - Attachment is obsolete: true
Attachment #319470 - Flags: superreview?(jst)
Attachment #319470 - Flags: review?(jst)
Attachment #319403 - Flags: superreview?(jst)
Attachment #319403 - Flags: review?(jst)
Attachment #319470 - Attachment is patch: true
Attachment #319470 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 319470 [details] [diff] [review]
Patch, vParanoid

Yeah, I think this is what we should do for 1.9, more for post 1.9... r+sr=jst
Attachment #319470 - Flags: superreview?(jst)
Attachment #319470 - Flags: superreview+
Attachment #319470 - Flags: review?(jst)
Attachment #319470 - Flags: review+
Attachment #319470 - Flags: approval1.9?
Duplicate of this bug: 432295
Comment on attachment 319470 [details] [diff] [review]
Patch, vParanoid

a1.9=beltzner
Attachment #319470 - Flags: approval1.9? → approval1.9+
Whiteboard: [has patch][has review][has approval]
Landed.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9
> Will file a followup to make sure we don't forget about it.

Filed bug 432569.

Depends on: 439316
Flags: wanted1.9.0.x+
David, there are a couple of crash reports for Firefox 3.0.3 with the following stack. Do they depend on this bug or shall I better file a new one for that issue?

0  	xul.dll  	nsJSIID::HasInstance  	 mozilla/js/src/xpconnect/src/xpcjsid.cpp:591
1 	xul.dll 	XPC_WN_Helper_HasInstance 	mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1016
2 	js3250.dll 	js_Interpret 	mozilla/js/src/jsinterp.c:6237
3 	js3250.dll 	js_Invoke 	mozilla/js/src/jsinterp.c:1313
4 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	mozilla/js/src/xpconnect/src/xpcwrappedjsclass.cpp:1523
5 	xul.dll 	nsXPCWrappedJS::CallMethod 	mozilla/js/src/xpconnect/src/xpcwrappedjs.cpp:559
6 	xul.dll 	PrepareAndDispatch 	mozilla/xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:114
7 	xul.dll 	SharedStub 	mozilla/xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:141
8 	xul.dll 	nsObserverService::NotifyObservers 	mozilla/xpcom/ds/nsObserverService.cpp:181
9 	xul.dll 	xul.dll@0x7b98fb
Crash Signature: [@ nsJSIID::HasInstance] [@ XPCNativeSet::FindInterfaceWithIID]
You need to log in before you can comment on or make changes to this bug.