Last Comment Bug 669903 - double unlink crashes [@ nsGenericHTMLElement::ClearDataset()] nsDOMStringMap's Unlink
: double unlink crashes [@ nsGenericHTMLElement::ClearDataset()] nsDOMStringMap...
Status: VERIFIED FIXED
[qa!][testday-20120203]
: crash, verified-beta
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: All All
: -- critical (vote)
: mozilla10
Assigned To: William Chen [:wchen]
:
Mentors:
Depends on:
Blocks: dataset
  Show dependency treegraph
 
Reported: 2011-07-07 07:59 PDT by Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( )
Modified: 2012-02-03 08:35 PST (History)
12 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
affected
verified
verified


Attachments
Possible fix for crash (1.49 KB, patch)
2011-07-25 11:45 PDT, William Chen [:wchen]
jonas: review+
Details | Diff | Review
Possible fix for crash (1.62 KB, patch)
2011-07-25 12:16 PDT, William Chen [:wchen]
no flags Details | Diff | Review
Possible fix for crash (1.64 KB, patch)
2011-07-25 12:26 PDT, William Chen [:wchen]
jonas: review+
Details | Diff | Review
check for double unlink (1.58 KB, patch)
2011-12-07 17:02 PST, Andrew McCreight [:mccr8]
jonas: review+
bugzilla: approval‑mozilla‑aurora+
Details | Diff | Review

Description Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2011-07-07 07:59:06 PDT
This bug was filed from the Socorro interface and is 
report bp-66910cea-e075-42a7-919b-98a242110705 .
============================================================= 
0 	xul.dll 	nsGenericHTMLElement::ClearDataset 	content/html/content/src/nsGenericHTMLElement.cpp:394
1 	xul.dll 	nsDOMStringMap::cycleCollection::Unlink 	content/html/content/src/nsDOMStringMap.cpp:54
2 	xul.dll 	nsCycleCollector::CollectWhite 	xpcom/base/nsCycleCollector.cpp:1962
3 	xul.dll 	nsCycleCollector::FinishCollection 	xpcom/base/nsCycleCollector.cpp:2784
4 	xul.dll 	nsCycleCollectorRunner::Collect 	xpcom/base/nsCycleCollector.cpp:3437
5 	xul.dll 	nsCycleCollector_collect 	xpcom/base/nsCycleCollector.cpp:3503
6 	xul.dll 	nsJSContext::CycleCollectNow 	dom/base/nsJSEnvironment.cpp:3271
7 	xul.dll 	CCTimerFired 	dom/base/nsJSEnvironment.cpp:3311
8 	xul.dll 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:424
9 	xul.dll 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:520
10 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:617
11 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:134
12 	xul.dll 	xul.dll@0xb79167 	
13 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:202
14 	xul.dll 	xul.dll@0x37125f 	
15 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:176
16 	xul.dll 	nsEvent::nsEvent 	obj-firefox/dist/include/nsGUIEvent.h:569
17 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:189
18 	xul.dll 	nsAppShell::Run 	widget/src/windows/nsAppShell.cpp:256
19 		@0x7530ffff 

While testing, I'm seeing this crash occuring. Sorry, atm I don't have a testcase for this crash.
Comment 1 Olli Pettay [:smaug] 2011-07-14 11:33:07 PDT
Is this a null pointer crash? null slots (maybe even null element) + offset to
mDataset.

Is it possible that something is keeping nsDOMStringMap alive after
first unlink. Then it gets unlinked again, and mElement is already null.
Comment 2 Jonas Sicking (:sicking) 2011-07-14 12:16:43 PDT
Yeah, looking at the code I couldn't make sense of this.

As long as the dataset is alive, the element should be alive too. *Especially* in the places where ClearDataset is called since we only ever clear the strong mElement reference after calling ClearDataset.

And as long as the dataset is alive, the element should be pointing to it, as that pointer is set up as soon as the dataset is created, and only cleared once the dataset goes away.

And the slots are only deleted when the element is deleted, which again should only happen when the dataset is going away.


Or, at least that was my initial analysis. I just realized that there is one situation when this isn't true. Specifically, if we end up cycle collecting a dataset more than once we'll end up dereferencing a null mElement pointer in the unlink code. I suppose that can happen.

So what we should do is to add a if (mElement) check to the unlink code and see if that helps.


William, can you try that?
Comment 3 Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2011-07-21 20:11:13 PDT
Sorry, I haven't been able to get a minimized or unminimized testcase yet.
I can tell you that in order for this crash to trigger is when I do something like this setTimeout(function(){document.write(document.documentElement.innerHTML)}, 1000);. And the testcase probably keeps a whole lot of dataset references somewhere.
Comment 4 William Chen [:wchen] 2011-07-25 11:45:16 PDT
Created attachment 548231 [details] [diff] [review]
Possible fix for crash

I added a check for the element prior to clearing the dataset. No test case because I was not able to reproduce the crash. I tried a few things but I was not able to get the unlink code to run more than once.
Comment 5 Jonas Sicking (:sicking) 2011-07-25 12:00:48 PDT
Comment on attachment 548231 [details] [diff] [review]
Possible fix for crash

Add an assertion as well, as there's definitely a bug somewhere if we unlink twice.
Comment 6 William Chen [:wchen] 2011-07-25 12:16:47 PDT
Created attachment 548246 [details] [diff] [review]
Possible fix for crash

Added assertion and comment.
Comment 7 William Chen [:wchen] 2011-07-25 12:26:37 PDT
Created attachment 548250 [details] [diff] [review]
Possible fix for crash

Forgot to add r=sicking
Comment 8 Peter Van der Beken [:peterv] 2011-09-28 06:12:56 PDT
Is this ready to land?
Comment 9 Andrew McCreight [:mccr8] 2011-11-29 17:42:49 PST
Jesse came up with a test case that looks very similar to this in bug 706281, that only triggers a crash after I landed the weak map cycle collector integration in bug 668855.
Comment 10 Andrew McCreight [:mccr8] 2011-11-29 17:55:11 PST
*** Bug 706281 has been marked as a duplicate of this bug. ***
Comment 11 Andrew McCreight [:mccr8] 2011-11-29 18:14:44 PST
In the first CC in Jesse's test case, the CC identifies the nsDOMStringMap as garbage, so it is unlinked.  There's only one reference to the DOMStringMap, an XPCWrappedNative, which is also identified as garbage and has a single reference.  But both of them show up again in the next CC.  I'm not sure why that is.  So you end up with a double unlink.  The attached patch seems like the way to go.  Double unlinks are bad, but they do happen.  I know of a few ways they can be caused.
Comment 12 Jesse Ruderman 2011-11-29 18:16:47 PST
How bad are double unlinks? Should the assertion be removed from the patch, or should a bug be filed on fixing the assertion?
Comment 13 Andrew McCreight [:mccr8] 2011-11-29 18:26:09 PST
They are kind of bad, but probably not anything worse than a controlled crash.  Unlinks usually just null out references, so I think the worst thing that will happen is a null deref.  But I don't think anybody has really audited unlinks.
Comment 14 Olli Pettay [:smaug] 2011-11-30 02:33:53 PST
I've always tried to write unlink code so that it can handle more than one unlinking -
I think most of our unlink handling is like that.
Comment 15 Andrew McCreight [:mccr8] 2011-11-30 11:26:45 PST
Maybe I'm reading the code wrong, but it looks like the wrappee of an XPCWrappedNative isn't unlinked by the XPCWN's unlink, so it will stay alive until the next GC.  Which means that double unlinks are probably fairly common.  I don't know.  One XPCWN I looked at disappeared from one log to another, but a second one didn't.
Comment 16 Andrew McCreight [:mccr8] 2011-12-07 17:02:12 PST
Created attachment 579901 [details] [diff] [review]
check for double unlink

I just removed the assertion from William's last patch.  Double unlinks will happen whenever a wrapped native is marked as garbage, and the cycle collector runs twice in a row without a GC in between, so I think it shouldn't be an assertion.
Comment 17 Andrew McCreight [:mccr8] 2011-12-08 09:24:55 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/271ba479719f
Comment 18 Andrew McCreight [:mccr8] 2011-12-08 09:34:14 PST
Comment on attachment 579901 [details] [diff] [review]
check for double unlink

Low risk fix (add a null check), but also an extremely rare crash, so I am flagging for Aurora but not Beta.  I could only find 9 crashes with this signature in the last week across all versions.
Comment 19 Ed Morley [:emorley] 2011-12-09 07:05:47 PST
https://hg.mozilla.org/mozilla-central/rev/271ba479719f
Comment 20 Andrew McCreight [:mccr8] 2011-12-14 09:06:41 PST
https://hg.mozilla.org/releases/mozilla-aurora/rev/28320cfb9cac
Comment 21 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-28 14:41:22 PST
Is this something QA can verify?
Comment 22 Andrew McCreight [:mccr8] 2011-12-28 20:50:20 PST
The steps in bug 706281 comment 0 triggered this crash.
Comment 23 Ioana (away) 2012-01-04 08:22:54 PST
Verified as fixed with the steps referenced in comment 22 on:
Mozilla/5.0 (Windows NT 6.1; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0) Gecko/20100101 Firefox/10.0

I have also verified the crash stats from the Socorro interface and this crash didn't reproduce on Firefox 10 anymore.
Comment 24 Ioana (away) 2012-02-03 08:25:49 PST
Verified as fixed with the steps referenced in comment 22 on:
Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0
Mozilla/5.0 (Windows NT 6.1; rv:11.0) Gecko/20100101 Firefox/11.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:11.0) Gecko/20100101 Firefox/11.0
Comment 25 Ioana (away) 2012-02-03 08:35:13 PST
The crash status from the Socorro interface were also verified and this crash didn't happen on Firefox 10 or newer anymore.

Note You need to log in before you can comment on or make changes to this bug.