Closed Bug 877164 Opened 6 years ago Closed 6 years ago

crash in nsHTMLDocument::Release @ nsNodeUtils::LastRelease

Categories

(Core :: DOM: Core & HTML, defect, critical)

23 Branch
All
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla26
Tracking Status
firefox22 --- unaffected
firefox23 + affected
firefox24 + verified
firefox25 --- verified
firefox26 --- verified

People

(Reporter: scoobidiver, Assigned: gkrizsanits)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(3 files, 2 obsolete files)

It's #60 browser crasher in 23.0a2 and #49 in 24.0a1.
It started spiking in 23.0a1/20130403. The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=aae004a3c5d9&tochange=97cfc16ba5dc

Signature 	nsNodeUtils::LastRelease(nsINode*) More Reports Search
UUID	bca23133-ee4f-43c2-a0c2-991182130529
Date Processed	2013-05-29 13:06:28
Uptime	14364
Last Crash	1.8 weeks before submission
Install Age	4.0 hours since version was first installed.
Install Time	2013-05-30 01:09:07
Product	Firefox
Version	24.0a1
Build ID	20130528030942
Release Channel	nightly
OS	Windows NT
OS Version	6.2.9200
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 42 stepping 7
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x14
App Notes 	
AdapterVendorID: 0x8086, AdapterDeviceID: 0x0116, AdapterSubsysID: 41401558, AdapterDriverVersion: 9.17.10.2932
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
Processor Notes 	sp-processor03_phx1_mozilla_com_19361:2012
EMCheckCompatibility	True
Adapter Vendor ID	0x8086
Adapter Device ID	0x0116
Total Virtual Memory	2147352576
Available Virtual Memory	1392750592
System Memory Use Percentage	36
Available Page File	5034209280
Available Physical Memory	1969000448

Frame 	Module 	Signature 	Source
0 	xul.dll 	nsNodeUtils::LastRelease 	content/base/src/nsNodeUtils.cpp:223
1 	xul.dll 	nsHTMLDocument::Release 	content/html/document/src/nsHTMLDocument.cpp:218
2 	xul.dll 	NS_NewHTMLDocument 	content/html/document/src/nsHTMLDocument.cpp:179
3 	xul.dll 	NS_NewDOMDocument 	content/xml/document/src/XMLDocument.cpp:90
4 	xul.dll 	nsContentUtils::ConvertToPlainText 	content/base/src/nsContentUtils.cpp:4115
5 	xul.dll 	nsHTMLFormatConverter::ConvertFromHTMLToUnicode 	widget/xpwidgets/nsHTMLFormatConverter.cpp:241
6 	xul.dll 	nsHTMLFormatConverter::Convert 	widget/xpwidgets/nsHTMLFormatConverter.cpp:197
7 	xul.dll 	nsTransferable::GetTransferData 	widget/xpwidgets/nsTransferable.cpp:344
8 	xul.dll 	nsDataObj::GetText 	widget/windows/nsDataObj.cpp:1245
9 	xul.dll 	nsDataObj::GetData 	widget/windows/nsDataObj.cpp:503
10 	ole32.dll 	HandleFromHandle 	
11 	ole32.dll 	RenderCurrentFormat 	
12 	ole32.dll 	RenderFormat 	
13 	user32.dll 	_SEH_prolog4 	
14 		@0xd 	
15 	user32.dll 	InternalCallWinProc 	
16 	user32.dll 	UserCallWinProcCheckWow 	
17 	ole32.dll 	GetPrivateClipboardWindow 	
18 	user32.dll 	UserCallWinProcCheckWow 	
19 	user32.dll 	VRipOutput 	
20 	user32.dll 	VRipOutput 	
21 	ntdll.dll 	ntdll.dll@0x17060 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=nsNodeUtils%3A%3ALastRelease%28nsINode*%29
It's #8 top browser crasher in 23.0b7 with many duplicates. It was #28 in 23.0b6.
Keywords: topcrash
bsmedberg thinks the higher numbers in b7 could be a variant of the Radeon issue, but a lot of those are startup and don't contain graphics card info - and the ones I looked into that do have info show a wide spread of graphics adapter vendors.
Milan can you help us find an engineer to look into this?
Flags: needinfo?(milan)
Will do, but it'll take me probably until Friday, I'm away from regular work today and tomorrow.
Still #7 in 23.0b8 data. :(
Fwiw, I don't get the impression this is graphics related, although I'm not sure.
I don't see any particular reason why this would be graphics-related at all, really. Maybe Olli or Ehsan have some ideas.
Flags: needinfo?(ehsan)
So, is document->Init failing in http://mxr.mozilla.org/mozilla-central/source/content/html/document/src/nsHTMLDocument.cpp?rev=468b35185c44#170
so we end up releasing the newly created document and something odd happens...
That would hint https://bugzilla.mozilla.org/show_bug.cgi?id=874158 or
https://bugzilla.mozilla.org/show_bug.cgi?id=820170
Is it possible that we don't get JunkScope very early in the startup?

The stack is still odd. Why do we have null+offset in ::LastRelease.
And that crash has been there for ages. since FF14 at least
(In reply to Olli Pettay [:smaug] from comment #8)
> And that crash has been there for ages. since FF14 at least

Yes, but for some reason it spiked in the recent couple of betas on 23.
(In reply to Olli Pettay [:smaug] from comment #8)
> Is it possible that we don't get JunkScope very early in the startup?

Yes, that can happen. Typically when the hidden window is being created. So yeah, bug 874158 adds more cases where document init fails, and with that makes this crash a lot more frequent :( I have no idea yet why the crash happens in the Release call though...
Seems like I was away long enough not to have to act here.
Flags: needinfo?(milan)
Flags: needinfo?(ehsan)
Based on the stack this is not the reason for the crash, but the stack does not make much sense, so who knows... (null pointer dereference because NS_RELEASE nulls out the pointer)

https://tbpl.mozilla.org/?tree=Try&rev=07498ca5a36f
00:41]	smaug	gabor: ah, that way. Would doc.forget(aInstancePtrResult); work?
r+ either way, but forget without get would be nicer in this case.
Attachment #781933 - Flags: review?(bugs) → review+
Tree is closed... I attached the updated patch in case there is still a chance to land this. It would be nice I guess...
Attachment #781933 - Attachment is obsolete: true
Attachment #782105 - Flags: review+
Comment on attachment 782105 [details] [diff] [review]
crash if nsDocument Init fails

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Old bug, but it is hit more frequently because of bug 874158
User impact if declined: There is a chance that this is the reason of this frequent crash, but even if not, this would cause a certain crash on its own.
Testing completed (on m-c, etc.): Can not reproduce the crash, so testing is not really possible.
Risk to taking this patch (and alternatives if risky): This is a very simple and safe patch that fixes a trivial bug. 
String or IDL/UUID changes made by this patch: none
Attachment #782105 - Flags: approval-mozilla-beta?
Attachment #782105 - Flags: approval-mozilla-aurora?
Gabor, I'm still worried that the document creation setup is just broken during startup because
we can't guarantee JunkScope's existence.
Stuff like https://crash-stats.mozilla.com/report/index/c569e2d3-fbcb-4259-b3f7-66cfe2130727 is
node really expected to fail.

If we don't crash anymore with the patch, we may still end up to odd state, since
nsWebShellWindow::Initialize may fail.
(In reply to Olli Pettay [:smaug] from comment #19)
> Gabor, I'm still worried that the document creation setup is just broken
> during startup because
> we can't guarantee JunkScope's existence.

I'm worried to... problem is that I can neither reproduce nor have an idea why does the JunksScope creation fail. Spent quite some time to understand this issue or to reproduce it, but so far no luck...
Also, it's almost sure that the XPConnect initialization that fails. Maybe I could instead of a sandbox create some other global for the JunkScope, but if XPConnect fails to init, it will cause a problem somewhere else.
Well, is it possible we try to create a document before XPConnect initialization?
Does something in JunkScope getter ensure that XPConnect has been initialized?
(In reply to Olli Pettay [:smaug] from comment #22)
> Does something in JunkScope getter ensure that XPConnect has been
> initialized?

Currently the JunksScope is just a sandbox.
https://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCComponents.cpp#3253

xpc_CreateSandboxObject makes sure that xpconnect is ready by calling do_GetService on it before anything else. Something else can be the reason for the failure too, but that would mean out of memory case mostly, which is unlikely.
What thing I could do is avoid calling xpc_CreateSandboxObject and just create a non-sandbox global for the junk scope. This is less likely to fail.

Bobby: given the unending pain that the unreproducible but still frequent failure of the junk scope creation causes, what do you think about this approach?
Attachment #782187 - Flags: feedback?(bobbyholley+bmo)
Comment on attachment 782105 [details] [diff] [review]
crash if nsDocument Init fails

Low risk and worth uplifting to see if crashes decrease in our final beta.
Attachment #782105 - Flags: approval-mozilla-beta?
Attachment #782105 - Flags: approval-mozilla-beta+
Attachment #782105 - Flags: approval-mozilla-aurora?
Attachment #782105 - Flags: approval-mozilla-aurora+
I wouldn't call this fixed yet. The stack traces are odd, and the fix is possible fix for the crash, 
but we're not quite sure yet.
Assignee: nobody → gkrizsanits
Target Milestone: --- → mozilla25
Updated this patch because of a forgotten null check...

Anyway, I agree with smaug, I don't consider this bug fixed either. Also I have no idea what is the proper way to set the flags on this bug in the current case so I just leave it as they are.
Attachment #782187 - Attachment is obsolete: true
Attachment #782187 - Flags: feedback?(bobbyholley+bmo)
Attachment #782548 - Flags: feedback?(bobbyholley+bmo)
Attachment #782548 - Attachment is patch: true
Comment on attachment 782548 [details] [diff] [review]
part2: no more xpc_CreateSandboxObject call in junkscope. v2

Review of attachment 782548 [details] [diff] [review]:
-----------------------------------------------------------------

Unless we have a theory on why we'd fail in xpc_CreateSandboxObject but not xpc::CreateGlobalObject, I'm not wild about blindly complicating the code.
Attachment #782548 - Flags: feedback?(bobbyholley+bmo) → feedback-
(In reply to Olli Pettay [:smaug] from comment #27)
> I wouldn't call this fixed yet. The stack traces are odd, and the fix is
> possible fix for the crash, 
> but we're not quite sure yet.
There are indeed crashes in 23.0b10 with a different stack trace though:
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsNodeUtils::LastRelease(nsINode *) 	content/base/src/nsNodeUtils.cpp
1 	xul.dll 	nsHTMLDocument::Release() 	content/html/document/src/nsHTMLDocument.cpp
2 	xul.dll 	nsRefPtr<nsDOMSettableTokenList>::~nsRefPtr<nsDOMSettableTokenList>() 	obj-firefox/dist/include/nsAutoPtr.h
3 	xul.dll 	NS_NewHTMLDocument(nsIDocument * *,bool) 	content/html/document/src/nsHTMLDocument.cpp
4 		@0x802718 	
5 	xul.dll 	nsContentUtils::FindInternalContentViewer(char const *,nsContentUtils::ContentViewerType *) 	content/base/src/nsContentUtils.cpp
6 	xul.dll 	CallGetService(char const *,nsID const &,void * *) 	obj-firefox/xpcom/build/nsComponentManagerUtils.cpp
7 	xul.dll 	xul.dll@0x10abfd8 	
8 	xul.dll 	nsDocShell::CreateAboutBlankContentViewer(nsIPrincipal *) 	docshell/base/nsDocShell.cpp
9 	xul.dll 	nsWebShellWindow::Initialize(nsIXULWindow *,nsIXULWindow *,nsIURI *,int,int,bool,nsWidgetInitData &) 	xpfe/appshell/src/nsWebShellWindow.cpp
10 	xul.dll 	nsAppShellService::JustCreateTopWindow(nsIXULWindow *,nsIURI *,unsigned int,int,int,bool,nsWebShellWindow * *) 	xpfe/appshell/src/nsAppShellService.cpp
11 	xul.dll 	nsAppShellService::CreateHiddenWindowHelper(bool) 	xpfe/appshell/src/nsAppShellService.cpp
12 	xul.dll 	nsAppShellService::CreateHiddenWindow() 	xpfe/appshell/src/nsAppShellService.cpp
13 	xul.dll 	nsAppStartup::CreateHiddenWindow() 	toolkit/components/startup/nsAppStartup.cpp
14 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp
15 	xul.dll 	XREMain::XRE_main(int,char * * const,nsXREAppData const *) 	toolkit/xre/nsAppRunner.cpp
16 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp
17 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp
If the junk scope creation is failing in nsDocument::init it means we return before mNodeInfo is set. So IsNodeTypeOf will work just fine but the first call to NodeType will crash: http://mxr.mozilla.org/mozilla-central/source/content/base/public/nsINode.h#508
(In reply to Bobby Holley (:bholley) from comment #29)
> Unless we have a theory on why we'd fail in xpc_CreateSandboxObject but not
> xpc::CreateGlobalObject, I'm not wild about blindly complicating the code.

I have very vague theories indeed... it's just that this would be the only way to validate it, since I have no way of reproducing this. Since this is a frequent crash, I find it very hard to believe that JS_NewGlobalObject fails that frequently, since it's just used so frequently, that it would cause a problem somewhere else too, if it could fail randomly.

On the other hand if you remember this one https://bugzilla.mozilla.org/show_bug.cgi?id=793370#c9 this can be the same story all over again. It's again the hidden window, and the very same early state, just xpc_CreateSandboxObject calls do_GetService(nsIXPConnect::GetCID() this time. What do you think?

First I would like to stay conservative and just fix the crash (see Comment 31), and let the nsDocument::init fail here, and see if that fixes the start-up problem as well, like the solution in bug 793370.
But if that were true, I'd think we'd crash (rather than return null) in XPCJSRuntime::GetJunkScope(), because AutoSafeJSContext looks like it'll crash if XPConnect is null.
(In reply to Bobby Holley (:bholley) from comment #33)
> But if that were true, I'd think we'd crash (rather than return null) in
> XPCJSRuntime::GetJunkScope(), because AutoSafeJSContext looks like it'll
> crash if XPConnect is null.

So there is no separable steps between xpconnect being null and xpconnect being registered as a service?
I looked into the other crashes with the same signature, but they are very rare and seems to be different from this one. So let's fix this first. I don't have a way to make junk scope creation infallible. Unfortunately I cannot even make sure that mNodeInfo is always set after the init, since a bunch of other steps can fail, including mNodeInfoManager->Init(this);

So I did the minimal fix here for this patch, as this should fix the majority of the crashes, and safe enough to uplift. What do you think?
Attachment #787410 - Flags: review?(bugs)
Attachment #787410 - Flags: review?(bugs) → review+
Whiteboard: leave-open
This patch is not a fix for all the crashes with the given signature, only for the most frequent case, hence I don't want to close this bug yet.
Whiteboard: leave-open → [leave open]
(In reply to Gabor Krizsanits [:krizsa :gabor] (moving to Berlin: in and out for a few weeks) from comment #34)
> (In reply to Bobby Holley (:bholley) from comment #33)
> > But if that were true, I'd think we'd crash (rather than return null) in
> > XPCJSRuntime::GetJunkScope(), because AutoSafeJSContext looks like it'll
> > crash if XPConnect is null.
> 
> So there is no separable steps between xpconnect being null and xpconnect
> being registered as a service?

I don't think so. They get set up when the XPConnect module is instantiated (see XPCModule.{cpp,h}). I'm not an expert on that stuff though.
(In reply to Gabor Krizsanits [:krizsa :gabor] (moving to Berlin: in and out for a few weeks) from comment #37)
> This patch is not a fix for all the crashes with the given signature, only
> for the most frequent case, hence I don't want to close this bug yet.

From a tracking POV, I usually prefer having a bug closed when a significant issue is closed, and using new bugs for other issues even if they have the same crash signature.
(In reply to Gabor Krizsanits [:krizsa :gabor] (moving to Berlin: in and out for a few weeks) from comment #37)
> This patch is not a fix for all the crashes with the given signature, only
> for the most frequent case, hence I don't want to close this bug yet.
Can you uplift it to Beta and Aurora because it's #16 crasher in 24.0b1 with many dupes (and #392 in 25.0a2!)?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #40)
> From a tracking POV, I usually prefer having a bug closed when a significant
> issue is closed, and using new bugs for other issues even if they have the
> same crash signature.

I'll keep that in mind. Note: I also plan to uplift this patch to aurora/beta.

(In reply to Scoobidiver from comment #41)
> Can you uplift it to Beta and Aurora because it's #16 crasher in 24.0b1 with
> many dupes (and #392 in 25.0a2!)?

Yeah, I just wanted to see the results first. Also if there is any frequent start-up issue pops up all of a sudden, this might be the root of it. I'm just saying this because I cannot test this, and this fix solves the crash, I'm not sure the hidden window will be created without any problems... and if not, will that cause a start-up failure or not is still a question to me. Anyway, probably fixing this crash is important anyway so I start the uplift process.
Comment on attachment 787410 [details] [diff] [review]
part3: creating junk scope later in nsDocument::init

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Old bug, but it is hit more frequently because of bug 874158
User impact if declined: The patch fixes most of the crashes with this signature.
Testing completed (on m-c, etc.): Can not reproduce the crash, so testing is not really possible.
Risk to taking this patch (and alternatives if risky): The patch might cause other start-up problems instead of the crash, as the hidden window might not be initialized correctly in the case where it crashes earlier (before the patch)... 
String or IDL/UUID changes made by this patch: none
Attachment #787410 - Flags: approval-mozilla-beta?
Attachment #787410 - Flags: approval-mozilla-aurora?
Blocks: 874158
Comment on attachment 787410 [details] [diff] [review]
part3: creating junk scope later in nsDocument::init

Let's get this uplifted to see if crash volume decreases over the next few 24.0 betas.
Attachment #787410 - Flags: approval-mozilla-beta?
Attachment #787410 - Flags: approval-mozilla-beta+
Attachment #787410 - Flags: approval-mozilla-aurora?
Attachment #787410 - Flags: approval-mozilla-aurora+
(In reply to Gabor Krizsanits [:krizsa :gabor] (moving to Berlin: in and out for a few weeks) from comment #42)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #40)
> > From a tracking POV, I usually prefer having a bug closed when a significant
> > issue is closed, and using new bugs for other issues even if they have the
> > same crash signature.
> 
> I'll keep that in mind. Note: I also plan to uplift this patch to
> aurora/beta.

Thanks. Esp. in the uplift case, we really want to be able to mark bugs as "fixed in this release" (so that it doesn't show up in queries on e.g. "what still needs to get into this release") and then have further work be done in a different bug - again, just for tracking things properly.
My hotel wifi is not very good at hg pull, so I likely be able to uplift this patch only tomorrow. If it's urgent someone else should do it... :(
Whiteboard: [leave open]
Target Milestone: mozilla25 → mozilla26
Marking as Resolved/Fixed for the topcrash - please reopen if that's not correct. For remaining crashes, a new bug can be opened for investigation.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
There are several crashes post fix for the signature of this bug recorded in the last 7 days:
24 beta 5 - 1 crash
24 beta 6 - 1 crash
24 beta 7 - 11 crashes
24 beta 8 - 17 crashes
24 beta 9 - 21 crashes

Based on Comment 48, filed Bug 914679.
Setting the status for Firefox 24 to verified.
I see no instances of this signature for Firefox 25 or 26 in crash-stats.
Status: RESOLVED → VERIFIED
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.