Firefox Crash in nsDocShell::InternalLoad

RESOLVED FIXED in Firefox 8

Status

()

--
critical
RESOLVED FIXED
8 years ago
3 years ago

People

(Reporter: marcia, Assigned: justin.lebar+bug)

Tracking

({crash, regression})

8 Branch
mozilla10
x86
Windows 7
crash, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox8+ fixed, firefox9+ fixed, firefox10 fixed)

Details

(Whiteboard: [inbound][qa-], crash signature)

Attachments

(3 attachments)

Seen while reviewing crash stats and the explosive report. Seen across versions 7, 8 and 9. https://crash-stats.mozilla.com/report/list?signature=nsDocShell%3A%3AInternalLoad%28nsIURI%2A%2C%20nsIURI%2A%2C%20nsISupports%2A%2C%20unsigned%20int%2C%20wchar_t%20const%2A%2C%20char%20const%2A%2C%20nsIInputStream%2A%2C%20nsIInputStream%2A%2C%20unsigned%20int%2C%20nsISHEntry%2A%2C%20int%2C%20nsIDocShell%2A%2A%2C%20nsIRequest%2A%2A%29 to the report swhich are all Windows.

https://crash-stats.mozilla.com/report/index/7f591dd6-a803-4015-a49f-958ae2110825 

Frame 	Module 	Signature [Expand] 	Source
0 		@0x10244c8b 	
1 	xul.dll 	nsDocShell::InternalLoad 	docshell/base/nsDocShell.cpp:8330
2 	xul.dll 	nsDocShell::LoadURI 	docshell/base/nsDocShell.cpp:1471
3 	xul.dll 	nsFrameLoader::ReallyStartLoadingInternal 	content/base/src/nsFrameLoader.cpp:487
4 	xul.dll 	nsFrameLoader::ReallyStartLoading 	content/base/src/nsFrameLoader.cpp:428
5 	xul.dll 	nsDocument::MaybeInitializeFinalizeFrameLoaders 	
6 	xul.dll 	nsDocument::EndUpdate 	content/base/src/nsDocument.cpp:3990
7 	xul.dll 	nsHTMLDocument::EndUpdate 	content/html/document/src/nsHTMLDocument.cpp:2523
8 	xul.dll 	nsINode::ReplaceOrInsertBefore 	content/base/src/nsGenericElement.cpp:4038
9 	xul.dll 	nsIDOMNode_AppendChild 	obj-firefox/js/src/xpconnect/src/dom_quickstubs.cpp:6863
10 	mozjs.dll 	CallCompiler::generateNativeStub 	js/src/methodjit/MonoIC.cpp:813
11 	mozjs.dll 	js::mjit::ic::NativeCall 	js/src/methodjit/MonoIC.cpp:1031
12 	mozjs.dll 	js::mjit::EnterMethodJIT 	js/src/methodjit/MethodJIT.cpp:686
13 	mozjs.dll 	js::mjit::JaegerShot 	js/src/methodjit/MethodJIT.cpp:733
14 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:4128
15 	mozjs.dll 	UncachedInlineCall 	js/src/methodjit/InvokeHelpers.cpp:320
16 	mozjs.dll 	js::mjit::stubs::UncachedCallHelper 	js/src/methodjit/InvokeHelpers.cpp:418
17 	mozjs.dll 	CallCompiler::update 	js/src/methodjit/MonoIC.cpp:963
18 	mozjs.dll 	js::mjit::ic::Call 	js/src/methodjit/MonoIC.cpp:1017
19 	mozjs.dll 	JSCompartment::wrap 	js/src/jscompartment.cpp:363
20 	mozjs.dll 	js::mjit::EnterMethodJIT 	js/src/methodjit/MethodJIT.cpp:686
21 	mozjs.dll 	js::mjit::JaegerShot 	js/src/methodjit/MethodJIT.cpp:733
22 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:610
23 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:686
24 	mozjs.dll 	js::ExternalInvoke 	js/src/jsinterp.cpp:805
25 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:5070
26 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1657
27 	xul.dll 	nsXPCWrappedJS::CallMethod 	js/src/xpconnect/src/xpcwrappedjs.cpp:585
28 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:114
29 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:141
30 	xul.dll 	nsEventListenerManager::HandleEventSubType 	content/events/src/nsEventListenerManager.cpp:1083

Comment 1

8 years ago
Also seen on Windows XP. Observed today on 10.0a1 on a web site I've been using every day with Nightly builds.

User Agent: Mozilla/5.0 (Windows NT 5.2; WOW64; rv:10.0a1) Gecko/20110929 Firefox/10.0a1

Comment 2

8 years ago
(In reply to B.J. Herbison from comment #1)
> on a web site I've been using every day with Nightly builds.

It would help the devs if you actually identified the web site (with URL).

Comment 3

8 years ago
(In reply to IU from comment #2)
> It would help the devs if you actually identified the web site (with URL).

Sorry, but it's the internal development version of my employer's product. If it was repeatable I would build a test case, but after two quick crashes this morning I haven't had a crash since. (And I've been doing the same thing over and over -- I have my own hard-to-reproduce bug to track down.)
It's currently #5 top crasher in 8.0b1.
Comments say it's related to Google or Gmail, maybe using quotes.
tracking-firefox8: --- → ?

Updated

8 years ago
Keywords: regression
Version: Trunk → 8 Branch
The regression range is:
http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=009c64b64cf3&tochange=c440edd84f84
The likely culprit is bug 668019.
Blocks: 668019
Summary: Firefox Crash @ nsDocShell::InternalLoad(nsIURI*, → Firefox Crash in nsDocShell::InternalLoad
(In reply to Scoobidiver from comment #5)
> The likely culprit is bug 668019.

Why likely? Did any comment mention that the user was copying the location bar value?
(In reply to Dão Gottwald [:dao] from comment #6)
> Why likely? Did any comment mention that the user was copying the location
> bar value?
Amongst the three bugs, this is the only one that calls the URI component.
Where did you get that range from?
The first entry in that table appears to be from September 7, which is outside of that range?
(In reply to Dão Gottwald [:dao] from comment #10)
> The first entry in that table appears to be from September 7, which is
> outside of that range?
It is the regression range for the spike (second?) because the crash daily rate is:
7.0.1: 0.13 crash/1M ADU
8.0a2 before 9/20: 60 crashes/1M ADU
8.0a2 after 9/21: 180 crashes/1M ADU
8.0b1: 380 crashes/1M ADU
I can't find regression ranges older than 30 days, so I don't know when the first spike appeared.
The first comment in the table says 'typed "furbush" in the google page', which as far as I can tell clearly rules out bug 668019. Spikes don't necessarily imply a regression -- this one could be due to a change on google's side.
No longer blocks: 668019
(In reply to Scoobidiver from comment #13)
> The regression range is:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=f414db34c70b&tochange=87e3ea12ed5d

Bug 646641? I think this would support the theory that there was a second spike due to a change at google.

Updated

8 years ago
Blocks: 646641
Bug 646641 is a reasonable guess for this crash, although it's hard to tell from these stacks what might be going wrong.

B.J., can you give us a little more information about the internal site you were using?  In particular, does it use history.push/replaceState or modify the URL's hash?

Comment 16

8 years ago
(In reply to Justin Lebar [:jlebar] from comment #15)
> B.J., can you give us a little more information about the internal site you
> were using?

Dojo, Struts, DWR. Lots of JavaScript to asynchronously load and save data.

> In particular, does it use history.push/replaceState or modify
> the URL's hash?

I don't think so. The application uses Dojo, and the Dojo code modifies location.hash in some cases. From a few searches I don't think our code uses dojo.hash, but Dojo has outsmarted me before.

I can't find any uses of history.push or history.replaceState.

I also have not seen this crash since the day I made my report in this discussion. I hope some code change is responsible for that. Is this crash signature still as common as it was?
(In reply to B.J. Herbison from comment #16)
> Is this crash signature still as common as it was?
Yes. It's #8 top crasher and #2 top crasher w/o hangs in 8.0b1.
>> In particular, does it use history.push/replaceState or modify
>> the URL's hash?
> I don't think so.

Thanks.

This makes it somewhat less likely that bug 646641 is at fault.  But if it's still our best guess, then it sounds like we should back it out and see if it fixes the crashes.

If I'm reading the crashstats page correctly, it looks like we'd need to back out from beta, since there are only three crashes that aren't on version 8.0.  bz, does backing out from beta seem reasonable to you?
For faster feedback, it can be backed out first in 10.0a1 where it's #12 top crasher.
(In reply to Scoobidiver from comment #19)
> For faster feedback, it can be backed out first in 10.0a1 where it's #12 top
> crasher.

Oh, I see.  Why don't those 10.0a1 crashes show up in the table in comment 0?
(In reply to Justin Lebar [:jlebar] from comment #20)
> Why don't those 10.0a1 crashes show up in the table in comment 0?
This query announces 1,793 crash reports but displays only 344 results in 8 pages!
Try run for 0112605a8509 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=0112605a8509
Results (out of 171 total builds):
    exception: 1
    success: 141
    warnings: 17
    failure: 12
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-0112605a8509
I'm having difficulty bzexporting my backout patches at the moment, but smaug, can you sign off on backing out bug 646641 to see if it eliminates this crash?
Created attachment 566557 [details] [diff] [review]
Backout part 2, v1
Attachment #566557 - Flags: review?(Olli.Pettay)
Created attachment 566559 [details] [diff] [review]
Backout part 1, v1
Attachment #566559 - Flags: review?(Olli.Pettay)
Attachment #566557 - Flags: review?(Olli.Pettay) → review+
Comment on attachment 566559 [details] [diff] [review]
Backout part 1, v1

rs for the backout in m-c
Attachment #566559 - Flags: review?(Olli.Pettay) → review+
Backed out the backout because of mysterious moth permaorange which didn't appear on try.

https://hg.mozilla.org/integration/mozilla-inbound/rev/9670f22f912d
Try run for 194a7b7c1eff is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=194a7b7c1eff
Results (out of 24 total builds):
    success: 19
    warnings: 3
    failure: 2
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-194a7b7c1eff
Maybe correlations by add-on can help to reproduce:
  nsDocShell::InternalLoad(nsIURI*, nsIURI*, nsISupports*, unsigned int, wchar_t const*, char const*, nsIInputStream*, nsIInputStream*, unsigned int, nsISHEntry*, int, nsIDocShell**, nsIRequest**)|EXCEPTION_ACCESS_VIOLATION_READ (247 crashes)
     33% (81/247) vs.   0% (84/22826) azhang@cloudacl.com (adult content filtering)
     27% (67/247) vs.  11% (2457/22826) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
Try run for ce38193e7b4b is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=ce38193e7b4b
Results (out of 24 total builds):
    success: 24
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-ce38193e7b4b
...and backed out the backout because it breaks Android.  I *will* get this right, eventually.
Try run for f297637346df is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=f297637346df
Results (out of 15 total builds):
    success: 12
    warnings: 3
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-f297637346df
Android looks ok.  This one's gonna work.

https://hg.mozilla.org/integration/mozilla-inbound/rev/8a34e6bed361
Whiteboard: [inbound]
https://hg.mozilla.org/mozilla-central/rev/8a34e6bed361

leaving open, as I understand this this is only backing out bug 646641, shouldn't that bug be reopened then?

Updated

7 years ago
Duplicate of this bug: 695848
Hm.  Well...now what?  I guess we start looking for places in InternalLoad which need null-checks?
Hm, maybe it's


+   // Link our new SHEntry to the old SHEntry's back/forward
+   // cache data, since the two SHEntries correspond to the
+   // same document.
+   mLSHE->AdoptBFCacheEntry(mOSHE);

This is the only place I see in [1] where we introduce a potential null-pointer exception.

Smaug, are you OK with me re-landing this patch with a null-check around this line?

[1] https://hg.mozilla.org/mozilla-central/diff/8a34e6bed361/docshell/base/nsDocShell.cpp
https://hg.mozilla.org/mozilla-central/rev/051b3d8ed545
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Assignee: nobody → justin.lebar+bug

Updated

7 years ago
tracking-firefox9: --- → ?
There are still no crashes from 10.0a1/20111023 when the new patch landed.
Created attachment 569090 [details] [diff] [review]
Patch for branches
(Assignee)

Updated

7 years ago
status-firefox10: --- → fixed
status-firefox8: --- → affected
status-firefox9: --- → affected
(Assignee)

Updated

7 years ago
Attachment #569090 - Flags: approval-mozilla-beta?
Attachment #569090 - Flags: approval-mozilla-aurora?

Comment 48

7 years ago
---------------------------------[ Triage Comment ]---------------------------------

We are approving for both aurora9 and beta8 as the patch is said to be safe and fixes a top 20 crasher (release) and the #9 crash by volume in Firefox 8 beta 4.

Please land ASAP.

Updated

7 years ago
Attachment #569090 - Flags: approval-mozilla-beta?
Attachment #569090 - Flags: approval-mozilla-beta+
Attachment #569090 - Flags: approval-mozilla-aurora?
Attachment #569090 - Flags: approval-mozilla-aurora+
status-firefox9 should be fixed too, yes?
(In reply to Boris Zbarsky (:bz) from comment #50)
> status-firefox9 should be fixed too, yes?

Yes, thanks.
status-firefox9: affected → fixed

Updated

7 years ago
tracking-firefox8: ? → +
tracking-firefox9: ? → +
qa? as QA needs a testcase to verify this fix. Can a testcase be provided? Is merely checking crashstats sufficient to verify this fix?
Keywords: testcase-wanted
Whiteboard: [inbound] → [inbound][qa?]
We don't have a testcase.  I'd love it if you could create one, but I think it would be Hard.
Without a testcase we will need to rely on Socorro for assumed verification.
Whiteboard: [inbound][qa?] → [inbound][qa+]
http://bit.ly/u5HBSx

There still seem to be some crashes for Firefox 9 with the crash signature after the fix has landed. 

Is this expected in any way?
Most (all?) of the recent crashes seem to be segfaults with a non-null pointer, which would be a different issue from the one fixed here (and could point to general memory corruption, for all I know).  But maybe I'm reading crash-stats wrong.

Comment 58

7 years ago
(In reply to Justin Lebar [:jlebar] from comment #56)
> Most (all?) of the recent crashes seem to be segfaults with a non-null
> pointer, which would be a different issue from the one fixed here (and could
> point to general memory corruption, for all I know).  But maybe I'm reading
> crash-stats wrong.

(In reply to Justin Lebar [:jlebar] from comment #57)
> The crashers are also on different lines than the ones fixed here. 
> (InternalLoad is a big function.)

In that case, could you file new bugs for those that are reasonable enough volume (on trunk a couple occurrences are enough for that, probably)?
Someone who's better at navigating crash-stats should do this.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #58)
> In that case, could you file new bugs for those that are reasonable enough
> volume (on trunk a couple occurrences are enough for that, probably)?
There are only crashes in 8.0* and 9.0 and, in both, it's beyond #300 crasher. I don't think they deserve a bug without a testcase.

Comment 61

7 years ago
(In reply to Scoobidiver from comment #60)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #58)
> > In that case, could you file new bugs for those that are reasonable enough
> > volume (on trunk a couple occurrences are enough for that, probably)?
> There are only crashes in 8.0* and 9.0 and, in both, it's beyond #300
> crasher. I don't think they deserve a bug without a testcase.

I agree to that. If we have a testcase or steps to reproduce, that changes things, but other than that, we should better concentrate on getting the top 50 or so signatures fixed.
qa- due to lack of steps/testcase -- please reset to qa+ if one can be provided.
Whiteboard: [inbound][qa+] → [inbound][qa-]
Depends on: 823187
Keywords: testcase-wanted
You need to log in before you can comment on or make changes to this bug.