Last Comment Bug 683777 - Firefox Crash in nsDocShell::InternalLoad
: Firefox Crash in nsDocShell::InternalLoad
Status: RESOLVED FIXED
[inbound][qa-]
: crash, regression
Product: Core
Classification: Components
Component: Document Navigation (show other bugs)
: 8 Branch
: x86 Windows 7
: -- critical (vote)
: mozilla10
Assigned To: Justin Lebar (not reading bugmail)
:
:
Mentors:
: 695848 (view as bug list)
Depends on: 823187
Blocks: 646641 707395
  Show dependency treegraph
 
Reported: 2011-08-31 16:38 PDT by Marcia Knous [:marcia - use ni]
Modified: 2015-10-16 11:50 PDT (History)
12 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed
fixed


Attachments
Backout part 2, v1 (84.04 KB, patch)
2011-10-12 09:02 PDT, Justin Lebar (not reading bugmail)
bugs: review+
Details | Diff | Splinter Review
Backout part 1, v1 (37.89 KB, patch)
2011-10-12 09:04 PDT, Justin Lebar (not reading bugmail)
bugs: review+
Details | Diff | Splinter Review
Patch for branches (1.30 KB, patch)
2011-10-24 09:32 PDT, Justin Lebar (not reading bugmail)
christian: approval‑mozilla‑aurora+
christian: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Marcia Knous [:marcia - use ni] 2011-08-31 16:38:22 PDT
Seen while reviewing crash stats and the explosive report. Seen across versions 7, 8 and 9. https://crash-stats.mozilla.com/report/list?signature=nsDocShell%3A%3AInternalLoad%28nsIURI%2A%2C%20nsIURI%2A%2C%20nsISupports%2A%2C%20unsigned%20int%2C%20wchar_t%20const%2A%2C%20char%20const%2A%2C%20nsIInputStream%2A%2C%20nsIInputStream%2A%2C%20unsigned%20int%2C%20nsISHEntry%2A%2C%20int%2C%20nsIDocShell%2A%2A%2C%20nsIRequest%2A%2A%29 to the report swhich are all Windows.

https://crash-stats.mozilla.com/report/index/7f591dd6-a803-4015-a49f-958ae2110825 

Frame 	Module 	Signature [Expand] 	Source
0 		@0x10244c8b 	
1 	xul.dll 	nsDocShell::InternalLoad 	docshell/base/nsDocShell.cpp:8330
2 	xul.dll 	nsDocShell::LoadURI 	docshell/base/nsDocShell.cpp:1471
3 	xul.dll 	nsFrameLoader::ReallyStartLoadingInternal 	content/base/src/nsFrameLoader.cpp:487
4 	xul.dll 	nsFrameLoader::ReallyStartLoading 	content/base/src/nsFrameLoader.cpp:428
5 	xul.dll 	nsDocument::MaybeInitializeFinalizeFrameLoaders 	
6 	xul.dll 	nsDocument::EndUpdate 	content/base/src/nsDocument.cpp:3990
7 	xul.dll 	nsHTMLDocument::EndUpdate 	content/html/document/src/nsHTMLDocument.cpp:2523
8 	xul.dll 	nsINode::ReplaceOrInsertBefore 	content/base/src/nsGenericElement.cpp:4038
9 	xul.dll 	nsIDOMNode_AppendChild 	obj-firefox/js/src/xpconnect/src/dom_quickstubs.cpp:6863
10 	mozjs.dll 	CallCompiler::generateNativeStub 	js/src/methodjit/MonoIC.cpp:813
11 	mozjs.dll 	js::mjit::ic::NativeCall 	js/src/methodjit/MonoIC.cpp:1031
12 	mozjs.dll 	js::mjit::EnterMethodJIT 	js/src/methodjit/MethodJIT.cpp:686
13 	mozjs.dll 	js::mjit::JaegerShot 	js/src/methodjit/MethodJIT.cpp:733
14 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:4128
15 	mozjs.dll 	UncachedInlineCall 	js/src/methodjit/InvokeHelpers.cpp:320
16 	mozjs.dll 	js::mjit::stubs::UncachedCallHelper 	js/src/methodjit/InvokeHelpers.cpp:418
17 	mozjs.dll 	CallCompiler::update 	js/src/methodjit/MonoIC.cpp:963
18 	mozjs.dll 	js::mjit::ic::Call 	js/src/methodjit/MonoIC.cpp:1017
19 	mozjs.dll 	JSCompartment::wrap 	js/src/jscompartment.cpp:363
20 	mozjs.dll 	js::mjit::EnterMethodJIT 	js/src/methodjit/MethodJIT.cpp:686
21 	mozjs.dll 	js::mjit::JaegerShot 	js/src/methodjit/MethodJIT.cpp:733
22 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:610
23 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:686
24 	mozjs.dll 	js::ExternalInvoke 	js/src/jsinterp.cpp:805
25 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:5070
26 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1657
27 	xul.dll 	nsXPCWrappedJS::CallMethod 	js/src/xpconnect/src/xpcwrappedjs.cpp:585
28 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:114
29 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:141
30 	xul.dll 	nsEventListenerManager::HandleEventSubType 	content/events/src/nsEventListenerManager.cpp:1083
Comment 1 B.J. Herbison 2011-09-29 07:16:33 PDT
Also seen on Windows XP. Observed today on 10.0a1 on a web site I've been using every day with Nightly builds.

User Agent: Mozilla/5.0 (Windows NT 5.2; WOW64; rv:10.0a1) Gecko/20110929 Firefox/10.0a1
Comment 2 IU 2011-09-29 10:21:05 PDT
(In reply to B.J. Herbison from comment #1)
> on a web site I've been using every day with Nightly builds.

It would help the devs if you actually identified the web site (with URL).
Comment 3 B.J. Herbison 2011-09-29 11:16:18 PDT
(In reply to IU from comment #2)
> It would help the devs if you actually identified the web site (with URL).

Sorry, but it's the internal development version of my employer's product. If it was repeatable I would build a test case, but after two quick crashes this morning I haven't had a crash since. (And I've been doing the same thing over and over -- I have my own hard-to-reproduce bug to track down.)
Comment 4 Scoobidiver (away) 2011-10-03 02:26:15 PDT
It's currently #5 top crasher in 8.0b1.
Comments say it's related to Google or Gmail, maybe using quotes.
Comment 5 Scoobidiver (away) 2011-10-07 03:39:28 PDT
The regression range is:
http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=009c64b64cf3&tochange=c440edd84f84
The likely culprit is bug 668019.
Comment 6 Dão Gottwald [:dao] 2011-10-07 07:37:16 PDT
(In reply to Scoobidiver from comment #5)
> The likely culprit is bug 668019.

Why likely? Did any comment mention that the user was copying the location bar value?
Comment 7 Scoobidiver (away) 2011-10-07 07:44:18 PDT
(In reply to Dão Gottwald [:dao] from comment #6)
> Why likely? Did any comment mention that the user was copying the location
> bar value?
Amongst the three bugs, this is the only one that calls the URI component.
Comment 8 Dão Gottwald [:dao] 2011-10-07 07:47:34 PDT
Where did you get that range from?
Comment 10 Dão Gottwald [:dao] 2011-10-07 08:00:59 PDT
The first entry in that table appears to be from September 7, which is outside of that range?
Comment 11 Scoobidiver (away) 2011-10-07 08:26:10 PDT
(In reply to Dão Gottwald [:dao] from comment #10)
> The first entry in that table appears to be from September 7, which is
> outside of that range?
It is the regression range for the spike (second?) because the crash daily rate is:
7.0.1: 0.13 crash/1M ADU
8.0a2 before 9/20: 60 crashes/1M ADU
8.0a2 after 9/21: 180 crashes/1M ADU
8.0b1: 380 crashes/1M ADU
I can't find regression ranges older than 30 days, so I don't know when the first spike appeared.
Comment 12 Dão Gottwald [:dao] 2011-10-07 08:52:54 PDT
The first comment in the table says 'typed "furbush" in the google page', which as far as I can tell clearly rules out bug 668019. Spikes don't necessarily imply a regression -- this one could be due to a change on google's side.
Comment 14 Dão Gottwald [:dao] 2011-10-09 02:15:39 PDT
(In reply to Scoobidiver from comment #13)
> The regression range is:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=f414db34c70b&tochange=87e3ea12ed5d

Bug 646641? I think this would support the theory that there was a second spike due to a change at google.
Comment 15 Justin Lebar (not reading bugmail) 2011-10-09 11:33:36 PDT
Bug 646641 is a reasonable guess for this crash, although it's hard to tell from these stacks what might be going wrong.

B.J., can you give us a little more information about the internal site you were using?  In particular, does it use history.push/replaceState or modify the URL's hash?
Comment 16 B.J. Herbison 2011-10-11 08:37:30 PDT
(In reply to Justin Lebar [:jlebar] from comment #15)
> B.J., can you give us a little more information about the internal site you
> were using?

Dojo, Struts, DWR. Lots of JavaScript to asynchronously load and save data.

> In particular, does it use history.push/replaceState or modify
> the URL's hash?

I don't think so. The application uses Dojo, and the Dojo code modifies location.hash in some cases. From a few searches I don't think our code uses dojo.hash, but Dojo has outsmarted me before.

I can't find any uses of history.push or history.replaceState.

I also have not seen this crash since the day I made my report in this discussion. I hope some code change is responsible for that. Is this crash signature still as common as it was?
Comment 17 Scoobidiver (away) 2011-10-11 08:47:00 PDT
(In reply to B.J. Herbison from comment #16)
> Is this crash signature still as common as it was?
Yes. It's #8 top crasher and #2 top crasher w/o hangs in 8.0b1.
Comment 18 Justin Lebar (not reading bugmail) 2011-10-11 09:03:04 PDT
>> In particular, does it use history.push/replaceState or modify
>> the URL's hash?
> I don't think so.

Thanks.

This makes it somewhat less likely that bug 646641 is at fault.  But if it's still our best guess, then it sounds like we should back it out and see if it fixes the crashes.

If I'm reading the crashstats page correctly, it looks like we'd need to back out from beta, since there are only three crashes that aren't on version 8.0.  bz, does backing out from beta seem reasonable to you?
Comment 19 Scoobidiver (away) 2011-10-11 09:28:44 PDT
For faster feedback, it can be backed out first in 10.0a1 where it's #12 top crasher.
Comment 20 Justin Lebar (not reading bugmail) 2011-10-11 09:48:14 PDT
(In reply to Scoobidiver from comment #19)
> For faster feedback, it can be backed out first in 10.0a1 where it's #12 top
> crasher.

Oh, I see.  Why don't those 10.0a1 crashes show up in the table in comment 0?
Comment 21 Scoobidiver (away) 2011-10-11 10:06:27 PDT
(In reply to Justin Lebar [:jlebar] from comment #20)
> Why don't those 10.0a1 crashes show up in the table in comment 0?
This query announces 1,793 crash reports but displays only 344 results in 8 pages!
Comment 22 Mozilla RelEng Bot 2011-10-11 20:30:59 PDT
Try run for 0112605a8509 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=0112605a8509
Results (out of 171 total builds):
    exception: 1
    success: 141
    warnings: 17
    failure: 12
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-0112605a8509
Comment 23 Justin Lebar (not reading bugmail) 2011-10-12 09:01:56 PDT
I'm having difficulty bzexporting my backout patches at the moment, but smaug, can you sign off on backing out bug 646641 to see if it eliminates this crash?
Comment 24 Justin Lebar (not reading bugmail) 2011-10-12 09:02:09 PDT
Created attachment 566557 [details] [diff] [review]
Backout part 2, v1
Comment 25 Justin Lebar (not reading bugmail) 2011-10-12 09:04:16 PDT
Created attachment 566559 [details] [diff] [review]
Backout part 1, v1
Comment 26 Olli Pettay [:smaug] 2011-10-12 09:22:49 PDT
Comment on attachment 566559 [details] [diff] [review]
Backout part 1, v1

rs for the backout in m-c
Comment 27 Justin Lebar (not reading bugmail) 2011-10-12 17:17:24 PDT
Backout pushed to inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/66f1de1d9cba
Comment 28 Justin Lebar (not reading bugmail) 2011-10-12 20:04:38 PDT
Backed out the backout because of mysterious moth permaorange which didn't appear on try.

https://hg.mozilla.org/integration/mozilla-inbound/rev/9670f22f912d
Comment 29 Marco Bonardo [::mak] 2011-10-13 07:33:22 PDT
merged the backout https://hg.mozilla.org/mozilla-central/rev/66f1de1d9cba
Comment 30 Marco Bonardo [::mak] 2011-10-13 07:36:28 PDT
and the backout of the backout
https://hg.mozilla.org/mozilla-central/rev/9670f22f912d
Comment 31 Mozilla RelEng Bot 2011-10-13 12:40:50 PDT
Try run for 194a7b7c1eff is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=194a7b7c1eff
Results (out of 24 total builds):
    success: 19
    warnings: 3
    failure: 2
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-194a7b7c1eff
Comment 32 Scoobidiver (away) 2011-10-14 14:35:26 PDT
Maybe correlations by add-on can help to reproduce:
  nsDocShell::InternalLoad(nsIURI*, nsIURI*, nsISupports*, unsigned int, wchar_t const*, char const*, nsIInputStream*, nsIInputStream*, unsigned int, nsISHEntry*, int, nsIDocShell**, nsIRequest**)|EXCEPTION_ACCESS_VIOLATION_READ (247 crashes)
     33% (81/247) vs.   0% (84/22826) azhang@cloudacl.com (adult content filtering)
     27% (67/247) vs.  11% (2457/22826) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
Comment 33 Mozilla RelEng Bot 2011-10-17 13:01:20 PDT
Try run for ce38193e7b4b is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=ce38193e7b4b
Results (out of 24 total builds):
    success: 24
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-ce38193e7b4b
Comment 34 Justin Lebar (not reading bugmail) 2011-10-17 13:05:56 PDT
Backout pushed to inbound again: https://hg.mozilla.org/integration/mozilla-inbound/rev/571224f5f4e5
Comment 35 Justin Lebar (not reading bugmail) 2011-10-17 18:11:39 PDT
...and backed out the backout because it breaks Android.  I *will* get this right, eventually.
Comment 36 Mozilla RelEng Bot 2011-10-18 17:21:06 PDT
Try run for f297637346df is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=f297637346df
Results (out of 15 total builds):
    success: 12
    warnings: 3
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-f297637346df
Comment 37 Justin Lebar (not reading bugmail) 2011-10-18 20:06:36 PDT
Android looks ok.  This one's gonna work.

https://hg.mozilla.org/integration/mozilla-inbound/rev/8a34e6bed361
Comment 38 Marco Bonardo [::mak] 2011-10-19 03:19:43 PDT
https://hg.mozilla.org/mozilla-central/rev/8a34e6bed361

leaving open, as I understand this this is only backing out bug 646641, shouldn't that bug be reopened then?
Comment 39 Scoobidiver (away) 2011-10-21 05:03:35 PDT
*** Bug 695848 has been marked as a duplicate of this bug. ***
Comment 41 Justin Lebar (not reading bugmail) 2011-10-21 07:45:56 PDT
Hm.  Well...now what?  I guess we start looking for places in InternalLoad which need null-checks?
Comment 42 Justin Lebar (not reading bugmail) 2011-10-21 07:53:10 PDT
Hm, maybe it's


+   // Link our new SHEntry to the old SHEntry's back/forward
+   // cache data, since the two SHEntries correspond to the
+   // same document.
+   mLSHE->AdoptBFCacheEntry(mOSHE);

This is the only place I see in [1] where we introduce a potential null-pointer exception.

Smaug, are you OK with me re-landing this patch with a null-check around this line?

[1] https://hg.mozilla.org/mozilla-central/diff/8a34e6bed361/docshell/base/nsDocShell.cpp
Comment 43 Justin Lebar (not reading bugmail) 2011-10-21 08:28:12 PDT
Re-landed bug 646641, with an added null-check.

https://hg.mozilla.org/integration/mozilla-inbound/rev/051b3d8ed545
Comment 44 Ed Morley [:emorley] 2011-10-22 06:08:21 PDT
https://hg.mozilla.org/mozilla-central/rev/051b3d8ed545
Comment 45 Scoobidiver (away) 2011-10-24 03:43:24 PDT
There are still no crashes from 10.0a1/20111023 when the new patch landed.
Comment 46 Justin Lebar (not reading bugmail) 2011-10-24 09:32:28 PDT
Created attachment 569090 [details] [diff] [review]
Patch for branches
Comment 47 Marcia Knous [:marcia - use ni] 2011-10-25 15:03:28 PDT
Currently #9 top crash in Beta 4: https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/8.0b4/7
Comment 48 christian 2011-10-25 20:04:05 PDT
---------------------------------[ Triage Comment ]---------------------------------

We are approving for both aurora9 and beta8 as the patch is said to be safe and fixes a top 20 crasher (release) and the #9 crash by volume in Firefox 8 beta 4.

Please land ASAP.
Comment 49 Justin Lebar (not reading bugmail) 2011-10-25 20:28:54 PDT
Thanks, Christian.  Landed in Aurora and Beta:

https://hg.mozilla.org/releases/mozilla-beta/rev/318e2cc8d206
https://hg.mozilla.org/releases/mozilla-aurora/rev/47f6ba029ea4
Comment 50 Boris Zbarsky [:bz] (still a bit busy) 2011-10-25 20:37:39 PDT
status-firefox9 should be fixed too, yes?
Comment 51 Justin Lebar (not reading bugmail) 2011-10-25 20:40:10 PDT
(In reply to Boris Zbarsky (:bz) from comment #50)
> status-firefox9 should be fixed too, yes?

Yes, thanks.
Comment 52 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-10-26 09:26:02 PDT
qa? as QA needs a testcase to verify this fix. Can a testcase be provided? Is merely checking crashstats sufficient to verify this fix?
Comment 53 Justin Lebar (not reading bugmail) 2011-10-26 09:28:08 PDT
We don't have a testcase.  I'd love it if you could create one, but I think it would be Hard.
Comment 54 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-08 11:24:33 PST
Without a testcase we will need to rely on Socorro for assumed verification.
Comment 55 Virgil Dicu [:virgil] [QA] 2011-12-12 01:18:09 PST
http://bit.ly/u5HBSx

There still seem to be some crashes for Firefox 9 with the crash signature after the fix has landed. 

Is this expected in any way?
Comment 56 Justin Lebar (not reading bugmail) 2011-12-12 04:04:02 PST
Most (all?) of the recent crashes seem to be segfaults with a non-null pointer, which would be a different issue from the one fixed here (and could point to general memory corruption, for all I know).  But maybe I'm reading crash-stats wrong.
Comment 58 Robert Kaiser 2011-12-12 05:07:16 PST
(In reply to Justin Lebar [:jlebar] from comment #56)
> Most (all?) of the recent crashes seem to be segfaults with a non-null
> pointer, which would be a different issue from the one fixed here (and could
> point to general memory corruption, for all I know).  But maybe I'm reading
> crash-stats wrong.

(In reply to Justin Lebar [:jlebar] from comment #57)
> The crashers are also on different lines than the ones fixed here. 
> (InternalLoad is a big function.)

In that case, could you file new bugs for those that are reasonable enough volume (on trunk a couple occurrences are enough for that, probably)?
Comment 59 Justin Lebar (not reading bugmail) 2011-12-12 05:11:18 PST
Someone who's better at navigating crash-stats should do this.
Comment 60 Scoobidiver (away) 2011-12-12 05:39:27 PST
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #58)
> In that case, could you file new bugs for those that are reasonable enough
> volume (on trunk a couple occurrences are enough for that, probably)?
There are only crashes in 8.0* and 9.0 and, in both, it's beyond #300 crasher. I don't think they deserve a bug without a testcase.
Comment 61 Robert Kaiser 2011-12-12 07:46:30 PST
(In reply to Scoobidiver from comment #60)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #58)
> > In that case, could you file new bugs for those that are reasonable enough
> > volume (on trunk a couple occurrences are enough for that, probably)?
> There are only crashes in 8.0* and 9.0 and, in both, it's beyond #300
> crasher. I don't think they deserve a bug without a testcase.

I agree to that. If we have a testcase or steps to reproduce, that changes things, but other than that, we should better concentrate on getting the top 50 or so signatures fixed.
Comment 62 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-14 13:10:31 PST
qa- due to lack of steps/testcase -- please reset to qa+ if one can be provided.

Note You need to log in before you can comment on or make changes to this bug.