Closed Bug 542040 Opened 10 years ago Closed 7 years ago

Firefox crashes upon network reconnection [@ nsDocument::ContentAppended(nsIDocument*, nsIContent*, int) ]

Categories

(Core :: DOM: Core & HTML, defect, critical)

1.9.2 Branch
x86
All
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mozilla-bugzilla, Unassigned)

Details

(Keywords: crash)

Crash Data

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)

When my laptop has been idle for a while, the wireless network gets switched off. After logging in through the screen lock and the network connection is reestablished, the Firefox bug reporting dialog is shown

Reproducible: Sometimes

Steps to Reproduce:
1. Run firefox showing several tabs
2. Let the pc idle, so the screen saver comes on and the wireless network is switched off
3. Log in again, wireless network is reestablished but firefox has crashed
Actual Results:  
Firefox crashes and bug reporting dialog appears

Expected Results:  
firefox is running normally

No problems with other applications, or firefox 3.5.7. 

Crash IDs:

http://crash-stats.mozilla.com/report/index/bp-49d03458-6088-4285-a794-f10ac2100125
http://crash-stats.mozilla.com/report/index/bp-817cd38a-3c41-44af-9145-ab17f2100124
http://crash-stats.mozilla.com/report/index/bp-01c21a89-ab55-4f0f-830f-a86d52100123
http://crash-stats.mozilla.com/report/index/bp-4160f4f1-a837-4874-88af-3b7072100123
http://crash-stats.mozilla.com/report/index/bp-502bb807-3b06-4608-b207-4ecdf2100123
http://crash-stats.mozilla.com/report/index/bp-2336bbc5-a2f3-4bb4-855b-f719f2100122
http://crash-stats.mozilla.com/report/index/bp-026fc9c1-8bab-43a8-971c-9203d2100122
http://crash-stats.mozilla.com/report/index/bp-7a76e05e-9194-403c-93ae-971802100122
Keywords: crash
Summary: Firefox crashes upon network reconnection → Firefox crashes upon network reconnection [@ nsDocument::ContentAppended(nsIDocument*, nsIContent*, int) ]
Version: unspecified → 3.6 Branch
Signature	nsDocument::ContentAppended(nsIDocument*, nsIContent*, int)
UUID	49d03458-6088-4285-a794-f10ac2100125
Version	3.6
Build ID	20100115144158
Branch	1.9.2
Crash Reason	EXCEPTION_ACCESS_VIOLATION
Crash Address	0x0

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	nsDocument::ContentAppended 	content/base/src/nsDocument.cpp:2396
1 	xul.dll 	nsNodeUtils::ContentAppended 	content/base/src/nsNodeUtils.cpp:134
2 	xul.dll 	nsGenericElement::doInsertChildAt 	content/base/src/nsGenericElement.cpp:3238
3 	xul.dll 	nsGenericElement::InsertChildAt 	content/base/src/nsGenericElement.cpp:3169
4 	xul.dll 	nsGenericElement::doReplaceOrInsertBefore 	content/base/src/nsGenericElement.cpp:3953
5 	xul.dll 	nsGenericElement::InsertBefore 	content/base/src/nsGenericElement.cpp:3488
6 	xul.dll 	nsGenericElement::AppendChild 	content/base/src/nsGenericElement.h:500
7 	xul.dll 	nsIDOMNode_AppendChild 	obj-firefox/js/src/xpconnect/src/dom_quickstubs.cpp:3671
8 	js3250.dll 	js_Interpret 	js/src/jsops.cpp:2208
9 	js3250.dll 	js_Invoke 	js/src/jsinterp.cpp:1368
10 	js3250.dll 	js_InternalInvoke 	js/src/jsinterp.cpp:1423
11 	js3250.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:5112
12 	xul.dll 	nsJSContext::CallEventHandler 	dom/base/nsJSEnvironment.cpp:2134
Assignee: nobody → bzbarsky
Blocks: 485808
Component: General → DOM
Product: Firefox → Core
QA Contact: general → general
Version: 3.6 Branch → 1.9.2 Branch
What makes you think this has anything to do with bug 485808?  The stack looks like we're ending up with mCur null in the iterator, which would crash with the pre-iterator code too, no?
Assignee: bzbarsky → nobody
In particular, if we entered this code with mCur null, that means that there were no kids but a nonzero child count. This situation should never arise.

Looking at the modules in that crash, what are MSNChatHook.dll, xpsp2res.dll, MFC71ENU.DLL, HookDLL.dll, mingwm10.dll?  Those are the ones without a debug id.

Reporter, just to make sure... does this happen in safe mode?
xpsp2res is a library which has no code it's from microsoft.
mfc71enu is the msvc 7.1 mfc english library, it should also have no code, it should be from microsoft. the reason for the lack of a debugid is that there's no code (afaiu)

hookdll seems dangerous
mingwm10 i have no idea

sorry about the bad blame, i misread the diff...
No longer blocks: 485808
I'vw been running for about a day now in safe mode, and so far, it hasn't crashed. I'll start bisecting addons to see if I can isolate it.
So far - it still crashes with adblockplus & noscript as the sole extensions:

http://crash-stats.mozilla.com/report/index/0014be71-3c1e-45ae-8b37-e8a812100129
I've not got it to crash using adblockplus on its own. Using noscript on its own was much more reliable however it did crash once with a different location

nsContentUtils::ComparePosition(nsINode*, nsINode*)

http://crash-stats.mozilla.com/report/index/bp-5568f480-5833-4840-bdd0-ec2302100131

which I had seen before below. Note these crashes occurred after the wireless network had reconnected after a brief disconnect with the machine active throughout.

http://crash-stats.mozilla.com/report/index/01c21a89-ab55-4f0f-830f-a86d52100123

Maybe some bad interaction between this pair of addons after a network glitch?
Correlations with addons for these signatures under Firefox 3.6:

nsDocument::ContentAppended(nsIDocument*, nsIContent*, int)|EXCEPTION_ACCESS_VIOLATION (19 crashes)
79% (15/19) vs.  12% (4726/40185) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
58% (11/19) vs.   2% (604/40185) {ef4e370e-d9f0-4e00-b93e-a4f274cfdd5a} (FoxTab, https://addons.mozilla.org/addon/8879)
47% (9/19) vs.   3% (1129/40185) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
42% (8/19) vs.   3% (1167/40185) {1018e4d6-728f-4b20-ad56-37578a4de76b} (Flagfox, https://addons.mozilla.org/addon/5791)
42% (8/19) vs.   7% (2735/40185) {b9db16a4-6edc-47ec-a1f4-b86292ed211d} (Video DownloadHelper, https://addons.mozilla.org/addon/3006)
37% (7/19) vs.   1% (574/40185) {9AA46F4F-4DC7-4c06-97AF-5035170634FE} (ImTranslator, https://addons.mozilla.org/addon/2257)
37% (7/19) vs.   3% (1060/40185) {19503e42-ca3c-4c27-b1e2-9cdb2170ee34} (FlashGot, https://addons.mozilla.org/addon/220)
37% (7/19) vs.   4% (1596/40185) {D4DD63FA-01E4-46a7-B6B1-EDAB7D6AD389} (Download Statusbar, https://addons.mozilla.org/addon/26)
16% (3/19) vs.   2% (931/40185) {73a6fe31-595d-460b-a920-fcc0f8843232} (NoScript, https://addons.mozilla.org/addon/722)

nsContentUtils::ComparePosition(nsINode*, nsINode*)|EXCEPTION_ACCESS_VIOLATION (19 crashes)
79% (15/19) vs.   2% (931/40185) {73a6fe31-595d-460b-a920-fcc0f8843232} (NoScript, https://addons.mozilla.org/addon/722)
58% (11/19) vs.  12% (4726/40185) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
53% (10/19) vs.  37% (14824/40185) {20a82645-c095-46ed-80e3-08825760534b} (Microsoft .NET Framework Assistant, http://www.windowsclient.net/)
37% (7/19) vs.  10% (4059/40185) {CAFEEFAC-0016-0000-0015-ABCDEFFEDCBA} (Java Console, http://java.sun.com/javase/downloads/)
37% (7/19) vs.  22% (8672/40185) {CAFEEFAC-0016-0000-0017-ABCDEFFEDCBA}
Hmm.  Noscript and adblockplus don't have binary components, right?
Nope, no binary components.
Yeah, I really don't see offhand how adblockplus would cause this sort of thing to happen, then...
Giorgio, any idea what might be going on here?
Status: UNCONFIRMED → NEW
Ever confirmed: true
NoScript has no binary component either.

If I'm not misreading the stack, this is happening during something like

window.setTimeout(function() {
  //...
  someContainer.appendChild(someNode);
}, n);

?

AFAIK, this appendChild() is guaranteed to be called from the UI thread. Under
which conditions can it be an unsafe pattern? And how could NoScript (or any
other JS-only extension) trigger such conditions?
> If I'm not misreading the stack, this is happening during something like

That's correct.

> AFAIK, this appendChild() is guaranteed to be called from the UI thread.

Yes, agreed.

> Under which conditions can it be an unsafe pattern?

Looking at the last set of stacks, every single one has nsNodeUtils::ContentAppended calling nsContentList::AddRef (which is expected) which then calls PresShell::ContentAppended (which is NOT expected).  After that point the stack makes no sense.

So my initial guess at a culprit would be a deleted nsContentList somehow.  This could happen if there's code touching the DOM from non-main threads somewhere earlier in the program run...  Other than that, I can't think of an obvious way it would happen.
That said, I just tried noscript in a debug build and other than the HTTPChannel asserts due to the semi-bogus onStartRequest calls on the channel that it makes I see no other asserts.  In particular, no threadsafety asserts.

mozilla-bugzilla@tart.net, is there any particular site you tend to have loaded when this happens?  Or any particular site you have whitelisted in noscript?
One other question.  Does manually going into offline mode via the file menu also trigger this crash?
> That said, I just tried noscript in a debug build and other than the
> HTTPChannel asserts due to the semi-bogus onStartRequest calls on the channel
> that it makes I see no other asserts.  In particular, no threadsafety asserts.

Correct. NoScript does not manipulate the DOM from threads different than the UI one.

On a side note, those strange onStartRequest calls come from an ugly but necessary hack to emulate internal redirections, needed by ABE in order to check requests after DNS resolution but before they hit the network and by STS to enforce HTTPS where neeeded according to the specification :(
so, the crash report indicates you're on the main thread.

634 nsContentList::ContentAppended(nsIDocument *aDocument, nsIContent* 
659   PRInt32 count = aContainer->GetChildCount();
672       if (nsContentUtils::PositionIsBefore(ourLastContent,
aContainer->GetChildAt(aNewIndexInContainer))) {

273   static PRBool PositionIsBefore(nsINode* aNode1,
274                                  nsINode* aNode2)
276     return (ComparePosition(aNode1, aNode2) &

1543 nsContentUtils::ComparePosition(nsINode* aNode1,
1544                                 nsINode* aNode2)
1546   NS_PRECONDITION(aNode1 && aNode2, "don't pass null");

1566   if (aNode2->IsNodeOfType(nsINode::eATTRIBUTE)) {

is the crash. i don't quite understand why GetChildAt is returning null.
> is the crash.

That doesn't look like any of the crashes in comment 12 to me.
bz, that's my reading of bp-2680da49-652b-4456-b519-0e64e2100222 - did i go wrong somewhere?
Oh, so you're assuming that the actual stack soccoro is listing is bogus, basically?
nah, "mostly accurate, with one frame suffering from the optimizer"
Which one frame?  The PositionIsBefore and ContentAppended frames both?  And looking like 3 frames?

In any case, if that's the presumed call path then comment 3 pretty much applies.
So far hasn't crashed when toggling the offline flag. I'll see if I can isolate the site(s) that trigger this - usually have 30-odd tabs open so might take a few days.
Attached patch PatchSplinter Review
I hope to get some feedback on this proposed patch.  I have also come across the same crash:  http://crash-stats.mozilla.com/report/index/bp-a07d74f6-cc5d-48c8-98b0-a079b2100406 but you will notice that none of the extensions I have loaded are the same as previously reported.  

I believe that comparePosition should be modified to prevent these access violations from occurring.  I am not familiar with any of this code, and I am not sure if the the value I propose to return when any of the parameters are NULL is appropriate.

This crash is very hard to reproduce, so any help on creating a reliable test scenario for this crash and patch is also greatly appreciated.
murph: so, the problem w/ that patch is that it's too deep, the NS_PRECONDITION in that function means that the function expects a caller not to pass the arguments it passed, thus the fix should be in some caller(s).
timeless...thanks for the (VERY) quick response.  I have gone and built a debug version of ff and have caught this access violation in the Visual Studio debugger.  Is there anything I can do at this point to provide more info that might help determine the underlying cause of the crashes?
Yes!  What's the first point on the stack there where the null value appears?
The first place the null value appears is in:
nsContentList.cpp:612

In my stack trace, it looks like the aContainer variable is bogus (all 0's)...yet  two stack frames up in nsGenericElement.cpp:3238 aContainer (in this function aParent) looks valid.

Is it possible that the macro IMPL_MUTATION_NOTIFICATION in nsNodeUtils.cpp:133/134 is causing the issue?

Is there a way from Visual Studio that I can create the equivalent of a core file for someone to look at?  I try saving things as a "minidump" but it was only about 100 KB (you must forgive my ignorance, I am not very familiar with development on windows).
windbg can create a more complete dump, if you're using a mozilla nightly then such dumps are useful.

in general it's easier to visit irc.mozilla.org and talk w/ us live instead.
I'll try hitting you all up over IRC this evening about this (IRC is blocked
from my office).

To give a bit more context here is a crude stack trace of the crash i have in
the debugger. 

1 xul.dll!nsContentUtils::ComparePosition(nsINode * aNode1=0x125ba408, nsINode
* aNode2=0x00000000)  Line 1580    C++

 2 xul.dll!nsContentUtils::PositionIsBefore(nsINode * aNode1=0x125ba408,
nsINode * aNode2=0x00000000)  Line 284 + 0xd bytes    C++

 3 xul.dll!nsContentList::ContentAppended(nsIDocument * aDocument=0x09028160,
nsIContent * aContainer=0x0e6157d0, int aNewIndexInContainer=72)  Line 612 +
0x11 bytes    C++

4 xul.dll!nsNodeUtils::ContentAppended(nsIContent * aContainer=0x0e6157e0, int
aNewIndexInContainer=72)  Line 134 + 0x75 bytes    C++

5 xul.dll!nsGenericElement::doInsertChildAt(nsIContent * aKid=0x00000000,
unsigned int aIndex=72, int aNotify=1, nsIContent * aParent=0x125ba408,
nsIDocument * aDocument=0x09028160, 
nsAttrAndChildArray & aChildArray={...})  Line 3238 + 0x9 bytes    C++

6 xul.dll!nsGenericElement::InsertChildAt(nsIContent * aKid=0x0dd445b8,
unsigned int aIndex=72, int aNotify=1)  Line 3169 + 0x1c bytes    C++

7 xul.dll!nsGenericElement::doReplaceOrInsertBefore(int aReplace=0, nsIDOMNode
* aNewChild=0x0dd445b8, nsIDOMNode * aRefChild=0x00000000, nsIContent *
aParent=0x125ba408, nsIDocument * aDocument=0x09028160, nsIDOMNode * *
aReturn=0x0017eda4)  Line 3953 + 0xf bytes    C++

8 xul.dll!nsGenericElement::InsertBefore(nsIDOMNode * aNewChild=0x0dd445d4,
nsIDOMNode * aRefChild=0x00000000, nsIDOMNode * * aReturn=0x0017eda4)  Line
3488 + 0x1c bytes    C++

9 xul.dll!nsGenericElement::AppendChild(nsIDOMNode * aNewChild=0x0dd445d4,
nsIDOMNode * * aReturn=0x0017eda4)  Line 501    C++

10 xul.dll!nsIDOMNode_AppendChild(JSContext * cx=0x05e7fdd8, unsigned int
argc=1, int * vp=0x1360dc0c)  Line 3673    C++

11 js3250.dll!js_Interpret(JSContext * cx=0x05e7fdd8)  Line 2217    C++

12 js3250.dll!js_Invoke(JSContext * cx=0x05e7fdd8, unsigned int argc=2, int *
vp=0x1360daa4, unsigned int flags=0)  Line 1368 + 0x6 bytes    C++

13 js3250.dll!js_fun_call(JSContext * cx=0x09c50a20, unsigned int argc=3, int *
vp=0x1360da60)  Line 1957    C++

14 js3250.dll!js_Interpret(JSContext * cx=)  Line 2217    C++

15 xul.dll!WrappedNative2WrapperMap::Find(JSObject * wrapper=0x03f12cc0)  Line
668    C++

16 xul.dll!XPC_XOW_WrapObject(JSContext * cx=0x05e7fdd8, JSObject *
parent=0x00000001, int * vp=0x584c4bd7, XPCWrappedNative * wn=0x05e7fdd8)  Line
504    C++

17 xul.dll!nsXPConnect::GetWrapperForObject(JSContext * aJSContext=0x00000001,
JSObject * aObject=0x0017f20c, JSObject * aScope=0x589d4cb8, nsIPrincipal *
aPrincipal=0x02421680, unsigned int aFilenameFlags=99089880, int *
_retval=0x0017f248)  Line 2478 + 0x15 bytes    C++

18 xul.dll!XPC_WN_JSOp_ThisObject(JSContext * cx=0x00000000, JSObject *
obj=0x00000000)  Line 1471    C++
BZ, here are the answers to your questions:

aParent is non-null in nsGenericElement::doInsertChild.
in that same frame, aKid is null.

In the frame above (nsGenericElement::InsertChildAt, aKid is not null.  In this frame, aKid->mParentPtrBits  = 307995659

To answer how aKid->mParentPtrBits relates to aParent, I might need to talk to you on IRC, not sure what part of aParent I should compare that value to.
Also, I used the task manager (this is windows 7, btw) to dump the process.  The .dmp file is 325 MB.  which i think is too large to attach to this bug report.  Would this file be helpful to you (I am able to view the entire call stack for the crashing thread).  If so, is there an appropriate place for me to put it.
> not sure what part of aParent I should compare that value to

The value of the aParent pointer.  In particular, |aKid->mParentPtrBits & ~3| should be equal to aParent.
Well, here is what I see.  I have to admit, I am really confused by this.  Mainly, if aKid is 0x00000000 in doInsertChildAt, how does it even get this far into the code.  I am wondering if some buffer overrun or other memory corruption has occurred.  The rest of the stack trace looks fine....????....

nsGenericElement::doInsertChildAt
aParent: 0x125ba408 aParent->mParentPtrBits = 300665555
aKid: 0x00000000


nsGenericElement::InsertChildAt
aKid: 0x0dd445b8  aKid->mParentPtrBits = 307995659
this: 0x00000000  (passed in as aParent)
(307995659 & ~3) == 0x125ba408, so that part looks good.

I wonder whether those null values have to do with things being optimized away (e.g. stored in registers and not on the stack) in a way the debugger can't figure out... or something.

David, do you know whether this is something that would be amenable to minidump debugging?
nsGenericElement::doInsertChildAt
EAX = 00000000 
EBX = 125BA408 
ECX = 00000000 
EDX = 00000000 
ESI = 00000000 
EDI = 00000000 
EIP = 586CA1D4 
ESP = 0017EC08 
EBP = 0017EC80 
EFL = 00000000 

nsGenericElement::InsertChildAt
EAX = 00000000 
EBX = 125BA408 
ECX = 00000000 
EDX = 00000000 
ESI = 00000000 
EDI = 00000000 
EIP = 586CB929 
ESP = 0017EC88 
EBP = 0017ED48 
EFL = 00000000
Okay, thanks BZ for helping me realize I needed different config parameters.  I
have reproduced the crash again and this stack trace appears to make more
sense.

The first frame I see a null is frame 2, nsContentUtils::PositionIsBefore. 
aNode2 is null (well, duh, we knew that).  So it looks like
aContainer->GetChildAt(aNewIndexInContainer) from
nsContentList::ContentAppended is giving a null value (would it be awesome if
we could just check that return before passing in the value to
PositionIsBefore).

I assume you want info about aContainer:
nsContentList::ContentAppended
aContainer: 0x0add74e8
aContainer->mParentPtrBits: 182285707

nsGenericElement::doInsertChildAt:
aKid: 0x0aeb0870
aKid->mParentPtrBits: 182285547
OK, that's somewhat more believable, though still totally broken.

What's aContainer->GetChildCount() in this case?  What's the value of aNewIndexInContainer?
aNewIndexInContainer: 37
aContainer->GetChildCount(): 37
Murph and I just spent some time talking through what he's seeing in a debugger.

The basic upshot is that we called nsGenericElement::doInsertChildAt with aParent being the <head> and aKid being a <script src> (doesn't seem to be an inline script).  We passed aIndex == 37 to this call, which happened to be the number of kids the <head> has (not surprising; this is DOM appendChild.

But by the time we get into nsContentList::ContentAppended, the child count of the <head> is 37, not the 38 I would expect.  The 37th kid is the aKid passed to doInsertChildAt, so that did get inserted correctly, but one of its previous siblings was then removed.  The removal happened between our aChildArray.InsertChildAt call and the ContentAppended call on the nsContentList.

We do guard script inserts on IsSafeToRunScript.  And in any case, this <script> is not inline as far as I can tell, so wouldn't run sync anyway, right?
Talked to Murph some more and he has an extension installed that hooks into network loads and does some checks to see whether the load should be allowed.

In particular, this extension listens to web progress notifications (STATE_START ones in particular) and performs a sync XHR to query a web service for information on whether to cancel the load.

So in his case, what's happening is presumably this:

1)  The page calls appendChild and passes in a <script src>
2)  Under the BindToTree call of that <script src> we hand it to the
    scriptloader, which immediately starts the load (calls AsyncOpen on the
    channel).
3)  Calling AsyncOpen synchronously dispatches STATE_START notifications (we
    have existing bugs on the combination of 2 and 3 meaning that extension
    script can run at "unsafe" times as here).
4)  The extension script does a sync XHR, which spins the event loop.  In
    particular, during the XHR timeouts on the page can fire, and various other
    stuff can happen as well.  
5)  One of those things (my bet is on a setTimeout callback) removes a child of
    the <head>.
6)  We unwind out of BindToTree back to nsGenericElement::doInsertChildAt with
    our child count now lower than aIndex+1.

The rest follows.

Possible things we should think about doing on our end that might mitigate this:

* Don't run timeouts while IsSafeToRunScript() is false.
* Fix the longstanding bug about the combination of #2 and #3 above.

Things Murph may be able to do in his extension to mitigate:

* Do the webservice check on a different thread, blocking the UI thread while
  he does it.  Can't use XHR then, obviously.
* Use XHR from the content page, not the chrome window, assuming the content
  page can access the web service data.  This will freeze timeouts on the
  content page, which might be good enough.
* Explicitly freeze the content page using
  nsIDOMWindowUtils::SuppressEventHandling.  That doesn't stop timeouts, though.

jst, sicking, smaug, any other suggestions?

Does noscript do something similar to the above?  The bug was initially reported linked to noscript....
(In reply to comment #44)
> * Don't run timeouts while IsSafeToRunScript() is false.

This won't do much good given that there are so many other ways scripts can run when you spin the event loop. For example network related events such as onload might fire.

> * Fix the longstanding bug about the combination of #2 and #3 above.

Yes!

> Things Murph may be able to do in his extension to mitigate:
> 
> * Do the webservice check on a different thread, blocking the UI thread while
>   he does it.  Can't use XHR then, obviously.

Would work.

> * Use XHR from the content page, not the chrome window, assuming the content
>   page can access the web service data.  This will freeze timeouts on the
>   content page, which might be good enough.

Would not be good enough. See above.

> * Explicitly freeze the content page using
>   nsIDOMWindowUtils::SuppressEventHandling.  That doesn't stop timeouts,
> though.

I'd imagine other things could fire too. But timeouts seems like enough of a reason.

> jst, sicking, smaug, any other suggestions?

I'd say step #3 in your comment needs to be fixed.
(In reply to comment #44)
> In particular, this extension listens to web progress notifications
> (STATE_START ones in particular) and performs a sync XHR to query a web service
> for information on whether to cancel the load.

Sounds like bug 530747...
(In reply to comment #44)

> Does noscript do something similar to the above?  The bug was initially
> reported linked to noscript....

NoScript might perform a preflight request (spinning the event queue), but *in a specific non-default configuration*, i.e. if "NoScript Options|Advanced|ABE|Allow sites to push their own rulesets" is checked and you're loading a resource from a HTTPS site for the first time in a session.
I already had plans to make the queue spinning go away (using the same internal redirection hack I already mentioned in another comment), but it was low priority because of the non-default/experimental status of this feature. 

(In reply to comment #45)

> I'd say step #3 in your comment needs to be fixed.

Is there already a bug # about this? May I be CCed, please?
> I'd say step #3 in your comment needs to be fixed.

Pretty sure we have a bug on this, but can't find it offhand.  Both this bug and bug 512142 should depend on it.  Maybe we should just file one...
This is a simple javascript extension that registers a web progress listener and performs and XHR when the STATE_START state is reached.  This is to help replicate the crash we are seeing.  It is to be used in conjunction with crashTest.html that will aslo be attached.
Whith attachment 443458 [details] loading this page should reproduce the crash.  Hopefully it will reproduce the crash for everyone else as well.
Crash Signature: [@ nsDocument::ContentAppended(nsIDocument*, nsIContent*, int) ]
Note this is still occurring with 10.0.1 on a new laptop running Windows 7 - the original report was on XP. See for example:

https://crash-stats.mozilla.com/report/index/bp-a7e2d6ed-2d2f-4a70-9e7b-9a61e2120212
mozilla-bugzilla@tart
do you still see this crash?

In the past month nothing on crash-stats newer than version 4 and above. And going back to April 2012, nothing there for at least a month per https://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A10.0&version=Firefox%3A10.0.1&version=Firefox%3A11.0&range_value=4&range_unit=weeks&date=04%2F25%2F2012+16%3A29%3A28&query_search=signature&query_type=exact&query=nsDocument%3A%3AContentAppended%28nsIDocument*%2C+nsIContent*%2C+int%29&reason=&build_id=&process_type=any&hang_type=any&do_query=1
Flags: needinfo?(mozilla-bugzilla)
OS: Windows XP → All
Whiteboard: [closeme 2013-04-10 WFM]
Resolved per whiteboard
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(mozilla-bugzilla)
Resolution: --- → WORKSFORME
Whiteboard: [closeme 2013-04-10 WFM]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.