Closed Bug 897503 Opened 7 years ago Closed 7 years ago

crash in nsHttpConnectionMgr::OnMsgShutdown mainly with Leethax.net extension

Categories

(Core :: Networking: HTTP, defect)

25 Branch
All
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla26
Tracking Status
firefox24 --- unaffected
firefox25 + verified
firefox26 + verified

People

(Reporter: scoobidiver, Assigned: mcmanus)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(3 files, 2 obsolete files)

It first showed up in 25.0a1/20130724 and is currently #1 top crasher in this build. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5ceea82a79c7&tochange=2983ca6d4d1a
It's likely a regression from bug 884061.

It's correlated to the Leethax.net extension (see http://leethax.net/extension/).

Signature 	nsHttpConnection::~nsHttpConnection() More Reports Search
UUID 	51d73271-0f19-4ddb-9af2-84ae02130724
Date Processed	2013-07-24 14:55:22.526728
Uptime	55
Last Crash	62 seconds before submission
Install Age 	823 since version was first installed.
Install Time 	2013-07-24 14:41:13
Product 	Firefox
Version 	25.0a1
Build ID 	20130724030204
Release Channel 	nightly
OS 	Windows NT
OS Version 	6.1.7601 Service Pack 1
Build Architecture 	x86
Build Architecture Info 	GenuineIntel family 6 model 23 stepping 10 | 2
Crash Reason 	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address 	0x0
App Notes 	
AdapterVendorID: 0x1002, AdapterDeviceID: 0x68d8, AdapterSubsysID: 21ff1458, AdapterDriverVersion: 12.104.0.0
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 

Frame 	Module 	Signature 	Source
0 	xul.dll 	nsHttpConnection::~nsHttpConnection() 	netwerk/protocol/http/nsHttpConnection.cpp
1 	xul.dll 	nsHttpConnection::`scalar deleting destructor'(unsigned int) 	
2 	xul.dll 	nsHttpConnection::Release() 	netwerk/protocol/http/nsHttpConnection.cpp
3 	xul.dll 	nsHttpConnectionMgr::ShutdownPassCB(nsACString_internal const &,nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry> &,void *) 	netwerk/protocol/http/nsHttpConnectionMgr.cpp
4 	xul.dll 	nsBaseHashtable<nsCStringHashKey,nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry>,nsHttpConnectionMgr::nsConnectionEntry *>::s_EnumStub(PLDHashTable *,PLDHashEntryHdr *,unsigned int,void *) 	obj-firefox/dist/include/nsBaseHashtable.h
5 	xul.dll 	PL_DHashTableEnumerate 	obj-firefox/xpcom/build/pldhash.cpp
6 	xul.dll 	nsBaseHashtable<nsStringHashKey,nsRefPtr<gfxFontFamily>,gfxFontFamily *>::Enumerate(PLDHashOperator (*)(nsAString_internal const &,nsRefPtr<gfxFontFamily> &,void *),void *) 	obj-firefox/dist/include/nsBaseHashtable.h
7 	xul.dll 	nsHttpConnectionMgr::OnMsgShutdown(int,void *) 	netwerk/protocol/http/nsHttpConnectionMgr.cpp
8 	xul.dll 	nsHttpConnectionMgr::nsConnEvent::Run() 	netwerk/protocol/http/nsHttpConnectionMgr.h
9 	xul.dll 	nsThread::ProcessNextEvent(bool,bool *) 	xpcom/threads/nsThread.cpp
10 	xul.dll 	NS_ProcessNextEvent(nsIThread *,bool) 	obj-firefox/xpcom/build/nsThreadUtils.cpp
11 	xul.dll 	nsSocketTransportService::Run() 	netwerk/base/src/nsSocketTransportService2.cpp
12 	xul.dll 	nsThread::ProcessNextEvent(bool,bool *) 	xpcom/threads/nsThread.cpp
13 	xul.dll 	nsThread::ThreadFunc(void *) 	xpcom/threads/nsThread.cpp
14 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c
15 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c
16 	msvcr100.dll 	_callthreadstartex 	f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c
17 	msvcr100.dll 	_threadstartex 	f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c
18 	kernel32.dll 	BaseThreadInitThunk 	
19 	ntdll.dll 	__RtlUserThreadStart 	
20 	ntdll.dll 	_RtlUserThreadStart 	

More reports at:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=nsHttpConnection%3A%3A~nsHttpConnection%28%29
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=nsHttpConnectionMgr%3A%3AnsConnectionEntry%3A%3A~nsConnectionEntry%28%29
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=nsHttpConnectionMgr%3A%3AnsHalfOpenSocket%3A%3AAbandon%28%29
Here are correlations for top signatures:
nsHttpConnection::~nsHttpConnection()|EXCEPTION_ACCESS_VIOLATION_READ (69 crashes)
     99% (68/69) vs.   5% (97/1931) leethax@leethax.net
nsHttpConnectionMgr::nsConnectionEntry::~nsConnectionEntry()|EXCEPTION_ACCESS_VIOLATION_READ (15 crashes)
    100% (15/15) vs.   5% (97/1931) leethax@leethax.net

There are some crashes without the Leethax.net extension.
Summary: crash in nsHttpConnectionMgr::OnMsgShutdown with Leethax.net extension → crash in nsHttpConnectionMgr::OnMsgShutdown mainly with Leethax.net extension
I looked at the source to this leethax extension. It appears to replace the nsHttpHandler with a version written in JavaScript, which is likely to work poorly given that nsHttpHandler is marked threadsafe and xpconnect is not. If people have time, bisecting to the exact revision that causes this would be helpful...
That will
Honza/Patrick/Nick,

This is a #1 topcrash.  I looked at the code briefly, and I'm puzzled as to how we're freeing the nsHttpConnection on the socket thread:  from a glance it looks like we explicitly send a nsConnEvent to the main thread in nsHttpConnectionMgr::OnMsgShutdown to make sure we free it there.  But since we're getting freed in ShutdownPassCB, I suppose we don't even get to that code.  Do we just need to add a KungFuDeathGrip(this) to the start of OnMsgShutdown?  That would be my first guess, but both Patrick and Honza (and maybe Nick?) know this code better than me.

Assigning to Honza for now--if someone else has cycles feel free to steal (let Honza know of course).
Assignee: nobody → honzab.moz
this is probly a regression from 884061. I know its probably a latent bug in necko at the root of it, but nonetheless as a top crasher it shouldn't be worked out on mozilla-central.
Crash Signature: nsHttpConnectionMgr::ConditionallyStopPruneDeadConnectionsTimer() ] → nsHttpConnectionMgr::ConditionallyStopPruneDeadConnectionsTimer() ] [@ mozilla::net::nsHttpChannel::UpdateInhibitPersistentCachingFlag() ]
Depends on: 901317
(In reply to Patrick McManus [:mcmanus] [afk until Aug 09] from comment #6)
> this is probly a regression from 884061. I know its probably a latent bug in
> necko at the root of it, but nonetheless as a top crasher it shouldn't be
> worked out on mozilla-central.

Given that, could we please get some action here? This situation has been uplifted to Aurora now.
Flags: needinfo?(honzab.moz)
Honza isn't available for a while.  Patrick, are you suggesting we back out 884061 from m-c and aurora?  It's a lot of code to back out and might be a mess context-wise, but it's one solution. 

Maybe we can try my one line fix and see if it works?
Flags: needinfo?(honzab.moz)
Never mind about my kungFuDeathGrip idea--I was misreading the code.  From comment 3 it sounds like we need to change

  http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpConnectionMgr.cpp#936

so that we arrange to release the connectionMsg on the main thread instead of the socket thread.
oh, and since Leetax is a JS object presumably we'd need to access it via a nsMainThreadPtrHolder
I can most likely get to this tomorrow or Mon. Hopefully nsMainThreadPtr will work.
Assignee: honzab.moz → sworkman
(In reply to Jason Duell (:jduell) from comment #8)
> Honza isn't available for a while.  Patrick, are you suggesting we back out
> 884061 from m-c and aurora?  It's a lot of code to back out and might be a
> mess context-wise, but it's one solution. 

well, it should have been done as soon as the problem came up right? That's the rule if you destablize m-c (its happened to me).


(In reply to Steve Workman [:sworkman] from comment #11)
> I can most likely get to this tomorrow or Mon. Hopefully nsMainThreadPtr
> will work.

Thanks steve.. lets checkpoint this on the tuesday meeting and I'll help if we don't seem to be on the verge of success.
First attempt: Added a new preprocessor macro to proxy the release of gHttpHandler to the main thread. Applies to nsHttpConnection, HttpBaseChannel, HttpChannelParent, and nsHttpChannelAuthProvider.

I'm going to try nsMainThreadPtrHandle/Holder now; that would reuse existing main thread proxy code, and we could enable main thread assertions for the * operator - i.e. force gHttpHandler to be used on the main thread only. I had some problems with this on Friday, however, so I'll play with it and see what happens.
This is an alternative patch which uses nsMainThreadPtrHandle/Holder for gHttpHandler. I had to make some changes to nsProxyRelease.h to allow ambiguous inheritance of nsISupports; otherwise, the compiler doesn't know how to do type conversion for NS_ProxyRelease. So, I added a couple of "nsUnambiguous~" classes which are new bases class for nsMainThreadPtrHandler and ~Holder. I assume ambiguity of inheritance will come up at some point again.

The big benefit of doing it this way is asserting the use of gHttpHandler/nsHttpHandler on the main thread. I haven't started testing or trying to remove all the cases yet, but I observed an assertion-caused crash in nsHttpConnectionMgr::OnMsgUpdateRequestTokenBucket almost immediately after launching FF.

So, we might want to think about a phased approach here. It'd be easy to tell nsMainThreadPtrHandle/Holder to allow off main thread pointer usage in this patch, thereby dealing with the urgency of the crash. And then we can follow up with changes to Necko code at a slower pace, to ensure that all gHttpHandler/nsHttpHandler accesses are on the main thread. Are there other known crashes caused by Necko trying to access plugin-based HTTP handlers?
(In reply to Steve Workman [:sworkman] from comment #13)
> Created attachment 789247 [details] [diff] [review]
> WIP 1.0 Proxy release of nsHttpHandler to the main thread
> 
> First attempt: Added a new preprocessor macro to proxy the release of
> gHttpHandler to the main thread. Applies to nsHttpConnection,
> HttpBaseChannel, HttpChannelParent, and nsHttpChannelAuthProvider.

I have some try results for the first patch - looks good on Linux and Linux64:
https://tbpl.mozilla.org/?tree=Try&rev=8af4f52801a5
hi steve, joshua, jdm - I'm afraid I don't understand this bug as described.

leethax appears to make a custom js based nsIHttpProtocolHander as a shim above the normal necko one (implemented by nsHttpHandler). That's ok - that new object is basically just held by nsIOService and all that work is done on the main thread. This is a significant difference from what is said in comment 3.

Specifically the crash here is in nsHttpConnection releasing gHttpHandler. gHttpHandler is always an nsHttpHandler (that's its only type) - so it can't be the JS object. It is commonly used safely on the networking thread - usually to call accessors that have stored pref values loaded on the main thread. 

This is all legacy code that has never been updated to use smart pointers.

so steve you won't be able to make it a nsMainThreadPtrHandle - nsHttpHandler is meant to have some methods be deref'd on the socket thread. And I don't understand why we would need to proxy its release to the main thread - we know this is a C++ object for certain. Why is it expected that it will address this problem?

the half dozen stack traces I looked at all had gHttpHandler being null while the connection manager was processing a shutdown event. calling RELEASE on gHttpHandler leads to the crash of a null pointer.

gHttpHandler wasn't null when nsHttpConnection was created because the ADDREF in the ctor isn't conditional either. That means it went null while a reference to it was being held, which obviously isn't supposed to happen.

do you guys see something here that I don't? Based on this I would want to step through the places the references are changed to make sure they are done atomically.
jason/jdm there is one strange bit of reference counting here - do you think it is related and/or a different bug?

http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpChannelParent.cpp#60

it is dropping a reference to ghttphandler that I believe it thinks it obtained in the ctor a few lines earlier in the CallGetService() incantation. However it has an assumpton that the GetService("http") object is the same as gHttpHandler - but in the case of leethax that's not going to be true (the GetService object is a piece of JS - while gHttpHandler is always an nsHttpHandler)

That kind of behavior would lead to this bug; but we don't expect this code to be running in this scenario, do we?
Flags: needinfo?(jduell.mcbugs)
(In reply to Patrick McManus [:mcmanus] from comment #16)
.
> And I don't understand why we would need to proxy its release to the main
> thread - we know this is a C++ object for certain. Why is it expected that
> it will address this problem?

to be fair to the proxy_release approach - nsHttpHandler does contain a bunch of nsCOMPtrs that don't look like they are great to release off the main thread. (the ioservice, the cookie service, etc..) Normally those are only used on the main thread, but the refs are dropped when the handler is destroyed on the socket thread.

But I don't think the stack traces are consistent with that.

has anyone reproduced it so that we can verify a fix?
I made this crash https://crash-stats.mozilla.com/report/index/bdf59f03-bf1c-4e6b-bf81-8da8f2130813 happen while playing around with the addon and some of its facebook games and then exiting nightly. I haven't gotten it to happen consistently yet.

What is interesting about that crash is that it is the same thing wrt RELEASE(gHttpHandler) (with a null deref) but this time it is from ~HttpBaseChannel on the main thread.. so that is pretty conclusive that proxying the release of all handlers to the main isn't going to help.
(In reply to Patrick McManus [:mcmanus] from comment #18)
> (In reply to Patrick McManus [:mcmanus] from comment #16)

> to be fair to the proxy_release approach - nsHttpHandler does contain a
> bunch of nsCOMPtrs that don't look like they are great to release off the
> main thread. (the ioservice, the cookie service, etc..) Normally those are
> only used on the main thread, but the refs are dropped when the handler is
> destroyed on the socket thread.
> 
> But I don't think the stack traces are consistent with that.
> 

that problem is 897503 (and indeed the stack is different)
> > 
> 
> that problem is 897503 (and indeed the stack is different)

whoops - that's this bug :)

I meant 899453
Duplicate of this bug: 899453
(In reply to Patrick McManus [:mcmanus] from comment #17)
> jason/jdm there is one strange bit of reference counting here - do you think

> http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/
> HttpChannelParent.cpp#60
> 
> 
[..]

> That kind of behavior would lead to this bug; but we don't expect this code
> to be running in this scenario, do we?

Experimentally I can show that code to be running and for both the cocoon and leethax extensions doing the wrong thing. I think its running now because of 870100 - which probably caused this to come to light.

I can even verify the fix. (coming)
Blocks: 870100
Assignee: sworkman → mcmanus
Comment on attachment 789628 [details] [diff] [review]
part 1 - HttpChannelParent reference count bug of nsHttpHandler

Review of attachment 789628 [details] [diff] [review]:
-----------------------------------------------------------------

This makes more sense. To be honest, I noticed this, but I made bad assumptions about the missing AddRef.

::: netwerk/protocol/http/HttpChannelParent.cpp
@@ +50,5 @@
>    // Ensure gHttpHandler is initialized: we need the atom table up and running.
> +  nsCOMPtr<nsIHttpProtocolHandler> handler =
> +    do_GetService(NS_NETWORK_PROTOCOL_CONTRACTID_PREFIX "http");
> +
> +  MOZ_ASSERT(gHttpHandler);

Since 'handler' is not used and the assert is for 'gHttpHandler', can you rename 'handler' to 'dummyInitializer' or add a comment or something?
Attachment #789628 - Flags: review?(sworkman) → review+
Comment on attachment 789627 [details] [diff] [review]
part 2 use smart pointers for nsHttpHandler references

Review of attachment 789627 [details] [diff] [review]:
-----------------------------------------------------------------

This is nicer code to read at the very least.

These member vars aren't going to be used; we're still going to use gHttpHandler. Can you add a comment above the mHttpHandler declaration in each class; something like the comment that was above the addref in the constructors, i.e. "grab a reference to the handler to ensure that it doesn't go away."
Attachment #789627 - Flags: review?(sworkman) → review+
Comment on attachment 789624 [details] [diff] [review]
part 3 several nsHttpHandler nsCOMPtrs need to be nsMainThreadPtrHandle

Review of attachment 789624 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.

How many of these do we want to uplift to Aurora? Seems like part 1 fixes the crash, but parts 2 and 3 and relatively simple and are prob good to get out there quickly.
Attachment #789624 - Flags: review?(sworkman) → review+
Attachment #789247 - Attachment is obsolete: true
Attachment #789313 - Attachment is obsolete: true
(In reply to Steve Workman [:sworkman] from comment #30)
> Comment on attachment 789624 [details] [diff] [review]
> part 3 several nsHttpHandler nsCOMPtrs need to be nsMainThreadPtrHandle
> 
> Review of attachment 789624 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good to me.
> 
> How many of these do we want to uplift to Aurora? Seems like part 1 fixes
> the crash, but parts 2 and 3 and relatively simple and are prob good to get
> out there quickly.

part 1 fixes this bug, and part 3 fixes the stack trace in 899453

but part 2 is just a cleanup and can ride the trains the slow way.
Comment on attachment 789628 [details] [diff] [review]
part 1 - HttpChannelParent reference count bug of nsHttpHandler

[Approval Request Comment]
Bug caused by (feature/regressing bug #): latent bug since < FF4, exposed in FF25 by 870100
User impact if declined: #1 top crasher gets shipped :(
Testing completed (on m-c, etc.): reproduced locally and verified fix. on m-i
Risk to taking this patch (and alternatives if risky): pretty low
String or IDL/UUID changes made by this patch: none

(uplift requested for parts 1 and 3.. part 2 intentionally omitted as its purely cleanup)
Attachment #789628 - Flags: approval-mozilla-aurora?
Comment on attachment 789624 [details] [diff] [review]
part 3 several nsHttpHandler nsCOMPtrs need to be nsMainThreadPtrHandle

[Approval Request Comment]
Bug caused by (feature/regressing bug #): latent bug since < FF4, exposed in FF25 by 870100
User impact if declined: #1 top crasher gets shipped :(
Testing completed (on m-c, etc.): reproduced locally and verified fix. on m-i
Risk to taking this patch (and alternatives if risky): pretty low
String or IDL/UUID changes made by this patch: none
Attachment #789624 - Flags: approval-mozilla-aurora?
No longer blocks: 870100
Ioana, can you please see if someone on your team can reproduce this? If so, please test the latest Nightly to verify it has been fixed.
QA Contact: ioana.budnar
Keywords: verifyme
Yesterday I installed leethax.net extension and today the day after it is not working at all in a firefox game. It worked before when I had not silverlight installed, but after I installed silverlight it has stopped working as it should
do.
Flags: needinfo?(jduell.mcbugs)
facebook game not firefox game
Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0

Using Aurora (build ID: 20130818004007) I was able to crash twice with this signature.

STR used to reproduce this crash:
1. Open Firefox and install the leethax.net extension.
2. Log in to facebook.
3. Open https://apps.facebook.com/avengersalliance.
4. Open a new tab.

Also, verified on Nightly (build ID: 20130818030219) with these STR and several other facebook apps using this extension and could not reproduce this.
Keywords: verifyme
QA Contact: ioana.budnar → cornel.ionce
Comment on attachment 789624 [details] [diff] [review]
part 3 several nsHttpHandler nsCOMPtrs need to be nsMainThreadPtrHandle

Approving this topcrash regression fix for Aurora.
Attachment #789624 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 789628 [details] [diff] [review]
part 1 - HttpChannelParent reference count bug of nsHttpHandler

And thanks for the verification :)
Attachment #789628 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Cornel, please verify this in Firefox 25 when you get a chance, thanks.
Keywords: verifyme
User Agent: Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0

Verified several times on Aurora (build ID: 20130908004001) using the STR from comment 39 and several other apps.

Issue is no longer reproducing; marking as Verified.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.