Closed Bug 1233962 Opened 9 years ago Closed 9 years ago

crash in mozilla::net::InterceptedChannelBase::DoNotifyController

Categories

(Core :: General, defect)

44 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox43 --- disabled
firefox44 + fixed
firefox45 --- fixed
firefox46 --- fixed
b2g-v2.5 --- fixed

People

(Reporter: philipp, Assigned: bkelly)

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 4 obsolete files)

[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-653d439a-2022-4fcf-a46c-f238b2151219.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::net::InterceptedChannelBase::DoNotifyController() 	netwerk/protocol/http/InterceptedChannel.cpp
1 	xul.dll 	mozilla::net::InterceptedChannelChrome::NotifyController() 	netwerk/protocol/http/InterceptedChannel.cpp
2 	xul.dll 	mozilla::net::nsHttpChannel::OpenCacheEntry(bool) 	netwerk/protocol/http/nsHttpChannel.cpp
3 	xul.dll 	mozilla::net::nsHttpChannel::Connect() 	netwerk/protocol/http/nsHttpChannel.cpp
4 	xul.dll 	mozilla::net::nsHttpChannel::ContinueBeginConnectWithResult() 	netwerk/protocol/http/nsHttpChannel.cpp
5 	nss3.dll 	PR_Assert 	nsprpub/pr/src/io/prlog.c
6 	xul.dll 	mozilla::net::nsHttpChannel::BeginConnect() 	netwerk/protocol/http/nsHttpChannel.cpp
7 	xul.dll 	mozilla::net::nsHttpChannel::OnProxyAvailable(nsICancelable*, nsIChannel*, nsIProxyInfo*, nsresult) 	netwerk/protocol/http/nsHttpChannel.cpp
8 	xul.dll 	nsAsyncResolveRequest::DoCallback() 	netwerk/base/nsProtocolProxyService.cpp
9 	xul.dll 	nsAsyncResolveRequest::OnQueryComplete(nsresult, nsCString const&, nsCString const&) 	netwerk/base/nsProtocolProxyService.cpp
10 	xul.dll 	ExecuteCallback::Run() 	netwerk/base/nsPACMan.cpp
11 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
12 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
13 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
14 	xul.dll 	nsThreadManager::UnregisterCurrentThread(nsThread*) 	xpcom/threads/nsThreadManager.cpp
15 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp
16 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp
17 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp
18 	xul.dll 	XREMain::XRE_main(int, char** const, nsXREAppData const*) 	toolkit/xre/nsAppRunner.cpp
19 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp

this signature is spiking in firefox 44 devedition (#4 with 2.88% of crashes) and beta. 
maybe it's related to the changes in bug 1206894, so i'm also cc'ing ehsan here on this bug.
We should probably gracefully fail interception if the mController provided to the intercepted channel is nullptr.
I have an idea for a patch.  NI to remind myself to look at this tomorrow.
Flags: needinfo?(bkelly)
Note, all crash stacks go through OnProxyAvailable in nsProtocolProxyService.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)
Still working on getting a local socks proxy running, but I'm fairly confident this is crashing due to a nullptr mController.  Its really the only possible nullptr in DoNotifyController().

Another possibility would be if DoNotifyController() was called twice for the same channel, but I don't think thats possibly.  Its only called immediately after construction of the InterceptedChannel.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=645bfe98267e
Attachment #8700641 - Flags: review?(josh)
Unfortunately we have no way to write a mochitest or web-platform-test to use a SOCKS proxy AFAICT.  We have xpcshell tests that implement a SOCKS proxy, but I'm not sure I can get service workers to run in xpcshell tests.
I've tried testing locally with manual proxy settings, a served proxy.pac, etc, but so far no luck reproducing.

I'm going to test various proxy addons as well.
Looking at the crash reports, there are no common extensions in use.

I'm not sure I will be able to reproduce this.  I think we should proceed with the patch as our best path forward.
Comment on attachment 8700641 [details] [diff] [review]
Don't create an InterceptedChannel if there is no controller. r=jdm

Review of attachment 8700641 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +3027,3 @@
>      // If this channel should be intercepted, we do not open a cache entry for this channel
>      // until the interception process is complete and the consumer decides what to do with it.
> +    if (mInterceptCache == MAYBE_INTERCEPT && controller) {

I am concerned that the state here is not actually well understood - in particular, the code in nsHttpChannel never expects to make a network request while mInterceptCache == MAYBE_INTERCEPT. I think we should store the controller in AsyncOpen and just use that, which will allow us to continue executing the regular code paths we already use.
Attachment #8700641 - Flags: review?(josh) → review-
I'm not sure if I should release the mController at some point or if its safe to hold until the channel is destroyed.  Lets see what try says.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d2f3f570131b
Attachment #8700641 - Attachment is obsolete: true
Attachment #8700788 - Flags: review?(josh)
Attachment #8700788 - Flags: review?(josh) → review+
Mmm, I forgot about the nulling out part. I predict leaks will come of this; we should probably null it out in OnStartRequest.
Yea, try confirms this leaks the world:

  https://treeherder.mozilla.org/logviewer.html#?job_id=14794971&repo=try

I'll try nulling in OnStartRequest later tonight.
Updated to clear mController in HttpBaseChannel::ReleaseListeners().

https://treeherder.mozilla.org/#/jobs?repo=try&revision=029926a3f3c1
Attachment #8700788 - Attachment is obsolete: true
Attachment #8700854 - Flags: review+
Unfortunately the last patch does not work in e10s mode.  There it seems the code actually depends on the controller changing from one HttpChannelParent to another HttpChannelParent during redirection.  By saving the first controller we ended up perma-failing some tests because we called back on the wrong actor.

I think it might just be safer for now to call ResetInterception() for a nullptr controller.  In theory this path should be sound since we already call that if the controller returns an error.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9280c79986d8
Attachment #8700854 - Attachment is obsolete: true
Attachment #8700900 - Flags: review?(josh)
Comment on attachment 8700900 [details] [diff] [review]
Call ResetInterception() if the controller is nullptr. r=jdm

Review of attachment 8700900 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense.
Attachment #8700900 - Flags: review?(josh) → review+
The old code did not ResetInterception() if the dispatcher was nullptr.  I'm pretty sure that was a bug, but the necko xpcshell tests depended on it.

This patch makes the xpcshell tests return a dummy dispatcher from ChannelIntercepted().

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a193354c753d
Attachment #8701050 - Flags: review?(josh)
Attachment #8701050 - Flags: review?(josh) → review+
Comment on attachment 8701048 [details] [diff] [review]
P1 Call ResetInterception() if the controller is nullptr. r=jdm

Approval Request Comment
[Feature/regressing bug #]: Service workers
[User impact if declined]: Crash under certain automatic proxy configurations when a site is using service workers.
[Describe test coverage new/current, TreeHerder]: Unfortunately we have not been able to produce the triggering configuration locally or in a test.  This fix is based on code inspection of where we can end up with a nullptr.  All existing service worker tests continue to pass with these changes, however.
[Risks and why]: This change only affects service workers.
[String/UUID change made/needed]: None.
Attachment #8701048 - Flags: approval-mozilla-aurora?
Comment on attachment 8701048 [details] [diff] [review]
P1 Call ResetInterception() if the controller is nullptr. r=jdm

Approval Request Comment
[Feature/regressing bug #]: Service workers
[User impact if declined]: Crash under certain automatic proxy configurations when a site is using service workers.
[Describe test coverage new/current, TreeHerder]: Unfortunately we have not been able to produce the triggering configuration locally or in a test.  This fix is based on code inspection of where we can end up with a nullptr.  All existing service worker tests continue to pass with these changes, however.
[Risks and why]: This change only affects service workers.
[String/UUID change made/needed]: None.
Attachment #8701048 - Flags: approval-mozilla-beta?
Attachment #8701050 - Flags: approval-mozilla-beta?
Attachment #8701050 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/76d24989a1b0
https://hg.mozilla.org/mozilla-central/rev/722472112f94
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Ritu, can we get this uplifted to beta immediately?  Unfortunately we have no instances of the crash on aurora or nightly due to user configuration, so bake time does not really help us.  I'd like to get this in for next Monday's beta build, but I will be on PTO starting tomorrow.
Flags: needinfo?(rkothari)
Comment on attachment 8701048 [details] [diff] [review]
P1 Call ResetInterception() if the controller is nullptr. r=jdm

Crash fix, taking it. Aurora45+, Beta44+
Flags: needinfo?(rkothari)
Attachment #8701048 - Flags: approval-mozilla-beta?
Attachment #8701048 - Flags: approval-mozilla-beta+
Attachment #8701048 - Flags: approval-mozilla-aurora?
Attachment #8701048 - Flags: approval-mozilla-aurora+
Attachment #8701050 - Flags: approval-mozilla-beta?
Attachment #8701050 - Flags: approval-mozilla-beta+
Attachment #8701050 - Flags: approval-mozilla-aurora?
Attachment #8701050 - Flags: approval-mozilla-aurora+
After a week there are no reports of this crash in 44.0b4, so the fix does appear to solve the problem.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: