Closed Bug 1256336 Opened 8 years ago Closed 8 years ago

Crash in mozilla::net::InterceptStreamListener::Cleanup

Categories

(Core :: Networking, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox45 --- disabled
firefox46 + fixed
firefox47 + fixed
firefox48 + fixed
firefox-esr38 --- disabled
firefox-esr45 --- disabled

People

(Reporter: bkelly, Assigned: dragana)

References

Details

(Keywords: crash, csectype-uaf, sec-critical, Whiteboard: [post-critsmash-triage][necko-active])

Crash Data

Attachments

(2 files)

The top of the stack is service worker related, but the Cleanup() method simply nulls two variables.  If its accessing 0xffffffff then I think the underlying object is garbage which suggests neck problem.

Note, there was a similar issue previously in bug 1089749.

Crash Reason 	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address 	0xffffffffffffffff

0 	xul.dll 	mozilla::net::InterceptStreamListener::Cleanup() 	netwerk/protocol/http/HttpChannelChild.cpp
1 	xul.dll 	mozilla::net::HttpChannelChild::DoNotifyListenerCleanup() 	netwerk/protocol/http/HttpChannelChild.cpp
2 	xul.dll 	mozilla::net::HttpBaseChannel::DoNotifyListener() 	netwerk/protocol/http/HttpBaseChannel.cpp
3 	xul.dll 	mozilla::net::HttpAsyncAborter<mozilla::net::HttpChannelChild>::HandleAsyncAbort() 	netwerk/protocol/http/HttpBaseChannel.h
4 	xul.dll 	mozilla::net::HttpChannelChild::RecvFailedAsyncOpen(nsresult const&) 	netwerk/protocol/http/HttpChannelChild.cpp
5 	xul.dll 	mozilla::net::PHttpChannelChild::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PHttpChannelChild.cpp
6 	xul.dll 	mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PContentChild.cpp
7 	xul.dll 	mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) 	ipc/glue/MessageChannel.cpp
8 	xul.dll 	mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message const&) 	ipc/glue/MessageChannel.cpp
9 	xul.dll 	mozilla::ipc::MessageChannel::OnMaybeDequeueOne() 	ipc/glue/MessageChannel.cpp
10 	xul.dll 	RunnableMethod<mozilla::ipc::MessageChannel, void ( mozilla::ipc::MessageChannel::*)(void), mozilla::Tuple<> >::Run() 	ipc/chromium/src/base/task.h
11 	xul.dll 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc
12 	xul.dll 	mozilla::ipc::DoWorkRunnable::Run() 	ipc/glue/MessagePump.cpp
13 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
14 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
15 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
16 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
17 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
18 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp
19 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp
20 	xul.dll 	XRE_RunAppShell 	toolkit/xre/nsEmbedFunctions.cpp
21 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
22 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
23 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
24 	xul.dll 	XRE_InitChildProcess 	toolkit/xre/nsEmbedFunctions.cpp
25 	plugin-container.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp
26 	plugin-container.exe 	__tmainCRTStartup 	f:/dd/vctools/crt/crtw32/startup/crt0.c:255
This is weird, because the InterceptStreamListener in HttpChannelChild is a strong reference.
(In reply to Josh Matthews [:jdm] from comment #1)
> This is weird, because the InterceptStreamListener in HttpChannelChild is a
> strong reference.

Right.  That's why I think the entire HttpChannelChild is bogus.
more e10s eyes
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(dd.mozilla)
Attached file stack
I got this crash on shutdown in a local Linux64 debug trunk build.
It looks like the FailedAsyncOpenEvent::mChild raw pointer is stale.
Group: core-security
Severity: normal → critical
Eep. I wonder why we chose to use raw pointers for those members :(
I am taking this to try to figure out what is happening.
Assignee: nobody → dd.mozilla
Flags: needinfo?(dd.mozilla)
Fixing crash signature link, tracking since this is sec-critical and affects 47 and 48.
Crash Signature: mozilla::net::InterceptStreamListener::Cleanup → [@ mozilla::net::InterceptStreamListener::Cleanup]
(In reply to Josh Matthews [:jdm] from comment #5)
> Eep. I wonder why we chose to use raw pointers for those members :(

It is fine to have raw pointers here. Calls are queues only if a divert to parents is in progress. At the end of a diversion the channelparent sends DeleteSelf, which deletes httpChannelChild, that is queued as well and called the last.

In the stack from comment #4 the channel is not in a diversion. AsyncOpen of a httpchannelchild is called which constructs a parent
http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpChannelChild.cpp#1946
The parent init fails which sends  SendFailedAsyncOpen.
At the moment when the RecvFailedAsyncOpen is called the httpchannelChild still exists it is not freed. A FailedAsyncOpenEvent is constructed and not queues just Run() is call which at some point calls DoNotifyListenerCleanup.
This function calls  PHttpChannelChild::Send__delete__(this) which frees httpChannelChild :) and then mInterceptListener->Cleanup() crashes :)
Attachment #8732809 - Flags: review?(jduell.mcbugs)
Thanks!
Whiteboard: [necko-active]
Group: core-security → network-core-security
Comment on attachment 8732809 [details] [diff] [review]
bug_1256336_v1.patch

Review of attachment 8732809 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, Send_delete() is treacherous.  Thanks for the catch.
Attachment #8732809 - Flags: review?(jduell.mcbugs) → review+
Comment on attachment 8732809 [details] [diff] [review]
bug_1256336_v1.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
If some check during channel asyncOpen fails (e.g. during shutdown it will fail, if host name cannot be parsed from url, in some cases if do not track blocks a uri) this code path will be triggered.
2 function calls are rearrange and one has a name Send_delete it is a bit obvious, IMO. 

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

Which older supported branches are affected by this flaw?
All. The change is in since ff38.

If not all supported branches, which bug introduced the flaw? 1065216

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
The patch should apply to all affected branches or it would be very easy to create without risk.

How likely is this patch to cause regressions; how much testing does it need?
Very unlikely.
Attachment #8732809 - Flags: sec-approval?
[Tracking Requested - why for this release]:
Since this problem was introduced in bug 1065216 it exists in all versions containing service workers.
it does exist in them all, but it also requires e10s right? So that might limit the uplifts (e.g. not to esr).
[Tracking Requested - why for this release]:

Good point.  We probably don't need uplift to 45.  I think we are doing experiments with e10s in 46, though, right?

Also, I marked ESR45 disabled because service workers are disabled, there, but I think this bug can happen even in that case.  Right?
Tracking for 46. It would probably be good to uplift this to beta 46 for the e10s experiments, as Ben points out.  
Service workers isn't enabled for 45esr so we likely don't need this fix for 45esr.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #18)
> Service workers isn't enabled for 45esr so we likely don't need this fix for
> 45esr.

Its true service workers are disabled in esr45, but I think this crash can trigger even on sites not using service workers when e10s is enabled.
(In reply to Ben Kelly [:bkelly] from comment #19)
> (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #18)
> > Service workers isn't enabled for 45esr so we likely don't need this fix for
> > 45esr.
> 
> Its true service workers are disabled in esr45, but I think this crash can
> trigger even on sites not using service workers when e10s is enabled.

This is not a service worker specific problem. If e10s is enable this can be triggered.
e10s is also disabled at the code level on ESR45.
sec-approval+. We'll want this on Aurora and Beta so please nominate it there as well.

Marking ESR45 as disabled based on comment 21.
Attachment #8732809 - Flags: sec-approval? → sec-approval+
Keywords: checkin-needed
Comment on attachment 8732809 [details] [diff] [review]
bug_1256336_v1.patch

Approval Request Comment
[Feature/regressing bug #]:1065216
[User impact if declined]:crash
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: No risk, just changing order of 2 functions. In current order it will always crash(probably almost always on that code path), because the first function frees a mChild which is used in the next function.
[String/UUID change made/needed]:none
Attachment #8732809 - Flags: approval-mozilla-beta?
Attachment #8732809 - Flags: approval-mozilla-aurora?
Comment on attachment 8732809 [details] [diff] [review]
bug_1256336_v1.patch

May decrease e10s crashes, let's try uplifting this to beta 5 since
the e10s experiment is still running.
Attachment #8732809 - Flags: approval-mozilla-beta?
Attachment #8732809 - Flags: approval-mozilla-beta+
Attachment #8732809 - Flags: approval-mozilla-aurora?
Attachment #8732809 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/364ea75abf26
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Group: network-core-security → core-security-release
I suspect this might not be fully fixed because I got a similar crash in bug 1262907.
(In reply to Mats Palmgren (:mats) from comment #29)
> I suspect this might not be fully fixed because I got a similar crash in bug
> 1262907.

I cannot see bug 1262907. Can it be dup of 1261070? Or is a separate issue?
Flags: needinfo?(mats)
I don't have access to bug 1261070 so I can't tell.
The relevant part of the stack in bug 1262907 is:
...
#5  <signal handler called>
#6  nsCOMPtr<nsILoadGroup>::get (this=0xe5e5e5e5e5e5e705) at nsCOMPtr.h:721
#7  nsCOMPtr<nsILoadGroup>::operator nsILoadGroup* (this=0xe5e5e5e5e5e5e705) at nsCOMPtr.h:729
#8  mozilla::net::HttpAsyncAborter<mozilla::net::HttpChannelChild>::HandleAsyncAbort (this=0x7fce516794a8) at mozilla/net/HttpBaseChannel.h:578
#9  mozilla::net::HttpChannelChild::HandleAsyncAbort (this=0x7fce51679000) at netwerk/protocol/http/HttpChannelChild.cpp:1080
#10 mozilla::net::HttpChannelChild::FailedAsyncOpen (this=0x7fce51679000, status=@0x7fce4e8694f0: NS_ERROR_NOT_AVAILABLE) at netwerk/protocol/http/HttpChannelChild.cpp:1091
#11 mozilla::net::FailedAsyncOpenEvent::Run (this=0x7fce4e8694e0) at netwerk/protocol/http/HttpChannelChild.cpp:1060
...
Flags: needinfo?(mats)
Flags: qe-verify-
Whiteboard: [necko-active] → [post-critsmash-triage][necko-active]
Dragana--re: comment 31.  Does that looks like a new bug, or still this one?
Flags: needinfo?(jduell.mcbugs) → needinfo?(dd.mozilla)
(In reply to Jason Duell [:jduell] (needinfo? me) from comment #32)
> Dragana--re: comment 31.  Does that looks like a new bug, or still this one?

This is resolved in bug 1261070.
Flags: needinfo?(dd.mozilla)
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.