Closed Bug 924967 Opened 11 years ago Closed 11 years ago

Use after free of *this at nsHttpChannel::OnStopRequest, nsLoadGroup::Cancel not thread-safe

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox26 --- unaffected
firefox27 --- fixed
firefox-esr17 --- unaffected
firefox-esr24 --- unaffected
b2g18 --- unaffected

People

(Reporter: mayhemer, Assigned: sworkman)

Details

(Keywords: csectype-uaf, sec-critical)

Attachments

(3 files, 2 obsolete files)

Attached file mozilla.log.zip
In nsHttpChannel::OnStopRequest last references to the channel may be coming only from:
- the transaction pump, that may be destroyed in OnStopRequest (mTransaction{*} = null)
- the consumer, that may release the channel in mListener->OnStopRequest()

After mListener->OnStopRequest then the channel can already be gone.  In opt builds we get up to CloseCacheEntry() { ... mCacheEntry->AsyncDoom() } where it crashes.


Snippet of the log:

2013-10-09 16:06:43.604000 UTC - 0[12c310]: nsHttpChannel::OnStopRequest [this=de482e0 request=de18248 status=804b0002]
2013-10-09 16:06:43.604000 UTC - 0[12c310]: nsHttpTransaction::DeleteSelfOnConsumerThread [this=de5eef0]

>> transaction goes away (it's an already closed transaction, released from connman pools)

2013-10-09 16:06:43.604000 UTC - 0[12c310]: Destroying nsHttpTransaction @de5eef0
2013-10-09 16:06:43.604000 UTC - 0[12c310]: _OldCacheEntryWrapper::HasWriteAccess [this=de802b8, write-access=1]
2013-10-09 16:06:43.604000 UTC - 0[12c310]: nsHttpChannel::FinalizeCacheEntry [this=de482e0]
2013-10-09 16:06:43.604000 UTC - 0[12c310]:   calling OnStopRequest

>> channel is destroyed here!

2013-10-09 16:06:43.604000 UTC - 0[12c310]: Destroying nsHttpChannel [this=de482e0] 
2013-10-09 16:06:43.604000 UTC - 7972[321e0e0]: CACHE: disk BindEntry [d7ce6e8 5abef39c]

>> cache entry after the last ref released too

2013-10-09 16:06:43.604000 UTC - 0[12c310]: Destroying _OldCacheEntryWrapper de802b8 for descriptor de80a30
2013-10-09 16:06:43.604000 UTC - 7972[321e0e0]: CACHE: AddRecord [5abef39c]
2013-10-09 16:06:43.604000 UTC - 0[12c310]: Destroying HttpBaseChannel @de482e0
2013-10-09 16:06:43.604000 UTC - 7972[321e0e0]: CACHE: InvalidateCache
2013-10-09 16:06:43.604000 UTC - 0[12c310]: Destroying nsHttpConnectionInfo @dcff530
2013-10-09 16:06:43.604000 UTC - 7972[321e0e0]: CACHE: disk OpenOutputStreamForEntry [d7ce6e8 2 0]

>> here we operate on freed |*this|

2013-10-09 16:06:43.604000 UTC - 0[12c310]: nsHttpChannel::CloseCacheEntry [this=de482e0] mStatus=804b0002 mCacheEntryIsWriteOnly=1 
2013-10-09 16:06:43.604000 UTC - 7972[321e0e0]: CACHE: disk OnDataSizeChange [d7ce6e8 0]
2013-10-09 16:06:43.604000 UTC - 0[12c310]:   dooming cache entry!!

=> crash...


Reproduced by cycling |while ./mach mochitest-plain content/events/test/test_bug409604.html; do true; done| on an opt build (bug 922296)

Solution: kungFuDeathGrip(this) over nsHttpChannel::OnStopRequest
Attached patch v1 (obsolete) — Splinter Review
Attachment #814949 - Flags: review?(mcmanus)
See Also: → 924112
See Also: 924112
Fixing this will fix the bug tree of https://bugzilla.mozilla.org/show_bug.cgi?id=924112

(Stupid bugzilla syncs the |See also| field... grrr!!!!)
Attachment #814949 - Flags: review?(mcmanus) → review+
I was running the same test loop with this patch and it still crashes.  I also realized that the transaction pump is held by the callback event utility class, so it cannot go away.  The chain should live until the bottom of the stack, so there shouldn't be any crash.  We apparently somewhere call release twice on something.  I have to find out where.  It still could be something with the new cache.
Attachment #814949 - Attachment is obsolete: true
Attachment #814949 - Flags: review+
New clue: we call nsHttpChannel::Cancel() on two threads [main/NS_BINDING_ABORTED and html5-parser/NS_ERROR_FAILURE] at almost the same time - with high probability not a safe operation.

Adding Steve, this could be an ODA on non-mainthread issue.
Status: ASSIGNED → NEW
Err: not the html-5 parser, but the image parser (the url is a .png file).

Related to bug 867755 ?
Few data:
- NS_ERROR_FAILURE cancel is coming from imgRequest::ODA since ImageFactory::CreateImage fails
- NS_BINDING_ABORTED cancel is coming as the window is being closed (comes from docshell->loadgroup) // this is the last place I haven't look at, it is probably where the double release happens since in non-crashing logs I don't see this cancel
- doesn't matter which of the two cancels is called sooner, both cases crash // supports the theory
- the concurrent cancel on the channel is caught by |if (mCanceled) return| check
- the listener is imgRequest that releases the channel in imgRequest::OnStopRequest (that is in the bad case the last reference)
- the underlying input stream pump is alive at the moment of the crash (dtor not logged - I've added that log locally)
Caught up reading about this, and had to do a clobber build to investigate more.

The Cancel from imgRequest::ODA is what I was going to look for, so that's good. I think it's similar behavior to test_draggableprop.html, in which an image (happy.png) is referenced but doesn't exist. So, Cancel gets called.

I don't yet have any suggestions about why there are two releases, but it's good to have a codepath to start with.
Got it:

I think this will be cleaner from the two callstacks (that happen simultaneously on two threads:)

Image decoder

 	xul.dll!mozilla::net::nsHttpChannel::Cancel(tag_nsresult status=NS_ERROR_FAILURE)  Line 4309	C++
>	xul.dll!nsLoadGroup::Cancel(tag_nsresult status=NS_ERROR_FAILURE)  Line 290 + 0x18 bytes	C++
 	xul.dll!imgRequest::Cancel(tag_nsresult aStatus=NS_ERROR_FAILURE)  Line 269	C++
 	xul.dll!imgRequest::OnDataAvailable(nsIRequest * aRequest=0x0eb419b4, nsISupports * ctxt=0x00000000, nsIInputStream * inStr=0x15796a00, unsigned __int64 sourceOffset=0, unsigned int count=391)  Line 800	C++


Main thread

>	xul.dll!mozilla::net::nsHttpChannel::Cancel(tag_nsresult status=NS_BINDING_ABORTED)  Line 4309	C++
 	xul.dll!nsLoadGroup::Cancel(tag_nsresult status=NS_BINDING_ABORTED)  Line 290 + 0x18 bytes	C++
 	xul.dll!imgRequest::Cancel(tag_nsresult aStatus=NS_BINDING_ABORTED)  Line 269	C++
 	xul.dll!imgRequest::RemoveProxy(imgRequestProxy * proxy=0x156f60c0, tag_nsresult aStatus=NS_BINDING_ABORTED)  Line 222	C++
 	xul.dll!imgRequestProxy::DoCancel(tag_nsresult status=NS_BINDING_ABORTED)  Line 312	C++
 	xul.dll!imgRequestProxy::imgCancelRunnable::Run()  Line 130	C++


The important part here is that imgRequest->mRequest *is a load group* that does nasty NS_RELEASE() calls on the channel.  nsLoadGroup::Cancel is simply not thread safe.
Summary: Use after free of *this at nsHttpChannel::OnStopRequest → Use after free of *this at nsHttpChannel::OnStopRequest, nsLoadGroup::Cancel not thread-safe
all cancel methods should probably assert main threadedness. (loadgroup and channel too)
Honza, I know you'd prefer making nsLoadGroup::Cancel threadsafe, but after a quick chat with Pat it looks like we'd need to hunt down every implementation of nsIRequest to make this truly threadsafe. So, dispatching to the main thread may be the fastest, simplest option here.

I've set f? on this patch not so much for the code, but more to see if you can reproduce the crash with it (I hope not!). I tried reproducing without the patch locally on my own machine (Mac OSX 10.8), but nothing yet. I'll keep trying, but since you were able to reproduce it already, it seems like you're in a better place to test the patch.

If it does seem to solve the issue, I'll add in Pat's suggestion to assert main thread only on the Cancel methods. Hopefully that won't crash anything else.
Assignee: honzab.moz → sworkman
Status: NEW → ASSIGNED
Attachment #815653 - Flags: feedback?(honzab.moz)
Also, I'm taking PTO on Friday/tomorrow, so feel free to take the bug back if you want it done super quick. Otherwise, I'll get back to it on Monday.
Comment on attachment 815653 [details] [diff] [review]
v1.0 Dispatch nsIRequest::Cancel to main thread in imgRequest::Cancel

No crashes.  I'm also off today.  An image peer must carefully review this.
Attachment #815653 - Flags: feedback?(honzab.moz) → feedback+
No crashes == awesome.
Comment on attachment 815653 [details] [diff] [review]
v1.0 Dispatch nsIRequest::Cancel to main thread in imgRequest::Cancel

Seth: Honza discovered that nsIRequest::Cancel was being called in imgRequest::Cancel OFF the main thread, and concurrently ON the main thread, resulting in two ptr releases and a crash. Converting all the implementations of nsIRequest to be threadsafe is not really feasible, so this patch dispatches the call to the main thread. Can you take a look and see if you can think of any problems with this?

FYI: Here are the results from try:
https://tbpl.mozilla.org/?tree=Try&rev=71b2892eac3d
Attachment #815653 - Flags: review?(seth)
The problem is really not in how each nsIRequest::Cancel is implemented, but in particular only in how nsLoadGroup::RemoveRequest is implemented.

However, safer and more complete solution can be here to post to the main thread (however, if that would delay calling OnStopRequest for consumers that has redirected to a background thread, then I'd be against - it would be just a performance regression and ruin the ODA off main thread work).
Comment on attachment 815653 [details] [diff] [review]
v1.0 Dispatch nsIRequest::Cancel to main thread in imgRequest::Cancel

Review of attachment 815653 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me! r+ with the PR_Sleep gone.

::: image/src/imgRequest.cpp
@@ +260,5 @@
> +  {
> +    MOZ_ASSERT(NS_IsMainThread(), "I should be running on the main thread!");
> +    mImgRequest->ContinueCancel(mStatus);
> +    return NS_OK;
> +  } 

Whitespace.

@@ +290,5 @@
> +{
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  RemoveFromCache();
> +  

Whitespace.

@@ +825,5 @@
>        if (mImage->HasError() && !mIsMultiPartChannel) { // Probably bad mimetype
>          // We allow multipart images to fail to initialize without cancelling the
>          // load because subsequent images might be fine; thus only single part
>          // images end up here.
> +        PR_Sleep(PR_MillisecondsToInterval(1000));

Is this just for debugging?

::: image/src/imgRequest.h
@@ +73,5 @@
>    void CancelAndAbort(nsresult aStatus);
>  
> +  // Called or dispatched by cancel for main thread only execution.
> +  void ContinueCancel(nsresult aStatus);
> +  

Whitespace.
Attachment #815653 - Flags: review?(seth) → review+
(In reply to Seth Fowler [:seth] from comment #16)
> Comment on attachment 815653 [details] [diff] [review]
> v1.0 Dispatch nsIRequest::Cancel to main thread in imgRequest::Cancel
> 
> Review of attachment 815653 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good to me! r+ with the PR_Sleep gone.

Oops!

> @@ +825,5 @@
> >        if (mImage->HasError() && !mIsMultiPartChannel) { // Probably bad mimetype
> >          // We allow multipart images to fail to initialize without cancelling the
> >          // load because subsequent images might be fine; thus only single part
> >          // images end up here.
> > +        PR_Sleep(PR_MillisecondsToInterval(1000));
> 
> Is this just for debugging?

It was; I didn't even use it because Honza figured it out before I got the change to run it :)
Pat, you mentioned having main thread assertions in the loadgroup and the channel. I added some to nsLoadGroup, nsHttpChannel, HttpChannelChild and nsInputStreamPump. I know there are more implemented nsIRequest sub-classes, but this seems like a decent core set to start with. Let me know what you think.

try: https://tbpl.mozilla.org/?tree=Try&rev=87dc59a0bd33
Attachment #817513 - Flags: review?(mcmanus)
Comment on attachment 817512 [details] [diff] [review]
v1.1 Dispatch nsIRequest::Cancel to main thread in imgRequest::Cancel

https://hg.mozilla.org/integration/mozilla-inbound/rev/dd2957ba1805
Setting [leave open] so I can add the main thread assertions once Pat gets a chance to look at it.
Whiteboard: [leave open]
fixed in mozilla central now https://hg.mozilla.org/mozilla-central/rev/dd2957ba1805https://hg.mozilla.org/mozilla-central/rev/dd2957ba1805
Flags: in-testsuite?
Target Milestone: --- → mozilla27
Comment on attachment 817513 [details] [diff] [review]
v1.0 Add main thread assertions for Cancel in nsLoadGroup, nsHttpChannel and nsInputStreamPump

Review of attachment 817513 [details] [diff] [review]:
-----------------------------------------------------------------

these are a good idea - here's an implementation of nsirequest in javascript - https://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/app/AppProtocolHandler.js#101 .. 
it has an empty cancel method, but clearly that's not going to work too well if called off the main thread either :)

it would be nice to make this free threaded, but that's a different (bigger) patch.

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +917,5 @@
>  NS_IMETHODIMP
>  HttpChannelChild::Cancel(nsresult status)
>  {
> +  MOZ_ASSERT(NS_IsMainThread());
> +  

whitespace issue
Attachment #817513 - Flags: review?(mcmanus) → review+
Does this affect 25 and 26? How about ESR24? If so, we probably want an Aurora patch even though it is too late for the Beta branch.
Thanks for checking Al, but this should only affect 27. It was caused by bug 867755, which is targeted for 27.
fixed on central https://hg.mozilla.org/mozilla-central/rev/4fcc79a6234b
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Group: network-core-security → core-security
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: