AddressSanitizer: heap-use-after-free [@ MOZ_Z_inflate_fast] with READ of size 1 through [@ mozilla::net::nsHTTPCompressConv::OnDataAvailable]
Categories
(Core :: Networking: HTTP, defect, P1)
Tracking
()
People
(Reporter: decoder, Assigned: mayhemer)
References
(Blocks 1 open bug)
Details
(4 keywords, Whiteboard: [necko-triaged][post-critsmash-triage][adv-main68+][adv-esr60.8+])
Attachments
(3 files, 1 obsolete file)
16.92 KB,
text/plain
|
Details | |
47 bytes,
text/x-phabricator-request
|
abillings
:
sec-approval+
|
Details | Review |
6.09 KB,
patch
|
RyanVM
:
approval-mozilla-esr60+
|
Details | Diff | Splinter Review |
The attached crash information was submitted via the ASan Nightly Reporter on mozilla-central-asan-nightly revision 68.0a1-20190328131410-https://hg.mozilla.org/mozilla-central/rev/4e2ea1a75e878ae392e4775f2eddd9f83d1b008e.
For detailed crash information, see attachment.
Reporter | ||
Comment 1•5 years ago
|
||
Updated•5 years ago
|
Comment 2•5 years ago
|
||
The "use" and "allocation" stacks look the same.
The "use" stack is thread T35 is in the middle of a call to HttpChannelChild::DoOnDataAvailable():
nsresult rv = mListener->OnDataAvailable(aRequest, aStream, offset, count);
The "free" stack is thread T0 calling ReleaseListeners() which I assume nulls out mListener. I don't know what sort of interleaving there is allowed for HttpChannelChild.
One possible fix would be to locally root mListener in HttpChannelChild::DoOnDataAvailable(), if somehow that is getting called simultaneously with ReleaseListeners(). But I don't know how this code is expected to work.
Comment 3•5 years ago
|
||
That's a good question...
It looks like we're doing background data delivery in an HTTP channel, with a gzip decoder in the data pipe. That's the HttpChannelChild::OnTransportAndData bits. That's coming off an event we queued in HttpBackgroundChannelChild::RecvOnTransportAndData.
While that's all going on, on the main thread we have what looks like an IPC error of some sort (MessageChannel::OnNotifyMaybeChannelError) and start tearing down actors: PContentChild::OnChannelClose -> PContentChild::DestroySubtree -> PNeckoChild::DestroySubtree -> HttpChannelChild::ActorDestroy.
That ends up calling into HttpBaseChannel::DoNotifyListener.
At this point, afaict, we have already lost. We are not done with the off-main-thread OnDataAvailable notification, and now we're sendin an OnStopRequest on the main thread to the same listener. That's likely to cause races in the listener and its following listeners, even apart from the race we end up with ourselves on our mListener.
It seems to me that HttpChannelChild::DoOnDataAvailable should have a mutex protecting mCanceled(!) and once it takes that mutex it should do that check, then set a boolean indicating we are calling OnDataAvailable, release the mutex, do the call, take the mutex, unset the boolean, and return.
In HttpChannelChild::ActorDestroy we should then take the same mutex, check whether the "we are delivering data" boolean is set", and if so either queue a runnable to call HandleAsyncAbort (possibly after checking the same boolean) or signal DoOnDataAvailable to queue the runnable when it unsets its boolean or something.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 5•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #3)
That's a good question...
It looks like we're doing background data delivery in an HTTP channel, with a gzip decoder in the data pipe. That's the HttpChannelChild::OnTransportAndData bits. That's coming off an event we queued in HttpBackgroundChannelChild::RecvOnTransportAndData.
While that's all going on, on the main thread we have what looks like an IPC error of some sort (MessageChannel::OnNotifyMaybeChannelError) and start tearing down actors: PContentChild::OnChannelClose -> PContentChild::DestroySubtree -> PNeckoChild::DestroySubtree -> HttpChannelChild::ActorDestroy.
That ends up calling into HttpBaseChannel::DoNotifyListener.
At this point, afaict, we have already lost. We are not done with the off-main-thread OnDataAvailable notification, and now we're sendin an OnStopRequest on the main thread to the same listener. That's likely to cause races in the listener and its following listeners, even apart from the race we end up with ourselves on our mListener.
It seems to me that HttpChannelChild::DoOnDataAvailable should have a mutex protecting mCanceled(!) and once it takes that mutex it should do that check, then set a boolean indicating we are calling OnDataAvailable, release the mutex, do the call, take the mutex, unset the boolean, and return.
In HttpChannelChild::ActorDestroy we should then take the same mutex, check whether the "we are delivering data" boolean is set", and if so either queue a runnable to call HandleAsyncAbort (possibly after checking the same boolean) or signal DoOnDataAvailable to queue the runnable when it unsets its boolean or something.
this code is not really build to handle IPC channel errors. We should make this behave nicer.
Can we find out what is the IPC error?
Comment 6•5 years ago
|
||
Bug 1542153 has a similar situation which is not obviously an IPC error. Looks like it might just be teardown of the PContenChild actor, which tears down the actors it manages...
Assignee | ||
Comment 7•5 years ago
|
||
First, we must kungfudeathgrip mListener at https://searchfox.org/mozilla-central/rev/7944190ad1668a94223b950a19f1fffe8662d6b8/netwerk/protocol/http/HttpChannelChild.cpp#979
This is easy to do and fixes the UAF.
Second, we must enqueue OnStopRequest call on the listener from the aborter same/similar way as we do at https://searchfox.org/mozilla-central/rev/7944190ad1668a94223b950a19f1fffe8662d6b8/netwerk/protocol/http/HttpChannelChild.cpp#1013
The contract is that onStart/onData/onStop are delivered in sequence when onData is retargetted; we don't call onData until onStart exits, and similarly, we don't call onStop until all onData are handled.
This second part is harder, we have to write DoNotifyListener for child (doesn't need to be virtualized, because the HttpAsyncAborter is templated with the implementor class type!) that will enqueue onStop call.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 8•5 years ago
|
||
Assignee | ||
Comment 9•5 years ago
|
||
Assignee | ||
Comment 10•5 years ago
|
||
Comment on attachment 9063545 [details]
Bug 1540759, r=valentin
Security Approval Request
- How easily could an exploit be constructed based on the patch?: One would need good knowledge of the code (even I can't easily see how this works ;)) and force the parent process to be killed at the right time to invoke actorDestroyed. this is all highly racy and the crash would be on the content process only anyway
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: all fot hem
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: we have changed the code recently in bug 1530691, currently on beta; so backporting this to release/esr may be a bit tricky, but not impossible
- How likely is this patch to cause regressions; how much testing does it need?: hard to say, but I gave my self a hard a time to understand all the possible flows here and this is the right fix that should not cause any trouble under a normal run
I hope that we cover this bit with our automated tests well enough, but hard to say.
Comment 11•5 years ago
|
||
Comment on attachment 9063545 [details]
Bug 1540759, r=valentin
sec-approval+ for trunk.
Assignee | ||
Updated•5 years ago
|
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
Comment 14•5 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #10)
- If not, how different, hard to create, and risky will they be?: we have changed the code recently in bug 1530691, currently on beta; so backporting this to release/esr may be a bit tricky, but not impossible
Is this something we need on ESR60? Given the rating, I'm inclined to say yes unless this is prohibitively difficult to backport.
Assignee | ||
Comment 15•5 years ago
|
||
I can try to have a patch. This has baked w/o problems long enough. Let's do it. Deadline for a patch?
Comment 16•5 years ago
|
||
60.8esr is due to ship on July 9, so there's time :)
Updated•5 years ago
|
Comment 20•5 years ago
|
||
I think you can do that ESR60 approval request now :)
Assignee | ||
Comment 21•5 years ago
|
||
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration:
- User impact if declined: UAF
- Fix Landed on Version: 68
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): sticks well and affects only older code.
- String or UUID changes made by this patch: none
Assignee | ||
Comment 22•5 years ago
|
||
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Comment 21
- User impact if declined:
- Fix Landed on Version:
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
- String or UUID changes made by this patch:
Comment 23•5 years ago
|
||
Comment on attachment 9075207 [details] [diff] [review] esr60 Fixes a Necko sec bug. Approved for 60.8esr.
Comment 24•5 years ago
|
||
Updated•5 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Description
•