Closed
Bug 1478849
Opened 6 years ago
Closed 6 years ago
AddressSanitizer: heap-use-after-free [@ IPC::Channel::Unsound_IsClosed] with READ of size 8
Categories
(Core :: IPC, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | - | unaffected |
firefox61 | --- | wontfix |
firefox62 | + | fixed |
firefox63 | + | fixed |
People
(Reporter: decoder, Assigned: jld)
References
(Blocks 1 open bug)
Details
(5 keywords, Whiteboard: [post-critsmash-triage][adv-main62+])
Attachments
(2 files)
31.61 KB,
text/plain
|
Details | |
759 bytes,
patch
|
mccr8
:
review+
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr60-
|
Details | Diff | Splinter Review |
The attached crash information was submitted via the ASan Nightly Reporter on mozilla-central-asan-nightly revision 63.0a1-20180725103029-https://hg.mozilla.org/mozilla-central/rev/0b9c0b211eb3d84c5afdb64f4a91d079d0aa4c75.
For detailed crash information, see attachment.
Reporter | ||
Comment 1•6 years ago
|
||
Updated•6 years ago
|
Group: core-security → media-core-security
Reporter | ||
Updated•6 years ago
|
No longer blocks: asan-nightly-project
Comment 3•6 years ago
|
||
It looks like maybe ~GMPServiceChild is touching a channel after it was destroyed, via mContentParents. Maybe ~ContentParent needs to call RemoveGMPContentParent()?
Nils, any idea who could look at this?
Flags: needinfo?(drno)
Updated•6 years ago
|
Flags: needinfo?(drno)
Comment 4•6 years ago
|
||
Byron, could you please take a look at this?
Ask cpearce for advice if needed.
Flags: needinfo?(docfaraday)
Updated•6 years ago
|
Rank: 8
Priority: -- → P1
Comment 5•6 years ago
|
||
Wow, look at all those bare pointers in the IPC code.
Comment 6•6 years ago
|
||
So, it is extremely hard to tell what is going on here, but I have a guess. The only place we create a DeleteTask<IPC::Channel> is here:
https://searchfox.org/mozilla-central/source/ipc/glue/ProtocolUtils.cpp#691
GMPContentParent, the thing we're destroying on main (the crashing thread) happens to be an IToplevelProtocol. What I think has happened is we are destroying a GMPContentParent, and are currently running the d'tor code for the IToplevelProtocol. This dispatches a DeleteTask<IPC::Channel> to Chrome_~dThread. Most of the time, this is fine, because that runnable takes a little bit of time to run. But when it is executed before the GMPContentParent is done tearing down, we get a UAF.
This is a bug in the IPC code, I think.
Flags: needinfo?(docfaraday)
Comment 7•6 years ago
|
||
The quick-and-dirty fix to this particular crash would be to destroy |mState| before dispatching the DeleteTask, I think. But given the web of bare pointers in this code, I am not sure.
Comment 8•6 years ago
|
||
Jim, could you help me to find an IPC expert have a look at this and Byron guess work in comment #6?
Flags: needinfo?(jmathies)
Assignee | ||
Comment 9•6 years ago
|
||
I've looked at the code, and comment #6 looks right. The ToplevelState contains the MessageChannel, whose destructor looks at MessageChannel::mLink, which in this case is-a ProcessLink, which has a raw pointer (“weak reference”) to the Transport a.k.a. IPC::Channel.
So that means that IToplevelProtocol::mTrans (owning reference to the Transport) must outlive that object's IProtocol::mState (borrowed reference to the Transport). And it doesn't, but comment #6's suggestion of clearing mState before sending the runnable looks reasonable.
Also, IToplevelProtocol::mTrans itself doesn't seem to be used much; it looks doable to move the Transport ownership into ProcessLink. That might want to be a followup instead.
(Maybe not directly relevant, but I see that bug 1283744 has some historical context on Transport management.)
Assignee: nobody → jld
Component: Audio/Video → IPC
Flags: needinfo?(jmathies)
Assignee | ||
Comment 10•6 years ago
|
||
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #9)
> Also, IToplevelProtocol::mTrans itself doesn't seem to be used much; it
> looks doable to move the Transport ownership into ProcessLink. That might
> want to be a followup instead.
Not quite. The transport owner isn't always IToplevelProtocol::mTrans; for the initial actor (like PContent or PGPU) it seems to be
ChildThread::channel_ on the child side and ChildProcessHost::channel_ on the parent side. IToplevelProtocol::mTrans appears to be used only for toplevel actors created via Endpoint::Bind. Simplifying that is *definitely* outside the scope of this bug.
Assignee | ||
Comment 11•6 years ago
|
||
Attachment #9001807 -
Flags: review?(continuation)
Updated•6 years ago
|
Attachment #9001807 -
Flags: review?(continuation) → review+
Assignee | ||
Comment 12•6 years ago
|
||
About the possibility of regressions from other raw-pointer issues, I haven't noticed any while looking through the code, but to be a little more rigorous about it:
This patch causes the destruction of mState to happen-before the destruction of mTrans, when previously they were unordered, so if there's a vulnerability from that ordering then it was already possible without the patch (especially because that ordering seems to have been what usually happened in practice, e.g. in our ASan CI runs).
It also causes mState to be destroyed before the other members of IToplevelProtocol and IProtocol, but they're all POD (integers, enums, raw pointers not touched by destructors) so there shouldn't be any interactions there.
Assignee | ||
Comment 13•6 years ago
|
||
Also, don't rely on this without double-checking it, but I think this isn't usefully exploitable: the only thing we're doing to the freed `Channel` is reading the `channel_impl_` pointer, and then that's used only to read a `bool` which is used to decide whether or not to MOZ_CRASH.
There is a virtual call to MessageLink::Unsound_IsClosed in this, but the object it's called on is the ProcessLink, which isn't freed (MessageChannel::mLink) until later in MessageChannel::Clear. The methods that are called on the Channel and ChannelImpl are non-virtual.
Assignee | ||
Comment 14•6 years ago
|
||
Comment on attachment 9001807 [details] [diff] [review]
Patch
This bug doesn't have a severity rating yet, but I'm requesting sec-approval in case it needs it.
[Security approval request comment]
* How easily could an exploit be constructed based on the patch?
Not simple, but probably doable: it's a use-after-free race condition between two threads in the parent process, and it looks like content could trigger it by creating/destroying EME or WebRTC contexts but it's not trivial to delay the thread that needs to be slower. There'd also need to be some parent-process heap manipulation to do anything with it. And see also comment #13 about severity.
* Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
There are no comments in this version, but it's hard to hide: it's pretty obviously rearranging destruction order near something async/thread related.
* Which older supported branches are affected by this flaw?
All of them. This code hasn't changed much since bug 1283744 (and it may have been more broken before that).
* Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
The patch crossports cleanly to beta and esr60.
* How likely is this patch to cause regressions; how much testing does it need?
Not likely; see comment #12.
Attachment #9001807 -
Flags: sec-approval?
Reporter | ||
Comment 15•6 years ago
|
||
Fwiw, unlike all other use-after-free reports in the ASan Nightly Project, this one is actually frequent. I got multiple reports from different people, in total 6 reports. That might sound low, but given a total population of around 100 clients, this is very frequent for such an issue. So this might indeed have some stability impact as well.
:jld, what is the MOZ_CRASH that you spoke of in comment 13? Can we see that in crash-stats?
Comment 16•6 years ago
|
||
going with sec-moderate based on comment 13
Although it's not directly this crash, comment 5 is worrying.
Keywords: csectype-uaf,
sec-moderate
Assignee | ||
Comment 17•6 years ago
|
||
https://searchfox.org/mozilla-central/rev/71ef4447db179639be9eff4471f32a95423962d7/ipc/glue/MessageChannel.cpp#804-831
There are a lot of "MessageChannel destroyed without being closed" crashes; all the ones I looked at were during shutdown, when the cycle collector destroyed a ContentParent, and I don't know if they're caused by this bug or because the channel is actually still open.
Assignee | ||
Comment 18•6 years ago
|
||
Comment on attachment 9001807 [details] [diff] [review]
Patch
sec-moderate, so doesn't need approval.
Landed: https://hg.mozilla.org/integration/mozilla-inbound/rev/9c34c00e44d0b021fb1d5ac4bf464b9de4955374
Attachment #9001807 -
Flags: sec-approval?
Updated•6 years ago
|
Flags: sec-bounty+
Comment 19•6 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/9c34c00e44d0b021fb1d5ac4bf464b9de4955374
https://hg.mozilla.org/mozilla-central/rev/9c34c00e44d0
Group: media-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Comment 20•6 years ago
|
||
Please request Beta and ESR60 approval on this when you get a chance.
status-firefox61:
--- → wontfix
status-firefox62:
--- → affected
status-firefox-esr52:
--- → wontfix
status-firefox-esr60:
--- → affected
tracking-firefox62:
--- → +
tracking-firefox63:
--- → +
tracking-firefox-esr60:
--- → 62+
Flags: needinfo?(jld)
Assignee | ||
Comment 21•6 years ago
|
||
Comment on attachment 9001807 [details] [diff] [review]
Patch
Approval Request Comment
[Feature/Bug causing the regression]: Unclear, because it's a very old bug — at least since the bug 1283744 cleanup and probably before that.
[User impact if declined]: This is a use-after-free that can be triggered by content; I have doubts about its exploitability, but I wouldn't bet on that. See comment #16; also see comment #15 about frequency in real-world use.
[Is this code covered by automated tests?]: Yes; this is core IPC code.
[Has the fix been verified in Nightly?]: Landed in m-c in comment #19.
[Needs manual test from QE? If yes, steps to reproduce]: No. It's an intermittent bug and we don't have STR.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: Low risk.
[Why is the change risky/not risky?]: This essentially just imposes an ordering on some things we were previously doing in parallel; see comment #12 for more analysis.
[String changes made/needed]: None.
Flags: needinfo?(jld)
Attachment #9001807 -
Flags: approval-mozilla-esr60?
Attachment #9001807 -
Flags: approval-mozilla-beta?
Comment 22•6 years ago
|
||
Comment on attachment 9001807 [details] [diff] [review]
Patch
Fixes an IPC security issue. Approved for 62.0b20 and ESR 60.2.
Attachment #9001807 -
Flags: approval-mozilla-esr60?
Attachment #9001807 -
Flags: approval-mozilla-esr60+
Attachment #9001807 -
Flags: approval-mozilla-beta?
Attachment #9001807 -
Flags: approval-mozilla-beta+
Comment 23•6 years ago
|
||
uplift |
Reporter | ||
Comment 24•6 years ago
|
||
When you find some time, it would be very helpful if you could answer the following questions so I can work on improving ASan Nightly further. In particular, many of the reports we saw about use-after-free were sent only once, not multiple times. So I'm asking myself what changes to ASan Nightly I can make to improve overall uaf detection rate:
1) Do you think more frequent GC/CC would have helped detecting this problem more frequently?
2) Do you think GC/CC on specific DOM events (e.g. beforeunload, pagehide, DOMContentLoaded) would have helped?
3) Do you think triggering the event loop in the specific child processes would have helped here? (e.g. dummy XMLHttpRequest onbeforeunload, useful in fuzzing).
4) Do you have any other ideas to reproduce this particular problem better or do you think any of the above points (or other approaches) might help to find other use-after-free issues in areas that you work in?
Any feedback would be greatly appreciated. Thanks!
Flags: needinfo?(jld)
Comment hidden (obsolete) |
Comment 26•6 years ago
|
||
backout |
Assignee | ||
Comment 27•6 years ago
|
||
60 predates bug 1451363 and the state factoring, but I forgot about that while digging through older history. Mea culpa.
Pre-1451363, the MessageChannel is an instance variable of each autogenerated abstract actor class (the PFooSide classes, which inherit from IToplevelProtocol), named mChannel but not the same as IProtocol::mChannel, and accessible through the virtual GetIPCChannel methods.
Which means that the MessageChannel and ProcessLink are already destroyed before entering ~IToplevelProtocol, and this bug is actually a regression caused by bug 1451363, and ESR 60 isn't affected.
Assignee | ||
Comment 28•6 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #24)
> 4) Do you have any other ideas to reproduce this particular problem better
> or do you think any of the above points (or other approaches) might help to
> find other use-after-free issues in areas that you work in?
For this UAF, an async runnable/task needs to run before something else that's done synchronously, which more or less means that the first thread needs to be preempted. This is the kind of thing that rr chaos mode approaches more generally, but for this specific case we might get some mileage out of doing a sched_yield() and/or a short sleep when posting a runnable/task to another thread.
Flags: needinfo?(jld)
Updated•6 years ago
|
Updated•6 years ago
|
Attachment #9001807 -
Flags: approval-mozilla-esr60+ → approval-mozilla-esr60-
Updated•6 years ago
|
Updated•6 years ago
|
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Updated•6 years ago
|
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main62+]
Updated•5 years ago
|
Group: core-security-release
Updated•5 years ago
|
Blocks: asan-maintenance
Updated•6 months ago
|
Keywords: reporter-external
You need to log in
before you can comment on or make changes to this bug.
Description
•