Closed Bug 898563 Opened 6 years ago Closed 6 years ago

Parent side of APZC messaging needs to be hardened against malicious children

Categories

(Core :: Graphics: Layers, defect)

All
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla30
Tracking Status
firefox23 --- wontfix
firefox24 --- wontfix
firefox25 --- wontfix
firefox26 --- wontfix
firefox27 --- wontfix
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
firefox-esr17 --- wontfix
firefox-esr24 --- wontfix
b2g18 --- wontfix
b2g-v1.2 --- wontfix
b2g-v1.3 --- wontfix
b2g-v1.3T --- wontfix
b2g-v1.4 --- fixed
seamonkey2.26 --- fixed

People

(Reporter: kats, Assigned: kats)

References

Details

(Keywords: csectype-priv-escalation, sec-critical, Whiteboard: [qa-][adv-main30-])

Attachments

(1 file, 1 obsolete file)

Discussing with :bsmith and :jrmuizel the other day, we realized that there are probably entry points into the AsyncPanZoomController that could be exploited by malicious child processes. The messaging across the TabChild -> TabParent bridge is often passed directly through the RenderFrameParent and on to the AsyncPanZoomController and invalid data there could cause divide-by-zeros and other manner of crashes. I'm not sure to what extent we need to harden this code but it's definitely something I hadn't considered while working on it, and so deserves some attention.
Thanks for filing the bug. In the future, please make sure you check the "Security-Sensitive Core Bug" checkbox; sorry I forgot to remind you about this before.
Group: core-security
Do you think should this be rated sec-other because this is a preventative hardening measure or sec-high (or whatever) because this seems like something that can probably be exploited right now?
I think the worst that can happen right now from is crashing the parent process or causing it to render strange things. I don't think those are arbitrarily exploitable (but I could be wrong) so based on that I would go with sec-other for preventative hardening.
Keywords: sec-other
In email, sicking wrote:
"If you can draw over another application in such a way that you can
click-jack the dialer app, then that sounds like a sg:critical bug to
me. Crashing the parent process is something that we at least need to make
partners aware of. I think generally they would consider that a
blocker."
We need an owner.

Milan, any lucky person on your team?
Flags: needinfo?(milan)
We'd need B2G to explicitly confirm this as critical before we can staff it.
Flags: needinfo?(milan)
Setting koi? in the hopes that will get this on some B2G radar.
Lukas, another graphics issue that seems to not be getting any traction, but is potentially serious. Can you help get someone on this? Thanks.
Flags: needinfo?(ladamski)
Flags: needinfo?(ladamski)
Apologies, I meant that for Lucas. Should we ask someone else? Thank you.
Flags: needinfo?(ladamski)
Seems pretty serious to me if you can compromise the privileged process from content space.  Adding PaulJT for a second opinion, since he doesn't seem to be on here.
Flags: needinfo?(ladamski) → needinfo?(ptheriault)
blocking-b2g: koi? → 1.3?
So this is serious, but honestly until we have the kernel layer sandbox in place (bug 790923) there are probably lots of other ways a child can escalate privileges after gaining arbitrary code execution. I don;t have any specific guidance, but my gut feel on the timeline for sandboxing is that well currently it has landed on master, we are working on getting Qualcomm to enable for 1.3. Then we need to fix all the APIs (like this one) that need their IPC mechanisms hardended. So maybe 1.4 ? Thats a hopeful guess.


PS I am assuming here that a web page would first need an exploit to gain privileged to execute arbitrary code/chrome privileged js in order to interface with AsyncPanZoomController in a way to cause an issue.If however this is something exposed to content js, then this is much more serious.
Flags: needinfo?(ptheriault)
(In reply to Paul Theriault [:pauljt] On PTO back 28th Oct from comment #11)
> PS I am assuming here that a web page would first need an exploit to gain
> privileged to execute arbitrary code/chrome privileged js in order to
> interface with AsyncPanZoomController in a way to cause an issue.If however
> this is something exposed to content js, then this is much more serious.

Correct, the interface is not exposed to content JS.
(In reply to Milan Sreckovic [:milan] from comment #6)
> We'd need B2G to explicitly confirm this as critical before we can staff it.

Yes, this is critical.

Being able to draw outside of the <iframe> which renders a child process is very serious.

If you can crash the parent process then it's an sg:crit and should be backported to as many releases as we can.

The fact that we may have other bugs of the same magnitude doesn't mean that we should put less focus on this one.
Kats, let's start talking about what would be involved in doing this in detail.  It may need some design time, so even if we can't start coding now, we should start looking at it.
Flags: needinfo?(bugmail.mozilla)
I want to understand better the threat model here. What are we assuming the child process can do? What can it not do? I assume that at a minimum it can send arbitrary IPC messages across the TabChild/TabParent interface, and so can invoke any of the PBrowser interface APIs with any parameters. Are there more severe threats we need to guard against? one thing that comes to mind is that Randall Barker is working on some code that might end up adding a shmem buffer to share some data between the child and parent processes. Does that need extra protection?

The next thing I'd like to know is the scope of this bug. The APZC messaging between the child/parent process is only a small-ish part of the PBrowser interface, so is this bug limited in scope to those functions, or everything in PBrowser?
Flags: needinfo?(bugmail.mozilla)
Kats, I agree - you're the one that added the bug, so I was assuming you can give us those answers.
Unfortunately not. I can define the scope of the bug if you like (my second question) but somebody more familiar with the security model on B2G should answer the first. I filed the bug after getting pulled into a discussion that bsmith and jrmuizel were having in the office about attack vectors via these APIs.

Perhaps Paul can provide more info on the exact threat model we need to protect against? From comment 11 it sounds like there is already some work happening for sandboxing and such.
Flags: needinfo?(ptheriault)
We assume that an attacker has tricked the user to navigate to a website controlled by the hacker. For example by emailing the user a link to the website or by posting a link on a forum.

We then assume that the hacker has found a bug somewhere in gecko which allows the attacker to execute arbitrary binary ARM instructions. For example by finding a buffer overrun, a double-free, a dangling vtable pointer or similar. These are found on a pretty regular basis in most browsers.

However we assume that the bug that the attacker found only lets the attacker execute arbitrary binary code in the child process. I.e. while a webpage running in a child process can cause various gecko code paths to be executed in the parent process, this is a much smaller set of code paths. And we do some amount of checking in the parent process whenever we receive data from a child process, so it's harder to exploit bugs in these code paths.

In other words, we assume that the attacker has full control of a child process running on the user's device.

A practical implication of this is that the attacker can send arbitrary IPDL messages (including arbitrary messagemanager messages) from the child process to the parent process. The attacker can also access any memory in the child process to find identifier information used when communicating over IPDL.

In this situation we do not want the attacker to be able to read information from other applications. And we don't want the attacker to be able to trigger gecko bugs in the parent process which would allow the attacker to run arbitrary binary code in the parent process. And we don't want the attacker to do things like read/write data to the SD card or other "shared" areas, or read GPS information, etc, unless the hacked app itself already had that ability.

Obviously we can never make that 100% foolproof. However any and all bugs that we find that allow a hacked child process to do more than the app should be able to do is considered a critical bug and should be fixed as soon as possible and backported as far as possible. Just like with sg:crit bugs in desktop firefox.


We do expect the problem of hacked childprocesses to be much more common on phones than it is on desktop because updating phone software happens much more rarely than on desktop. This means that bugs that are found and fixed on desktop will linger much longer on users phones.


I hope that clarifies things?
Yup, that answers my questions. Thanks for the detailed response!
Flags: needinfo?(ptheriault)
Ok, so looking at the PBrowser.ipdl interface methods related to APZC, the only one that allows cross-content-process leakage is ContentReceivedTouch, because that allows a malicious child process to pass in a ScrollableLayerGuid with a layers id for a different content process. We can change that to only pass the presShellId and ViewID, or we can verify on the parent side that the layers id is what we expect it to be. The latter approach might even help us detect a bad child process.

I'll keep looking through the APIs and update this bug with things I find that would need to be fixed.
Assigning to myself so I don't forget about this, but I'm going on PTO so I'll continue when I get back. If anybody else wants to pick it up while I'm gone please do.
Assignee: nobody → bugmail.mozilla
Thanks Jonas, even I understand it now :)  Given the timing, this won't make it into 1.3.  Making 1.4 blocking to make sure it doesn't get lost, but if we can get it into 29, even better.
blocking-b2g: 1.3? → 1.4+
Any update here? (Happy Holidays!)
This guards against the one hole I described above.
Attachment #8350617 - Flags: review?(botond)
Comment on attachment 8350617 [details] [diff] [review]
Guard against bad layers id in ContentReceivedTouch

Actually wait, I want to think about this a bit more.
Attachment #8350617 - Flags: review?(botond)
My concern now is that if the child process hijacks the layer tree it sends to the parent process, then it claim the entire screen is in its hit-test region (once bug 945203 lands) and so the APZ code will direct all gestures to it. If we implement bug 920036 then the situation will be made worse because we won't be doing Gecko hit detection in the root process at all and so won't have any way to check for this (whereas now we can assert that mLayersId == aOutTargetGuid.mLayersId at [1]).

Roc, do you have any thoughts on this?

[1] http://mxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent.cpp?rev=8a8ecfe64d03#909
Flags: needinfo?(roc)
(In reply to David Bolter [:davidb] from comment #23)
> Any update here? (Happy Holidays!)

Happy Holidays indeed!  Unless this becomes a non-theoretical attack, our current evaluation of the priority of this feature work is low enough that it will not be done in the first half of 2014.
Assignee: bugmail.mozilla → nobody
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #26)
> My concern now is that if the child process hijacks the layer tree it sends
> to the parent process, then it claim the entire screen is in its hit-test
> region (once bug 945203 lands) and so the APZ code will direct all gestures
> to it. If we implement bug 920036 then the situation will be made worse
> because we won't be doing Gecko hit detection in the root process at all and
> so won't have any way to check for this (whereas now we can assert that
> mLayersId == aOutTargetGuid.mLayersId at [1]).
> 
> Roc, do you have any thoughts on this?

I don't see a problem here.

The child process can construct a layer tree that claims to (or actually does) catch events everywhere. However, that layer tree gets plugged into the master layer tree at some RefLayer. If that RefLayer is in a container with a cliprect, then the child layer tree can't render or catch events outside that cliprect. Likewise, its rendering and event handling is restricted to the z-plane the RefLayer is in. So it can't hijack content above its z-order or outside the cliprect. All assuming compositing and APZC event handling are implemented correctly.

In practice, an <iframe> element clips its children (unless that's disabled using some very special hacks that Gaia should not be using). This means that a child process can render and catch events everywhere inside its <iframe> but not outside, even after bug 945203 and bug 920036 are fixed.
Flags: needinfo?(roc)
Do Roc's comments in comment 28 address the whole problem? Or are there still outstanding concerns that need to be addressed?

CritSmash: this is currently a sec-critical bug that has been won't-fixed for seven releases. We might want to re-evaluate the problem, the rating or both, based on our latest understanding.
Roc's comments address the thing I brought up in comment 26. Architecturally I think everything is fine; it's just the few entry points here and there that need hardening against bad input.
Thanks Kats. Are we worried enough about such entry points to continue rating this sec-critical? If so we need to find an owner.
Flags: needinfo?(bugmail.mozilla)
I'm not sure what qualifies as sec-critical but I wouldn't think this bug qualifies. The patch I attached already is probably sufficient, but I'll want to give it another once-over at some point to make sure.
Flags: needinfo?(bugmail.mozilla)
Assignee: nobody → bugmail.mozilla
With the current priorities, this is not going to get into Gecko 30.
blocking-b2g: 1.4+ → ---
Updated patch to harden a couple more possible failure points. From a manual inspection of the code I think this patch fixes everything but it might be nice to have somebody run a fuzzer to check.

Try push for this patch at https://tbpl.mozilla.org/?tree=Try&rev=86d19d27ab50
Attachment #8350617 - Attachment is obsolete: true
Attachment #8379854 - Flags: review?(botond)
Attachment #8379854 - Flags: review?(botond) → review+
Comment on attachment 8379854 [details] [diff] [review]
Guard against bad child->parent data in APZ functions

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
- Not very easily. It requires completely 0wning the child process first (which this patch doesn't consider at all) and then crafting messages to send over the IPC channel. Even in the worst case I don't think they could do anything to hijack the parent process because of this.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
- A little bit, yeah. But even without the comments it's pretty obvious what can be done here.

Which older supported branches are affected by this flaw?
- Any release where we had content processes using APZ. In b2g 1.1 and 1.2 this was only in the browser content processes. In b2g 1.3 all apps are also affected.

If not all supported branches, which bug introduced the flaw?
- I think this has been in since the first b2g release.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
- I haven't backported this patch but it should be pretty easy to do.

How likely is this patch to cause regressions; how much testing does it need?
- Unlikely. All the conditions that are caught are bad conditions; if they cause regressions they are just uncovering other latent bugs in the code that will need fixing anyway.
Attachment #8379854 - Flags: sec-approval?
Comment on attachment 8379854 [details] [diff] [review]
Guard against bad child->parent data in APZ functions

sec-approval+ for trunk. If you and the others want it on older branches, please nominate this. 

It isn't clear to me that we need this on older branches for non-B2G releases.
Attachment #8379854 - Flags: sec-approval? → sec-approval+
https://hg.mozilla.org/mozilla-central/rev/21bbea0bdc12
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Marking [qa-] for verification based on lack of actionable test case.
Whiteboard: [qa-]
Whiteboard: [qa-] → [qa-][adv-main30+]
Whiteboard: [qa-][adv-main30+] → [qa-][adv-main30-]
Applied cleanly onto SeaMonkey 2.26.1 (Gecko 29 based)

https://hg.mozilla.org/releases/mozilla-release/rev/de4297e4ffcd
Group: core-security
You need to log in before you can comment on or make changes to this bug.