Closed Bug 1651705 Opened 4 years ago Closed 4 years ago

UAF in nsWindow::LayerViewSupport::OnDetach

Categories

(GeckoView :: General, defect, P1)

Unspecified
Android
defect

Tracking

(firefox-esr68 unaffected, firefox-esr78 unaffected, firefox79 wontfix, firefox80 wontfix, firefox81 wontfix, firefox82+ wontfix, firefox83 fixed)

RESOLVED FIXED
83 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox79 --- wontfix
firefox80 --- wontfix
firefox81 --- wontfix
firefox82 + wontfix
firefox83 --- fixed

People

(Reporter: bugzilla, Assigned: bugzilla)

References

Details

(Keywords: csectype-race, csectype-uaf, sec-high, Whiteboard: [geckoview:m80][geckoview:m81][geckoview:m82][post-critsmash-triage][adv-main83+r])

Attachments

(7 files, 2 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

While working on bug 1637351 I discovered that this is pretty easy to reproduce when e10s-multi is turned on.

It's pretty clear to me that nsWindow::NativePtr and nsWindow::WindowPtr are not thread-safe as intended.

Here we try to instantiate WindowPtr::Locked but its mutex is the good old 0x5e pattern.

I am rewriting these smart pointers so that they actually work.

Whiteboard: [geckoview:m80] → [geckoview:m80][geckoview:m81]

:aklotz triaging as reo for 79, do you have an update on this bug?

Flags: needinfo?(aklotz)

Work in progress. This issue affects some very thorny code at the foundation of GeckoView, so it has been slow going. I've got it compiling again, but it still needs some touch-ups.

Flags: needinfo?(aklotz)

The current patchset is passing on try. I do not intend to push until after the next merge.

Given the access patterns involved on the native side, I think it is safest
to ensure that this field is access atomically by the VM.

  • We rename the existing NativePtr struct to NativePtrTraits, as that is
    more descriptive of what that code actually does;
  • We introduce NativeWeakPtr as a smart pointer type that holds a pointer
    to an object and allows its access in a thread-safe way. See comments.
  • We replace some explicit uses of template types with type deduction via
    auto and decltype(auto). This allows for more use of forward declarations.

Depends on D87360

  • Having AndroidView and GeckoViewSupport as nested classes inside of
    nsWindow make it impossible to forward declare them. We move those classes
    into their own headers. We also move WindowEvent into its own header.

  • We remove the old NativePtr and WindowPtr implementations from nsWindow
    and convert the class definitions in this patch to use the new NativeWeakPtr.

  • GeckoViewSupport had a unique quirk where it was owned by nsWindow
    instead of its Java counterpart. To make GeckoViewSupport's ownership work
    like the other classes that use NativeWeakPtr (and to substantially simplify
    the implementation of NativeWeakPtr itself), I have reversed that: now
    nsWindow holds a NativeWeakPtr to GeckoViewSupport, while
    GeckoViewSupport is owned by its Java counterpart and holds a strong ref to
    the nsWindow.

  • GeckoViewSupport no longer inherits from SupportsWeakPtr, since using it
    with NativeWeakPtr provides stronger and safer guarantees.

Depends on D87361

These conversions are pretty straight forward thanks to the type system.
Basically we take a NativeWeakPtr, call Access() on it, and if the
accessor is truthy, then we call whatever methods we need to call.

Creation of new pointers is done using NativeWeakPtrHolder::Attach() and
detaching of strong references is done by NativeWeakPtr::Detach().

Depends on D87362

This patch is similar to part 4 but for GeckoEditableSupport.

Conversions over to NativeWeakPtr are pretty straight forward thanks to the
type system. Basically we take a NativeWeakPtr, call Access() on it, and
if the accessor is truthy, then we call whatever methods we need to call.

Creation of new pointers is done using NativeWeakPtrHolder::Attach() and
detaching of strong references is done by NativeWeakPtr::Detach().

Depends on D87363

This patch is similar to part 4 but for Android a11y.

Conversions over to NativeWeakPtr are pretty straight forward thanks to the
type system. Basically we take a NativeWeakPtr, call Access() on it, and
if the accessor is truthy, then we call whatever methods we need to call.

Creation of new pointers is done using NativeWeakPtrHolder::Attach() and
detaching of strong references is done by NativeWeakPtr::Detach().

Depends on D87364

Whiteboard: [geckoview:m80][geckoview:m81] → [geckoview:m80][geckoview:m81][geckoview:m82]

This patch is similar to part 4 but for MediaSessionSupport.

Conversions over to NativeWeakPtr are pretty straight forward thanks to the
type system. Basically we take a NativeWeakPtr, call Access() on it, and
if the accessor is truthy, then we call whatever methods we need to call.

Creation of new pointers is done using NativeWeakPtrHolder::Attach() and
detaching of strong references is done by NativeWeakPtr::Detach().

Depends on D87365

Comment on attachment 9170527 [details]
Bug 1651705: Part 1 - Make JNIObject.mHandle volatile; r=#geckoview-reviewers

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: It would be really hard; this patch fixes a bunch of concurrency issues that are extremely timing sensitive. We have no evidence of them surfacing under single-process e10s (which is the current default on beta and release).
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: All supported
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: High risk. Given that we have not seen this bug without turning on e10s-multi, and given that we have no intention to do that on other branches, I do not think we should backport.
  • How likely is this patch to cause regressions; how much testing does it need?: Test suite runs green, however we do want as close to a full Nightly cycle as we can get with these patches, hence landing early in the 82 cycle.
Attachment #9170527 - Flags: sec-approval?
Attachment #9170528 - Flags: sec-approval?
Attachment #9170529 - Flags: sec-approval?
Attachment #9170530 - Flags: sec-approval?
Attachment #9170531 - Flags: sec-approval?
Attachment #9170532 - Flags: sec-approval?
Attachment #9171817 - Flags: sec-approval?

Comment on attachment 9170527 [details]
Bug 1651705: Part 1 - Make JNIObject.mHandle volatile; r=#geckoview-reviewers

Approved to land

Attachment #9170527 - Flags: sec-approval? → sec-approval+
Attachment #9170528 - Flags: sec-approval? → sec-approval+
Attachment #9170529 - Flags: sec-approval? → sec-approval+
Attachment #9170530 - Flags: sec-approval? → sec-approval+
Attachment #9170531 - Flags: sec-approval? → sec-approval+
Attachment #9170532 - Flags: sec-approval? → sec-approval+
Attachment #9171817 - Flags: sec-approval? → sec-approval+

This landed and got backed out mochitest failures on test_resizers_resizing_elements.html:

https://hg.mozilla.org/integration/autoland/rev/fbd07f0d19d2009779ebde0c676fcbce64607371

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=retry%2Ctestfailed%2Cbusted%2Cexception&revision=7fd32a3fb6e750ba944b7d1db9d688cc7cdcdfc7&selectedTaskRun=aulXGHHDRVSdlip_8XrhVg.0

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=314312882&repo=autoland

[task 2020-08-28T19:18:01.747Z] 19:18:01 INFO - 3698 INFO TEST-PASS | editor/libeditor/tests/test_resizers_resizing_elements.html | Resizers for <img>: The width should be increased by 0 pixels
[task 2020-08-28T19:18:01.747Z] 19:18:01 INFO - Buffered messages finished
[task 2020-08-28T19:18:01.747Z] 19:18:01 WARNING - 3699 INFO TEST-UNEXPECTED-FAIL | editor/libeditor/tests/test_resizers_resizing_elements.html | Resizers for <img>: The height should be increased by -10pixels - got 150, expected 140 epsilon: +/- 2
[task 2020-08-28T19:18:01.747Z] 19:18:01 INFO - SimpleTest.isfuzzy@SimpleTest/SimpleTest.js:513:14
[task 2020-08-28T19:18:01.748Z] 19:18:01 INFO - testResizer@editor/libeditor/tests/test_resizers_resizing_elements.html:159:14

etc.

It had landed as:
https://hg.mozilla.org/integration/autoland/rev/2d79f4348b36f4d6f7117b4ca2eedfc5cefdf35d
https://hg.mozilla.org/integration/autoland/rev/9ffca762753ccc17d8f00baa394a9176f2c728ae
https://hg.mozilla.org/integration/autoland/rev/7d9d2d44b2ee2274eb11e59901f77a7df3b3e2ab
https://hg.mozilla.org/integration/autoland/rev/84590e96de88128ecd3308e92c47da0cf979bb7c
https://hg.mozilla.org/integration/autoland/rev/531e71369f6842f64adbf4f89a070312ba497401
https://hg.mozilla.org/integration/autoland/rev/de4edbcb15c112562aa6a2556791f1b112842570
https://hg.mozilla.org/integration/autoland/rev/7fd32a3fb6e750ba944b7d1db9d688cc7cdcdfc7

Flags: needinfo?(aklotz)
Flags: needinfo?(aklotz)
Depends on: 1662313

James I know Aaron is off for a few weeks. Do you know if there was a plan to hand off this work?

Flags: needinfo?(snorp)

If there isn't I can pick it up and fix the test that's preventing this from landing.

Yeah, looks like Agi will try to finish it up.

Flags: needinfo?(snorp)

I rebased the patches and now the browser crashes after every test run in WindowEvent: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=315775315&repo=try&lineNumber=2249 trying to figure out what's going on there.

Update: from what I can tell we're getting a SyncPauseCompositor for a dead window from here: https://searchfox.org/mozilla-central/rev/5efefd3ef214ed6d3234ba245c1da3004ead94e0/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/GeckoSession.java#5865

The window is dead because something calls disposeNative, which I think indicates that the object is GC'd?

I'll keep looking

OK I think I'm getting close, my current theory is that we call .close on the Session and then we get a onSurfaceDestroyed which causes us to call into a dead compositor.

Ok so Comment 21 seems to be correct, there's also another case where we call mWindow.close after we disposed of the window.

Ok so it looks like we get close and dispose in the right order but then we execute them out-of-order, so the close happens after dispose which seems wrong.

09-18 19:40:19.429 19984 19984 I Gecko   : sferrog: WindowEvent constructor nativeClose 0x7de045156e20
09-18 19:40:19.430 19984 19984 I Gecko   : sferrog OnNativeCall nativeDisposeNative
09-18 19:40:19.430 19984 19984 I Gecko   : sferrog: WindowEvent constructor nativeDisposeNative 0x7de045634dd0
09-18 19:40:19.537 19984 19999 I Gecko   : sferrog: IsStaleCall nativeClose 0x7de045156e20
09-18 19:40:19.537 19984 19999 I Gecko   : sferrog: GetHandle 0x2008de
09-18 19:40:19.537 19984 19999 I Gecko   : sferrog: CheckNativeHandle 0
09-18 19:40:19.537 19984 19999 I Gecko   : sferrog: CheckNativeHandle 0 exception

I think I know what's going on. There is a Get which checks for a valid native handle and throws if there isn't one and GetNativeHandle which doesn't (Get uses GetNativeHandle internally)

  • In the old code we used to use GetNativeHandle directly to determine if the window was still alive and ignored a null value
  • In the new code we use Get which throws if the handle is invalid/null

The method erroneously returned false when the object has been disposed.

After we close the window we dispose of the native object, which causes the
compositor to be disposed too.

Thanks Agi! I think what I'll do here is merge your follow-ups into the existing patchset and then land that.

Sounds good to me!

Attachment #9176712 - Attachment is obsolete: true
Attachment #9176711 - Attachment is obsolete: true
Flags: qe-verify-
Whiteboard: [geckoview:m80][geckoview:m81][geckoview:m82] → [geckoview:m80][geckoview:m81][geckoview:m82][post-critsmash-triage]
Whiteboard: [geckoview:m80][geckoview:m81][geckoview:m82][post-critsmash-triage] → [geckoview:m80][geckoview:m81][geckoview:m82][post-critsmash-triage][adv-main83+r]
No longer blocks: 1655077
Regressions: 1684967
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: