Closed Bug 1293709 (CVE-2017-5392) Opened 8 years ago Closed 8 years ago

nsWindow::LayerViewSupport's weak reference used on multiple threads

Categories

(Core Graveyard :: Widget: Android, defect, P1)

All
Android
defect

Tracking

(fennec51+, firefox51 fixed)

RESOLVED FIXED
mozilla51
Tracking Status
fennec 51+ ---
firefox51 --- fixed

People

(Reporter: mayhemer, Assigned: jchen)

References

Details

(Keywords: csectype-race, sec-moderate, Whiteboard: [post-critsmash-triage][adv-main51+])

Attachments

(3 files, 6 obsolete files)

6.28 KB, patch
jchen
: review+
Details | Diff | Splinter Review
18.08 KB, patch
jchen
: review+
Details | Diff | Splinter Review
27.92 KB, patch
jchen
: review+
Details | Diff | Splinter Review
Based on bug 956338, see https://treeherder.mozilla.org/#/jobs?repo=try&revision=cfc312760e8c4c6103e05b80db1b192d64fe337b for android, log: https://treeherder.mozilla.org/logviewer.html#?job_id=25388610&repo=try#L1704

Weak references are not thread safe.

 09:35:44     INFO -   0  libxul.so!mozilla::detail::WeakReference<nsWindow::GeckoViewSupport>::get [WeakPtr.h:cfc312760e8c : 137 + 0x0]
 09:35:44     INFO -       r0 = 0x5d7dd7d7    r1 = 0x96b33039    r2 = 0x96b33039    r3 = 0x5df31f08
 09:35:44     INFO -       r4 = 0x00000000    r5 = 0x61e24b60    r6 = 0x2a2964b0    r7 = 0x62bff498
 09:35:44     INFO -       r8 = 0x62bff4a8    r9 = 0x5347cedc   r10 = 0x2a295de0   r12 = 0x00000003
 09:35:44     INFO -       fp = 0x62bff4bc    sp = 0x62bff430    lr = 0x5cb4c86b    pc = 0x5cb4c86a
 09:35:44     INFO -      Found by: given as instruction pointer in context
 09:35:44     INFO -   1  libxul.so!mozilla::jni::detail::NativePtr<nsWindow::LayerViewSupport, true>::Get<mozilla::jni::LocalRef<mozilla::java::LayerView::Compositor> > [WeakPtr.h:cfc312760e8c : 246 + 0x5]
 09:35:44     INFO -       r4 = 0x4c7a3f78    r5 = 0x2a2964b0    r6 = 0x2a2964b0    r7 = 0x62bff498
 09:35:44     INFO -       r8 = 0x62bff4a8    r9 = 0x5347cedc   r10 = 0x2a295de0    fp = 0x62bff4bc
 09:35:44     INFO -       sp = 0x62bff448    pc = 0x5cb4f7c3
 09:35:44     INFO -      Found by: call frame info
 09:35:44     INFO -   2  libxul.so!mozilla::jni::detail::ProxyNativeCall<nsWindow::LayerViewSupport, mozilla::java::LayerView::Compositor, false, false>::operator() [Natives.h:cfc312760e8c : 345 + 0x5]
 09:35:44     INFO -       r3 = 0x0000001a    r4 = 0x4c7a3f78    r5 = 0x62bff490    r6 = 0x2a2964b0
 09:35:44     INFO -       r7 = 0x62bff498    r8 = 0x62bff4a8    r9 = 0x5347cedc   r10 = 0x2a295de0
 09:35:44     INFO -       fp = 0x62bff4bc    sp = 0x62bff458    pc = 0x5cb4f805
 09:35:44     INFO -      Found by: call frame info
 09:35:44     INFO -   3  libxul.so!nsWindow::LayerViewSupport::OnNativeCall<mozilla::jni::detail::ProxyNativeCall<nsWindow::LayerViewSupport, mozilla::java::LayerView::Compositor, false, false> > [nsWindow.cpp:cfc312760e8c : 958 + 0x5]
 09:35:44     INFO -       r4 = 0x4c7a3f78    r5 = 0x62bff490    r6 = 0x00000000    r7 = 0x5347cee4
 09:35:44     INFO -       r8 = 0x62bff4a8    r9 = 0x5347cedc   r10 = 0x2a295de0    fp = 0x62bff4bc
 09:35:44     INFO -       sp = 0x62bff478    pc = 0x5cb4f88f
 09:35:44     INFO -      Found by: call frame info
 09:35:44     INFO -   4  libxul.so!mozilla::jni::detail::NativeStubImpl<mozilla::java::LayerView::Compositor::SyncInvalidateAndScheduleComposite_t, nsWindow::LayerViewSupport, mozilla::jni::Args<>, false, true>::Wrap<&nsWindow::LayerViewSupport::SyncInvalidateAndScheduleComposite> [Natives.h:cfc312760e8c : 412 + 0x5]
 09:35:44     INFO -       r4 = 0x4c7a3f78    r5 = 0x2a295dd0    r6 = 0x00000000    r7 = 0x5347cee4
 09:35:44     INFO -       r8 = 0x62bff4a8    r9 = 0x5347cedc   r10 = 0x2a295de0    fp = 0x62bff4bc
 09:35:44     INFO -       sp = 0x62bff490    pc = 0x5cb4f979
 09:35:44     INFO -      Found by: call frame info

stack not copied completely, see the log.
Summary: nsWindow::LayerViewSupport's weak reference used on multiple threads → nsWindow::GeckoViewSupport's weak reference used on multiple threads
Blocks: 956338
Jim, you originally wrote this class (bug 1197957 and bug 1227706).  

The checks introduced in bug 956338 makes sure that the weak proxy object is always used only on a single thread.  It means, (1) taking weak references + (2) dereference of them + (3) destruction of the referee are all made on the same thread.

The crash shows that the weak reference is dereferenced on a different thread than it has been created on.  It shows potentially (likely) wrong usage of weak referencing.

The questions are: 
- are we dereferencing the weak ref only on a single thread where also the referee can be destroyed?
- can all the (1+2+3) operations mentioned above be protected by some single umbrella mutex?
- can we migrate away from using weak reference here and use a strong ref and some 'release the ref' notification on referrers?
- some other solution?

It's likely this is a highly critical bug.
Flags: needinfo?(nchen)
Flags: needinfo?(mark.finkle)
Passing to Sebastian. I am no longer working on Firefox.
Flags: needinfo?(mark.finkle) → needinfo?(s.kaspari)
tracking-fennec: --- → ?
OS: Unspecified → Android
Priority: -- → P1
Hardware: Unspecified → All
Everything in GeckoViewSupport happens on the Gecko thread, so it's most likely nsWindow::LayerViewSupport's WeakPtr.

(In reply to Honza Bambas (:mayhemer) from comment #1)
> 
> The questions are: 
> - are we dereferencing the weak ref only on a single thread where also the
> referee can be destroyed?

Unfortunately no, we have two threads involved here.

> - can all the (1+2+3) operations mentioned above be protected by some single
> umbrella mutex?

Possibly, but seems like that would still violate bug 956338 right?

> - can we migrate away from using weak reference here and use a strong ref
> and some 'release the ref' notification on referrers?

I think so. This is probably the better approach.

> - some other solution?
> 
> It's likely this is a highly critical bug.

These are long-lived objects (typically close to application lifetime), so I don't think it's quite expoitable, beyond possible startup/shutdown crashes. I do agree we should fix it though.
Flags: needinfo?(nchen)
Summary: nsWindow::GeckoViewSupport's weak reference used on multiple threads → nsWindow::LayerViewSupport's weak reference used on multiple threads
(In reply to Jim Chen [:jchen] [:darchons] from comment #3)
> Everything in GeckoViewSupport happens on the Gecko thread, so it's most
> likely nsWindow::LayerViewSupport's WeakPtr.
> 
> (In reply to Honza Bambas (:mayhemer) from comment #1)
> > 
> > The questions are: 
> > - are we dereferencing the weak ref only on a single thread where also the
> > referee can be destroyed?
> 
> Unfortunately no, we have two threads involved here.
> 
> > - can all the (1+2+3) operations mentioned above be protected by some single
> > umbrella mutex?
> 
> Possibly, but seems like that would still violate bug 956338 right?

We can try to implement something like ProtectedWeakReference that will keep reference to the mutex and assert it's held during all the operations, if the mutex can live longer than the weakref proxy.  Or we can just have an implementation that won't do the checks and just assume the correct usage.

> 
> > - can we migrate away from using weak reference here and use a strong ref
> > and some 'release the ref' notification on referrers?
> 
> I think so. This is probably the better approach.

Cool, who is going to implement that?  I have no idea at all how this works, so I'm just a last resort candidate.

> 
> > - some other solution?
> > 
> > It's likely this is a highly critical bug.
> 
> These are long-lived objects (typically close to application lifetime), so I
> don't think it's quite expoitable, beyond possible startup/shutdown crashes.
> I do agree we should fix it though.

Good to here.
Group: core-security → layout-core-security
I'll work on it.
Assignee: nobody → nchen
Status: NEW → ASSIGNED
(In reply to Jim Chen [:jchen] [:darchons] from comment #5)
> I'll work on it.

Thanks!
Add the NativePtr class for referencing a native object, and the
WindowPtr class for referencing back to the nsWindow inside a native
object class. NativePtr automatically clears its reference when the
native object is destroyed. WindowPtr automatically clears its
reference when its nsWindow is detached from the native object.
Attachment #8780665 - Flags: review?(snorp)
Use the new NativePtr and WindowPtr classes for NPZCSupport to replace
similar functionality in the exiting class.
Attachment #8780666 - Flags: review?(snorp)
Switch LayerViewSupport from using WeakPtr to using NativePtr/WindowPtr
in order to provide better thread safety. Also make some minor
improvements to code in LayerViewSupport.
Attachment #8780667 - Flags: review?(snorp)
Flags: needinfo?(s.kaspari)
Attachment #8780665 - Flags: review?(snorp) → review+
Attachment #8780666 - Flags: review?(snorp) → review+
Comment on attachment 8780667 [details] [diff] [review]
Use NativePtr and WindowPtr for LayerViewSupport (v1)

Review of attachment 8780667 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/android/nsWindow.cpp
@@ +1147,5 @@
> +        MOZ_ASSERT(AndroidBridge::IsJavaUiThread());
> +
> +        RefPtr<CompositorBridgeParent> bridge;
> +
> +        if (LockedWindowPtr window{mWindow}) {

This syntax is nice but looks so weird.
Attachment #8780667 - Flags: review?(snorp) → review+
(the patches don't apply to today's m-c)
Rebased patches.
Attachment #8781651 - Flags: review+
Attachment #8780665 - Attachment is obsolete: true
Attachment #8780666 - Attachment is obsolete: true
Attachment #8780667 - Attachment is obsolete: true
Interestingly, the problem disappeared with some recent changes: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fc9b997ac90bf87a80c80c9d8290ffb40d718894 (having my weka assertion patch + all other fixes except this bug's patch)

Patches in this big are still worth landing anyway.  Thanks for such a quick reaction!
(In reply to Honza Bambas (:mayhemer) from comment #15)
> Interestingly, the problem disappeared with some recent changes:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=fc9b997ac90bf87a80c80c9d8290ffb40d718894 (having my
> weka assertion patch + all other fixes except this bug's patch)
> 
> Patches in this big are still worth landing anyway.  Thanks for such a quick
> reaction!

Just for curiosity, I rather repushed to try with a slight loose in #ifdefs of the thread checking code (no !MOZ_PROFILING)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=11c3b4364874
tracking-fennec: ? → 51+
(In reply to Honza Bambas (:mayhemer) from comment #16)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=11c3b4364874

OK, problem detected.  Why does the Android 4.3 API15+ opt tests declare MOZ_PROFILING?
(In reply to Honza Bambas (:mayhemer) from comment #17)
> (In reply to Honza Bambas (:mayhemer) from comment #16)
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=11c3b4364874
> 
> OK, problem detected.  Why does the Android 4.3 API15+ opt tests declare
> MOZ_PROFILING?

We enable it in Nightly builds to not strip away debug info, so that profiler unwinding works.
Updated patch that adds locking on the nsWindow side.
Attachment #8782528 - Flags: review+
Attachment #8781651 - Attachment is obsolete: true
Updated patch that fixes a possible crash when reattaching GeckoView.
Attachment #8782529 - Flags: review+
Attachment #8781652 - Attachment is obsolete: true
Updated patch that adds more locking when getting a surface or when drawing.
Attachment #8782530 - Flags: review+
Attachment #8781653 - Attachment is obsolete: true
(In reply to Jim Chen [:jchen] [:darchons] from comment #18)
> (In reply to Honza Bambas (:mayhemer) from comment #17)
> > (In reply to Honza Bambas (:mayhemer) from comment #16)
> > > https://treeherder.mozilla.org/#/jobs?repo=try&revision=11c3b4364874
> > 
> > OK, problem detected.  Why does the Android 4.3 API15+ opt tests declare
> > MOZ_PROFILING?
> 
> We enable it in Nightly builds to not strip away debug info, so that
> profiler unwinding works.

OK, I can see that "Android 4.3 API15+ debug" suite + all tests is run on inbound, just not on "try -a".  That should be sufficient to give the coverage.
https://hg.mozilla.org/mozilla-central/rev/54069ada44c7
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Group: layout-core-security → core-security-release
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main51+]
Alias: CVE-2017-5392
Group: core-security-release
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: