Closed Bug 1771934 Opened 2 years ago Closed 2 years ago

Crash in [@ mozilla::a11y::AccessibleWrap::GetVirtualViewID]

Categories

(Core :: Disability Access APIs, defect)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
104 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox101 --- unaffected
firefox102 + disabled
firefox103 + disabled
firefox104 + fixed

People

(Reporter: mccr8, Assigned: Jamie)

References

Details

(4 keywords, Whiteboard: [ctw-m1])

Crash Data

Attachments

(2 files, 2 obsolete files)

Crash report: https://crash-stats.mozilla.org/report/index/253aa567-8a43-45e2-93b9-1cccc0220531

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0 libxul.so mozilla::a11y::AccessibleWrap::GetVirtualViewID accessible/android/AccessibleWrap.cpp:565
1 libxul.so mozilla::a11y::SessionAccessibility::PopulateNodeInfo accessible/android/SessionAccessibility.cpp:932
2 libxul.so void mozilla::jni::NativeStub<mozilla::java::SessionAccessibility::NativeProvider::GetNodeInfo_t, mozilla::a11y::SessionAccessibility, mozilla::jni::Args<int, mozilla::jni::Ref<mozilla::jni::Object, _jobject*> const&> >::Wrap<&mozilla::a11y::SessionAccessibility widget/android/jni/Natives.h:1447
3 base.odex base.odex@0x00000000001dc890 
4 base.odex base.odex@0x00000000001dc890 
5 None @0x0000007a904b54f4 
6 None @0x0000007a9065f158 
7 None @0x0000007a90659820 
8 None @0x0000007a908cf9fc 
9 None @0x0000007a908da794 

Another high volume Fenix Nightly a11y regression. The fun twist here is that these crashes are all on poison values, indicating that this is some kind of use after free.

[Tracking Requested - why for this release]: reasonably high volume sec-high crash

Eitan, it looks like you wrote some patches for the files at the top of the crash stack recently, so maybe you could take a look. Thanks.

Flags: needinfo?(eitan)

There is also a spike in UAF crashes with the signature [@ mozilla::jni::NativeStub<T>::Wrap<T> ], which is maybe the same issue?

bp-4ebaa8a0-2459-4d7d-872a-507ba0220526

Whiteboard: [ctw-m1]

A UAF suggests there's somewhere we're not acquiring the Android lock. I can only think of two places:

  1. DocAccessibleParent::ActorDestroy. We already acquire the lock in BrowserParent::DestroyInternal (which destroys the DocAccessibleParent) and DocAccessibleParent::RecvShutdown. If the remote process crashes, we might not get RecvShutdown, but BrowserParent::DestroyInternal should still get called, so we should be fine. Nevertheless, I wonder whether it makes sense to acquire the lock in ActorDestroy to be safe.
  2. We don't acquire the lock in DocAccessibleParent::RecvBindChildDoc. That is rarely called (if ever) and it doesn't mutate the tree in a way that should really cause UAFs. Nevertheless, it does mutate the tree, so it should probably acquire the lock.

Another place we should probably acquire the Android lock (but don't currently) is BrowserBridgeParent::RecvSetEmbedderAccessible.

Assignee: nobody → jteh
Status: NEW → ASSIGNED

Ah. We can't acquire the lock in ActorDestroy because when RecvShutdown (which acquires the lock) calls Send__delete__, IPDL ends up calling ActorDestroy, so we crash trying to acquire the lock twice.

Edit: But we should be able to safely acquire it if mShutdown is false.

jesup, you might find this bug interesting. It looks like an example of how your Clang thread safety static analysis might be useful on Android.

Flags: needinfo?(rjesup)

Yeah, it looks like the accessible's subtree is destroyed while PopulateNodeInfo is running. I'm not sure how ActorDestroy is getting called in that case, assuming just the auto-generated protocol managers destroying their actors? In any case the patch looks sound, and worth trying. We should see the crashes go away if it works.

Flags: needinfo?(eitan)

Comment on attachment 9279076 [details]
Bug 1771934: Acquire the Android a11y lock in some additional places.

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Uncertain. The fact that it deals with locks might hint at a thread safety issue.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?:
  • If not all supported branches, which bug introduced the flaw?: Bug 1758540
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?:
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely to cause regressions, but note that the pref that exposes this will soon be enabled only on Nightly.
  • Is Android affected?: Yes
Attachment #9279076 - Flags: sec-approval?

Comment on attachment 9279076 [details]
Bug 1771934: Acquire the Android a11y lock in some additional places.

Approved to land and request uplift

Attachment #9279076 - Flags: sec-approval? → sec-approval+
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 103 Branch

Please request Beta approval on this when you get a chance.

Flags: needinfo?(jteh)

Comment on attachment 9279076 [details]
Bug 1771934: Acquire the Android a11y lock in some additional places.

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Simply acquires an Android accessibility specific lock in a few extra places.
  • String changes made/needed:
  • Is Android affected?: Yes
Flags: needinfo?(jteh)
Attachment #9279076 - Flags: approval-mozilla-beta?

Comment on attachment 9279076 [details]
Bug 1771934: Acquire the Android a11y lock in some additional places.

Approved for 102 beta 6, thanks.

Attachment #9279076 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Still crashing here, so we didn't actually fix this. :( The good news is that I think I know what might really be going on now.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #9281115 - Attachment is obsolete: true

Blah, nope, premature hope.

We do know that this is crashing while walking the children of a queried Accessible, not while getting the Accessible itself being queried. We've also realised that mChildren doesn't get cleared in DocAccessibleParent::Destroy. While that would certainly mean that a caller walking the children would touch destroyed Accessibles, a caller shouldn't be touching this DocAccessibleParent in the first place because Destroy calls ProxyDestroyed, which should unregister it from the id map.

It probably still makes sense to call mChildren.Clear() in DocAccessibleParent::Destroy anyway, especially since DocAccessibleParent is ref counted. However, this fix alone isn't sufficient.

We're wondering whether the PresShell died before the DocAccessibleParent, which would make it impossible to unregister the DocAccessibleParent. Clearing the id map when SessionAccessibility is detached might fix this. I guess that depends whether the PresShell is destroyed before or after SessionAccessibility is detached, though.

(In reply to James Teh [:Jamie] from comment #21)

We're wondering whether the PresShell died before the DocAccessibleParent, which would make it impossible to unregister the DocAccessibleParent. Clearing the id map when SessionAccessibility is detached might fix this. I guess that depends whether the PresShell is destroyed before or after SessionAccessibility is detached, though.

I think I found where that happens, but not 100% sure. In AppWindow::Destroy() we first shut down the DocShell which presumably destroys the doc's PresShell, and then we detach the BrowsingContext which I assume ultimately destroys the DocAccessibleParent. Since we don't have a PresShell in the doc accessible parent's shutdown phase, we can't unregister the remote accessibles because we use the owning doc's PresShell to retrieve the root widget.

It won't be enough to clear the registry upon detaching SessionAccessibility because that happens much later.

I think we need to add some fail safes to SessionAccessibility::GetAccessibleByID. Return null if

  1. Owning GeckoViewSupport is detached. or...
  2. GeckoViewSupport doesn't have an associated nsWindow (it is closed) or..
  3. nsWindow does not have an associated DOM document (GetDocument() is null).

Comment on attachment 9281267 [details]
Bug 1771934 - Return null accessible when SessionAcc is not fully attached. r?Jamie

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Uncertain. This seems like an intermittent shutdown issue.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Unknown
  • Which older supported branches are affected by this flaw?:
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: This should apply fine to beta. Unnecessary since we disabled caching in beta.
  • How likely is this patch to cause regressions; how much testing does it need?:
  • Is Android affected?: Yes
Attachment #9281267 - Flags: sec-approval?

Comment on attachment 9281267 [details]
Bug 1771934 - Return null accessible when SessionAcc is not fully attached. r?Jamie

Approved to land and, if needed, uplift

Attachment #9281267 - Flags: sec-approval? → sec-approval+
Attachment #9281267 - Attachment is obsolete: true

The associated PresShell of the root doc can be used for retrieving the
SessionAccessibility. If the PresShell is about to go away, we should
unregister all the accessibles.

Comment on attachment 9283524 [details]
Bug 1771934 - Unregister all SessionAccessibility accessibles when top doc is shut down. r?Jamie

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: I think it would be pretty difficult, especially as the patch only affects shutdown.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: None. The relevant area of the code is behind a disabled pref (only available in about:config) in other branches which contain it.
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?:
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely. It only affects shutdown and our tests should catch obvious regressions here.
  • Is Android affected?: Yes
Attachment #9283524 - Flags: sec-approval?
Flags: needinfo?(rjesup)

Comment on attachment 9283524 [details]
Bug 1771934 - Unregister all SessionAccessibility accessibles when top doc is shut down. r?Jamie

Approved to land and uplift if needed.

Attachment #9283524 - Flags: sec-approval? → sec-approval+
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: 103 Branch → 104 Branch
Regressions: 1778585
QA Whiteboard: [post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: