Crash in [@ NS_CycleCollectorSuspect3 | nsCycleCollectingAutoRefCnt::incr<T>], inside of mozilla::a11y::nsAccUtils::GetSelectableContainer on Android (with an a11y service enabled)
Categories
(Core :: Disability Access APIs, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox107 | --- | unaffected |
firefox108 | + | fixed |
firefox109 | --- | fixed |
People
(Reporter: dholbert, Assigned: Jamie)
References
(Regression)
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(2 files, 1 obsolete file)
9.09 MB,
video/mp4
|
Details | |
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
|
Details | Review |
I'm hitting this crash reliably on Fenix nightly. STR coming up.
Crash report: https://crash-stats.mozilla.org/report/index/55666e6e-9c9e-4938-ad64-c34e40221122
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(data) (Cycle collected object used on a thread without a cycle collector.)
Top 10 frames of crashing thread:
0 libxul.so NS_CycleCollectorSuspect3 xpcom/base/nsCycleCollector.cpp:3797
1 libxul.so nsCycleCollectingAutoRefCnt::incr<&NS_CycleCollectorSuspect3> xpcom/base/nsISupportsImpl.h:248
1 libxul.so nsCycleCollectingAutoRefCnt::incr<&NS_CycleCollectorSuspect3> xpcom/base/nsISupportsImpl.h:234
1 libxul.so nsDocLoader::AddRef uriloader/base/nsDocLoader.cpp:193
1 libxul.so nsDocShell::AddRef docshell/base/nsDocShell.cpp:587
1 libxul.so {virtual override thunk} docshell/base/nsDocShell.cpp
2 libxul.so nsCOMPtr<nsIDocShell>::nsCOMPtr xpcom/base/nsCOMPtr.h:525
2 libxul.so nsCoreUtils::GetDocShellFor accessible/base/nsCoreUtils.cpp:330
3 libxul.so mozilla::a11y::DocAccessible::NativeRole const accessible/generic/DocAccessible.cpp:199
4 libxul.so mozilla::a11y::LocalAccessible::Role const accessible/generic/LocalAccessible-inl.h:26
Reporter | ||
Comment 1•1 year ago
•
|
||
STR:
- Have an app installed with an a11y service (BitWarden in my case, with BitWarden settings | Auto-fill services | Use Accessibility set to enabled)
- Start Firefox Nightly and go to https://crash-stats.mozilla.org/signature/?signature=mozilla%3A%3AFrameProperties%3A%3APropertyComparator%3A%3AEquals&date=%3E%3D2022-11-15T15%3A01%3A00.000Z&date=%3C2022-11-22T15%3A01%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1
- Click the "Reports" tab and click in the textbox that lets you pick which fields you'd like to use
- Type "comm" (or "a" probably or any character, really)
ACTUAL RESULTS: Firefox crashes
EXPECTED RESULTS: No crash
Reporter | ||
Comment 2•1 year ago
|
||
I can try to get a regression range later today.
Reporter | ||
Comment 3•1 year ago
|
||
Hmm, not sure I'll be able to get a regression range, actually. GVE doesn't seem to repro the bug, unfortunately. And mozregression only seems to support gve, not "real" Firefox Nightly for Android.
But I tested in Firefox Nightly on a separate device (a Pixel 4a, on a fresh Android 13 install, with a freshly installed / fresh-profile Firefox Nightly) and it reproduces reliably there.
Reporter | ||
Comment 4•1 year ago
|
||
FWIW the assertion was added in bug 1776243, in June of this year. Adding a dependency on that bug. But crash data suggests that this started being a problem more-recently than that; the first crash we have here was on Sept 15.
Comment 5•1 year ago
|
||
My patch just made it so we crashed with a nice message. Before that, we just crashed with a null deref in a way that was hard to understand. I think this is more likely a regression from recenty a11y Android work. I think I've seen some bugs on file but I don't have a bug number handy.
Reporter | ||
Comment 6•1 year ago
|
||
Our oldest crash from Sept 15 was bp-21cc0c5c-1e2c-4550-9c8c-50c130220915 and seems to have been built from http://hg.mozilla.org/mozilla-central/rev/e2ce8d3d4a4b01bf3ec52fb795c5f8c392c1476b (based on the source links in its backtrace).
Here's the pushlog for 3 days leading up to that commit, as a tentative regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2022-09-11+16%3A00%3A00&enddate=2022-09-14+16%3A00%3A00
(Probably it'd be something in there, since it looks like we've had a few crashes every day since this started happening, so it's likely that the guilty patch landed within the a day or so prior to the first crash report.)
Shot in the dark, looking at that range: maybe this was from bug 1788143, which was related to a11y and comboboxes? The actual component here isn't a combo-box, but it seems like it might be serving as one for a11y purposes; the entries in the autocomplete-dropdown haev role="option"
, e.g. they look like
<div class="select2-result-label" id="select2-result-label-860" role="option"><span class="select2-match"></span>abort message</div>
Nathan or Jamie, maybe you have ideas here.
Reporter | ||
Comment 7•1 year ago
|
||
(The only other a11y change I'm seeing in my tentative 3-day regression-range pushlog is bug 1788547, but that was just adding a null-check which seems unlikely to introduce a crash here.
I'm not seeing any Android or thread-related stuff there that feels like it could have a connection, but it's entirely possible I'm overlooking something.)
Reporter | ||
Comment 8•1 year ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #5)
I think this is more likely a regression from recenty a11y Android work.
Possibly related to bug 1765433 (and other cache-the-world a11y android work). That's what the hg-blame for the topmost a11y functions in the backtrace are pointing to, at least -- SessionAccessibility::GetNodeInfo, SessionAccessibility::PopulateNodeInfo. (Though that landed a while back, so there must've been some more recent triggering event as well.)
--> Adding eeejay to ni. Also actually-CC'ing Nathan who I meant to CC in comment 6. :)
Reporter | ||
Comment 9•1 year ago
|
||
Here's a screencast of the bug reproducing, in case it's useful. (starting out showing the options that I have enabled in Bitwarden.)
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 10•1 year ago
•
|
||
Side note: some of these crashes (e.g. the very first one bp-21cc0c5c-1e2c-4550-9c8c-50c130220915) are in SessionAccessibility::CachedPivot
, but my crash linked from comment 0 is in SessionAccessibility::GetNodeInfo
.
Bug 1765433 was about both sorts of methods, so I suspect it's all related. (That bug's title was Make GetNodeInfo and Pivot methods run safely in Android UI thread
)
Also, these crash reports are on the AndroidUI thread; the crash reports say "Crashing Thread (0), Name: AndroidUI". So this does indeed feel pretty closely related to bug 1765433, despite the time delay; and that means hopefully eeejay is the best person to point at this. So I'll clear my earlier ni=jamie to avoid needinfo inflation. :)
Assignee | ||
Updated•1 year ago
|
Reporter | ||
Comment 11•1 year ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #3)
Hmm, not sure I'll be able to get a regression range, actually. GVE doesn't seem to repro the bug, unfortunately. And mozregression only seems to support gve, not "real" Firefox Nightly for Android.
Turns out Fenix nightlies are available for manual download at https://archive.mozilla.org/pub/fenix/nightly/2022/ (that's the best we've got for Fenix bisecting). Using those with my STR in comment 1, I got this range:
Last good:
https://archive.mozilla.org/pub/fenix/nightly/2022/11/2022-11-10-05-01-35-fenix-108.0b1-android-arm64-v8a/
First bad:
https://archive.mozilla.org/pub/fenix/nightly/2022/11/2022-11-10-17-01-19-fenix-108.0b1-android-arm64-v8a/
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8baf918fe8e882b3826ddaef4b204f4c1ec63563&tochange=c49bf65cd85aa4925e2851f97abce61b071cf704
(This is much more recent than when we started getting crash data, so I suspect this pushlog contains a commit that just mad this bug easier to repro, or possible to repro using my particular scenario in commnet 1 at least.)
Assignee | ||
Comment 12•1 year ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #10)
Side note: some of these crashes (e.g. the very first one bp-21cc0c5c-1e2c-4550-9c8c-50c130220915) are in
SessionAccessibility::CachedPivot
, but my crash linked from comment 0 is inSessionAccessibility::GetNodeInfo
.
Your crash linked in comment 0 is caused by bug 1796733, so I'm using this bug to deal with that. There are various other bugs dealing with other instances of this crash. I don't think we have a bug on file for the CachedPivot case, though.
Bug 1765433 was about both sorts of methods, so I suspect it's all related.
It's all related, yes, but the fixes are unfortunately all very different.
Reporter | ||
Comment 13•1 year ago
•
|
||
Gotcha. Good news: my fenix nightly manual-download-and-test regression range (in comment 11) agrees with you about bug 1796733 being the culprit here.
Thanks for taking a look!
Comment 14•1 year ago
|
||
Set release status flags based on info from the regressing bug 1796733
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Comment 16•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on beta
For more information, please visit auto_nag documentation.
Updated•1 year ago
|
Assignee | ||
Comment 17•1 year ago
|
||
The fix is trivial. Getting an Android build environment and emulator set up and working and then writing a test to verify the fix is... very non-trivial.
Assignee | ||
Comment 18•1 year ago
|
||
Trying to access a local OuterDocAccessible from the Android UI thread was causing a crash.
We shouldn't be crossing document boundaries anyway.
Comment 19•1 year ago
|
||
This avoids situations in Android where we get parent process local
accessibles off the main thread.
Comment 20•1 year ago
|
||
Lol. I didn't realize this was in my review queue before I wrote the exact same fix..
Updated•1 year ago
|
Comment 21•1 year ago
|
||
Pushed by jteh@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/57434a88c266 Don't cross document boundaries in nsAccUtils::GetSelectableContainer. r=eeejay,geckoview-reviewers,owlish
Comment 22•1 year ago
|
||
Once this lands in mozilla-central, did you want to nominate this for uplift for fx108?
Comment 23•1 year ago
|
||
bugherder |
Assignee | ||
Comment 24•1 year ago
|
||
Comment on attachment 9305238 [details]
Bug 1801879: Don't cross document boundaries in nsAccUtils::GetSelectableContainer.
Beta/Release Uplift Approval Request
- User impact if declined: Crashes on Android.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Clear logic to restrict a tree walk covered by automated tests.
- String changes made/needed:
- Is Android affected?: Yes
Comment 25•1 year ago
|
||
Comment on attachment 9305238 [details]
Bug 1801879: Don't cross document boundaries in nsAccUtils::GetSelectableContainer.
Approved for 108.0b9
Comment 26•1 year ago
|
||
bugherder uplift |
Description
•