Firefox 106 freeze with Proxmox web UI
Categories
(Core :: Disability Access APIs, defect, P1)
Tracking
()
People
(Reporter: yann.papouin, Assigned: Jamie)
References
(Regressed 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
dmeehan
:
approval-mozilla-beta+
pascalc
:
approval-mozilla-release+
|
Details | Review |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0
Steps to reproduce:
- Created a new empty profile
- Going into my self-hosted Proxmox instance (6.2-4) after entering login/password
(Also tried with another instance (7.1-6) on my worksite and I got the same issue)
Actual results:
- Firefox UI freeze and entire process does not respond
- Need to kill the firefox.exe process
Expected results:
- The UI should load normally
- It works perfectly on Firefox <= 105, so that's a regression
Comment 1•8 months ago
|
||
The Bugbug bot thinks this bug should belong to the 'Toolkit::Password Manager' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Reporter | ||
Updated•8 months ago
|
Reporter | ||
Updated•8 months ago
|
Comment 2•8 months ago
|
||
Yann.papoin, could you try to find a regression range? - https://mozilla.github.io/mozregression/ could hopefully find when the issue was introduced.
For the time being, moving this to Toolkit -> Startup and Profile so that our developers could take a closer look - if this is not the right component, please set a more appropriate one.
Reporter | ||
Comment 3•8 months ago
|
||
I made some test with mozregression but I'm not able to reproduce the UI hang with any nightly version with binaries fetched from this tool, so I started a « manual binary bissect » using nightly installer archives. (oh please make the auto-update an opt-in option for nightly).
This is my results, the bug was introduced in nightly build 2022-08-31-09-32-58 and is still there in latest build.
- https://archive.mozilla.org/pub/firefox/nightly/2022/08/2022-08-29-09-45-51-mozilla-central/firefox-106.0a1.en-US.win64.installer.exe
✅ pass - https://archive.mozilla.org/pub/firefox/nightly/2022/08/2022-08-29-21-45-26-mozilla-central/firefox-106.0a1.en-US.win64.installer.exe
✅ pass - https://archive.mozilla.org/pub/firefox/nightly/2022/08/2022-08-30-09-27-50-mozilla-central/firefox-106.0a1.en-US.win64.installer.exe
✅ pass - https://archive.mozilla.org/pub/firefox/nightly/2022/08/2022-08-30-21-04-05-mozilla-central/firefox-106.0a1.en-US.win64.installer.exe
✅ pass - https://archive.mozilla.org/pub/firefox/nightly/2022/08/2022-08-31-09-32-58-mozilla-central/firefox-106.0a1.en-US.win64.installer.exe
❌ fail - https://archive.mozilla.org/pub/firefox/nightly/2022/08/2022-08-31-21-54-47-mozilla-central/firefox-106.0a1.en-US.win64.installer.exe
❌ fail - https://archive.mozilla.org/pub/firefox/nightly/2022/09/2022-09-01-09-54-52-mozilla-central/firefox-106.0a1.en-US.win64.installer.exe
❌ fail - https://archive.mozilla.org/pub/firefox/nightly/2022/09/2022-09-02-09-30-09-mozilla-central/firefox-106.0a1.en-US.win64.installer.exe
❌ fail - https://archive.mozilla.org/pub/firefox/nightly/2022/09/2022-09-13-21-30-20-mozilla-central/firefox-106.0a1.en-US.win64.installer.exe
❌ fail - https://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central-l10n/firefox-107.0a1.fr.win64.installer.exe
❌ fail
Updated•8 months ago
|
Comment 4•8 months ago
|
||
Based on the user's investigations this is a regression introduced in Nightly 106.0a1:
Last good: https://hg.mozilla.org/mozilla-central/rev/ecb328de1aafc36765b3bbf7f434ef84d93cad28
First bad: https://hg.mozilla.org/mozilla-central/rev/11e997d3cf78eb6a4f31a1e13a2509f4181f4b0a
Updated•8 months ago
|
Comment 5•8 months ago
|
||
Yann, if you're still able to reproduce this, would you be willing to create and attach a dump file of the hanging Firefox process? You can create a dump file by right-clicking on the hanging process in Task Manager and then selecting "Create dump file". If you don't feel comfortable attaching this dump file to the bug, you can email me it instead. This should allow us to determine what is going on with the hanging process.
Reporter | ||
Comment 6•8 months ago
|
||
Following is a link to an archive of the dump "firefox-bug-1793748.7z"
https://drive.google.com/file/d/10tlZ_R7qvbC7GPu_nlIQdczOQ1kKIihE/view?usp=sharing
Comment 7•8 months ago
|
||
(In reply to Yann Papouin from comment #6)
Following is a link to an archive of the dump "firefox-bug-1793748.7z"
https://drive.google.com/file/d/10tlZ_R7qvbC7GPu_nlIQdczOQ1kKIihE/view?usp=sharing
Thanks for the dump! Somehow Visual Studio couldn't extract a correct stack from it, but someone helped me get a stack using WinDbg, here is the stack where the main thread is blocked:
00 000000ec`62ffe090 00007ffa`afdb9cae mozglue!je_free+0x149 [/builds/worker/checkouts/gecko/memory/build/malloc_decls.h @ 54]
01 (Inline Function) --------`-------- xul!operator delete+0x2 [/builds/worker/workspace/obj-build/dist/include/mozilla/cxxalloc.h @ 51]
02 000000ec`62ffe0d0 00007ffa`afde07d7 xul!mozilla::a11y::AccIterator::Next+0x8e [/builds/worker/checkouts/gecko/accessible/base/AccIterator.cpp @ 38]
03 000000ec`62ffe120 00007ffa`afde076d xul!mozilla::a11y::TableAccessible::RowAt+0x47 [/builds/worker/checkouts/gecko/accessible/generic/TableAccessible.cpp @ 266]
04 000000ec`62ffe190 00007ffa`afdfe3af xul!mozilla::a11y::LocalAccessible::NativeAttributes+0xa0d [/builds/worker/checkouts/gecko/accessible/generic/ARIAGridAccessible.cpp @ 80]
05 000000ec`62ffe1c0 00007ffa`afdfe54e xul!mozilla::a11y::TableCellAccessible::PrevColHeader+0x10f [/builds/worker/checkouts/gecko/accessible/generic/TableCellAccessible.cpp @ 59]
06 000000ec`62ffe240 00007ffa`afdac1a3 xul!mozilla::a11y::TableCellAccessible::ColHeaderCells+0x1e [/builds/worker/checkouts/gecko/accessible/generic/TableCellAccessible.cpp @ 110]
07 000000ec`62ffe290 00007ffa`afe128de xul!mozilla::a11y::ia2AccessibleTableCell::get_columnHeaderCells+0xb3 [/builds/worker/checkouts/gecko/accessible/windows/ia2/ia2AccessibleTableCell.cpp @ 78]
08 000000ec`62ffe340 00007ffa`afe147ad xul!mozilla::a11y::HandlerProvider::BuildDynamicIA2Data+0x4de [/builds/worker/checkouts/gecko/accessible/ipc/win/HandlerProvider.cpp @ 394]
09 (Inline Function) --------`-------- xul!mozilla::detail::RunnableMethodArguments<_AccChildData **,unsigned long *,long *>::applyImpl+0x19 [/builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h @ 1147]
0a (Inline Function) --------`-------- xul!mozilla::detail::RunnableMethodArguments<_AccChildData **,unsigned long *,long *>::apply+0x19 [/builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h @ 1153]
0b 000000ec`62ffe510 00007ffa`adee0411 xul!mozilla::detail::RunnableMethodImpl<mozilla::a11y::HandlerProvider *,void (mozilla::a11y::HandlerProvider::*)(_AccChildData **, unsigned long *, long *),0,mozilla::RunnableKind::Standard,_AccChildData **,unsigned long *,long *>::Run+0x2d [/builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h @ 1203]
0c (Inline Function) --------`-------- xul!`anonymous namespace'::SyncRunnable::APCRun+0x28 [/builds/worker/checkouts/gecko/ipc/mscom/MainThreadInvoker.cpp @ 60]
0d 000000ec`62ffe540 00007ffb`4abef178 xul!mozilla::mscom::MainThreadInvoker::MainThreadAPC+0x71 [/builds/worker/checkouts/gecko/ipc/mscom/MainThreadInvoker.cpp @ 172]
0e 000000ec`62ffe5e0 00007ffb`4ac10a6e ntdll!RtlDispatchAPC+0x68
0f 000000ec`62ffe660 00007ffb`4ac10534 ntdll!KiUserApcDispatch+0x2e
10 000000ec`62ffeb98 00007ffa`adee4897 ntdll!NtTestAlert+0x14
11 000000ec`62ffeba0 00007ffa`ac20e76e xul!`anonymous namespace'::SyncRunnable::Run+0x17 [/builds/worker/checkouts/gecko/ipc/mscom/MainThreadInvoker.cpp @ 53]
12 000000ec`62ffebd0 00007ffa`ad449f27 xul!mozilla::SchedulerGroup::Runnable::Run+0x3e [/builds/worker/checkouts/gecko/xpcom/threads/SchedulerGroup.cpp @ 140]
13 (Inline Function) --------`-------- xul!mozilla::RunnableTask::Run+0x11 [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 538]
14 000000ec`62ffec10 00007ffa`ad449203 xul!mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal+0x847 [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 851]
15 (Inline Function) --------`-------- xul!mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal+0xb [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 683]
16 000000ec`62ffefe0 00007ffa`ad1aec6c xul!mozilla::TaskController::ProcessPendingMTTask+0x63 [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 461]
17 (Inline Function) --------`-------- xul!mozilla::TaskController::InitializeInternal::<lambda_2>::operator()+0xc [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 190]
18 (Inline Function) --------`-------- xul!mozilla::detail::RunnableFunction<`lambda at /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:190:7'>::Run+0xc [/builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h @ 531]
19 000000ec`62fff0b0 00007ffa`ad47269b xul!nsThread::ProcessNextEvent+0x1e6c [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 1205]
1a (Inline Function) --------`-------- xul!NS_ProcessNextEvent+0x29 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp @ 465]
1b 000000ec`62fff440 00007ffa`ac3a6aff xul!mozilla::ipc::MessagePump::Run+0x1cb [/builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp @ 107]
1c (Inline Function) --------`-------- xul!MessageLoop::RunInternal+0x16 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 381]
1d 000000ec`62fff4f0 00007ffa`ab9ef5ee xul!MessageLoop::RunHandler+0x2f [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 375]
1e 000000ec`62fff540 00007ffa`abb4bea8 xul!MessageLoop::Run+0x4e [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 357]
1f 000000ec`62fff5a0 00007ffa`abb4ad57 xul!nsBaseAppShell::Run+0x28 [/builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp @ 152]
20 000000ec`62fff5e0 00007ffa`accd8e7b xul!nsAppShell::Run+0x37 [/builds/worker/checkouts/gecko/widget/windows/nsAppShell.cpp @ 614]
21 000000ec`62fff760 00007ffa`ac3a6aff xul!XRE_RunAppShell+0x4b [/builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp @ 880]
22 (Inline Function) --------`-------- xul!MessageLoop::RunInternal+0x16 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 381]
23 000000ec`62fff7a0 00007ffa`ab9ef5ee xul!MessageLoop::RunHandler+0x2f [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 375]
24 000000ec`62fff7f0 00007ffa`accd87b2 xul!MessageLoop::Run+0x4e [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 357]
25 000000ec`62fff850 00007ff6`92f6ef96 xul!XRE_InitChildProcess+0x602 [/builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp @ 743]
26 (Inline Function) --------`-------- firefox!content_process_main+0x73 [/builds/worker/checkouts/gecko/browser/app/../../ipc/contentproc/plugin-container.cpp @ 57]
27 (Inline Function) --------`-------- firefox!NS_internal_main+0x2fd [/builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp @ 359]
28 000000ec`62fffaf0 00007ff6`92f7d248 firefox!wmain+0x5b6 [/builds/worker/checkouts/gecko/toolkit/xre/nsWindowsWMain.cpp @ 167]
29 (Inline Function) --------`-------- firefox!invoke_main+0x22 [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90]
2a 000000ec`62fffe30 00007ffb`4a527034 firefox!__scrt_common_main_seh+0x10c [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
2b 000000ec`62fffe70 00007ffb`4abc2651 kernel32!BaseThreadInitThunk+0x14
2c 000000ec`62fffea0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
The main thread is blocked on some accessibility code.
Assignee | ||
Comment 8•8 months ago
|
||
[Tracking Requested - why for this release]: Potential hang of the entire browser when an accessibility client is being used.
Distilled test case:
data:text/html,<div role="treegrid"><div role="blockquote" tabindex="0"><div role="row"><div role="gridcell">a
Comment 9•8 months ago
|
||
Set release status flags based on info from the regressing bug 1726124
Assignee | ||
Comment 10•8 months ago
|
||
Here's what's going on here:
- There's an Accessible between the table and the row which is neither a row group nor a generic text container.
- Before bug 1726124 part 1, nsAccUtils::TableFor would have returned null for this cell. Now, it returns the table.
- TableCellAccessible::PrevColHeader calls Table(), which calls TableFor(), which returns a table (instead of null), so we don't fail early like we did previously.
- PrevColHeader calls RowIdx(), which calls RowIndexFor(), which uses Filters::GetRow. That skips subtrees which aren't row groups nor generic text containers. That means we don't find a row.
- RowIndexFor() returns -1. But RowIdx() returns an unsigned int! So now we have a very large number in rowIdx.
- The loop searching for previous rows continues for billions of iterations:
for (uint32_t curRowIdx = rowIdx - 1; curRowIdx < rowIdx; curRowIdx--) {
I suspect this loop isn't truly infinite - it should eventually stop when we underflow curRowIdx - but that's going to take a looong time.
This is a bit tricky to fix:
- I tried making nsAccUtils::TableFor return null if it encounters something which isn't a row group nor a generic text container. That is closer to its previous behaviour without breaking its new behaviour. That causes a crash when a document containing a CachedTableAccessible shuts down because it calls Role() and we crash trying to access DOM data structures which I guess must be dying.
- RowIdx() really shouldn't be returning -1 in an unsigned int. However, making RowIdx() signed is a huge change across the code and seems rather overkill just to avoid this silliness. The size/risk of the code change is particularly relevant here because we'll definitely want to get this uplifted to 107.
I think I'm going to try to fix this in PrevColHeader somehow.
Assignee | ||
Comment 11•8 months ago
|
||
Previously, we were inconsistent in what we treated as a table row in various places.
This caused breakage after bug 1726124, since nsAccUtils::TableFor() started returning a table when an ARIA row had an invalid parent, but Filters::GetRow wouldn't find such a row
This was causing broken expectations (and consequently, an extremely long loop) in TableCellAccessible::PrevColHeader.
To fix this, avoid creating an ARIARowAccessible (and thus ARIAGridCellAccessibles) if the row's parent isn't valid.
This way, there aren't conflicting expectations because the TableCellAccessible code can never run.
Furthermore, clients don't get a broken table cell interface on invalid cells, which was a problem even before bug 1726124.
Reporter | ||
Comment 12•8 months ago
|
||
Thank you for your investigation and fixes.
As a separate issue, do you have an idea why I'm not able to reproduce this freeze with builds fetched by the mozregression tool ?
Updated•8 months ago
|
Comment 13•8 months ago
|
||
Pushed by jteh@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b25d099d04b1 Don't treat a role="row" as a table row if it has an invalid parent. r=morgan
Assignee | ||
Comment 14•8 months ago
|
||
(In reply to Yann Papouin from comment #12)
do you have an idea why I'm not able to reproduce this freeze with builds fetched by the mozregression tool ?
You have some accessibility client on your system. I'm not sure what; it might not be a traditional accessibility tool, but could be something else that's querying accessibility for information. Whatever it is, it is scanning the accessibility tree looking for something. We know apps like Lenovo Privacy Guard do this.
When Firefox is installed, we register an additional component with the system (AccessibleHandler.dll) which improves accessibility performance by sending some additional information when an accessibility node is retrieved. One of those pieces of information is table column headers, which is what causes the freeze due to this bug. So, even though your accessibility client isn't asking for that, we send it when AccessibleHandler is available, thus triggering the freeze.
Ideally, AccessibleHandler would always be used, as it significantly improves performance. Unfortunately, it has to be registered in the Windows registry due to Windows limitations. Since mozregression doesn't "install" Firefox - it just extracts and runs - AccessibleHandler doesn't get registered. This is normally a disadvantage, but in your particular case (due to this bug), it is an advantage.
Comment 15•8 months ago
|
||
(In reply to James Teh [:Jamie] from comment #14)
Since mozregression doesn't "install" Firefox - it just extracts and runs - AccessibleHandler doesn't get registered.
Thanks a lot for the detailed explanation! I was really puzzled about why mozregression wasn't helpful here.
Comment 16•8 months ago
|
||
bugherder |
Assignee | ||
Comment 17•8 months ago
|
||
Yann, Would you mind testing with the latest Firefox Nightly? While I'm fairly certain I figured out what was going on from the dump you provided (thank you for that), I can't be entirely certain because I don't have a Proxmox instance to test with.
Reporter | ||
Comment 18•8 months ago
|
||
I made a test with Proxmox and indeed the issue is fixed in latest nightly.
And thank you the accessibility dll registering explanation.
Comment 19•7 months ago
|
||
:jamie given this is an S2, would you submit a beta uplift request when ready?
Should we also try to include this in the 106 planned dot release?
Assignee | ||
Comment 20•7 months ago
|
||
Comment on attachment 9299190 [details]
Bug 1793748: Don't treat a role="row" as a table row if it has an invalid parent.
Beta/Release Uplift Approval Request
- User impact if declined: Browser freeze with accessibility enabled on some sites; e.g. on Proxmox web UI.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Changes the code to gracefully handle a particular authoring error which isn't valid and failed previously anyway.
- String changes made/needed:
- Is Android affected?: No
Comment 21•7 months ago
|
||
Comment on attachment 9299190 [details]
Bug 1793748: Don't treat a role="row" as a table row if it has an invalid parent.
Given that this is a P1/S2, that the fix was verified by the reporter and that this is estimated as low risk, I am taking it into our 106.0.2 dot release, thanks
Comment 22•7 months ago
|
||
bugherder uplift |
Comment 23•7 months ago
|
||
Comment on attachment 9299190 [details]
Bug 1793748: Don't treat a role="row" as a table row if it has an invalid parent.
Approved for 107.0b5.
Comment 24•7 months ago
|
||
bugherder uplift |
Updated•7 months ago
|
Description
•