Closed Bug 1793748 Opened 2 months ago Closed 2 months ago

Firefox 106 freeze with Proxmox web UI

Categories

(Core :: Disability Access APIs, defect, P1)

Firefox 106
Desktop
Windows
defect

Tracking

()

RESOLVED FIXED
108 Branch
Tracking Status
relnote-firefox --- 106+
firefox-esr102 --- unaffected
firefox106 --- fixed
firefox107 --- fixed
firefox108 + fixed

People

(Reporter: yann.papouin, Assigned: Jamie)

References

(Regressed 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0

Steps to reproduce:

  • Created a new empty profile
  • Going into my self-hosted Proxmox instance (6.2-4) after entering login/password
    (Also tried with another instance (7.1-6) on my worksite and I got the same issue)

Actual results:

  • Firefox UI freeze and entire process does not respond
  • Need to kill the firefox.exe process

Expected results:

  • The UI should load normally
  • It works perfectly on Firefox <= 105, so that's a regression

The Bugbug bot thinks this bug should belong to the 'Toolkit::Password Manager' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Password Manager
Product: Firefox → Toolkit
Component: Password Manager → General
Product: Toolkit → Firefox

Yann.papoin, could you try to find a regression range? - https://mozilla.github.io/mozregression/ could hopefully find when the issue was introduced.

For the time being, moving this to Toolkit -> Startup and Profile so that our developers could take a closer look - if this is not the right component, please set a more appropriate one.

Component: General → Startup and Profile System
Flags: needinfo?(yann.papouin)
Product: Firefox → Toolkit

I made some test with mozregression but I'm not able to reproduce the UI hang with any nightly version with binaries fetched from this tool, so I started a « manual binary bissect » using nightly installer archives. (oh please make the auto-update an opt-in option for nightly).

This is my results, the bug was introduced in nightly build 2022-08-31-09-32-58 and is still there in latest build.

Flags: needinfo?(yann.papouin)
Component: Startup and Profile System → General
Product: Toolkit → Firefox
Whiteboard: regression

Yann, if you're still able to reproduce this, would you be willing to create and attach a dump file of the hanging Firefox process? You can create a dump file by right-clicking on the hanging process in Task Manager and then selecting "Create dump file". If you don't feel comfortable attaching this dump file to the bug, you can email me it instead. This should allow us to determine what is going on with the hanging process.

Flags: needinfo?(yann.papouin)

Following is a link to an archive of the dump "firefox-bug-1793748.7z"
https://drive.google.com/file/d/10tlZ_R7qvbC7GPu_nlIQdczOQ1kKIihE/view?usp=sharing

Flags: needinfo?(yann.papouin)

(In reply to Yann Papouin from comment #6)

Following is a link to an archive of the dump "firefox-bug-1793748.7z"
https://drive.google.com/file/d/10tlZ_R7qvbC7GPu_nlIQdczOQ1kKIihE/view?usp=sharing

Thanks for the dump! Somehow Visual Studio couldn't extract a correct stack from it, but someone helped me get a stack using WinDbg, here is the stack where the main thread is blocked:

00 000000ec`62ffe090 00007ffa`afdb9cae     mozglue!je_free+0x149 [/builds/worker/checkouts/gecko/memory/build/malloc_decls.h @ 54] 
01 (Inline Function) --------`--------     xul!operator delete+0x2 [/builds/worker/workspace/obj-build/dist/include/mozilla/cxxalloc.h @ 51] 
02 000000ec`62ffe0d0 00007ffa`afde07d7     xul!mozilla::a11y::AccIterator::Next+0x8e [/builds/worker/checkouts/gecko/accessible/base/AccIterator.cpp @ 38] 
03 000000ec`62ffe120 00007ffa`afde076d     xul!mozilla::a11y::TableAccessible::RowAt+0x47 [/builds/worker/checkouts/gecko/accessible/generic/TableAccessible.cpp @ 266] 
04 000000ec`62ffe190 00007ffa`afdfe3af     xul!mozilla::a11y::LocalAccessible::NativeAttributes+0xa0d [/builds/worker/checkouts/gecko/accessible/generic/ARIAGridAccessible.cpp @ 80] 
05 000000ec`62ffe1c0 00007ffa`afdfe54e     xul!mozilla::a11y::TableCellAccessible::PrevColHeader+0x10f [/builds/worker/checkouts/gecko/accessible/generic/TableCellAccessible.cpp @ 59] 
06 000000ec`62ffe240 00007ffa`afdac1a3     xul!mozilla::a11y::TableCellAccessible::ColHeaderCells+0x1e [/builds/worker/checkouts/gecko/accessible/generic/TableCellAccessible.cpp @ 110] 
07 000000ec`62ffe290 00007ffa`afe128de     xul!mozilla::a11y::ia2AccessibleTableCell::get_columnHeaderCells+0xb3 [/builds/worker/checkouts/gecko/accessible/windows/ia2/ia2AccessibleTableCell.cpp @ 78] 
08 000000ec`62ffe340 00007ffa`afe147ad     xul!mozilla::a11y::HandlerProvider::BuildDynamicIA2Data+0x4de [/builds/worker/checkouts/gecko/accessible/ipc/win/HandlerProvider.cpp @ 394] 
09 (Inline Function) --------`--------     xul!mozilla::detail::RunnableMethodArguments<_AccChildData **,unsigned long *,long *>::applyImpl+0x19 [/builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h @ 1147] 
0a (Inline Function) --------`--------     xul!mozilla::detail::RunnableMethodArguments<_AccChildData **,unsigned long *,long *>::apply+0x19 [/builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h @ 1153] 
0b 000000ec`62ffe510 00007ffa`adee0411     xul!mozilla::detail::RunnableMethodImpl<mozilla::a11y::HandlerProvider *,void (mozilla::a11y::HandlerProvider::*)(_AccChildData **, unsigned long *, long *),0,mozilla::RunnableKind::Standard,_AccChildData **,unsigned long *,long *>::Run+0x2d [/builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h @ 1203] 
0c (Inline Function) --------`--------     xul!`anonymous namespace'::SyncRunnable::APCRun+0x28 [/builds/worker/checkouts/gecko/ipc/mscom/MainThreadInvoker.cpp @ 60] 
0d 000000ec`62ffe540 00007ffb`4abef178     xul!mozilla::mscom::MainThreadInvoker::MainThreadAPC+0x71 [/builds/worker/checkouts/gecko/ipc/mscom/MainThreadInvoker.cpp @ 172] 
0e 000000ec`62ffe5e0 00007ffb`4ac10a6e     ntdll!RtlDispatchAPC+0x68
0f 000000ec`62ffe660 00007ffb`4ac10534     ntdll!KiUserApcDispatch+0x2e
10 000000ec`62ffeb98 00007ffa`adee4897     ntdll!NtTestAlert+0x14
11 000000ec`62ffeba0 00007ffa`ac20e76e     xul!`anonymous namespace'::SyncRunnable::Run+0x17 [/builds/worker/checkouts/gecko/ipc/mscom/MainThreadInvoker.cpp @ 53] 
12 000000ec`62ffebd0 00007ffa`ad449f27     xul!mozilla::SchedulerGroup::Runnable::Run+0x3e [/builds/worker/checkouts/gecko/xpcom/threads/SchedulerGroup.cpp @ 140] 
13 (Inline Function) --------`--------     xul!mozilla::RunnableTask::Run+0x11 [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 538] 
14 000000ec`62ffec10 00007ffa`ad449203     xul!mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal+0x847 [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 851] 
15 (Inline Function) --------`--------     xul!mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal+0xb [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 683] 
16 000000ec`62ffefe0 00007ffa`ad1aec6c     xul!mozilla::TaskController::ProcessPendingMTTask+0x63 [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 461] 
17 (Inline Function) --------`--------     xul!mozilla::TaskController::InitializeInternal::<lambda_2>::operator()+0xc [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 190] 
18 (Inline Function) --------`--------     xul!mozilla::detail::RunnableFunction<`lambda at /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:190:7'>::Run+0xc [/builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h @ 531] 
19 000000ec`62fff0b0 00007ffa`ad47269b     xul!nsThread::ProcessNextEvent+0x1e6c [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 1205] 
1a (Inline Function) --------`--------     xul!NS_ProcessNextEvent+0x29 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp @ 465] 
1b 000000ec`62fff440 00007ffa`ac3a6aff     xul!mozilla::ipc::MessagePump::Run+0x1cb [/builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp @ 107] 
1c (Inline Function) --------`--------     xul!MessageLoop::RunInternal+0x16 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 381] 
1d 000000ec`62fff4f0 00007ffa`ab9ef5ee     xul!MessageLoop::RunHandler+0x2f [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 375] 
1e 000000ec`62fff540 00007ffa`abb4bea8     xul!MessageLoop::Run+0x4e [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 357] 
1f 000000ec`62fff5a0 00007ffa`abb4ad57     xul!nsBaseAppShell::Run+0x28 [/builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp @ 152] 
20 000000ec`62fff5e0 00007ffa`accd8e7b     xul!nsAppShell::Run+0x37 [/builds/worker/checkouts/gecko/widget/windows/nsAppShell.cpp @ 614] 
21 000000ec`62fff760 00007ffa`ac3a6aff     xul!XRE_RunAppShell+0x4b [/builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp @ 880] 
22 (Inline Function) --------`--------     xul!MessageLoop::RunInternal+0x16 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 381] 
23 000000ec`62fff7a0 00007ffa`ab9ef5ee     xul!MessageLoop::RunHandler+0x2f [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 375] 
24 000000ec`62fff7f0 00007ffa`accd87b2     xul!MessageLoop::Run+0x4e [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 357] 
25 000000ec`62fff850 00007ff6`92f6ef96     xul!XRE_InitChildProcess+0x602 [/builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp @ 743] 
26 (Inline Function) --------`--------     firefox!content_process_main+0x73 [/builds/worker/checkouts/gecko/browser/app/../../ipc/contentproc/plugin-container.cpp @ 57] 
27 (Inline Function) --------`--------     firefox!NS_internal_main+0x2fd [/builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp @ 359] 
28 000000ec`62fffaf0 00007ff6`92f7d248     firefox!wmain+0x5b6 [/builds/worker/checkouts/gecko/toolkit/xre/nsWindowsWMain.cpp @ 167] 
29 (Inline Function) --------`--------     firefox!invoke_main+0x22 [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90] 
2a 000000ec`62fffe30 00007ffb`4a527034     firefox!__scrt_common_main_seh+0x10c [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 
2b 000000ec`62fffe70 00007ffb`4abc2651     kernel32!BaseThreadInitThunk+0x14
2c 000000ec`62fffea0 00000000`00000000     ntdll!RtlUserThreadStart+0x21

The main thread is blocked on some accessibility code.

Component: General → Disability Access APIs
Product: Firefox → Core

[Tracking Requested - why for this release]: Potential hang of the entire browser when an accessibility client is being used.

Distilled test case:
data:text/html,<div role="treegrid"><div role="blockquote" tabindex="0"><div role="row"><div role="gridcell">a

Assignee: nobody → jteh
Severity: -- → S2
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
OS: Unspecified → Windows
Priority: -- → P1
Regressed by: 1726124
Hardware: Unspecified → Desktop

Set release status flags based on info from the regressing bug 1726124

Here's what's going on here:

  1. There's an Accessible between the table and the row which is neither a row group nor a generic text container.
  2. Before bug 1726124 part 1, nsAccUtils::TableFor would have returned null for this cell. Now, it returns the table.
  3. TableCellAccessible::PrevColHeader calls Table(), which calls TableFor(), which returns a table (instead of null), so we don't fail early like we did previously.
  4. PrevColHeader calls RowIdx(), which calls RowIndexFor(), which uses Filters::GetRow. That skips subtrees which aren't row groups nor generic text containers. That means we don't find a row.
  5. RowIndexFor() returns -1. But RowIdx() returns an unsigned int! So now we have a very large number in rowIdx.
  6. The loop searching for previous rows continues for billions of iterations:
    for (uint32_t curRowIdx = rowIdx - 1; curRowIdx < rowIdx; curRowIdx--) {
    I suspect this loop isn't truly infinite - it should eventually stop when we underflow curRowIdx - but that's going to take a looong time.

This is a bit tricky to fix:

  1. I tried making nsAccUtils::TableFor return null if it encounters something which isn't a row group nor a generic text container. That is closer to its previous behaviour without breaking its new behaviour. That causes a crash when a document containing a CachedTableAccessible shuts down because it calls Role() and we crash trying to access DOM data structures which I guess must be dying.
  2. RowIdx() really shouldn't be returning -1 in an unsigned int. However, making RowIdx() signed is a huge change across the code and seems rather overkill just to avoid this silliness. The size/risk of the code change is particularly relevant here because we'll definitely want to get this uplifted to 107.

I think I'm going to try to fix this in PrevColHeader somehow.

Previously, we were inconsistent in what we treated as a table row in various places.
This caused breakage after bug 1726124, since nsAccUtils::TableFor() started returning a table when an ARIA row had an invalid parent, but Filters::GetRow wouldn't find such a row
This was causing broken expectations (and consequently, an extremely long loop) in TableCellAccessible::PrevColHeader.
To fix this, avoid creating an ARIARowAccessible (and thus ARIAGridCellAccessibles) if the row's parent isn't valid.
This way, there aren't conflicting expectations because the TableCellAccessible code can never run.
Furthermore, clients don't get a broken table cell interface on invalid cells, which was a problem even before bug 1726124.

Thank you for your investigation and fixes.
As a separate issue, do you have an idea why I'm not able to reproduce this freeze with builds fetched by the mozregression tool ?

Pushed by jteh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b25d099d04b1
Don't treat a role="row" as a table row if it has an invalid parent. r=morgan

(In reply to Yann Papouin from comment #12)

do you have an idea why I'm not able to reproduce this freeze with builds fetched by the mozregression tool ?

You have some accessibility client on your system. I'm not sure what; it might not be a traditional accessibility tool, but could be something else that's querying accessibility for information. Whatever it is, it is scanning the accessibility tree looking for something. We know apps like Lenovo Privacy Guard do this.

When Firefox is installed, we register an additional component with the system (AccessibleHandler.dll) which improves accessibility performance by sending some additional information when an accessibility node is retrieved. One of those pieces of information is table column headers, which is what causes the freeze due to this bug. So, even though your accessibility client isn't asking for that, we send it when AccessibleHandler is available, thus triggering the freeze.

Ideally, AccessibleHandler would always be used, as it significantly improves performance. Unfortunately, it has to be registered in the Windows registry due to Windows limitations. Since mozregression doesn't "install" Firefox - it just extracts and runs - AccessibleHandler doesn't get registered. This is normally a disadvantage, but in your particular case (due to this bug), it is an advantage.

Regressions: 1796386

(In reply to James Teh [:Jamie] from comment #14)

Since mozregression doesn't "install" Firefox - it just extracts and runs - AccessibleHandler doesn't get registered.

Thanks a lot for the detailed explanation! I was really puzzled about why mozregression wasn't helpful here.

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 108 Branch

Yann, Would you mind testing with the latest Firefox Nightly? While I'm fairly certain I figured out what was going on from the dump you provided (thank you for that), I can't be entirely certain because I don't have a Proxmox instance to test with.

Flags: needinfo?(yann.papouin)

I made a test with Proxmox and indeed the issue is fixed in latest nightly.
And thank you the accessibility dll registering explanation.

Flags: needinfo?(yann.papouin)

:jamie given this is an S2, would you submit a beta uplift request when ready?
Should we also try to include this in the 106 planned dot release?

Flags: needinfo?(jteh)

Comment on attachment 9299190 [details]
Bug 1793748: Don't treat a role="row" as a table row if it has an invalid parent.

Beta/Release Uplift Approval Request

  • User impact if declined: Browser freeze with accessibility enabled on some sites; e.g. on Proxmox web UI.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Changes the code to gracefully handle a particular authoring error which isn't valid and failed previously anyway.
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(jteh)
Attachment #9299190 - Flags: approval-mozilla-release?
Attachment #9299190 - Flags: approval-mozilla-beta?

Comment on attachment 9299190 [details]
Bug 1793748: Don't treat a role="row" as a table row if it has an invalid parent.

Given that this is a P1/S2, that the fix was verified by the reporter and that this is estimated as low risk, I am taking it into our 106.0.2 dot release, thanks

Attachment #9299190 - Flags: approval-mozilla-release? → approval-mozilla-release+

Comment on attachment 9299190 [details]
Bug 1793748: Don't treat a role="row" as a table row if it has an invalid parent.

Approved for 107.0b5.

Attachment #9299190 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.