Closed Bug 1736742 Opened 3 years ago Closed 3 years ago

MSIX: AccessibleHandler is not registered, causing severe performance issues for a11y

Categories

(Firefox :: Installer, defect, P2)

Desktop
Windows
defect

Tracking

()

VERIFIED FIXED
96 Branch
Accessibility Severity s2
Tracking Status
relnote-firefox --- 94+
firefox94 + verified
firefox95 + verified
firefox96 + verified

People

(Reporter: Jamie, Assigned: Jamie)

References

(Blocks 1 open bug)

Details

(Keywords: access, Whiteboard: [fidedi-tikka])

Attachments

(4 files, 3 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

E10s a11y depends on a COM component (AccessibleHandler.dll) which has to be registered in the registry. Without this component, a11y will work, but it will be unusably slow, possibly unreliable and OOP iframes won't work with the JAWS screen reader.

STR:

  1. Download and install the NVDA screen reader.
  2. Install Firefox from the Windows Store.
  3. Start NVDA.
  4. Start the Firefox Windows Store app.
  5. Go to about:support.
  6. Look for "Accessible Handler Used" and check its value.
    • Expected: true
    • Actual: false
  7. Visit https://en.wikipedia.org/wiki/World_War_I
    • Expected: NVDA will say "Loading document..." and then start reading the page after about 3 seconds.
    • Actual: NVDA says "Loading document..." and starts reading the page after about 15 seconds. Responsiveness is very poor for 10+ seconds thereafter.

AccessibleHandler is a COM lightweight client-side handler. This is a fairly obscure, rarely used piece of COM. As far as I know, these have to be registered in the registry; they are not supported in .manifest files. Our NSIS installer registers this in HKLM and our detection code looks in HKLM. I'm not sure whether COM supports this in HKCU, but I think it supports everything else there, so I guess it probably does.

Regardless of HKLM vs HKCU, an open question is whether MSIX even supports COM lightweight client-side handlers at all. If it doesn't, that is a big problem for us.

Note that we are currently working on a massive re-architecture (bug 1694563) which will eliminate the need for this dependency. Unfortunately, that is many months away from being usable given current resourcing.

Our MSIX installer does register AccessibleHandler.dll, but only as the proxy/stub for our internal IGeckoBackChannel interface. It does not register it as a lightweight client-side handler.

I looked through the schema for application manifests and I can't see any support for COM handlers, only interfaces and servers. :(

I was talking to Mick at NV Access today. He noted that CoRegisterClassObject takes a CLSCTX and there is CLSCTX_INPROC_HANDLER. That means it might be possible to register the handler in the parent process (and maybe content processes; I'm not sure if that's necessary or not) using CoRegisterClassObject at runtime instead of the registry. That would be good for other reasons as well: depending on system registration has always been a source of pain for us.

Computer says no. :( If I try to register the handler with CoRegisterClassObject, I get E_INVALIDARG.

The documentation for REGCLS seems to support this, saying that other values of CLSCTX not covered there (CLSCTX_INPROC_HANDLER isn't covered) return an error for all values of REGCLS.

I'll attach a patch here just so we don't lose it, but I think this is a dead end.

The idea was that this would hopefully eliminate the dependency on registering this in the registry.
Unfortunately, it seems CoRegisterClassObject doesn't support CLSCTX_INPROC_HANDLER at all and always returns E_INVALIDARG.

Attachment #9246796 - Attachment is obsolete: true

I can't devote any time to this issue at the moment, but there is a mechanism for including registry entries in MSIX packages. I tried to avoid it because it's very difficult to generate these registry hive files in the generality we'd want as part of our build. But it's possible that these registry hives can impact HKLM, allowing the MSIX to at least produce the registry entry. But it looks like there an additional registration process done by the installer, namely invoking regsvr32. That I don't have any suggestions for.

Whiteboard: [access-s2] → [access-s2] [fidedi-tikka]

(In reply to Nick Alexander :nalexander [he/him] from comment #6)

But it looks like there an additional registration process done by the installer, namely invoking regsvr32. That I don't have any suggestions for.

That ends up calling DllRegisterServer in our AccessibleHandler code.

Using HKLM seems like it will be pretty painful due to these registry.dat files. So, I looked into the possibility of using HKCU. Tl;dr: Works outside MSIX, but doesn't work inside, as I expected. I'm almost certain we'll get the same even if we try HKLM.

Outside of MSIX, I determined that COM does indeed support lightweight client-side handlers registered in HKCU. I did this by modifying our IsAccessibleHandlerRegistered code to check in HKCU and then tested with the Wikipedia article as described in comment 0. It loaded quickly as I would expect with the handler enabled.

I then made a bunch of changes to register the handler in HKCU at startup. I confirmed that code worked outside of MSIX.

Unfortunately, that doesn't seem to work inside MSIX. I don't know why yet and I'm having trouble debugging it because I can't seem to find a way to start Firefox with a console so I can get printf output (or with a debugger at startup so I can get debug output).

I then learned that you can run regedit inside the MSIX container using PowerShell:
Invoke-CommandInDesktopPackage -PackageFamilyName Mozilla.MozillaNightly_g0vjsq4eewpdt -AppId Mozilla .MozillaNightly -command regedit.exe
Using that, I manually added the necessary registry entries. After that, starting the Firefox MSIX and checking about:support showed that the handler was indeed detected.

Sadly, the Wikipedia article was super slow to load, as I would expect without the handler. Thus, I think my suspicion is correct: COM doesn't support handlers (even if specified in the container registry) inside an MSIX package.

Attachment #9247009 - Attachment is obsolete: true

As an aside (not really related to this bug), while I was looking at all this, I noticed our VFS stuff doesn't seem to appear reflected in c:\Program Files as I would expect; i.e. we always see the raw path even inside the container, not the virtualised path. This article suggests the path inside VFS should be ProgramFilesX64, but we call it ProgramFiles. Could this be the reason for that?

Attachment #9247010 - Attachment is obsolete: true

(In reply to James Teh [:Jamie] from comment #7)

Using HKLM seems like it will be pretty painful due to these registry.dat files.

For completeness, I tried HKLM too. I manually created Registry.dat with the necessary keys in regedit, ensured they were copied into the MSIX package, ran the MSIX app and verified that about:support showed Accessible Handler Used: true (so our code was detecting the correct HKLM entries). I then tested with the Wikipedia article as described in comment 0. It was very slow, indicating the handler was not being used.

In case it's useful for future reference for some other purpose, I found some useful info on Registry.dat on this page. Notably:

The .dat file format is a native registry file form, and if you need to view it, you can use a tool like RegEdit to import/export .dat files onto an existing key. The file is expected to contain a root key called “REGISTRY”, with sub keys for MACHINE and USER as needed for the equivalent to HKLM and HKCU on the native system.

This is the info I followed to place the correct HKLM entries in comment 11.

In conclusion, unless someone can come up with some other approach I've missed, I think we're out of options here. The only solution to this is going to be our (many months from being complete) caching re-architecture in bug 1694563, which will eliminate the need for this component.

I guess we could ask Microsoft whether there's some way to get this working that I've missed, but I highly doubt it'll be possible without changes to Windows.

Do other Windows MSIX apps have the same problem?

Priority: -- → P2

(In reply to Romain Testard [:RT] from comment #14)

Do other Windows MSIX apps have the same problem?

No. Firefox's current multi-process a11y implementation is unique in depending on this rather obscure Windows COM technology. Chromium doesn't use this, so it is not likewise impacted either.

After a brief investigation on the matter, I can confirm that this issue is only really observable on the reported page (or on pages with a huge amount of text). Furthermore, the Accessible Handler Used argument from about:support is different on different systems and I can't determine what activates it incorrectly.

This is a list of the values of this argument in different versions vs systems:
https://docs.google.com/spreadsheets/d/1iP8BnZJKK0A00w77eV2qZ9qv0GS9P9_z4DqVPr_qIMQ/edit#gid=0
Marked red if incorrect/unexpected.

Next, I'll attempt to perform a regression investigation in the change of this Accessible Handler Used value and come back with results.

(In reply to James Teh [:Jamie] from comment #9)

As an aside (not really related to this bug), while I was looking at all this, I noticed our VFS stuff doesn't seem to appear reflected in c:\Program Files as I would expect; i.e. we always see the raw path even inside the container, not the virtualised path. This article suggests the path inside VFS should be ProgramFilesX64, but we call it ProgramFiles. Could this be the reason for that?

Yes -- we got this wrong. We don't rely on this for anything but tracking it is Bug 1736113.

(In reply to Bodea Daniel [:danibodea] from comment #16)

After a brief investigation on the matter, I can confirm that this issue is only really observable on the reported page (or on pages with a huge amount of text).

It is very problematic for daily usage, though. In heavily dynamic web apps, for example, responsiveness will be unacceptably sluggish.

Furthermore, the Accessible Handler Used argument from about:support is different on different systems and I can't determine what activates it incorrectly.

Is this with an MSIX package or a normal Firefox install/zip archive? For MSIX packages, I don't see how it could ever be true.

Next, I'll attempt to perform a regression investigation in the change of this Accessible Handler Used value and come back with results.

This is not a regression. It is an issue associated with the Firefox MSIX package, which is a recent development.

Depends on: 1737192
  • I've managed to find the cause: When the user installs Firefox and selects "No" on the UserAccountControl prompt, then the "Accessible Handler Used" in about:support will be displayed as FALSE and the NVDA hang issue will reproduce.

Proof:
nightly + yes on UAC / standard location -> Accessible Handler Used=TRUE
nightly + no on UAC / standard location -> Accessible Handler Used=FALSE
nightly + yes on UAC / custom location -> Accessible Handler Used=TRUE
nightly + no on UAC / custom location -> Accessible Handler Used=FALSE
release + yes on UAC / standard location -> Accessible Handler Used=TRUE
release + no on UAC / standard location -> Accessible Handler Used=FALSE
release + yes on UAC / custom location -> Accessible Handler Used=TRUE
release + no on UAC / custom location -> Accessible Handler Used=FALSE

  • Moreover, I've attempted to find out if it is a regression and the result suggests that this issue has been happening since before the addition of the "Accessible Handler Used" argument (on builds installed with No UAC).

Proof:
First know build with the "Accessible Handler Used" argument:
Nightly v56.0a1 2017-07-20 -> the NVDA issue occurs.
Last known build before the addition of the "Accessible Handler Used" argument:
Nightly v56.0a1 2017-06-15 -> THE NVDA ISSUE OCCURS!
Mozregression was unable to find enough data to bisect (the addition/implementation of the "Accessible Handler Used")
Older builds wouldn't even load the Wikipedia page to attempt issue reproduction so it's safe to assume this issue is very old.

  • Other notes:
    -All mozregression loaded nightlies are affected (Accessible Handler Used=false).
    -MacOS and Ubuntu are not affected (Windows-specific)

  • The last worry would be why a build installed from the Windows Store would have "Accessible Handler Used" set as FALSE (like normal installations made with option "No" at UAC prompt)

Please NI me if you need more testing.

That is expected. Only installs performed as administrator can correctly register the handler. Until now, this has been the "official" way to install Firefox, so this has been okay. MSIX doesn't support registering handlers at all.

Severity: -- → S2

I did some additional digging into this, and I don't think it's going to be possible. But I really don't understand the Windows a11y architecture, so I may be incorrect -- let's hope!

I think the most helpful source of high-level information is https://blogs.windows.com/windowsdeveloper/2017/04/13/com-server-ole-document-support-desktop-bridge/, which clearly says that only OOP servers are supported: "Support is scoped to OOP servers". IIUC, the "lightweight client-side handlers" we use are not such OOP servers. The model here is that some Firefox code (here, AccessibilityHandler.dll) may be loaded into an Accessibility Tool's process space to avoid more expensive cross-process (remote to OOP server) communication. That's not possible in the MSIX/Appx/UWP model :(

Jamie: can you fact check me? Maybe I'm misunderstanding what these handlers are actually used for? In that document I also see, "If your application uses COM only for its own personal use, then you can rely on COM entries in the application’s private hive (Registry.dat) to support your app." So if it's only the Firefox process that needs to load these handlers, then we're probably okay. But I don't think that's the architecture.

Flags: needinfo?(jteh)

(In reply to Nick Alexander :nalexander [he/him] from comment #21)

I think the most helpful source of high-level information is https://blogs.windows.com/windowsdeveloper/2017/04/13/com-server-ole-document-support-desktop-bridge/,

Ah! I missed that document during my research. Thanks.

which clearly says that only OOP servers are supported: "Support is scoped to OOP servers". IIUC, the "lightweight client-side handlers" we use are not such OOP servers. The model here is that some Firefox code (here, AccessibilityHandler.dll) may be loaded into an Accessibility Tool's process space to avoid more expensive cross-process (remote to OOP server) communication.

That's how handlers are normally used. In our case, the handler is only used in the Firefox parent process to make queries from injected in-process a11y dlls faster. We had originally hoped we could make use of this in out-of-process a11y tool processes, but that didn't end up working out.

In that document I also see, "If your application uses COM only for its own personal use, then you can rely on COM entries in the application’s private hive (Registry.dat) to support your app." So if it's only the Firefox process that needs to load these handlers, then we're probably okay.

That's really interesting. In that case, we should actually be fine. Except the approaches I tried in comment 7 and comment 11 didn't work... and they should have if we can indeed rely on our private registry hive. I'm now wondering whether I missed something else.

Flags: needinfo?(jteh)

Some great progress! It looks like the handler is getting picked up by Windows once it's registered correctly in HKCU. However, we were failing to serialise the handler payload in the content process. It looks like that is due to some missing COM interface registrations. I manually registered those with regedit inside the container, which fixed the problem!

I still need to work out how to fix this properly, though. We do specify COM interfaces in the manifest, so either that only exposes them outside the container (and not inside) or it's just that some interfaces are missing.

I also figured out why my code to auto register the handler in HKCU was failing. For reasons unknown to me, RegCreateKeyTransacted fails with ERROR_RM_NOT_ACTIVE (Transaction support within the specified resource manager is not started or was shut down due to an error.). Using RegCreateKeyEx works as expected. I guess registry transactions aren't supported in MSIX containers?

(In reply to James Teh [:Jamie] from comment #23)

I still need to work out how to fix this properly, though. We do specify COM interfaces in the manifest, so either that only exposes them outside the container (and not inside) or it's just that some interfaces are missing.

Some interfaces were missing from the manifest. Adding them sorts that out.

I'm guessing the runtime HKCU patch is going to be preferred, since it avoids the horror that is registry.dat. I'll work on cleaning up my patch and getting it submitted.

Assignee: nobody → jteh
Status: NEW → ASSIGNED

We could do this in HKLM using registry.dat, but that file is difficult to manage.
Instead, we register in HKCU at runtime.
Also, RegCreateKeyTransacted doesn't work in MSIX containers, so we must use RegCreateKeyEx.

Pushed by jteh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/da069141e166
part 1: Add missing a11y COM interfaces to Appx Manifest. r=nalexander
https://hg.mozilla.org/integration/autoland/rev/d842a82ae3eb
part 2: Provide ability to register a COM handler inside an MSIX container. r=nalexander,jmathies
https://hg.mozilla.org/integration/autoland/rev/70846707f4d7
part 3: When running in an MSIX container, register AccessibleHandler in HKCU at startup if it isn't already registered. r=nalexander,jmathies

Please nominate this for Beta approval ASAP. We'd like to get it into tomorrow's b5 build to get some testing ahead of possible dot release uplift.

Flags: needinfo?(jteh)

Comment on attachment 9248463 [details]
Bug 1736742 part 1: Add missing a11y COM interfaces to Appx Manifest.

Beta/Release Uplift Approval Request

  • User impact if declined: Users of screen readers that interact with Firefox installed from the Microsoft Store/MSIX will have a markedly slower performance than users who interact with a "regular" Firefox installed with the installer.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: To test, follow the instructions in https://bugzilla.mozilla.org/show_bug.cgi?id=1736742#c0.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): We believe this patch is not risky because the new functionality is MSIX-specific. However, there are some small structural changes (to DLLs, manifests, etc) that could impact a non-MSIX package.
  • String changes made/needed:
Attachment #9248463 - Flags: approval-mozilla-beta?
Attachment #9248464 - Flags: approval-mozilla-beta?
Attachment #9248465 - Flags: approval-mozilla-beta?
Flags: qe-verify+

(In reply to Ryan VanderMeulen [:RyanVM] from comment #30)

Please nominate this for Beta approval ASAP. We'd like to get it into tomorrow's b5 build to get some testing ahead of possible dot release uplift.

I've done the nomination -- timezones -- but defer to Jamie in all matters here.

Comment on attachment 9248463 [details]
Bug 1736742 part 1: Add missing a11y COM interfaces to Appx Manifest.

Approved for 95 beta 5.

Attachment #9248463 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9248464 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9248465 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

We tried verifying the fix on 95.0b5 and 96.0a1. While the "Accessible Handler Used" is set to true on these builds, 95.0b5 has some hangs of about 10-15 seconds when accessing the wiki page (2 crashes occurred while hanging, see Bug 1740203) and 96.0a1 is much more stable with shorter hangs (~2-3 seconds) and then it stabilizes itself. The hangs for 95.0b5 can be seen in the attachment in Bug 1740203's description.

Backed out 3 changesets (Bug 1736742) for causing crashes (Bug 1740203) a=backout
https://hg.mozilla.org/releases/mozilla-beta/rev/42ccd24f9308

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED

Verified with an central-as-beta build 96.0b1 and it seems that this is affected as well so the issue is a Nightly vs. Beta issue that will affect future releases too.

Urgh. Because the MSIX repackager can't properly parse the branding configure.sh script, we end up with a broken configuration for beta builds. The repackager always uses the last UUID in configure.sh. However, beta and release use the same branding: official. The first UUID is the beta UUID, the second is release. Thus, we always use the release UUIDs even for beta. That's a problem because the build itself is using the beta UUIDs.

I'm reopening this because the current patch set doesn't properly fix the issue and because all the tracking for this (including beta) is happening here.

Status: RESOLVED → REOPENED
Flags: needinfo?(jteh)
Resolution: FIXED → ---

Previously, we always used the last UUID in configure.sh.
This is incorrect for beta because beta and release share the same "official" branding and the beta UUIDs are specified first.
We now decide whether to use the first or last UUID based on the channel.

Pushed by jteh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5eae874d2025
part 1 fix: Correct COM UUIDs used for beta channel builds. r=nalexander
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED

Looked over a treeherder central 96.0a1 (with the patch included), and the ~2s hangs occur after the page has been fully loaded (back page, accessing a link from the wiki page, etc) in which the browser is frozen and after that 2-3 seconds the browser will start responding again without issues (NVDA reads everything in this time). Is this patch intended to resolve beta's longer hangs ?

Just tested on 96-as-Beta and 96-as-Release builds and they are much more stable with only a ~2 sec delay.

Comment on attachment 9250127 [details]
Bug 1736742 part 1 fix: Correct COM UUIDs used for beta channel builds.

Thanks for the testing, Catalin. Re-landing for 95.0b6. If the 2sec delay is something we want to continue trying to improve, let's do so in a new bug.

Attachment #9250127 - Flags: approval-mozilla-beta+

I don't think we can resolve the 2 sec hang with our current architecture. In any case, this is not specific to MSIX. Note that the major re-architecture we're working on in bug 1694563 will probably help with this (among many other things), but that's many months away from being done. We think it makes sense to focus our efforts there and get as many wins there as possible.

Ok, so looked over again on two different systems:

  1. An i7 lenovo laptop - both 95.0b6 and 96.0a1 builds work fine on Windows 10 & 11
  2. Ryzen 5 based system - both beta and nightly builds work fine on Windows 11, while on Windows 10 the ~10 sec hang seems to be occurring but only on 95.0b6, the nightly build work fine.
    Any idea of what could be happening on the Ryzen system?

Just to confirm, these were all MSIX builds?

This would seem to suggest I'm still missing something specific to beta, though I can't fathom what. Can you please do the following to get some extra info:

  1. Load the page and wait for it to fully load as normal.
  2. Press NVDA+n to open the NVDA menu. (The NVDA key is insert or caps lock depending on what you've configured.) Choose Preferences, Settings, then set the log level to debug and press OK.
  3. Back in the browser with the page loaded, press NVDA+f5. NVDA should eventually say "refreshed".
  4. NVDA+n, Tools, View log.
  5. Moving from the bottom of the log upwards, look for a line which says "Buffer load took". What is the value on that line?

Thanks.

Hello James! The tests were all performed on MSIX builds, yes. Regarding the issue on Windows 10x64, it seems that is no longer reproducible on my Ryzen system with 95.b6 MSIX. I don't understand why... Buffer load took has a value of 2.027 seconds which I understand is fine.

(In reply to Alexandru Trif, QA [:atrif] from comment #49)

Regarding the issue on Windows 10x64, it seems that is no longer reproducible on my Ryzen system with 95.b6 MSIX. I don't understand why... Buffer load took has a value of 2.027 seconds which I understand is fine.

I tested 95.0b7 MSIX locally here and it worked as expected for me. I'm not sure why there was a problem with that one installation. I guess we'll need to watch out for this, but for now, I don't think there's any more action we can take here.

Changing flags accordingly on 95 and 96 based on the above comments.

Did you want to nominate this for release approval as a dot release ride-along?

Flags: needinfo?(jteh)

Comment on attachment 9248463 [details]
Bug 1736742 part 1: Add missing a11y COM interfaces to Appx Manifest.

Beta/Release Uplift Approval Request

  • User impact if declined: Users of accessibility tools such as screen readers will experience unacceptably poor performance and may experience stability/reliability problems when using Firefox installed from the Windows Store.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: See steps outlined in comment 0.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Specific to a11y and MSIX. It can't make the situation worse there than it already is.
  • String changes made/needed: None.
Flags: needinfo?(jteh)
Attachment #9248463 - Flags: approval-mozilla-release?
Attachment #9248464 - Flags: approval-mozilla-release?
Attachment #9248465 - Flags: approval-mozilla-release?
Attachment #9250127 - Flags: approval-mozilla-release?

Comment on attachment 9248463 [details]
Bug 1736742 part 1: Add missing a11y COM interfaces to Appx Manifest.

Approved for 94.0.2.

Attachment #9248463 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #9248464 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #9248465 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #9250127 - Flags: approval-mozilla-release? → approval-mozilla-release+

Added to the 94.0.2 relnotes:

Improved hangs experienced by users of assistive technology such as NVDA when installing Firefox through the Microsoft Store

Verified the fix for Firefox 94.0.2 on two different systems both on Windows 10 and 11.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
See Also: → 1803496
Accessibility Severity: --- → s2
Whiteboard: [access-s2] [fidedi-tikka] → [fidedi-tikka]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: