Closed Bug 1755979 Opened 2 years ago Closed 2 years ago

Accessibility checks on 32-bit windows pre build 15063 cause user32.dll to load

Categories

(Core :: Disability Access APIs, defect, P1)

Desktop
Windows
defect

Tracking

()

RESOLVED FIXED
99 Branch
Tracking Status
firefox99 --- fixed

People

(Reporter: bobowen, Assigned: bobowen)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1378141 added checks that use shell32.dll that in turn loads user32.dll.
These are for 32-bit before win 10 build 15063.
This happens before CoInitializeSecurity and user32.dll being loaded causes that to fail when win32k lockdown is enabled.

If there aren't ways of doing these checks that are compatible with win32k lockdown, it looks like these checks could be just done in the parent, with result passed down to child.

Sending this down from the parent should work. We could do that in the various places where we call ContentParent::SendActivateA11y.

One caveat is that there are cases where the a11y engine is instantiated in a content process without being triggered by a request from the parent process. I'm actually not quite sure how that could happen, but we have code for it, so presumably it can. If we need to handle that, we can use the same sync request we use for the msaa id, but we'll need to be sure that happens before a11y::PlatformChild is initialised.

Frustratingly, we won't need any of this once we only have to worry about our new caching architecture, but that's months from being ready for daily usage, probably quite a bit longer before we could realistically drop support for our old architecture.

Bob, is this something you need the a11y team to fix, and if so, when?

Severity: -- → S3
Flags: needinfo?(bobowencode)
OS: Unspecified → Windows
Hardware: Unspecified → Desktop

(In reply to James Teh [:Jamie] from comment #1)

Sending this down from the parent should work. We could do that in the various places where we call ContentParent::SendActivateA11y.

One caveat is that there are cases where the a11y engine is instantiated in a content process without being triggered by a request from the parent process. I'm actually not quite sure how that could happen, but we have code for it, so presumably it can. If we need to handle that, we can use the same sync request we use for the msaa id, but we'll need to be sure that happens before a11y::PlatformChild is initialised.

Assuming they aren't needed before we have IPDL set-up, couldn't we just always send them down from the parent in an early PContent message?

Bob, is this something you need the a11y team to fix, and if so, when?

The issue we have here is that if user32.dll is loaded when we call CoInitializeSecurity it takes a different code path that fails when win32k lockdown is enabled.
For the moment I have restricted win32k lockdown to win10 build 16299+, because after that point we have a work around which allows CoInitializeSecurity to succeed even with user32.dll loaded.
I have done that because I am concerned that other things might cause user32.dll to be loaded (I'm thinking other security/privacy software here) and cause similar crashes. This way we can hopefully smooth the initial roll-out of win32k lockdown and still get it out to a large majority of users where win32k lockdown is supported.

So, for the moment that removes this crash, but we're going to want to roll out to other versions as soon as we can. That will mean removing any early loads we do (like this one) and then probably replacing the diagnostic assert for CoInitializeSecurity failure. I'll need to add telemetry first to try and assess the possible impact.

From what you say, I guess we could move this code until after CoInitializeSecurity (I wasn't sure if it needed to be before), but I haven't checked whether it will fail anyway with win32k lockdown. Some of the code almost certainly will fail with other protections we want to turn on.

(Typing all this gives me another idea for a work around for other loads of user32.dll, although it might be a little too crazy ...)

Frustratingly, we won't need any of this once we only have to worry about our new caching architecture, but that's months from being ready for daily usage, probably quite a bit longer before we could realistically drop support for our old architecture.

That's interesting, so are we moving back to this idea?
I thought that the fears were that something like this would be too slow, which was why we went with the COM solution.

Flags: needinfo?(bobowencode) → needinfo?(jteh)

(In reply to Bob Owen (:bobowen) from comment #2)

Assuming they aren't needed before we have IPDL set-up, couldn't we just always send them down from the parent in an early PContent message?

I guess we could, yeah. We don't want to initialise the a11y service if we don't need to, though. The resource id doesn't require the a11y service, so that's fine, but the msaa id might; I don't recall right now. So, we might need a separate message rather than adding it to the existing setup.

I have done that because I am concerned that other things might cause user32.dll to be loaded (I'm thinking other security/privacy software here)

Loaded into our content processes? Yuck.

So, for the moment that removes this crash, but we're going to want to roll out to other versions as soon as we can. That will mean removing any early loads we do (like this one) and then probably replacing the diagnostic assert for CoInitializeSecurity failure.

A bit off-topic here, but I'm pretty worried about removing that diagnostic assert. It also catches cases where COM was initialised by something else when it shouldn't have been, which would have caught bug 1714212 instead of that only showing up in debug builds and thus being missed until after release. I worry removing it might cause us to miss similar obscure bugs in future.

From what you say, I guess we could move this code until after CoInitializeSecurity (I wasn't sure if it needed to be before), but I haven't checked whether it will fail anyway with win32k lockdown. Some of the code almost certainly will fail with other protections we want to turn on.

Ug. That's a good point: I don't know whether this works after CoInitializeSecurity. ProcessRuntime does need the resource id, so we might already have a problem here. That is, if accessibility gets initialised after ProcessRuntime, we might not use the correct resource id.

That's interesting, so are we moving back to [the idea of a11y caching]?
I thought that the fears were that something like this would be too slow, which was why we went with the COM solution.

There's a lot of history here which I'd be glad to explain elsewhere if you're interested. Suffice it to say here that yes, after years of debate and struggling with our current architecture, we have made the decision to "Cache the World" in the parent process, abandoning our current COM architecture on Windows.

Flags: needinfo?(jteh)

(In reply to Bob Owen (:bobowen) from comment #2)

Assuming they aren't needed before we have IPDL set-up, couldn't we just always send them down from the parent in an early PContent message?

Does ContentChild get initialised before ProcessRuntime?

Flags: needinfo?(bobowencode)

Err, I mean, would we get an early PContent message before ProcessRuntime.

(In reply to James Teh [:Jamie] from comment #5)

Err, I mean, would we get an early PContent message before ProcessRuntime.

Ah no, so maybe the simplest thing would be to add it to the command line.
Particularly if it is going away at some point.

Flags: needinfo?(bobowencode)

Ignore this comment, hadn't realised the function it calls is in mozglue.

I'll look at adding this to the command line, given that we need it early on.

Jamie - is this actually needed anywhere other than the content process?
(I think we create and activate the activation context in all children that use ProcessRuntime at the moment.)

Assignee: nobody → bobowencode
Status: NEW → ASSIGNED
Flags: needinfo?(jteh)
Priority: -- → P1

(In reply to Bob Owen (:bobowen) from comment #8)

I'll look at adding this to the command line, given that we need it early on.

Thanks.

Jamie - is this actually needed anywhere other than the content process?
(I think we create and activate the activation context in all children that use ProcessRuntime at the moment.)

It's needed in both the parent process and content processes (including things like WebExtension content). It is not needed in any other kind of process that can't render web content; e.g. GPU process.

Note that it also isn't needed if accessibility.cache.enabled (static startup pref) is set to true.

Flags: needinfo?(jteh)

This is being done because the way the ID is determined causes issue with the
sandbox and the ID is required very early in process start up.

Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/540d7dd134db
Pass accessibilty resource ID down from the parent. r=Jamie,nika

Backed out changeset 540d7dd134db (Bug 1755979) for causing xpcshell failures on ContentParent.cpp.
Backout link
Push with failures
Failure Log

Flags: needinfo?(bobowencode)

(In reply to Marian-Vasile Laza from comment #13)

Backed out changeset 540d7dd134db (Bug 1755979) for causing xpcshell failures on ContentParent.cpp.
Backout link
Push with failures
Failure Log

Looks like that assertion I added was not valid. :-)

Flags: needinfo?(bobowencode)
Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/99d20f4c8750
Pass accessibilty resource ID down from the parent. r=Jamie,nika
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
Depends on: 1755734
Depends on: 1759167
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: