Closed Bug 1793972 Opened 2 years ago Closed 1 year ago

Enable an LPAC on the windows MF Media Engine utility process controlled by a pref.

Categories

(Core :: Security: Process Sandboxing, enhancement, P1)

All
Windows
enhancement

Tracking

()

RESOLVED FIXED
112 Branch
Tracking Status
firefox112 --- fixed

People

(Reporter: bobowen, Assigned: bobowen)

References

Details

Attachments

(1 file)

This will contain the same capabilities used in the MediaFoundationCdm utility process in chromium. Currently used by Edge.

Severity: -- → N/A
Priority: -- → P1
Depends on: 1797768
Depends on: 1797769
No longer depends on: 1797768, 1797769

FYI you can run wmfme on the try server to see if this lpac would break the current implementation.

Just to update here.
Running the wmfme tests I was getting issues with a crash during COM initialization that I couldn't reproduce anywhere other than in the actual try run.

No crash dump was produced because of stack unwinding info being missing for the delay loads that were happening as part of the init.
(Bug 1801322 has been filed for this.)

Once Yannis provided some logging for this I was able to guess at the issue and prevent this loading (although investigating the crash with the load might also be useful).

https://treeherder.mozilla.org/jobs?repo=try&revision=78b15b1ad9eb70baece794d3d249f49b69880156

I am now investigating the subsequent issues.

Depends on: 1804724

Long overdue update ...
I couldn't reproduce on the loaner, so I had to resort to debugging on the try servers (mainly through printf and hooking).
I discovered that the root of the problem, seemed to be in the initial load of user32.dll, gdi32.dll and friends (their loading is always intertwined).
I then narrowed this down to GdiDllInitialize failing.
I audited the sandbox settings in chromium and examined the token used in msedge, to try and make sure our settings matched.

I tried removing SIDs from the access token locally and found that removing the logon SID, caused a similar failure, the try server jobs weren't missing this, but it indicated that an issue with the token might be the cause.
Logging out all of the TokenGroups and comparing the try runs with the loaner, I found that the only difference was that the loaner job had the REMOTE INTERACTIVE LOGON SID whereas the try job process had the CONSOLE LOGON SID.

Looking through the taskcluster code, I found that the try jobs are started from a service.
By setting up a service on the loaner (running as a normal user) to start firefox I can final reproduce what looks like the same issue.
Disabling the LPAC removes the problem.

I can also reproduce locally.
Interestingly if I set up msedge to start from a service in a similar way, it seems to detect the issue and doesn't attempt to start their WMF CDM utility process.

Now that I can debug locally, I think I've narrowed it down further to win32u!NtGdiInit failing, which is a system call.
So, it looks like I'm going to have to dive into some kernel debugging to try and find the problem.
I'll also have a delve through the chromium code to see if the detection of the issue is in that code, although it could easily be something added into the msedge only code.

Finally tracked it down to the Window Station and Desktop not having permissions for the LPAC, which means win32kfull!xxxResolveDesktop fails in the kernel code during win32u!NtGdiInit.

By adding these permissions on I can get the process to load user32.dll successfully when started from a service.
This didn't work initially for the try servers as it is actually running using the default Window Station (WinSta0) and Desktop (Default) and the main process doesn't have the rights to grant more access.
However, by using a Windows Station and Desktop that we've created (in the sandbox code) for that process it worked.
There were some tests passing, but then it timed out.
The chromium sandbox for Edge uses the Default Window Station and Desktop for this process, so I suspect we'll need to do the same.

However, I've found that we already grant access to Everybody here for the taskcluster workers.

So I've filed bug 1815711 to get the general Application Container SIDs added.

Latest try push with grant of permissions to lpacFirefoxInstallFiles added to the LPAC init code:
https://treeherder.mozilla.org/jobs?repo=try&revision=4550e276d9f479848b607b259c7f90e3efa591f6

Try push with LPAC enabled, test fail because of bug 1815711, but permissions are granted to bin dir:
https://treeherder.mozilla.org/jobs?repo=try&revision=3015418b127077fafb8af97bb8e09b8760c001b3

Duplicate of this bug: 1797768
Duplicate of this bug: 1793967
Duplicate of this bug: 1793968

Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/7e0174d704f0
Enable an LPAC on the windows MF Media Engine utility process controlled by a pref. r=handyman
https://hg.mozilla.org/integration/autoland/rev/a9fc5126c3ad
1793972: apply code formatting via Lando

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 112 Branch
Group: mozilla-employee-confidential
Depends on: 1843153
No longer depends on: 1843153
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: