Closed Bug 1770098 Opened 2 years ago Closed 2 years ago

Outdated Exploit Protection Security Baseline causes browser to fail to load any pages with win32k lockdown

Categories

(External Software Affecting Firefox :: Other, defect, P1)

Tracking

(firefox102 verified, firefox103 verified)

VERIFIED FIXED
Tracking Status
firefox102 --- verified
firefox103 --- verified

People

(Reporter: tquan, Assigned: gcp)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0

Steps to reproduce:

launch firefox 100.0.1 x64 on Windows 11 build 22000.613 (April release) or 22000.675 (May release)

Actual results:

browser window comes up, but browser is unresponsive. trying to visit any site fails, browser hangs

Expected results:

able to view site

similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1769845 but I'm on windows 11 and not on a VM

The Bugbug bot thinks this bug should belong to the 'Core::Widget: Win32' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Widget: Win32
Product: Firefox → Core

traced this down to win32k lockdown: https://hacks.mozilla.org/2022/05/improved-process-isolation-in-firefox-100/

if security.sandbox.content.win32k-disable is set to true, browser fails to render any site and multiple pieces of the browser fail as well. similar to what was described in https://bugzilla.mozilla.org/show_bug.cgi?id=1769845 and https://bugzilla.mozilla.org/show_bug.cgi?id=1769702. if security.sandbox.content.win32k-disable is set to false, browser works properly again.

Summary: firefox 100.0.1 hangs on windows 11, firefox 99.0.1 works fine → enabling win32k lockdown on firefox 100.0.1 causes browser to fail to load any pages

one other piece of information, the impacted device is running Microsoft Defender for Endpoint.

Component: Widget: Win32 → Security: Process Sandboxing
Flags: needinfo?(cmartin)

similar to what was described in https://bugzilla.mozilla.org/show_bug.cgi?id=1769845

The problem is that this was reported against Nightly, as a recent regression, and win32k Lockdown was enabled in Nightly for several months.

the impacted device is running Microsoft Defender for Endpoint.

Is Exploit Protection being applied with modified settings? This is known not to work: https://support.mozilla.org/nl/kb/compatibility-exploit-protection-windows-10

Can you attach about:support and about:third-party from an effected machine to this bug?

Flags: needinfo?(cmartin)

Thanks for taking the time to report this.
Please could you also see if you have any unreported crashes in about:crashes, report a couple if you do and post the links here, thanks.

Flags: needinfo?(tquan)

all of the about: options are broken for me in firefox 100.0.1 so I cannot provide the output of about:support, about:crashes and about:third-party.

we are running Exploit Protection with custom settings for our organization. If we remove all of those firefox works again, so we will try to isolate which setting is causing the issue.

unfortunately mozregression is not working for me as I explained to Gian-Carlo in email. every build it downloads fails to start so effectively the tool cannot bisect and find the commit that led to this issue.

Flags: needinfo?(tquan)

we are running Exploit Protection with custom settings for our organization. If we remove all of those firefox works again, so we will try to isolate which setting is causing the issue.

Ok, that's great news. If you can identify it, please let us know, so we can add support articles for this.

unfortunately mozregression is not working for me as I explained to Gian-Carlo in email. every build it downloads fails to start so effectively the tool cannot bisect and find the commit that led to this issue.

I think what you're seeing is that mozregressions will bisect on our development builds (Nightly), where some features will be enabled earlier before we enable them on release. So whatever is conflicting with Exploit Protection might have been introduced earlier than Firefox 99/100. You can go back to Firefox 91 for example as a starting point (the basis for the last long term stable/enterprise support build).

See Also: → 1769702

Gian-Carlo, we identified which settings in Exploit Protection are causing the failure.

We had to turn off both of these settings:

Simulate execution (SimExec) "Ensures that calls to sensitive functions return to legitimate callers"
Validate API invocation (CallerCheck) "Ensures that sensitive APIs are invoked by legitimate callers"

We are turning both of these off in our organization so that our users are able to run Firefox again.

Thanks for the suggestion on mozregression I will try it.

(In reply to tquan from comment #10)

We had to turn off both of these settings:

Simulate execution (SimExec) "Ensures that calls to sensitive functions return to legitimate callers"
Validate API invocation (CallerCheck) "Ensures that sensitive APIs are invoked by legitimate callers"

We are turning both of these off in our organization so that our users are able to run Firefox again.

Thanks for posting those settings. We have the same problem and I hope the admins can verify this tomorrow.
Microsoft tells that both settings are only "configurable for 32-bit (x86) applications". Weird...
https://docs.microsoft.com/en-us/microsoft-365/security/defender-endpoint/customize-exploit-protection

(In reply to Tobi from comment #11)

Thanks for posting those settings. We have the same problem and I hope the admins can verify this tomorrow.
Microsoft tells that both settings are only "configurable for 32-bit (x86) applications". Weird...
https://docs.microsoft.com/en-us/microsoft-365/security/defender-endpoint/customize-exploit-protection

Not sure how things are set up in your environment, in ours we do allow users to temporarily turn them off (policy will turn them back on again, but you can turn them off long enough to test this)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #9)

I think what you're seeing is that mozregressions will bisect on our development builds (Nightly), where some features will be enabled earlier before we enable them on release. So whatever is conflicting with Exploit Protection might have been introduced earlier than Firefox 99/100. You can go back to Firefox 91 for example as a starting point (the basis for the last long term stable/enterprise support build).

I ran mozregressions this way (last good build 91 and first bad build 100). Every build still failed to launch, but at least this time it outputted a commit that is win32k lockdown related? Not sure if this is what caused our settings to fail.

2022-05-19T14:11:03.736000: DEBUG : Found commit message:
Bug 1701794 - Use nsOSHelperAppServiceChild with Windows r=nika,haik

Currently, the Windows implemenation of nsOSHelperAppService uses COM to figure out MIME types and extension
associations. That is incompatible with Win32k lockdown.

Luckily, Haik already made a class that handles these calls in content process and either marks them as
unimplemented or-else forwards them to the main process. This seems to work fine on Windows, so it looks like all
that needs to be done is adding XP_WIN to a couple of #ifdefs and we're good to go!

Differential Revision: https://phabricator.services.mozilla.com/D118391

We have confirmed with more users that it is these two Endpoint Protection settings which need to be disabled for Firefox:

Simulate execution (SimExec) "Ensures that calls to sensitive functions return to legitimate callers"
Validate API invocation (CallerCheck) "Ensures that sensitive APIs are invoked by legitimate callers"

Unfortunately we have still the same problem even with those two features deactivated. Are you able to share your settings?

  <AppConfig Executable="firefox.exe">
    <SystemCalls DisableWin32kSystemCalls="false" Audit="true" />
    <ExtensionPoints DisableExtensionPoints="true" />
    <DynamicCode BlockDynamicCode="false" AllowThreadsToOptOut="false" Audit="false" />
    <SignedBinaries MicrosoftSignedOnly="false" AllowStoreSignedBinaries="true" Audit="true" AuditStoreSigned="true" EnforceModuleDependencySigning="false" />
    <Fonts DisableNonSystemFonts="true" AuditOnly="false" Audit="false" />
    <ImageLoad BlockRemoteImageLoads="false" AuditRemoteImageLoads="true" BlockLowLabelImageLoads="false" AuditLowLabelImageLoads="true" />
    <Payload EnableExportAddressFilter="false" AuditEnableExportAddressFilter="false" EnableExportAddressFilterPlus="false" AuditEnableExportAddressFilterPlus="false" EnableImportAddressFilter="false" AuditEnableImportAddressFilter="false" EnableRopStackPivot="true" AuditEnableRopStackPivot="false" EnableRopCallerCheck="false" AuditEnableRopCallerCheck="false" EnableRopSimExec="false" AuditEnableRopSimExec="false" />
    <ChildProcess DisallowChildProcessCreation="false" Audit="true" />
  </AppConfig>
``

(In reply to Tobi from comment #14)

Unfortunately we have still the same problem even with those two features deactivated. Are you able to share your settings?

This is what our configuration looks like. I noticed you have "Validate Stack Integrity (StackPivot)" turned on since EnableRopStackPivot is set true. I did a quick test and turning this on definitely breaks Firefox. You may want to turn it off and try again. That feature has a similar description to the other two which must be disabled.

So it looks like these 3 Exploit Protection Features must be turned off for Firefox to work:

Simulate execution (SimExec) "Ensures that calls to sensitive functions return to legitimate callers"
Validate API invocation (CallerCheck) "Ensures that sensitive APIs are invoked by legitimate callers"
Validate stack integrity (StackPivot) "Ensures that the stack has not been redirected for sensitive functions"

  <AppConfig Executable="firefox.exe">
    <DEP Enable="true" EmulateAtlThunks="false" />
    <ASLR ForceRelocateImages="false" RequireInfo="false" BottomUp="true" HighEntropy="false" />
    <StrictHandle Enable="false" />
    <ExtensionPoints DisableExtensionPoints="false" />
    <DynamicCode BlockDynamicCode="false" AllowThreadsToOptOut="false" Audit="false" />
    <ControlFlowGuard Enable="false" SuppressExports="false" StrictControlFlowGuard="false" />
    <SignedBinaries MicrosoftSignedOnly="false" AllowStoreSignedBinaries="false" Audit="false" AuditStoreSigned="false" EnforceModuleDependencySigning="false" />
    <Fonts DisableNonSystemFonts="false" AuditOnly="false" Audit="false" />
    <ImageLoad BlockRemoteImageLoads="false" AuditRemoteImageLoads="false" BlockLowLabelImageLoads="false" AuditLowLabelImageLoads="false" PreferSystem32="false" AuditPreferSystem32="false" />
    <Payload EnableExportAddressFilter="false" EnableImportAddressFilter="false" AuditEnableImportAddressFilter="false" EnableRopStackPivot="false" EnableRopCallerCheck="false" EnableRopSimExec="false" />
    <SEHOP Enable="true" TelemetryOnly="false" Audit="false" />
    <Heap TerminateOnError="true" />
    <ChildProcess DisallowChildProcessCreation="false" Audit="false" />
  </AppConfig>

(In reply to tquan from comment #15)
Thanks for your help, now it works!

See comment 13, this looks like a surprising result from bug 1701794.

Why don't
"Simulate execution (SimExec) "Ensures that calls to sensitive functions return to legitimate callers"
Validate API invocation (CallerCheck) "Ensures that sensitive APIs are invoked by legitimate callers"
like that change?

Maybe this is unfixable but maybe there's something we can do here.

"Validate Stack Integrity (StackPivot)"

It sounds like this would always break with JITs, though if we know the setting we can verify.

Flags: needinfo?(cmartin)
Summary: enabling win32k lockdown on firefox 100.0.1 causes browser to fail to load any pages → enabling Exploit Protection on firefox 100.0.1 causes browser to fail to load any pages

One thing that's mysterious to me is that the change above was made as far back as Firefox 92, and coincidentally both of you apparently had an Exploit Protection configuration change pushed around Firefox 100 that would've completely broken the past 8 version of Firefox. (I mean, I see nothing in this bug that hints anything in Firefox 100 caused this!)

Is this just a bad coincidence? Is there some Windows forum that you(r admins) frequent that recommended these 'improved' settings?

(In reply to Gian-Carlo Pascutto [:gcp] from comment #17)

See comment 13, this looks like a surprising result from bug 1701794.

Why don't
"Simulate execution (SimExec) "Ensures that calls to sensitive functions return to legitimate callers"
Validate API invocation (CallerCheck) "Ensures that sensitive APIs are invoked by legitimate callers"
like that change?

Maybe this is unfixable but maybe there's something we can do here.

"Validate Stack Integrity (StackPivot)"

It sounds like this would always break with JITs, though if we know the setting we can verify.

Gian-Carlo as I mentioned I still don’t think the bisect I did was a good one as every build mozregression provided to try failed to start. It’s still not clear to me if there is any checkin associated with the problem.

Component: Security: Process Sandboxing → Other
Product: Core → External Software Affecting Firefox
Version: Firefox 100 → unspecified
No longer blocks: win32k-lockdown
Summary: enabling Exploit Protection on firefox 100.0.1 causes browser to fail to load any pages → enabling Exploit Protection causes browser to fail to load any pages

Interestingly, it looks like Microsoft's default configuration overrides the settings you identified as broken on plugin-container.exe, which (I'm talking 5 years ago or so), used to be the content process:

https://github.com/gunnarhaslinger/Windows-Defender-Exploit-Guard-Configuration/blob/master/Windows10-v2104_ExploitGuard-Security-Baseline.xml#L92

<Payload OverrideEnableExportAddressFilter="false" OverrideEnableExportAddressFilterPlus="false" OverrideEnableImportAddressFilter="false" OverrideEnableRopStackPivot="false" OverrideEnableRopCallerCheck="false" OverrideEnableRopSimExec="false" />

But as far as I know they're not defaulted on anyway.

I think I understand what happened!

We identified in 2018 that the then just-released Windows-10-RS3-Security-Baseline (Windows 10 Version 1709 Security Baseline.zip) contained settings that would break a required component for win32k lockdown, and they match exactly those that you identified as enabled in your config:

See
https://bugzilla.mozilla.org/show_bug.cgi?id=1433065#c38
and
https://bugzilla.mozilla.org/show_bug.cgi?id=1433065#c52

I think we raised this with Microsoft, they fixed the configuration and then pulled the affected configuration file from their webpages, you can't find it there any more. And then we forgot about it after 4 years...

Now that we enabled win32k lockdown, anyone that still has the broken Baseline settings from the "1709 baseline" applied is going to have been broken.

Although it's removed from Microsoft's server, I managed to find a copy of the archive on the net, and indeed it contains:

Broken config:

  <AppConfig Executable="firefox.exe">
    <DEP Enable="true" EmulateAtlThunks="false" />
    <ASLR ForceRelocateImages="true" RequireInfo="false" BottomUp="true" HighEntropy="false" />
    <Payload EnableExportAddressFilter="true" EnableExportAddressFilterPlus="true" EnableImportAddressFilter="true" EnableRopStackPivot="true" EnableRopCallerCheck="true" EnableRopSimExec="true" />
  </AppConfig>

Whereas the current Microsoft Baseline configuration files contain (only):

  <AppConfig Executable="firefox.exe">
    <DEP OverrideDEP="false" />
    <ASLR ForceRelocateImages="true" />
  </AppConfig>
Severity: -- → S2
Regressed by: win32k-lockdown
Summary: enabling Exploit Protection causes browser to fail to load any pages → Outdated Exploit Protection Security Baseline causes browser to fail to load any pages with win32k lockdown
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(cmartin)

Assigning to myself to get our SUMO pages updated with a further warning.

Assignee: nobody → gpascutto
Priority: -- → P1
Has Regression Range: --- → yes

See comment 19 - regression range not clear (or not relevant).

Has Regression Range: yes → ---
Has Regression Range: --- → yes
See Also: → 1770994
Pushed by cmartin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/19a5db06b72e
Disable Win32k Lockdown with incompatible mitigations r=handyman,Gankra
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

Comment on attachment 9280773 [details]
Bug 1770098 - Disable Win32k Lockdown with incompatible mitigations

Beta/Release Uplift Approval Request

  • User impact if declined: Some users that have been experiencing an non-functional Firefox since FF100 came out in May will have to wait until the end of August for FF to work again
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: 1. Navigate to "about:support"
  1. Under Sandbox>Win32k Lockdown State for Content Process, verify the status is "Win32k Lockdown enabled ... "
  2. Open the Start Menu and type "Exploit Protection"
  3. Goto the "Program Settings" tab
  4. Click "Add program to customize">"Add by program name"
  5. Type "firefox.exe"
  6. When the "Program settings" panel pops up, find "Simulate Execution (SimExec)"
  7. Check the "Override system settings" and turn the slider to "On"
  8. Restart firefox
  9. Check the same area in "about:support" and verify that Win32k is now disabled due to incompatible exploit protection policies
  10. Don't forget to undo adding "firefox.exe" to Exploit Protection on your QA machine!
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): - Despite its large size, the change mostly does pretty mundane stuff (searching the registry for values and reducing them to a single yes/no answer), and the worst-case scenario is that Win32k Lockdown is incorrectly disabled compatible machines (extremely unlikely), or that it's incorrectly enabled on incompatible machines (which is already the case anyhow)
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9280773 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9280773 [details]
Bug 1770098 - Disable Win32k Lockdown with incompatible mitigations

Beta/Release Uplift Approval Request

  • User impact if declined: Some users that have been experiencing an non-functional Firefox since FF100 came out in May will have to wait until the end of August for FF to work again
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: See steps for Beta approval
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Despite its large size, the change mostly does pretty mundane stuff (searching the registry for values and reducing them to a single yes/no answer), and the worst-case scenario is that Win32k Lockdown is incorrectly disabled on compatible machines (extremely unlikely), or that it's incorrectly enabled on incompatible machines (which is already the case anyhow)
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9280773 - Flags: approval-mozilla-release?

Comment on attachment 9280773 [details]
Bug 1770098 - Disable Win32k Lockdown with incompatible mitigations

The patch does not apply to the beta branch.

Flags: needinfo?(gpascutto)

Comment on attachment 9281885 [details]
Bug 1770098 - Disable Win32k Lockdown with incompatible mitigations

Beta/Release Uplift Approval Request

  • User impact if declined: Some users that have been experiencing an non-functional Firefox since FF100 came out in May will have to wait until the end of August for FF to work again
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: 1. Navigate to "about:support"
  1. Under Sandbox>Win32k Lockdown State for Content Process, verify the status is "Win32k Lockdown enabled ... "
  2. Open the Start Menu and type "Exploit Protection"
  3. Goto the "Program Settings" tab
  4. Click "Add program to customize">"Add by program name"
  5. Type "firefox.exe"
  6. When the "Program settings" panel pops up, find "Simulate Execution (SimExec)"
  7. Check the "Override system settings" and turn the slider to "On"
  8. Restart firefox
  9. Check the same area in "about:support" and verify that Win32k is now disabled due to incompatible exploit protection policies
  10. Don't forget to undo adding "firefox.exe" to Exploit Protection on your QA machine!
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Despite its large size, the change mostly does pretty mundane stuff (searching the registry for values and reducing them to a single yes/no answer), and the worst-case scenario is that Win32k Lockdown is incorrectly disabled compatible machines (extremely unlikely), or that it's incorrectly enabled on incompatible machines (which is already the case anyhow)
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9281885 - Flags: approval-mozilla-beta?
Attachment #9280773 - Flags: approval-mozilla-release?
Attachment #9280773 - Flags: approval-mozilla-beta?

(In reply to Pascal Chevrel:pascalc from comment #32)

Comment on attachment 9280773 [details]
Bug 1770098 - Disable Win32k Lockdown with incompatible mitigations

The patch does not apply to the beta branch.

Sorry about that! I have added attachment 9281885 [details], which should merge into Beta.

Flags: needinfo?(gpascutto) → needinfo?(pascalc)

Comment on attachment 9280773 [details]
Bug 1770098 - Disable Win32k Lockdown with incompatible mitigations

Approved for landing on the beta branch before Monday merge, it will be in the release candidate.

Flags: needinfo?(pascalc)
Attachment #9280773 - Flags: approval-mozilla-beta+
Attachment #9281885 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
QA Whiteboard: [qa-triaged]

Reproduced the issue with Firefox 102.0a1 (20220518214245) on Windows 10x64 using STR from comment 34. After enabling Simulate Execution (SimExec) from Exploit Protection the Win32k Lockdown State was still enabled.

The issue is verified fixed with Firefox 102.0b10 (20220619084506) from comment 37 and Firefox 103.0a1 (20220619215938) on Windows 10x64 and Windows 11x64. The win32 Lockdown State is disabled due to Incompatible Windows Exploit Protection policies enabled after following steps from comment 34. Also, I opened some random web pages and they are loaded as expected.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Duplicate of this bug: 1800869
See Also: → 1816095
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: