Closed Bug 1515088 Opened 5 years ago Closed 5 years ago

Sandboxed process fails to start on ARM64 Windows when installed dir is blocked by the delayed access token level.

Categories

(Core :: Security: Process Sandboxing, defect, P1)

ARM64
Windows
defect

Tracking

()

VERIFIED FIXED
mozilla67
Tracking Status
firefox66 + verified
firefox67 --- verified

People

(Reporter: bobowen, Assigned: bobowen)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

From Bug 1481518 comment 17.

The problem here appears to be that it does some part of the DLL loading on a new thread.
The chromium sandbox sets an impersonation token on the main thread of the child process with more rights, so that it can do general start-up things like load DLLs. This is reverted early in the process when we call TargetServicesBase::LowerToken.

I can see the child process spinning up a new thread and calling NtQueryAttributesFile on mozglue.dll, which fails because that thread has the more restricted token.
The brokering won't work either because we haven't called TargetServicesBase::Init and it's possibly too early in the process start-up anyway.

I'm working on a patch to add the MaxLoaderThreads - Image File Execution Options that should fix this.
It should also fix the RDD process.

Depends on: 1519368
Summary: Content process fails to start when installed in user's dir and USER_LIMITED used in sandbox. → Sandboxed process fails to start on ARM64 Windows when installed dir is blocked by the delayed access token level.

In bug 1519368 I've landed a patch to set the Image File Execution Options (IFEO) in the installer when we have admin rights.

Following an idea from aklotz, I've been looking at returning a fake IFEO key for the situations where we can't create the registry keys.
My plan (thanks to jimm) was to create an alternative IFEO key at HKEY_CURRENT_USER\Software\Mozilla\IFEO and open and return that key when the process attempted to open the real one.

The problem was that it only seems to go and look for the real IEFO if an entry for the executable exists underneath it.
Something (I assume kernel side) sets a flag, which means it never goes to look for it when it doesn't exist.

On ARM64 this was coming from:

  • a pointer from "xpr"
  • offset by 0x60 to another pointer
  • offset by 0x20 to another pointer
  • offset by 0x8 to the flags (bit 14 is the relevant one)

I couldn't find any information about xpr on ARM64, it's not a register.
So, I looked at the similar code early in the x86-64 process start-up.
It does the same trick, but the original pointer comes from "gs:[30h]", so this the Thread Environment Block (xpr seems to be the ARM64 way of getting this).

From windbg:

0:000> dt ntdll!_TEB
   +0x000 NtTib            : _NT_TIB
...
   +0x060 ProcessEnvironmentBlock : Ptr64 _PEB

0:000> dt ntdll!_PEB
   +0x000 InheritedAddressSpace : UChar
...
   +0x020 ProcessParameters : Ptr64 _RTL_USER_PROCESS_PARAMETERS

0:000> dt ntdll!_RTL_USER_PROCESS_PARAMETERS
   +0x000 MaximumLength    : Uint4B
   +0x004 Length           : Uint4B
   +0x008 Flags            : Uint4B

So, we have our Flags in the user process parameters.
If I unset bit 14 in those flags in the suspended child process from the parent process, then it looks for the IFEO, we can return our alternative one and it seems to successfully pick up the MaxLoaderThreads setting.

I also have to add in a dummy registry rule in the sandbox policy if it doesn't have one, so that the NtOpenKey interception is put in place.

As I said in bug 1519368 comment 7, because of this extra hackery that might get broken by subsequent releases, I'm only going to do this when the parent detects that the real IFEO keys aren't there.

[Tracking Requested - why for this release]:
ARM 66 related uplift candidate

After my investigations detailed in comment 3, dmajor pinged me on IRC to talk this over.
As part of that he had the idea that the process start-up code might store the MaxLoaderThreads setting somewhere in the PEB and maybe we could overwrite it there directly instead.

After a quick search I found a LoaderThreads field in the _RTL_USER_PROCESS_PARAMETERS structure.
This is normally 0 (which means it defaults to 4 loading threads), unless you have a MaxLoaderThreads IFEO registry setting, it then gets defaulted to that.
(What's a bit weird is that this is as soon as the process is created, even though it goes and reads the registry value as well.)

If I set that field to 1 in the suspended child process, then it honours that and things seem to work fine.
(A MaxLoaderThreads IFEO registry setting will override that, but that's going to be very rare and hopefully someone doing that would realise they broke things.)

This field is not officially documented, so there is some small chance that it could change or move, but it looks like that doesn't happen very often. For example the start of the structure is identical in Windows 7, with some extra fields added to the end on Windows 10.

I did think I would only use this when the registry entry wasn't there, but after writing the code to check for that, I realised I was still checking a pretty poorly documented registry key for an even less well documented setting.
So, I've decided to back-out the installer patch and just always set the LoaderThreads field to 1 from the parent, when the value is currently 0.
I think the risk of this breaking is pretty small and it makes the fix much simpler.

For the purpose of future-proofing, maybe you could create some kind of unit test that set the registry to a known value, and then you create a child process, and try to retrieve that value from the location in the TEB to make sure that they match.

Blocks: 1525981

(In reply to Aaron Klotz [:aklotz] from comment #6)

For the purpose of future-proofing, maybe you could create some kind of unit
test that set the registry to a known value, and then you create a child
process, and try to retrieve that value from the location in the TEB to make
sure that they match.

+1, I like this idea.

(In reply to David Major [:dmajor] from comment #7)

(In reply to Aaron Klotz [:aklotz] from comment #6)

For the purpose of future-proofing, maybe you could create some kind of unit
test that set the registry to a known value, and then you create a child
process, and try to retrieve that value from the location in the TEB to make
sure that they match.

+1, I like this idea.

I do too, we'd need tests with Administrator rights, but aklotz already has plans to create these.
One other thing I just thought though, is that I don't think our infra tends to be on the bleeding edge of updates or full releases, so would we ever catch the change early enough?
We would need machines that always kept bang up to date and also ones running insider builds, I guess.

One other idea I had over testing, was if there were a way to enumerate win 10 ntdll.pdb files after a certain date/version from the symbol server, then we could just search them.
The offset appears just before the field name in the files.
Generally the symbols are looked up from a reference in the DLL I think, I'm not sure if the symbol server API allows you to list them.

This change also includes temporary installer code to try and remove any
remaining Image File Execution Options.
Attachment #9042531 - Flags: review?(mhowell)
Comment on attachment 9042531 [details] [diff] [review]
Part 1: Backed out changeset 86ef2274deff (bug 1519368)

Review of attachment 9042531 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #9042531 - Flags: review?(mhowell) → review+

Just an NI, because you may not be monitoring bugzilla review queues anymore.

Flags: needinfo?(aklotz)

I'll be looking at this today.

Flags: needinfo?(aklotz)
Comment on attachment 9042532 [details] [diff] [review]
Part 2: Set LoaderThreads to 1 in the RTL_USER_PROCESS_PARAMETERS structure on child process start-up

Review of attachment 9042532 [details] [diff] [review]:
-----------------------------------------------------------------

Okay, but pretty please look into that unit test that we discussed!
Attachment #9042532 - Flags: review?(aklotz) → review+
Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bca59c2709d6
Part 1: Backed out changeset 86ef2274deff (bug 1519368). r=mhowell
https://hg.mozilla.org/integration/mozilla-inbound/rev/22e9a70db886
Part 2: Set LoaderThreads to 1 in the RTL_USER_PROCESS_PARAMETERS structure on child process start-up. r=aklotz
Blocks: 1528227

I've filed bug 1528227 over adding a test to check LoaderThreads field.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

Hi lizzard - is this still needed for Fx66? I'd heard we were now aiming for Fx67 for arm64.

Flags: needinfo?(lhenry)

Chris is that the case? I had thought there was some testing of arm64 on beta 66. If not then this can ride with 67.

Flags: needinfo?(lhenry) → needinfo?(cpeterson)

We are testing Windows ARM64 on beta; we owe folks a build on March 20, and that will be 66.

Flags: needinfo?(cpeterson)

OK thanks I'll request uplift then.

Comment on attachment 9042532 [details] [diff] [review]
Part 2: Set LoaderThreads to 1 in the RTL_USER_PROCESS_PARAMETERS structure on child process start-up

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

None

User impact if declined

If Windows ARM64 firefox is not installed in location with general read access, then child processes may fail to start.
Note only part 2 should be uplifted.

Is this code covered by automated tests?

No

Has the fix been verified in Nightly?

Yes

Needs manual test from QE?

Yes

If yes, steps to reproduce

Don't give installer admin rights and firefox will install under user's AppData folder.
Open firefox from there and in about:config set security. sandbox.content.level to 5 and restart firefox.
Firefox will then not load pages in Fx66 prior to this patch and should work after it.

List of other uplifts needed

None

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

Low risk to X86 builds as code change should only affect ARM64.
Simple change for ARM64, but it does set an undocumented internal windows field on the child process.

String changes made/needed

None

Attachment #9042532 - Flags: approval-mozilla-beta?

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #21)

Chris is that the case? I had thought there was some testing of arm64 on beta 66. If not then this can ride with 67.

(In reply to Marnie Pasciuto-Wood [:marnie] from comment #22)

We are testing Windows ARM64 on beta; we owe folks a build on March 20, and that will be 66.

Like Marnie said, Windows ARM64 is testing in Beta 66, but Android ARM64 is currently restricted to Fennec Nightly. (Android ARM64 support did land in Fennec 66 Nightly, but it will probably won't ride the trains until Fennec 68.)

Whiteboard: [qa-triaged]

This issue could not be reproduced with the aarch64 build from 13th of February (prior to the patch), on Lenovo Yoga ARM device.
I have downloaded the Nightly aarch64 installer build from:
http://archive.mozilla.org/pub/firefox/nightly/2019/02/2019-02-13-21-34-09-mozilla-central/
I have installed it without giving it the admin privileges (which means it got installed by default in the user/AppData/Local folder) and then I have modified the value of the "security.sandbox.content.level"pref to 5 and restarted the browser. After the browser restarted with the same (older/affected) build version, all and any page that I have attempted to access loaded correctly.

Considering that I could not reproduce this issue, I cannot verify the fix.

Am I doing something wrong?

Flags: needinfo?(bobowencode)

(In reply to Bodea Daniel [:danibodea] from comment #26)

This issue could not be reproduced with the aarch64 build from 13th of February (prior to the patch), on Lenovo Yoga ARM device.
I have downloaded the Nightly aarch64 installer build from:
http://archive.mozilla.org/pub/firefox/nightly/2019/02/2019-02-13-21-34-09-mozilla-central/
I have installed it without giving it the admin privileges (which means it got installed by default in the user/AppData/Local folder) and then I have modified the value of the "security.sandbox.content.level"pref to 5 and restarted the browser. After the browser restarted with the same (older/affected) build version, all and any page that I have attempted to access loaded correctly.

Considering that I could not reproduce this issue, I cannot verify the fix.

Am I doing something wrong?

Could Nightly be auto-updating on the restart?
Try re-installing firefox once you've set the sandbox level or you could even just not restart and open some new tabs and navigate to different sites. That should attempt to start new content processes at the new sandbox level.

It's also possible that you still have the remnants of a previous fix in the registry.
If you run regedit.exe and go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options
If there are keys within that key for firefox.exe or plugin-container.exe then delete them.

Flags: needinfo?(bobowencode)

It is auto-updating at restart beyond normally; I had quite a fight with the browser to force it not to update at first lunch (A bug about retaining the update related settings is lurking around this build. We are investigating.)

(In reply to Bob Owen (:bobowen) from comment #24)

If yes, steps to reproduce

Don't give installer admin rights and firefox will install under user's AppData folder.
Open firefox from there and in about:config set security.sandbox.content.level to 5 and restart firefox.
Firefox will then not load pages in Fx66 prior to this patch and should work after it.

I have re-attempted to reproduce your issue. STR:

  1. Uninstalled all versions of Firefox ever installed on the Lenovo Yoga.
  2. I ran regedit.exe and went to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options
    and removed any keys with the name "firefox.exe" or "plugin-container.exe".
  3. Download aarch64 build installer from:
    http://archive.mozilla.org/pub/firefox/nightly/2019/02/2019-02-13-21-34-09-mozilla-central/
  4. Installed it in the standard location (user/AppData/Local folder) after choosing "No" to the first pop-up displayed "Do you want to allow this app to make changes to your device?"
  5. Opened it without a connection to the internet to avoid auto-update at the first lunch of a new profile.
  6. Went to about:preferences#general and checked the "Check for updates but let you choose to install them" option.
  7. Re-connected Yoga to the internet.
  8. Went to about:config and modified the security.sandbox.content.level from 2 to 5.
  9. Loaded a bunch of pages to attempt issue reproduction, but no luck.
  10. Closed the browser and reopened it with the same profile.
  11. Loaded a bunch of pages to attempt issue reproduction. THE ISSUE REPRODUCED.
  12. I have manually chosen to download Nightly update, installed it and restarted the browser.
  13. The same pages were displayed and their "refresh" buttons could not be clicked/taped.
  14. I have reloaded the pages by tapping in the address bars and tapped "Enter".
  15. It appears that the issue is not reproducing anymore on this version (Nightly v67.0a1 fro 2019-02-20, aarch64 build).

Based on all of the above, I will mark this bug as "Verified".
Thank you!

I have also removed the "qe-verify+" tag as well, considering that the QA-Engineering involvement is at least temporarily completed. Please put it back id needed.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Thanks for fighting, Dani!

Comment on attachment 9042532 [details] [diff] [review]
Part 2: Set LoaderThreads to 1 in the RTL_USER_PROCESS_PARAMETERS structure on child process start-up

Fix for ARM64 issue, verified in nightly.
OK to uplift for beta 11.
Attachment #9042532 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

NI to myself to verify in Beta 11 when available.

Flags: needinfo?(daniel.bodea)
Blocks: 1529954

In order to verify this fix, I need a Firefox Beta 9 (or older) aarch64 build installer to confirm the issue's reproduction before validating the expected behavior after the fix. The installers I can find in Task Cluster are not working correctly. They appear to install correctly, but when checking the install folder, much of the browser is missing.

Furthermore, all the installers from there are not working, so I also need an installer of the build with the fix in place as well.

I expect to find builds here: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&searchStr=win%2Caarch64%2Cns
Where could I get the needed (working) builds from? (Firefox Beta 9 and Firefox Beta 10 aarch64 build installers)

Flags: needinfo?(daniel.bodea) → needinfo?(bobowencode)

(In reply to Bodea Daniel [:danibodea] from comment #33)

In order to verify this fix, I need a Firefox Beta 9 (or older) aarch64 build installer to confirm the issue's reproduction before validating the expected behavior after the fix. The installers I can find in Task Cluster are not working correctly. They appear to install correctly, but when checking the install folder, much of the browser is missing.

Furthermore, all the installers from there are not working, so I also need an installer of the build with the fix in place as well.

I expect to find builds here: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&searchStr=win%2Caarch64%2Cns
Where could I get the needed (working) builds from? (Firefox Beta 9 and Firefox Beta 10 aarch64 build installers)

I'm afraid I don't know too much about the Beta builds being used for ARM64/aarch64 testing.
I don't see anything here https://archive.mozilla.org/pub/firefox/releases/66.0b9/, which is the other place I would look.

Hopefully RyanVM will know.

Flags: needinfo?(bobowencode) → needinfo?(ryanvm)

The setup.exe artifact from those jobs is the stub installer, which probably doesn't work very well with TH builds :). Try target.installer.exe from the rs(N) jobs instead.
https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&searchStr=win%2Caarch64%2Crs

Flags: needinfo?(ryanvm)
Flags: qe-verify+

This fix has been verified on firefox66 as well using the procedure as the one in comment 28. The issue has reproduced on an affected Beta 10 build and it did not occur anymore after updating to the latest Beta 11.

This bug is now fully verified.
Thank you for your contribution!

Flags: qe-verify+
QA Whiteboard: [qa-triaged]
Whiteboard: [qa-triaged]
See Also: → 1723609
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: