Closed Bug 1538028 (CVE-2023-23597) Opened 5 years ago Closed 2 years ago

Privilege escalation from web to file process

Categories

(Core :: DOM: Navigation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
109 Branch
Tracking Status
firefox-esr60 --- wontfix
firefox-esr68 --- wontfix
firefox-esr102 --- wontfix
firefox66 + wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 - wontfix
firefox71 --- wontfix
firefox107 --- wontfix
firefox108 --- wontfix
firefox109 + fixed

People

(Reporter: niklas.baumstark, Assigned: nika)

References

(Depends on 1 open bug)

Details

(4 keywords, Whiteboard: [reporter-external] [client-bounty-form] [verif?][post-critsmash-triage][adv-main109+])

Attachments

(5 files)

This was found via manual analysis of the sandbox code of Firefox 65. I am currently not able to get a Windows debug build for Firefox 66, but I assume the flaw still exists.

Given a renderer exploit, we can perform the following steps:

  1. Hook CheckLoadURIWithPrincipal to always return true.
  2. Hook PopupBlocker::CanShowPopupByPermission to always allow.
  3. Load file:///path/to/stage1.html in an iframe.
    stage1.html file must be readable from within the web sandbox,
    either drop it to temp dir sContentTempDir, or use a UNC path on Windows.
    Every action performed from this document will be associated with
    a security principal for a file:/// origin.
  4. In stage1.html, we can do window.open('file:///path/to/stage2.html')
    This will create a new FILE_REMOTE_TYPE process.
  5. Now re-exploit the new renderer from stage2.html.

Similar to my Pwn2Own bug, steps 1-4 can probably be replaced by appropriate IPC/message manager/XPCOM usage which I have not investigated in detail.

We escalated privileges, because a file process can:

  • Access all files 1
  • On Windows, access a USER_NON_ADMIN token, rather than USER_PROTECTED 2
Flags: sec-bounty?
Group: firefox-core-security → dom-core-security
Component: Security → DOM: Content Processes
Product: Firefox → Core

There's two pieces to fixing this I think:

a) Remove the thing where all file:// is SOP with all other file://
b) Change the way we implement file content process from being a special sandbox policy to having the same sandbox semantics and make the parent process responsible for doing file loads, then have the IPC for that enforce SOP correctly.

Flags: needinfo?(bobowencode)
Priority: -- → P2

(In reply to Alex Gaynor [:Alex_Gaynor] from comment #1)

There's two pieces to fixing this I think:

a) Remove the thing where all file:// is SOP with all other file://
b) Change the way we implement file content process from being a special sandbox policy to having the same sandbox semantics and make the parent process responsible for doing file loads, then have the IPC for that enforce SOP correctly.

Can we prevent writing the file at all? Or can we at least prevent loading file from locations which are writable from the content process?

I think we're driving towards blocking writing files -- we already disallow this on macOS, but I don't know how far we off from it.

I think we could blocking loading file:// URIs that are in the content temp directory, though I'm not 100% sure how difficult that'd be.

The file same-origin-policy should be irrelevant because non-same-origin files will still be allowed to link to each other (an early step in CheckLoadURI is a scheme compare. Documents can always link to another document in the same scheme).

What needs to happen is moving the CheckLoadURI step out of the content process, as well as initiating of the navigation. The child should ask the parent to navigate it, and the parent can do the CheckLoadURI and start or block the load. I assume something like this is already in the plan for Fission because the parent will have to decide whether to spin up new processes if the new document is from a different site. (A fission MVP might still have these checks in the child because the 1st worry is NON-compromised child processes doing Spectre attacks, but ultimately we want to protect against compromised child processes too.)

I think we're driving towards blocking writing files
-- we already disallow this on macOS, but I don't know how far we off from it.

The technique would work on Linux as well because it also has a writable content-process-specific tmpdir. Aside from fontconfig wanting to write to T(E)MPDIR, there's things like the shader caches that we redirect into a writable dir.

https://searchfox.org/mozilla-central/source/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp#466

I remember asking whether the parent enforced these checks in a discussion during that presentation in Hawaii. Looks like it never got filed.

I agree with Dan that checking the linking policy in the parent is best way to fix this.

Flags: needinfo?(bobowencode)
Depends on: 1379635

Nika, can you comment on the plans here (comment 4) and how they jive with our glorious Fission future?

Flags: needinfo?(nika)

FWIW, Nika's also got a good write up on bug 1538072. We shouldn't need fission to solve this, since file processes are already distinct.

No longer depends on: 1379635

see comment 8 and bug 1538072 :-)

Flags: needinfo?(nika)

I don't think we plan to fix this for 66 at this point.

Just a quick question: Will this likely be addressed in 67?

(In reply to Niklas Baumstark from comment #11)

Just a quick question: Will this likely be addressed in 67?

67 release candidates are being made now and there is no patch for this as of yet so, no, it isn't going to be addressed in 67.

Is there any reason (In reply to Niklas Baumstark from comment #0)

  1. Hook CheckLoadURIWithPrincipal to always return true.
  2. Hook PopupBlocker::CanShowPopupByPermission to always allow.
  3. Load file:///path/to/stage1.html in an iframe.
    stage1.html file must be readable from within the web sandbox,
    either drop it to temp dir sContentTempDir, or use a UNC path on Windows.
    Every action performed from this document will be associated with
    a security principal for a file:/// origin.
  4. In stage1.html, we can do window.open('file:///path/to/stage2.html')
    This will create a new FILE_REMOTE_TYPE process.
  5. Now re-exploit the new renderer from stage2.html.

On OSX, after performing 1 & 2 (by just editing the code), I can window.open a file:// URI directly from a website (e.g. https://ritter.vg/misc/ff/file2.html ) without needing to do Step 3.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee: nobody → tom

Tom, are you intending to work on this for the 70 timeframe?

Flags: needinfo?(tom)

(In reply to Liz Henry (:lizzard) from comment #14)

Tom, are you intending to work on this for the 70 timeframe?

Blocked on Fission work; explained here: https://bugzilla.mozilla.org/show_bug.cgi?id=1538072#c20 I'll work on it whenever it becomes unblocked :)

Flags: needinfo?(tom)

Is the work outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=1538072#c25 going to be finished soon? If not, is it possible to have some sort of bandaid fix here in the meantime?

Flags: needinfo?(tom)
Flags: needinfo?(nika)

:mwoodrow is working on trying to get more loads initiated by the parent process and going through the DocumentChannel code path, but that's still behind a pref right now (and also not used for file URI loads). Improvements to this code are a starting point for moving where trust is placed for starting navigations, which would allos for us to fix issues such as this one. More work would be required after DocumentChannel is complete to fix this issue as well, though.

I don't know of any easy band-aid fix here, due to how spread out the ways the parent process can ask content to navigate are, and some ugly edge cases. I worry that even DocumentChannel isn't enough to do all of the changes we need, and it may not be possible to block this until the fission session history changes are complete.

There are some easier cases where we could move the decision out of content. For example, we could theoretically check in RemoteWebNavigation::LoadURI if the URI is a file:// URI, and trigger the process switch from the parent in that case.

Unfortunately, there are also some really tricky cases, such as session history. We don't currently have a trusted copy of session history in the parent process, and won't until fission changes are complete, so the content process could lie to us about having a file:// URI as the previous history entry, and "navigate back in history" to that document as a history load. HIstory loads are fundamentally triggered by content right now, so as long as we want to support going back from web to file URIs, we have to somewhat support content triggering file URI loads.

If we did manage to finish doing this, we'd need to add checks in RedirectLoad and when creating a popup browser in RecvCreateWindowInDifferentProcess that the remote type we're going to switch into isn't file. I think by the time we've finished all of the infrastructure work needed to do that, though, we have a good chance of having a better solution right around the corner using DocumentChannel loads and friends.

Flags: needinfo?(nika)
Flags: needinfo?(tom)

Currently blocked on bug 1556493, fission milestone M5.

Assignee: tom → nobody
Keywords: stalled

That's not what the "stalled" keyword means with respect to security bugs. We know what the problem is and we know what the next steps for fixing it are. "stalled" is when the bug investigation is going nowhere, almost always for crashes filed from stack traces that don't have testcases or other steps to reproduce. "stalled" is often the last step before a bug is resolved INCOMPLETE; that's not the case here.

Keywords: stalled
Flags: sec-bounty? → sec-bounty+
Depends on: 1647550
No longer depends on: document-channel

Hi Olli, I was just wondering if the currently set dependency on bug 1647550 still reflects the state of play here. I assume that since nika's analysis from 2 years ago many things went ahead here?

Flags: needinfo?(bugs)

The current dependencies are still there, and we still have a lot of work left to do before we can fully move control over navigation out of the content process. :annyG has been doing some work on bug 1647550, which is definitely one of the big blockers, but there will be more work to do even after that (including shipping Fission & removing the non-SHIP codepath).

Flags: needinfo?(bugs)
Component: DOM: Content Processes → DOM: Navigation

Currently in our roadmap planning process.

Assignee: nobody → htsai

The severity field for this bug is set to normal. However, the bug is flagged with the sec-high keyword.
:hsinyi, could you consider increasing the severity of this security bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(htsai)
Severity: normal → S2
Flags: needinfo?(htsai)

I'm copying offline discussion notes and paste over here.

:bholley and Nika talked about this, and Nika suggested that we could simply keep a hashtable of all the file URIs that were explicitly navigated to by the parent process, and block the navigation if the target isn’t in that list.

So the next step here would be to investigate whether this idea would be workable.

Peter, my strategy for this involves requiring requests that don't pass CanLoad checks for the URL loaded in the current content process for a BC to originate in the parent process, or a child that they do pass CanLoad checks for, and have the pending request cached in the parent while we wait for the DocumentChannel to ask us to redirect.

The current sticking point is session history navigations, which can originate in the child via session history APIs. I can check them against the parent process session history. The only issue is that I'm not sure how trustworthy that data is. Is there any way a compromised content process could get us to insert an entry in the parent process session history that shouldn't be there, and navigate us there to bypass the extra CanLoad checks?

Flags: needinfo?(peterv)

(In reply to Olli Pettay [:smaug] from comment #27)

history.pushState may be able to do odd things.

I think I can deal with that part fairly easily.

So as Olly said, the pushState code is a path were we insert the data coming from the child into session history. I think anchor navigation is another.

Looking at the other ways to insert entries, for those the data should be coming from the parent in the first place, so those should be ok. We could tighten up same document navigations, seems like something we should be doing anyway.

Flags: needinfo?(peterv)
Assignee: htsai → kmaglione+bmo
Assignee: kmaglione+bmo → nika
Depends on: 1798986

With SHIP, we have a copy of the actual SessionHistoryEntry which is
responsible for a history load locally within the parent process. Using this,
we can validate any session history loads coming from the content process to
ensure that they correspond to actual outstanding history loads.

We can't do this for non-SHIP loads, as session history exists only within the
content process.

Depends on D161196

This is done using slightly different mechanisms for each of LoadInfo and
nsDocShellLoadState, and will be used in the next part to validate document
loads based on the RemoteType responsible for the load.

For subresource loads, the TriggeringRemoteType is fairly straightforward - it
is the process which created the channel. We can handle this by getting the
current remote type when creating the channel, and then using the remote type
of the sending process when receiving the LoadInfo over IPC to either replace
the triggering remote type, or validate it.

For document loads, the situation is a bit more complex, as there are at least
3 (potentially-)different processes responsible for different parts of the
navigation:

  1. The "Triggering Process" is the process which provided the URI to load.
    This is also the process which provides the Triggering Principal. This is
    the process being tracked in this patch.

  2. The "Loading Process" is the process which actually creates the channel and
    starts the load. This may be the same as the triggering process, or may be
    a different process starting the navigation on behalf of the triggering
    process. In general this is the process hosting the current docshell,
    though it may be the parent process in the case of parent-initiated loads.

  3. The "Final Process" is the process which receives the response and renders
    the final document. This isn't known at channel creation time, and is
    determined by the result principal and process isolation policy.

This change uses a serializer and special field on nsDocShellLoadState to track
the "Triggering Process" for the load, even as the load state is serialized
between processes by tracking which loads were sent into which content
processes, and matching them up when the parent process sees them again. The
information is then copied into the LoadInfo before configuring the real
channel, so it can be used for security checks.

The "Triggering Process" is overridden to be the parent process for history
loads, as history loads are often started in processes which wouldn't normally
be able to navigate to those pages. This is OK thanks to the changes in part 1
which validate history loads against the real session history when SHIP is
enabled.

Depends on D161197

The previous part introduced a new mechanism to track the triggering remote
type for a specific load in a reliable way. This adds some basic checks based
on the triggering remote type to the nsContentSecurityManager, while also
providing the potential infrastructure to expand these checks in the future.

As these checks are performed before some other content security checks (to
ensure that they are performed before InitialSecurityCheckDone() is checked),
they may reject a load which would otherwise have been rejected by a later
check. For this reason, the diagnostic assertions added in this part are only
fired if the check appears as though it would otherwise have succeeded. This
check is not fully accurate, however, so may miss some cases.

This is important, as we have some tests, such as service worker navigation
tests, which will try to load file:/// URIs in content processes, and only fail
in the later content security checks.

For now, no checks are performed for non-document loads, though that may change
in the future.

Depends on D161198

There's some more work which needs to be done here to make this better beyond the patches which I've already posted, but it will build upon that foundation, so will be added as additional parts.

  1. When Fission is disabled, we may start a load using CreateWindowInDifferentProcess (which is the original path which was fixed in bug 1538072). This code path already has a monkey-patch fix from that bug (https://searchfox.org/mozilla-central/rev/0b1543e85d13c30a13c57e959ce9815a3f0fa1d3/dom/ipc/ContentParent.cpp#5736-5759), but ideally it'd also set the triggering remote type like we do here. That being said, the only use of this is with fission disabled, which is only the case on Android, so we can perhaps get away with removing the CreateWindowInDifferentProcess codepath completely instead. Other codepaths will start the load in the triggering content process, so shouldn't be impacted. Even if we don't want to remove it, as Android uses it, we could make sure that it can't be called when UseRemoteSubframes is enabled.

  2. We need to validate loads using the context menu (e.g. from "Open link in new tab"). The easiest way to do this is probably to validate that the principals being used as the triggering principal in nsContextMenu.js (triggered by ContextMenuParent.sys.mjs) are actually from the content process by instead pulling them from this.manager.documentPrincipal, rather than trusting the values sent over IPC. Ideally we'd also set the triggering remote type in that situation, but once we block loading system principals in content processes, we should be able to stop the load at the principal level.

    In general, I think it should be possible to get a lot of the fields which are currently being pulled from the content process for nsContextMenu from the parent process actor, so perhaps that's another small project to work on at some point.

  3. We need to validate loads which are triggered by the content click handler (ClickHandlerChild.sys.mjs). These should have the source process set as the triggering process, and should also pull principals from the parent process, rather than the content process, as with nsContextMenu.js.

    Much like with nsContextMenu.js the bulk of this can probably be done by pulling the information from trusted sources rather than from a content process IPC message, but it'd still be good to set the triggering remote type.

  4. Also in ClickHandlerChild we have the middle-click paste navigation handler, which if enabled will paste from the clipboard and try to navigate to that URL when middle-clicking on a page. This is another potential issue, as content processes can write to the clipboard (as can arbitrary websites), meaning that they could start any navigation with it. Fortunately, it's been disabled by default on all platforms since bug 366945, so with a couple of extra checks for the middlemouse.contentLoadURL pref, we should be able to avoid it being an issue for most users. I'd prefer to avoid removing the about:config pref in this bug, as I imagine it'll upset some users, based on the comments in that bug.

  5. It is possible to navigate and open a content window using the service worker Client APIs. I've filed bug 1799692 for that issue.

I also looked at drag & drop, as I know dropping a link into a content area can cause a navigation, and we handle the drop event within content, but it looks like we already validate it against the parent process copy of the drag, and downgrade the principal if it was modified by content (https://searchfox.org/mozilla-central/rev/0b1543e85d13c30a13c57e959ce9815a3f0fa1d3/dom/ipc/BrowserParent.cpp#844-858), so we should be OK there.

Those are all of the cases which should be impacted off the top of my head, but I may have missed some which we'll need to fix as we find them. I am optimistic that this will block the vast majority of cases, and at the very least it offers us a framework for fixing other cases as well.

Looks like the contentLoadURL pref is already checked when instantiating the actor, so is handled: https://searchfox.org/mozilla-central/rev/0b1543e85d13c30a13c57e959ce9815a3f0fa1d3/browser/components/BrowserGlue.jsm#452

Propagate the ability to pass triggeringRemoteType through the desktop frontend
in various places, such that it is set when using the context menu or content
click handler.

Depends on D161199

I've put up patches in bug 1799692 as well, so I'm hoping to potentially land the patches in this bug and that bug at the same time right now, unless issues come up.

Comment on attachment 9302939 [details]
Bug 1538028 - Part 4: Pass TriggeringRemoteType through the frontend, r=gijs!

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: This patch is only for a sandbox escape. A sufficiently knowledgeable person could probably figure out how to trigger it if they already had control of a content process, but it would require effort.

The patch doesn't super-paint a bullseye on the problem, but it probably wouldn't be too hard to figure out what it would be fixing.

  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: They will be somewhat difficult to implement. THis patch builds on top of other changes such as bug 1798986, which in turn depends on bug 1800545 to avoid test failures. We could potentially uplift this to beta with a lot of effort.

I also currently plan to land this at the same time as bug 1799692, as it is required to plug all of the holes I know of which could allow this kind of privilege escalation.

  • How likely is this patch to cause regressions; how much testing does it need?: This patch is fairly large, but I tried to keep it self-contained to keep regressions down. That being said, I had to take many tries to get all tests (hopefully) green, as this part of our codebase is somewhat fragile.

There is a risk that there is some codepath which we are not aware of which will try to start a file URI load from a content process which will break after this patch. We probably won't be able to find it until the patches land, and we get bug reports from users.

I'm marking android as not impacted, as I don't believe it distinguishes and gives "file" content processes a different privilege level, so it is largely unimpacted, though there may still be some impact due to gecko-only permission checks.

  • Is Android affected?: No
Attachment #9302939 - Flags: sec-approval?
Attachment #9301853 - Flags: sec-approval?
Attachment #9301854 - Flags: sec-approval?
Attachment #9301855 - Flags: sec-approval?

Comment on attachment 9302939 [details]
Bug 1538028 - Part 4: Pass TriggeringRemoteType through the frontend, r=gijs!

We discussed this and given the complexity of creating uplifts, and the impact of the vulnerability (it allows an attacker to read an arbitrary file on the filesystem, but not write one or plant malware), we are okay with letting it ride the trains out and not uplifting it to Beta or ESR.

Attachment #9302939 - Flags: sec-approval? → sec-approval+
Attachment #9301855 - Flags: sec-approval? → sec-approval+
Attachment #9301854 - Flags: sec-approval? → sec-approval+
Attachment #9301853 - Flags: sec-approval? → sec-approval+
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch
Group: dom-core-security → core-security-release
Flags: qe-verify-
Whiteboard: [reporter-external] [client-bounty-form] [verif?] → [reporter-external] [client-bounty-form] [verif?][post-critsmash-triage]
Whiteboard: [reporter-external] [client-bounty-form] [verif?][post-critsmash-triage] → [reporter-external] [client-bounty-form] [verif?][post-critsmash-triage][adv-main109+]
Alias: CVE-2023-23597
Regressed by: 1812356
No longer regressed by: 1812356
Regressions: 1812356
Regressions: 1838686
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: