Open Bug 862383 Opened 11 years ago Updated 23 days ago

nsFileChannel::nsFileChannel causes expensive lstats

Categories

(Core :: Networking: File, defect, P3)

x86
macOS
defect

Tracking

()

Performance Impact low

People

(Reporter: BenWa, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: main-thread-io, perf:responsiveness, Whiteboard: [snappy][necko-backlog])

Here's an example profile showing 173ms in lstats. This main thread IO signature show up very frequent when profiling.

Profile:
http://people.mozilla.com/~bgirard/cleopatra/#report=cdaaa5044a9bd73efa3e6473cd288485f15a5320

Looks like we do this to avoid a 'time of check, time of use' attack but the checks are not atomic so we're not defending from that 100% properly.

http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/file/nsFileChannel.cpp#244

Is there anything we can do to mitigation this IO?
Flags: needinfo?(netzen)
Whiteboard: [snappy]
we could maybe implement some kind of caching but I think the check is somewhat important from a security point of view. CCing bz for more info.
Flags: needinfo?(netzen)
Unfortunately, yes.  :(
I don't think caching is right if we're trying to defend from a TOUTOK attack. We shouldn't be doing main thread IO at all.

Can we have a non main thread IO implementation and modify the users? The profile shows the users to be:
* nsDocShell::DoURILoad
* nsScriptLoader::StartLoad
Keywords: main-thread-io
How about just dropping support for lnk?
The users are anyone opening a file channel.

We could try to move the machinery off-thread and away from the AsyncOpen()...

> How about just dropping support for lnk?

And for symlinks on Unix-based OSes?
I'm not sure if the sec issue was ever confirmed with symlinks. I think it is a sec issue with symlinks but I'm not sure.  But I think it's acceptable to drop lnk on Windows and at least get a perf win on Windows.
(In reply to Boris Zbarsky (:bz) from comment #5)
> The users are anyone opening a file channel.

True. But each user we move away from it is killing main thread IO. We need to move towards no main thread IO if we want to be snappy.
My point was that there is nothing special about DoURILoad and the script loader.  They're just opening a network channel, which is an inherently async operation.  There's nothing they want to know about this thing synchronously.  So there is no need to modify them in any way; we just need to fix the necko behavior.
This shows up whenever we import a .jsm. I'll try and work on this.
I'd like to make these checks lazy (and move them OMT whenever we are doing things asynchronously):
1. nsFileChannel::nsFileChannel's entire |isSymLink| branch;
2. nsFileChannel::MakeFileInputStream's |file->isDirectory| check.

Now, a comment in nsFileChannel::nsFileChannel indicates that the symlink check must be performed immediately, but I do not see how it efficiently protects us against anything. Could someone spell this ot for me?
Flags: needinfo?(bzbarsky)
I assume you've already read bug 670514?
Flags: needinfo?(bzbarsky)
I have. However, my understanding of localhost security policies on localhost is somewhat limited, so I'm not sure I have extracted everything meaningful.
The file:// origin policy is that a file:// page may only access things in its directory and subdirectories.

The attack here is that you have the following sequence of steps:

1)  Browser loads baz.lnk which points to index.html in the same dir.
2)  baz.lnk is changed to point to C:\
3)  index.html asks the browser to do an XHR to baz.lnk.

Looking at bug 670514 some more, the important thing here is that index.html have the origin of index.html, not of baz.lnk, and that the security check we do when we try to do the XHR resolve whatever baz.lnk points to at that point.

So the real constraint here is twofold:

1)  We need to resolve the lnk for the index.html load to the actual place the data is coming from before we create the principal for that document.

2)  We need to use the post-resolve URI on the second load when deciding on the channel's principal.  What I don't understand is how that works right now, actually... nsContentUtils::SetUpChannelOwner uses the passed-in URI, not the channel's URI, for the CheckMayLoad check, no?  Daniel?
The key part is that the final principal of a load from lnk needs to be based on the lnk target, not
Flags: needinfo?(dveditz)
> 1)  We need to resolve the lnk for the index.html load to the actual place the data is coming from before we create the principal for that document.

Couldn't we perform the resolution lazily, i.e. when we create the principal for the document? This would at least remove the cost for instances of nsFileChannel that are not attached to a document.
> i.e. when we create the principal for the document?

Yes, we could.
That code is a bit obscure to me. Should we do this in nsNullPrincipal::Init()?
It would need to somehow happen in GetChannelPrincipal(), whether via the channel setting an owner on itself or some other mechanism.

Or if we really want to limit it to documents (which is not quite obvious to me, actually), in the place where document calls GetChannelPrincipal.

The first step, though, is writing down the _exact_ behavior we want here.  Randomly changing security code without clearly speccing out what it should be doing is not a good plan.  ;)
Sounds like a good idea, but by understanding of the security implications is not sufficient for this. So I guess I should drop that bug.
Still waiting on dveditz to respond.  :(
I'm not seeing the question for me in comment 13. Yes, in the current approach we want the principal to represent what the file actually is, not what it used to point at.

> The file:// origin policy is that a file:// page may only access things in its directory
> and subdirectories.

Can we safely change this? I believe Chrome gives each file:// URI a unique origin and I haven't heard a lot of complaints. It will break some things, but will it break anything we care about? Note that we couldn't use the URL as the origin, it'd have to be unique (e.g. nsNullPrincipal) since you could open foo.lnk (->page1) and then after a delay so it can switch the link it opens a frame to "itself", foo.lnk (->page2).
Flags: needinfo?(dveditz)
> I'm not seeing the question for me in comment 13.

It's this part:

> What I don't understand is how that works right now, actually...
> nsContentUtils::SetUpChannelOwner uses the passed-in URI, not the channel's URI, for the
> CheckMayLoad check, no?

As in, does the fix for bug 670514 actually work?  If so, why?

> but will it break anything we care about?

We kept our current behavior for things like HTML help systems, iirc...
Whiteboard: [snappy] → [snappy][necko-backlog]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3

I was halfway through re-filing this and then found this bug again.

https://searchfox.org/mozilla-central/rev/d24696b5abaf9fb75f7985952eab50d5f4ed52ac/netwerk/protocol/file/nsFileChannel.cpp#256

On Windows, this doesn't do IO (unless you're fetching a .lnk file) because we bail out in nsIFile. But it's mainthread IO on Linux and macOS.

We've recently changed our file: security handling. It also seems like the original fix could in principle be restricted to any loads where we'd create URI-associated principals (or perhaps even push the check into the creation of the principals instead of the channels?) - I'd assume that would normally be document loads. I'm not sure if there are others, but certainly this check should be unnecessary for fetch/XHR, which we can detect based on the load info. Boris, is that right, or am I misunderstanding the original reason this check was added?

That'd probably get rid of all practical cases where we hit this and cause perf issues in Firefox itself. It'd still happen for users who manually load html files from disk, but that's certainly better than for any fetch() call of file: things we do internally...

Flags: needinfo?(bzbarsky)
Whiteboard: [snappy][necko-backlog] → [snappy][necko-backlog][fxperf]
See Also: → 1602434

Boris, is that right, or am I misunderstanding the original reason this check was added?

Consider an attacker that gets you to load a .lnk file (or Unix symlink) as a web page. What path should the resulting principal be for? Should it be able to do XHR to the .lnk file? To the thing the link resolves to? Something else?

Right now, afaict it would be able to do XHR to the .lnk file, because we would resolve the symlink up front (worth testing this via experiment). But if we moved the symlink resolution later, it would not be able to do that.

Flags: needinfo?(bzbarsky)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #26)

Boris, is that right, or am I misunderstanding the original reason this check was added?

Consider an attacker that gets you to load a .lnk file (or Unix symlink) as a web page. What path should the resulting principal be for? Should it be able to do XHR to the .lnk file? To the thing the link resolves to? Something else?

I think I'm pretty comfortable allowing only the XHR to the target (thing the link resolves to), tbh. But yeah, my suggestion would break loading the .lnk again via xhr.

Right now, afaict it would be able to do XHR to the .lnk file, because we would resolve the symlink up front (worth testing this via experiment). But if we moved the symlink resolution later, it would not be able to do that.

Right.

Perhaps there's a simpler solution here: drop the check if the triggering/loading principal is system (as it'd be able to load anything it liked anyway). Is that feasible?

(Basically, I really don't want us to incur this IO for builtin/chrome code trying to read files. I would prefer for us never to incur it, but I get that that seems like we can't do it.)

Flags: needinfo?(bzbarsky)
Whiteboard: [snappy][necko-backlog][fxperf] → [snappy][necko-backlog][fxperf:p3]

Sorry for the horrible lag here...

Perhaps there's a simpler solution here: drop the check if the triggering/loading principal is system (as it'd be able to load anything it liked anyway). Is that feasible?

I think if we skip this stuff for non-document (so DOCUMENT and SUBDOCUMENT) loads that have a system triggering (or loading?) principal that should be safe. For document loads we'll probably want to keep canonicalizing for now.

More long-term, I wonder whether we can move this symlink stuff off-thread to after the channel is opened and treat it kinda like a redirect or something.

Flags: needinfo?(bzbarsky)
Severity: normal → S3
Performance Impact: --- → ?
Whiteboard: [snappy][necko-backlog][fxperf:p3] → [snappy][necko-backlog]

The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Impact on site: Causes noticeable jank
Configuration: Rare
Websites affected: Rare
[x] Bug affects multiple sites
[x] Multiple reporters

This bug seems to occur in rare configurations when accessing a file locally. If the impact is larger than addressed here, please comment and we can re-triage. 1796514 was reviewed for context.

Performance Impact: ? → low
You need to log in before you can comment on or make changes to this bug.