862383 - nsFileChannel::nsFileChannel causes expensive lstats

Reporter

Description

•

11 years ago

Here's an example profile showing 173ms in lstats. This main thread IO signature show up very frequent when profiling.

Profile:
http://people.mozilla.com/~bgirard/cleopatra/#report=cdaaa5044a9bd73efa3e6473cd288485f15a5320

Looks like we do this to avoid a 'time of check, time of use' attack but the checks are not atomic so we're not defending from that 100% properly.

http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/file/nsFileChannel.cpp#244

Is there anything we can do to mitigation this IO?

Benoit Girard (:BenWa)

Reporter

Updated

•

11 years ago

Flags: needinfo?(netzen)

Benoit Girard (:BenWa)

Reporter

Updated

•

11 years ago

Whiteboard: [snappy]

Brian R. Bondy [:bbondy]

Comment 1

•

11 years ago

we could maybe implement some kind of caching but I think the check is somewhat important from a security point of view. CCing bz for more info.

Flags: needinfo?(netzen)

Boris Zbarsky [:bzbarsky]

Comment 2

•

11 years ago

Unfortunately, yes.  :(

Benoit Girard (:BenWa)

Reporter

Comment 3

•

11 years ago

I don't think caching is right if we're trying to defend from a TOUTOK attack. We shouldn't be doing main thread IO at all.

Can we have a non main thread IO implementation and modify the users? The profile shows the users to be:
* nsDocShell::DoURILoad
* nsScriptLoader::StartLoad

Keywords: main-thread-io

Brian R. Bondy [:bbondy]

Comment 4

•

11 years ago

How about just dropping support for lnk?

Boris Zbarsky [:bzbarsky]

Comment 5

•

11 years ago

The users are anyone opening a file channel.

We could try to move the machinery off-thread and away from the AsyncOpen()...

> How about just dropping support for lnk?

And for symlinks on Unix-based OSes?

Brian R. Bondy [:bbondy]

Comment 6

•

11 years ago

I'm not sure if the sec issue was ever confirmed with symlinks. I think it is a sec issue with symlinks but I'm not sure.  But I think it's acceptable to drop lnk on Windows and at least get a perf win on Windows.

Benoit Girard (:BenWa)

Reporter

Comment 7

•

11 years ago

(In reply to Boris Zbarsky (:bz) from comment #5)
> The users are anyone opening a file channel.

True. But each user we move away from it is killing main thread IO. We need to move towards no main thread IO if we want to be snappy.

Boris Zbarsky [:bzbarsky]

Comment 8

•

11 years ago

My point was that there is nothing special about DoURILoad and the script loader.  They're just opening a network channel, which is an inherently async operation.  There's nothing they want to know about this thing synchronously.  So there is no need to modify them in any way; we just need to fix the necko behavior.

David Teller [:Yoric] - still alive but not very active

Updated

•

10 years ago

Blocks: 978079

David Teller [:Yoric] - still alive but not very active

Comment 9

•

10 years ago

This shows up whenever we import a .jsm. I'll try and work on this.

David Teller [:Yoric] - still alive but not very active

Comment 10

•

10 years ago

I'd like to make these checks lazy (and move them OMT whenever we are doing things asynchronously):
1. nsFileChannel::nsFileChannel's entire |isSymLink| branch;
2. nsFileChannel::MakeFileInputStream's |file->isDirectory| check.

Now, a comment in nsFileChannel::nsFileChannel indicates that the symlink check must be performed immediately, but I do not see how it efficiently protects us against anything. Could someone spell this ot for me?

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 11

•

10 years ago

I assume you've already read bug 670514?

Flags: needinfo?(bzbarsky)

David Teller [:Yoric] - still alive but not very active

Comment 12

•

10 years ago

I have. However, my understanding of localhost security policies on localhost is somewhat limited, so I'm not sure I have extracted everything meaningful.

Boris Zbarsky [:bzbarsky]

Comment 13

•

10 years ago

The file:// origin policy is that a file:// page may only access things in its directory and subdirectories.

The attack here is that you have the following sequence of steps:

1)  Browser loads baz.lnk which points to index.html in the same dir.
2)  baz.lnk is changed to point to C:\
3)  index.html asks the browser to do an XHR to baz.lnk.

Looking at bug 670514 some more, the important thing here is that index.html have the origin of index.html, not of baz.lnk, and that the security check we do when we try to do the XHR resolve whatever baz.lnk points to at that point.

So the real constraint here is twofold:

1)  We need to resolve the lnk for the index.html load to the actual place the data is coming from before we create the principal for that document.

2)  We need to use the post-resolve URI on the second load when deciding on the channel's principal.  What I don't understand is how that works right now, actually... nsContentUtils::SetUpChannelOwner uses the passed-in URI, not the channel's URI, for the CheckMayLoad check, no?  Daniel?
The key part is that the final principal of a load from lnk needs to be based on the lnk target, not

Flags: needinfo?(dveditz)

David Teller [:Yoric] - still alive but not very active

Comment 14

•

10 years ago

> 1)  We need to resolve the lnk for the index.html load to the actual place the data is coming from before we create the principal for that document.

Couldn't we perform the resolution lazily, i.e. when we create the principal for the document? This would at least remove the cost for instances of nsFileChannel that are not attached to a document.

Boris Zbarsky [:bzbarsky]

Comment 15

•

10 years ago

> i.e. when we create the principal for the document?

Yes, we could.

David Teller [:Yoric] - still alive but not very active

Comment 16

•

10 years ago

That code is a bit obscure to me. Should we do this in nsNullPrincipal::Init()?

Boris Zbarsky [:bzbarsky]

Comment 17

•

10 years ago

It would need to somehow happen in GetChannelPrincipal(), whether via the channel setting an owner on itself or some other mechanism.

Or if we really want to limit it to documents (which is not quite obvious to me, actually), in the place where document calls GetChannelPrincipal.

The first step, though, is writing down the _exact_ behavior we want here.  Randomly changing security code without clearly speccing out what it should be doing is not a good plan.  ;)

David Teller [:Yoric] - still alive but not very active

Comment 18

•

10 years ago

Sounds like a good idea, but by understanding of the security implications is not sufficient for this. So I guess I should drop that bug.

Boris Zbarsky [:bzbarsky]

Comment 19

•

10 years ago

Still waiting on dveditz to respond.  :(

David Teller [:Yoric] - still alive but not very active

Comment 20

•

10 years ago

dveditz ping?

David Teller [:Yoric] - still alive but not very active

Updated

•

10 years ago

Blocks: 859912

Daniel Veditz [:dveditz]

Comment 21

•

10 years ago

I'm not seeing the question for me in comment 13. Yes, in the current approach we want the principal to represent what the file actually is, not what it used to point at.

> The file:// origin policy is that a file:// page may only access things in its directory
> and subdirectories.

Can we safely change this? I believe Chrome gives each file:// URI a unique origin and I haven't heard a lot of complaints. It will break some things, but will it break anything we care about? Note that we couldn't use the URL as the origin, it'd have to be unique (e.g. nsNullPrincipal) since you could open foo.lnk (->page1) and then after a delay so it can switch the link it opens a frame to "itself", foo.lnk (->page2).

Flags: needinfo?(dveditz)

Boris Zbarsky [:bzbarsky]

Comment 22

•

10 years ago

> I'm not seeing the question for me in comment 13.

It's this part:

> What I don't understand is how that works right now, actually...
> nsContentUtils::SetUpChannelOwner uses the passed-in URI, not the channel's URI, for the
> CheckMayLoad check, no?

As in, does the fix for bug 670514 actually work?  If so, why?

> but will it break anything we care about?

We kept our current behavior for things like HTML help systems, iirc...

Patrick McManus [:mcmanus]

Updated

•

8 years ago

Whiteboard: [snappy] → [snappy][necko-backlog]

Firefox Bug Husbandry Bot

Comment 23

•

7 years ago

Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258

Priority: -- → P1

Firefox Bug Husbandry Bot

Comment 24

•

7 years ago

Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258

Priority: P1 → P3

:Gijs (he/him)

Comment 25

•

4 years ago

I was halfway through re-filing this and then found this bug again.

https://searchfox.org/mozilla-central/rev/d24696b5abaf9fb75f7985952eab50d5f4ed52ac/netwerk/protocol/file/nsFileChannel.cpp#256

On Windows, this doesn't do IO (unless you're fetching a .lnk file) because we bail out in nsIFile. But it's mainthread IO on Linux and macOS.

We've recently changed our file: security handling. It also seems like the original fix could in principle be restricted to any loads where we'd create URI-associated principals (or perhaps even push the check into the creation of the principals instead of the channels?) - I'd assume that would normally be document loads. I'm not sure if there are others, but certainly this check should be unnecessary for fetch/XHR, which we can detect based on the load info. Boris, is that right, or am I misunderstanding the original reason this check was added?

That'd probably get rid of all practical cases where we hit this and cause perf issues in Firefox itself. It'd still happen for users who manually load html files from disk, but that's certainly better than for any fetch() call of file: things we do internally...

Flags: needinfo?(bzbarsky)

:Gijs (he/him)

Updated

•

4 years ago

Whiteboard: [snappy][necko-backlog] → [snappy][necko-backlog][fxperf]

:Gijs (he/him)

Updated

•

4 years ago

Comment 26

•

4 years ago

Boris, is that right, or am I misunderstanding the original reason this check was added?

Consider an attacker that gets you to load a .lnk file (or Unix symlink) as a web page. What path should the resulting principal be for? Should it be able to do XHR to the .lnk file? To the thing the link resolves to? Something else?

Right now, afaict it would be able to do XHR to the .lnk file, because we would resolve the symlink up front (worth testing this via experiment). But if we moved the symlink resolution later, it would not be able to do that.

Boris Zbarsky [:bzbarsky]

Updated

•

4 years ago

Flags: needinfo?(bzbarsky)

:Gijs (he/him)

Comment 27

•

4 years ago

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #26)

Boris, is that right, or am I misunderstanding the original reason this check was added?

Consider an attacker that gets you to load a .lnk file (or Unix symlink) as a web page. What path should the resulting principal be for? Should it be able to do XHR to the .lnk file? To the thing the link resolves to? Something else?

I think I'm pretty comfortable allowing only the XHR to the target (thing the link resolves to), tbh. But yeah, my suggestion would break loading the .lnk again via xhr.

Right now, afaict it would be able to do XHR to the .lnk file, because we would resolve the symlink up front (worth testing this via experiment). But if we moved the symlink resolution later, it would not be able to do that.

Right.

Perhaps there's a simpler solution here: drop the check if the triggering/loading principal is system (as it'd be able to load anything it liked anyway). Is that feasible?

(Basically, I really don't want us to incur this IO for builtin/chrome code trying to read files. I would prefer for us never to incur it, but I get that that seems like we can't do it.)

Flags: needinfo?(bzbarsky)

Emma Malysz

Updated

•

4 years ago

Whiteboard: [snappy][necko-backlog][fxperf] → [snappy][necko-backlog][fxperf:p3]

Boris Zbarsky [:bzbarsky]

Comment 28

•

4 years ago

Sorry for the horrible lag here...

Perhaps there's a simpler solution here: drop the check if the triggering/loading principal is system (as it'd be able to load anything it liked anyway). Is that feasible?

I think if we skip this stuff for non-document (so DOCUMENT and SUBDOCUMENT) loads that have a system triggering (or loading?) principal that should be safe. For document loads we'll probably want to keep canonicalizing for now.

More long-term, I wonder whether we can move this symlink stuff off-thread to after the channel is opened and treat it kinda like a redirect or something.

Flags: needinfo?(bzbarsky)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Valentin Gosu [:valentin] (he/him)

Updated

•

2 years ago

Updated

•

29 days ago

Performance Impact: --- → ?

Keywords: perf:responsiveness

Whiteboard: [snappy][necko-backlog][fxperf:p3] → [snappy][necko-backlog]

Frank Doty [:fdoty]

Comment 29

•

23 days ago

The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Impact on site: Causes noticeable jank
Configuration: Rare
Websites affected: Rare
[x] Bug affects multiple sites
[x] Multiple reporters

This bug seems to occur in rare configurations when accessing a file locally. If the impact is larger than addressed here, please comment and we can re-triage. 1796514 was reviewed for context.

Performance Impact: ? → low