Closed Bug 1558932 Opened 5 years ago Closed 3 years ago

Determine the scope to which storage and communications should be scoped in the third-party context

Categories

(Core :: Privacy: Anti-Tracking, defect, P2)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: ehsan.akhgari, Unassigned)

References

(Blocks 1 open bug)

Details

The existing first party isolation feature uses eTLD+1 as its scope for third-party content.

Currently we use the same scheme for dynamic first party isolation, but we have yet to decide whether that is the exact scheme we would like to use.

This bug is filed to track that discussion as well as any possible implementation work.

Also, as part of this we should probably discuss how to handle top-level documents that do not come from a URL with a registerable domain name. Examples include IP addresses used as host name, Interanet/loopback domain names (localhost, company.corp), etc.

If we want to use the registrable domain as the key, I'd expect an algorithm that takes an origin origin and returns a key as follows:

  1. If origin is an opaque origin, then return origin.
  2. If origin's host's registrable domain is null, then return origin.
  3. Return (origin's host's scheme, origin's host's registrable domain).

That matches fission (and Safari?) I believe. However, there are a number of efforts underway (largely Chrome-led) to push things toward origin isolation. All things being equal, not depending on the Public Suffix List is probably better.

Priority: -- → P2

It seems that Chrome is planning to use the triple-keying approach based on top-level/frame origin isolation. I suggest using a similar approach here. Anne, what do you think?

Flags: needinfo?(annevk)
Assignee: nobody → xeonchen

I discussed this briefly with Gary (and we should probably have a larger discussion about how this intersects with bug 1590107).

As Gary and I understand that document, Chrome wants the key to be the combination of the top-level and the respective frame. My intuition was the combination of the top-level and second-level frame.

Structure Google Key Anne Key
A         A          A
|-B       A+B        A+B
| |-C     A+C        A+B
|-C       A+C        A+C

(A has nested B and C, B has nested C.)

A gets to attack B and C's caches either way. Chrome's setup would allow second-level B to attack the caches of second-level C. My setup would allow third-level C to attack B.

If B is an ad and C is a bank, my setup seems preferable. (Edit: filed https://github.com/shivanigithub/http-cache-partitioning/issues/2 to see what Chrome thinks.)

(Aside, per the latest comment in https://github.com/whatwg/fetch/issues/904 Chrome plans to use site rather than origin again.)

Flags: needinfo?(annevk)

Tim will be working on this.

Assignee: xeonchen → tihuang

In this bug, we want to add a separate cache key to all cache mechanisms to avoid the side-channel attack. And we think that the approaches mentioned in comment 3 are not sufficient, both of them still allow attacks in certain situations. So, we believe the cache key should consider the whole tree structure. Like this

Structure Cache Key
A         A          
|-B       A+B        
| |-C     A+B+C
|-C       A+C

Right now, we use OriginAttributes to partition cache as well as storage. But, we want to add a separate cache key in OAs to partition caches in this bug, and this cache key won't affect the partition of storages. To achieve this, we propose that we can add extended fields in OAs that will only be shown when we request it in certain situations. For this bug, we can add an extended field 'CacheKey' into OAs and attach this field for caches. This means we would need two versions of OriginAttributes::CreateSuffix(): One will add extended fields and another one won't. And we need to modify all call-sites where need extended fields, like caches. And I think this 'CacheKey' should be used in both NodePrincipal and StoragePrincipal since this should be an opt-out protection I think. And we would put this protection behind a pref first.
(Note that the extended field of OAs doesn't have to be the same in a DOM tree.)

The cache key should populate like this: During the top-level loading, it adds the first cache key to the OA, which is [top-level origin]. When the top-level document loads an iframe, we would get the current cache key from the top-level's OAs and mix it with the origin of the iframe that we are going to load. So, the cache key of the second level iframe is [top-level origin + second-level origin]. And we will do the same way if the second level iframe wants to load a third level iframe. The cache key would become [top-level + second-level + third-level]. And keep doing it as the level goes up.

So, the process would look like this: extract the current cache key and mix it with the origin that we are going to load in the next level, and use it as the cache key of the next level. There are three places where we need to do this.

  • Loading a top-level document
  • Loading a sub-document(like an iframe)
  • Redirects(We need to use the final origin in the cache key)

Given the recent Fission change, we should not expose cross-origin information across processes. So, the cache key itself should not reveal the tree structure. Thus, we need to use a one-way hash(sha256, maybe?) to hide real origins in the cache key.

Ehsan and Anne, do I miss something here or do you have any suggestions? Thanks.

Flags: needinfo?(ehsan)
Flags: needinfo?(annevk)

We should also think about how to sanitize caches if we introduce this new cache key. The cache key won't be known without the tree structure, and we may not know the tree structure when clearing caches. But, this may not be a problem for us because we clear the caches by hosts.

Do all cache lookups go through the parent process? In that case there might not be a need for the content process to keep track of the origins (well, sites), right? That would also prevent spoofing in case of a compromised content process (assuming more hardening arrives at some point in the future).

(I also recall Andrea suggesting we might not need OriginAttributes for this, but that was near the end of the meeting so maybe I misunderstood.)

Flags: needinfo?(annevk) → needinfo?(amarchesini)

(In reply to Anne (:annevk) from comment #7)

Do all cache lookups go through the parent process? In that case there might not be a need for the content process to keep track of the origins (well, sites), right? That would also prevent spoofing in case of a compromised content process (assuming more hardening arrives at some point in the future).

This could be a even better approach since it is simpler and we don't have to mess up with OriginAttributes. However, I have two concerns regarding this.

  • Do we all access cache in the parent process? I believe that we access the network cache in the parent process. But, I don't know if this is true for DOMCache or other caches.
  • How do we clear cache for a given site? It seems to me that we cannot clear cache for a site unless we record all possible cache keys for a given site. So, we might end up with clearing all cache instead of one site when doing 'Forget about site'.

Requests for quota manager data go through the parent process, so that ought to be okay.

As for clearing, if you consider the cache key key an ordered list of sites (top-to-bottom), and a user wants to clear site A, you'd clear all data for which key[0] is A. That model makes the most sense to me as it also puts all the resource usage blame with the top-level site, the one the user can see in the address bar and presumably wanted to visit or at least ended up at somehow.

The image and font caches live in the content process right now... And I think in general we may find other similar caches in the content process so it's better to make fewer assumptions about what process things live in.

About storing a list of sites, I agree with Anne's suggestion. How about a scheme like this:

  • let salt = generateUUID();, salt is a value that is never exposed to content processes.
  • storeInProfile(salt);
  • let siteHash = site => sha1(salt + site);
  • For each level of browsing context embedding, append siteHash(site) to the cacheKey.
  • When clearing site data, in the parent process, look at cacheKey and break apart each component, compare it against siteHash(site).

I'd still be very interested to hear baku's feedback on all of this...

Anne, two questions for you:

Flags: needinfo?(ehsan) → needinfo?(annevk)

(In reply to :ehsan akhgari from comment #10)

The image and font caches live in the content process right now... And I think in general we may find other similar caches in the content process so it's better to make fewer assumptions about what process things live in.

About storing a list of sites, I agree with Anne's suggestion. How about a scheme like this:

  • let salt = generateUUID();, salt is a value that is never exposed to content processes.
  • storeInProfile(salt);
  • let siteHash = site => sha1(salt + site);
  • For each level of browsing context embedding, append siteHash(site) to the cacheKey.
  • When clearing site data, in the parent process, look at cacheKey and break apart each component, compare it against siteHash(site).

I'd still be very interested to hear baku's feedback on all of this...

One small question here. Would the cache key becomes too long as the tree becomes deeper?

I thought the image cache is per-document, but yeah, font cache would be problematic. If read gadgets become commonplace due to Spectre this architecture seems problematic though, but it won't be the most problematic thing we have to deal with.

I'm not really sure what Chrome's rationale is, exactly, I asked for further clarification. Since they are using sites (and Safari is as well) I think we should too, though it would be nice if the design allows for making this stricter down the line.

Flags: needinfo?(annevk)
  1. I think putting the entire tree structure into the cachmekey makes sense. (I also don't understand Chrome's documentation for the triple key stuff.)

  2. I am also uncertain how we will be able to clear caches if we turn the cache key into a cryptographic hash vs domain strings. The proposal in Comment 9 would work in some situations; but If I am on example.com with a logged-in iframe for mail.com; it might be unintuitive that going to mail.com and 'forgetting it' does not log me out.

  3. The potentially infinite length of the cache key if a site tries to DoS Firefox by recursively embedding seems concerning. But is the cache key the first thing that would break in that case? :)

  4. I think that this behavior would strengthen of First Party Isolation, and Tor would be happy with it, and thus we would not need to maintain backwards compatibility for them.

  5. The notion of different behavior for caches and storage concerns me. It seems like there could be cross-site attacks that could happen when example.com embeds mail.com and ad.com; and ad.com embeds mail.com. The two mail.com iframe will share storage. This is currently the situation; yes - but if we are going to separate caches this way - what is the argument for not separating storage? Do we think we're going to break something?

  6. The crypto proposed in Comment 10 is not safe; sorry Ehsan :) When mixing in a secret, use HMAC; not simply append/prepend the secret. Sha-1 is insecure. The proposal to use a global parent-process secret to obscure origins could work, as long as we never expose the ability for a content process to submit an arbitrary origin and get back the obscured value (aka a query oracle) - that would enable a simple guessing attack. If we end up using crypto for this solution, we should get it double checked by the crypto team (and I'd like to look at it too) - trying to take shortcuts (using a CRC instead of a hash function, skipping hmac, etc) could lead to subtle bypasses.

(In reply to Tom Ritter [:tjr] (ni for response to sec-[approval|rating|advisories|cve]) from comment #14)

  1. I am also uncertain how we will be able to clear caches if we turn the cache key into a cryptographic hash vs domain strings. The proposal in Comment 9 would work in some situations; but If I am on example.com with a logged-in iframe for mail.com; it might be unintuitive that going to mail.com and 'forgetting it' does not log me out.

If I understand comment 9 correctly, mail.com will be the first level of embedding on example.com in your example, so any cache key that looks like "mail.com" or "mail.com^foo.com" etc on example.com will be forgotten when forgetting about mail.com. That should result in a similar user experience to today. (We can even look at frames at other levels of the cache key besides just the first level.)

  1. The potentially infinite length of the cache key if a site tries to DoS Firefox by recursively embedding seems concerning.

There's no opportunity for an infinite length, the maximum depth for content iframes is 10. That still creates a long cache key but there is an upper bound thankfully.

But is the cache key the first thing that would break in that case? :)

  1. I think that this behavior would strengthen of First Party Isolation, and Tor would be happy with it, and thus we would not need to maintain backwards compatibility for them.

Fantastic! Do we need to worry about porting over any data?

  1. The notion of different behavior for caches and storage concerns me. It seems like there could be cross-site attacks that could happen when example.com embeds mail.com and ad.com; and ad.com embeds mail.com. The two mail.com iframe will share storage. This is currently the situation; yes - but if we are going to separate caches this way - what is the argument for not separating storage? Do we think we're going to break something?

I think the argument is more that the caches are possible to probe cross-origin. Perhaps Anne can say more. (I do see the possibility of switching storageKey to be the same thing as cacheKey at some point, FWIW.)

  1. The crypto proposed in Comment 10 is not safe; sorry Ehsan :) When mixing in a secret, use HMAC; not simply append/prepend the secret. Sha-1 is insecure. The proposal to use a global parent-process secret to obscure origins could work, as long as we never expose the ability for a content process to submit an arbitrary origin and get back the obscured value (aka a query oracle) - that would enable a simple guessing attack. If we end up using crypto for this solution, we should get it double checked by the crypto team (and I'd like to look at it too) - trying to take shortcuts (using a CRC instead of a hash function, skipping hmac, etc) could lead to subtle bypasses.

You're completely right, my bad. I was mostly trying to sketch an idea there but I should have been more clear. I agree with everything you said above.

(In reply to :ehsan akhgari from comment #15)

  1. I think that this behavior would strengthen of First Party Isolation, and Tor would be happy with it, and thus we would not need to maintain backwards compatibility for them.

Fantastic! Do we need to worry about porting over any data?

That depends. Do you want to annoy all the people who have FPI turned on in Firefox; or not? Tor doesn't need porting; but I and everyone else using FPI with Firefox would probably a) lose all of our cookies and b) orphan all the storage present in our browsers under cachekeys that will never be accessed and probably never deleted unless we delete and re-create our profile. There are many vocal people using FPI....

(In reply to Tom Ritter [:tjr] (ni for response to sec-[approval|rating|advisories|cve]) from comment #16)

(In reply to :ehsan akhgari from comment #15)

  1. I think that this behavior would strengthen of First Party Isolation, and Tor would be happy with it, and thus we would not need to maintain backwards compatibility for them.

Fantastic! Do we need to worry about porting over any data?

That depends. Do you want to annoy all the people who have FPI turned on in Firefox; or not? Tor doesn't need porting; but I and everyone else using FPI with Firefox would probably a) lose all of our cookies and b) orphan all the storage present in our browsers under cachekeys that will never be accessed and probably never deleted unless we delete and re-create our profile. There are many vocal people using FPI....

Fair enough. I was asking because under this plan there is nothing we can do to preserve people's cache data. For storage data we can do something, depending on how much work we're willing to put into it. I think it's totally reasonable to port the cookie manager data. Porting all web storage data sounds a lot reasonable to me... If you were to propose which data to port, what else (besides cookies) would you put on the list?

(In reply to :ehsan akhgari from comment #17)

(In reply to Tom Ritter [:tjr] (ni for response to sec-[approval|rating|advisories|cve]) from comment #16)

(In reply to :ehsan akhgari from comment #15)

  1. I think that this behavior would strengthen of First Party Isolation, and Tor would be happy with it, and thus we would not need to maintain backwards compatibility for them.

Fantastic! Do we need to worry about porting over any data?

That depends. Do you want to annoy all the people who have FPI turned on in Firefox; or not? Tor doesn't need porting; but I and everyone else using FPI with Firefox would probably a) lose all of our cookies and b) orphan all the storage present in our browsers under cachekeys that will never be accessed and probably never deleted unless we delete and re-create our profile. There are many vocal people using FPI....

Fair enough. I was asking because under this plan there is nothing we can do to preserve people's cache data. For storage data we can do something, depending on how much work we're willing to put into it. I think it's totally reasonable to port the cookie manager data. Porting all web storage data sounds a lot reasonable to me... If you were to propose which data to port, what else (besides cookies) would you put on the list?

Yeah I can't see cache being a problem blowing away. Cookies seems like the most important. I don't have great insight into what else is being used TBH. (Partly because of bugs like Bug 1583891) But I don't think any else is that important? Most other technologies seem like they're opportunistic improvements to the site (e.g. caching stuff in localstorage or offline experiences with SW.) Maybe non-exportable Web Crypto keys?

(In reply to :ehsan akhgari from comment #15)

  1. The notion of different behavior for caches and storage concerns me. It seems like there could be cross-site attacks that could happen when example.com embeds mail.com and ad.com; and ad.com embeds mail.com. The two mail.com iframe will share storage. This is currently the situation; yes - but if we are going to separate caches this way - what is the argument for not separating storage? Do we think we're going to break something?

I think the argument is more that the caches are possible to probe cross-origin. Perhaps Anne can say more. (I do see the possibility of switching storageKey to be the same thing as cacheKey at some point, FWIW.)

Yes; but there are still attacks that can be done by only first-party isolating storage instead of frame-tree isolating. E.g. the window.length trick that can detect if you're logged in on a site that differs in frames served to logged in/logged out; and the onload/onerror detection that's similar. It just seems so much that we have the indicators of some mildly concerning stuff, and I have to imagine more inventive attacks being possible, and we should just bite the bullet and do it (more) securely the first time. /shrug

(In reply to Tom Ritter [:tjr] (ni for response to sec-[approval|rating|advisories|cve]) from comment #18)

(In reply to :ehsan akhgari from comment #15)

  1. The notion of different behavior for caches and storage concerns me. It seems like there could be cross-site attacks that could happen when example.com embeds mail.com and ad.com; and ad.com embeds mail.com. The two mail.com iframe will share storage. This is currently the situation; yes - but if we are going to separate caches this way - what is the argument for not separating storage? Do we think we're going to break something?

I think the argument is more that the caches are possible to probe cross-origin. Perhaps Anne can say more. (I do see the possibility of switching storageKey to be the same thing as cacheKey at some point, FWIW.)

Yes; but there are still attacks that can be done by only first-party isolating storage instead of frame-tree isolating. E.g. the window.length trick that can detect if you're logged in on a site that differs in frames served to logged in/logged out; and the onload/onerror detection that's similar. It just seems so much that we have the indicators of some mildly concerning stuff, and I have to imagine more inventive attacks being possible, and we should just bite the bullet and do it (more) securely the first time. /shrug

Does using cacheKey for everything that we are currently thinking of using storageKey for fix all known existing xsleaks though? For example, it doesn't fix the frames.length or onload/onerror leaks you're referring to, because as far as I understand those leaks are available through opener references as well as through embedded iframes. If my understand is correct, each one of those attacks requires mitigations specific to the vector used to read the cross-origin data from...

(In reply to :ehsan akhgari from comment #19)

Does using cacheKey for everything that we are currently thinking of using storageKey for fix all known existing xsleaks though? For example, it doesn't fix the frames.length or onload/onerror leaks you're referring to, because as far as I understand those leaks are available through opener references as well as through embedded iframes. If my understand is correct, each one of those attacks requires mitigations specific to the vector used to read the cross-origin data from...

Perhaps right.

The opener bit is certainly true. Tor has theorized about treating things created via window.open as if they were framed by the originating first party as a way to fix broken login flows. That would inadvertently also solve the .length bit there because you wouldn't be logged in. Bit like using a sledgehammer for a finish nail though, and it'd break a bunch of other stuff probably.

The images onload/onerror however would be fixed. I think. Presently, with no FPI:
a) ad.com is visited as a first party, and uses onload/onerror tricks to figure out if you're signed into target.com OR
b) example.com is visited as a first party, and frames ad.com, and ad.com uses the same trick.

With FPI:
a) ad.com is visited as a first party, and cannot use onload/onerror tricks to figure out if you're signed into target.com, because ad.com||target.com's cookie jar is empty. OR
b) example.com is visited as a first party, and frames ad.com, and ad.com is blocked the same way.

With a Partial-Tree StorageKey:
a) example.com is visited as a first party, it frames target.com and ad.com. You login to target.com (in the frame). ad.com can now use the onload/onerror trick.

With a Full-Tree StorageKey
a) example.com is visited as a first party, it frames target.com and ad.com. You login to target.com (in the frame). ad.com cannot use the onload/onerror trick.

I think.

(In reply to Tom Ritter [:tjr] (ni for response to sec-[approval|rating|advisories|cve]) from comment #20)

The opener bit is certainly true. Tor has theorized about treating things created via window.open as if they were framed by the originating first party as a way to fix broken login flows. That would inadvertently also solve the .length bit there because you wouldn't be logged in. Bit like using a sledgehammer for a finish nail though, and it'd break a bunch of other stuff probably.

Unrelated to this bug, but related to that, see this discussion.

The images onload/onerror however would be fixed. I think. Presently, with no FPI:
a) ad.com is visited as a first party, and uses onload/onerror tricks to figure out if you're signed into target.com OR
b) example.com is visited as a first party, and frames ad.com, and ad.com uses the same trick.

With FPI:
a) ad.com is visited as a first party, and cannot use onload/onerror tricks to figure out if you're signed into target.com, because ad.com||target.com's cookie jar is empty. OR
b) example.com is visited as a first party, and frames ad.com, and ad.com is blocked the same way.

With a Partial-Tree StorageKey:
a) example.com is visited as a first party, it frames target.com and ad.com. You login to target.com (in the frame). ad.com can now use the onload/onerror trick.

With a Full-Tree StorageKey
a) example.com is visited as a first party, it frames target.com and ad.com. You login to target.com (in the frame). ad.com cannot use the onload/onerror trick.

Yes, this is all true. In practice however two things are unclear to me, one is how often ad.com is frames without a script from ad.com being embedded in the embedding browsing context. The second is how much web compatibility we'll be trading off with a full-tree storage key vs a triple-keyed storage key. Especially without the second data point having a concrete discussion about the trade-offs is very difficult.

The web storage key's impact on the behaviour of web APIs is also quite observable, and that is an area which would be really nice to have cross-browser agreement on our behaviour.

What we should probably do IMO is to have a pref for turning on the full-tree storage key behaviour for the storage key part and then do some experimentation based on that. I doubt we can come to anything more concrete here (if you disagree please let me know, perhaps I'm not taking something into account that I should?)

BTW, bug 1612652 is also talking about the interaction of the storage and cache keys, and there asuth is suggesting that we should use the same key for both to keep things like workbox working (bug 1612652 comment 9).

Well, he's saying that service workers and the Cache API (this is not the HTTP cache but instead a JavaScript API for a similar concept that's origin-bound and quite similar to localStorage or some such) cannot use the cache key. It begs the question whether a cache key that's stronger than the storage key (in that it has more components) is a worthwhile endeavor however, especially for v1.

Status: NEW → ASSIGNED
Assignee: tihuang → nobody
Status: ASSIGNED → NEW

Given comment 24 it seems reasonable to close this bug now, any objections?

Flags: needinfo?(tihuang)
Flags: needinfo?(senglehardt)
Flags: needinfo?(annevk)

I don't object to closing this and focusing on shipping scoping to the top-level site. I do think we ought to look into the various HTTP cache attacks further and do more to protect framed documents, but that's a separate primarily security-focused effort.

Flags: needinfo?(annevk)

(In reply to Anne (:annevk) from comment #26)

I don't object to closing this and focusing on shipping scoping to the top-level site. I do think we ought to look into the various HTTP cache attacks further and do more to protect framed documents, but that's a separate primarily security-focused effort.

I feel the same way.

Flags: needinfo?(senglehardt)

Ditto

Flags: needinfo?(tihuang)

Note that relative to comment 0 we made one change to the scoping, which is to take the scheme into account as well: bug 1637516.

Filed bug 1681036 as follow-up for using additional keys for the HTTP cache for security reasons.

Resolving this as WORKSFORME per above comments.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(amarchesini)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.