1730797 - (pool-party) Using capped, unpartitioned thread-pools for cross-site and / or cross-profile communication

Regarding the need to keep this issue under wraps, we are under the impression that global limit (resource cap) attacks are already conceptually known (and the fact that there are various channels in the browser or OS through which those could be enabled), e.g. https://xsleaks.dev/docs/attacks/timing-attacks/connection-pool/. Would you agree with that? Is there anything especially sensitive about this issue that puts users at a higher risk when compared to other known attacks, also outside of global limits such as fingerprinting?
What is the state on other browsers? It's hard to imagine that Firefox is the only browser vulnerable to these types of attacks. Are there any mitigations in e.g. Brave that would not make it vulnerable to (certain) global limit attacks?
Have you studied practical application of this attack when dealing with interference from, say, 10 other tabs open in the background and connecting to various services? Does the usage of WebSockets make this channel more robust towards interference because they are less frequently used?

Thanks!

Flags: needinfo?(sor.karami)

Flags: needinfo?(snyderp)

Flags: needinfo?(jhofmann)

Peter Snyder

Reporter

Comment 8

•

4 years ago

Howdy!

Yep, the Chrome team brought that link to our attention too, which was something we'd missed. There are a couple of differences here though.
a. using the pool limits as the communication itself, instead of as a way of leaking a timing signal, means that in most cases the technique will be more robust, not mitigated through certain techniques (fuzzing or restricting timers)
b. the issue isn't limited to explicit network connection pools, but in the broader category of cases where there is a pool used as an implementation detail. The DNS case is one such example, the global locks on things like full screen API and speech synthesis API are other examples
c. the bandwidth on the websocket case is high enough that the attack is trivial and real world (to the degree we think popular trackers might actually do it), which isn't the case for many timing side channels
d. Mostly though, speaking casually, i don't buy the "a determined attacker can always do this bc there are limited system resources at the end of the day" punt. Thats true in principal, but in practice, web-scale trackers aren't going to bring an application or machine to its knees by, say, exhausting all network or file sockets the OS allows, or exhausting all disk space, just to pass an identifier across a partition boundary (setting aside that system noise would make that difficult). But web-trackers absolutely might pay games with a low-cap, application managed, unlikely to have side effect pool like some of the ones we identified (WS, SSE, etc)
Every browser we tested is vulnerable to some attacks in this category. We found the most cases in Gecko, but a similar number in Blink. Two notes though:
a. WebKit has the least, mostly bc most of their caps have no limits. So if you wanted to, say, do the websocket attack on WebKit, you'd have to keep opening up connections until you hit some system limit, and that makes the attack not practical for lots of other reasons
b. Gecko has the unique problem of having an application-wide cap on these pools, which allows communication across profiles, including private windows and non-private windows. Blink seems to have per-profile limits, which prevents the cross-profile leaks

We're not sure what we're going to do in Brave. If Chromium-folks agree its a problem they want to tackle, then we'll work with upstream. If not, then we'll probably remove caps, or partition pools, for at least the highest bandwidth channels, and possibly prevent some of these kinds of connections all together in Tor tabs (none of this is decided, this is just my teams rough thoughts for plan b).

The main difficulty is that Brave is the only main chromium browser doing any kind of default partitioning for all users (storage or network state, and we'll soon do both). Since most Chromium browsers explicitly allow cross site communication, we might in the medium term have to go it alone. C'est la vie ¯_(ツ)_/¯

Yep, we're working on a paper that includes exactly those measurements. Different factors like how much page-noticeable-per-cost you assume an attacker would allow affect which forms of the attack are practical. But, at the very least, the WebSocket one is extremely practical, bc
i) the pool is large, so high bandwidth, enough so that you can design a communication protocol to be robust against some noise
ii) the typical site on the web uses zero websockets (so relatively stable channel)

When randomly crawling the Alexa 5000 (IIRC) we found almost (maybe literally zero) contention for the WS channel.

The SSE version of the attack has #2 but not #1, then DNS channel has #1 but not #2, etc. I'd guess having #2 is more important than #1 from an attacker perspective

Hope that helps, very happy to share more if its helpful

Flags: needinfo?(snyderp)

Daniel Veditz [:dveditz]

Comment 10

•

4 years ago

Is this report intended to be embargoed until some coordinated disclosure?

Group: core-security → core-security-release

Flags: needinfo?(snyderp)

Keywords: sec-audit

Whiteboard: [reporter-external] [client-bounty-form] [verif?] → [embargo maybe?][reporter-external] [client-bounty-form] [verif?]

Martin Thomson [:mt:]

Comment 11

•

4 years ago

I would want to keep this bug marked security until we have a mitigation plan in place at a minimum. I'm sure that other browsers would also. Though these types of attack are known to be possible, there is running code attached to this bug.

It seems like per-partition limits are worth investigating further as a mitigation. Aside from proxied HTTPS requests, there are no practical reasons for a server to need a large number of websockets or SSE channels. Small limits could make sense for those (~200 seems excessive). Capping DNS queries seems like it might be tricky, but it is also worthwhile. Anything protection we apply could be undermined if OS-level limits are independently applied. It would require greater server-side coordination if OS limits are greater than per-partition limits.

Peter Snyder

Reporter

Comment 12

•

4 years ago

:dveditz, i think your question was to me? (apologies if Im reading that wrong, its been a while since i've used bugzilla).

Assuming I got that right, we plan on writing this up as part of a publication submission. Current hope was to submit this on October 12, but (assuming we hit that deadline) we can alway request the paper be kept private during review as well.

Flags: needinfo?(snyderp)

Peter Snyder

Reporter

Comment 13

•

4 years ago

Hi folks, as mentioned, we've submitted our paper around this topic. The paper does not include the POC examples or code shared above (though it does include a description of the algorithm used).

The paper will be under review until March 4th, though we plan on publishing a pre-print of the paper, along with a blog post in the coming week or two, unless other vendors are working on a fix and would like us to briefly delay. (We're also happy to discuss more about posting the pre-print and blog post if you all think this would be harmful to your efforts. We'd like to share ASAP, but we don't want to do so in a way that would avoidably make anyone's lives, or anyone's users' lives, unpleasant).

FWIW, Brave plans on addressing at least the highest bandwidth channels by capping and partitioning. We're certain there will be ones we don't address, but we're less concerned about those bc they're either harder to exploit (eg. limits on DNS messages in flight) or lower bandwidth (eg. things like handles to the system text -> speech process).

Peter Snyder

Reporter

Comment 14

•

4 years ago

Attached file pool-party.pdf — Details

Peter Snyder

Reporter

Comment 15

•

4 years ago

Apologies for the dupe messages; just wanted to note that since the paper is under review, please do not circulate outside of your team(s). Thanks!

Andrew McCreight [:mccr8]

Comment 16

•

4 years ago

A blog post and paper about this issue are public now.

Peter Snyder

Reporter

Comment 17

•

4 years ago

I hope the above isn't a surprise or concern. We asked ~two weeks ago if your team had any objections and heard nothing

Bobby Holley (:bholley)

Comment 18

•

4 years ago

I'd like to thank Pete and the rest of the team for exploring what's possible in this space and thoughtfully coordinating with other vendors to ensure no one was surprised.

We do plan on migrating to partitioned limits, but we're not particularly concerned about these attacks in practice. Since the technique is neither persistent (it requires the two sites to be open at the same time) nor adversarial (it requires the two sites to run colluding code) we think the ROI on productizing and deploying it is probably quite low relative to other available options, and expect trackers and threat actors to come to the same conclusion.

Bobby Holley (:bholley)

Updated

•

4 years ago

Flags: needinfo?(sor.karami)

Bobby Holley (:bholley)

Updated

•

4 years ago

Whiteboard: [embargo maybe?][reporter-external] [client-bounty-form] [verif?] → [reporter-external] [client-bounty-form] [verif?]

Bobby Holley (:bholley)

Comment 20

•

4 years ago

Now that the paper and blog post are public, I'm opening up this bug. I scrubbed the details that aren't ours to share, including the PoC code, though the paper is clear enough about the algorithm that I wouldn't consider it particularly sensitive at this point.

Bobby Holley (:bholley)

Updated

•

4 years ago

Group: core-security-release

Tim Huang[:timhuang]

Updated

•

4 years ago

Severity: -- → S3

Priority: -- → P3

Anne (:annevk)

Updated

•

4 years ago

Blocks: xs-leaks

Anne (:annevk)

Updated

•

4 years ago

See Also: → https://bugs.chromium.org/p/chromium/issues/detail?id=1249658

Arthur Edelstein

Comment 22

•

3 years ago

Hi all -- I'm sharing an update on the pool party attack (the latest version is here: https://arxiv.org/abs/2112.06324)

In this new version of the paper, we managed to speed up the attack using the WebSocket pool in Firefox: it is now able to send 35 bits from one site to another (in open tabs) in 7.0 seconds. We also have now developed a demonstration of the pool party attack using the Web Worker pool (whose global limit is 512 Web Workers), which takes 9.0 seconds. to send 35 bits. (We chose 35 bits as a sufficiently large bit vector to enable the tracking of a unique client between websites.)

In the case of Firefox, the paper describes how both of these attacks are not 100% successful. That is likely because of race conditions in the WebSocket and Web Worker implementations. However, message sending has a high enough success rate that repeating the message a few times is almost guaranteed to ensure the correct message is passed between websites.

We're happy to provide any additional needed information that would help in resolving this issue.

Brave has already capped and partitioned Web Sockets per-eTLD+1; we're looking into how to do the same for Server-Side Events, which are vulnerable to the pool-party attack in Chromium-based browsers (but, as far as we can see, not Firefox).

Frederik Braun [:freddy]

Comment 23

•

3 years ago

Looks like this was accepted to a highly esteemed conference. Congratulations to the authors.
Is there anything else we need to be doing, still?

Alias: pool-party

Flags: needinfo?(tihuang)

Tim Huang[:timhuang]

Comment 24

•

3 years ago

We have implemented the WebSocket connection partitioning in Bug 1713748 that partitions the WebSocket connection queue by originAttributes, I guess this doesn't effectively protect from the pool-party attack based on the global limit. We will look into possible solutions or mitigations. We will probably use the same partitioning approach as Brave, or as an alternative, we can look into introducing noises into the global pool to mitigate the attack.

Flags: needinfo?(tihuang)

Giorgio Maone [:ma1]

Comment 25

•

2 years ago

•

Edited

What's the state of the art here now?
My understanding is that we're partitioning WebSocket connections, but there's still a global limit, and more importantly we still don't partition thread pools?
And how much the general globally capped resources problem is practically concerning in the PBM-only scenario of the Tor Browser?

Daniel Veditz [:dveditz]

Updated

•

2 years ago

Flags: sec-bounty? → sec-bounty-

David Lawrence [:dkl]

Updated

•

1 year ago

Keywords: reporter-external