WebSocketChannel accesses nsDocShell and nsDocument off the main thread

RESOLVED FIXED in Firefox 42, Firefox OS v2.2

Status

()

Core
Networking: WebSockets
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: mccr8, Assigned: michal)

Tracking

(Blocks: 1 bug, {csectype-race, regression, sec-high})

Trunk
mozilla43
csectype-race, regression, sec-high
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox39 unaffected, firefox40+ wontfix, firefox41 wontfix, firefox42+ fixed, firefox43 fixed, firefox-esr3845+ fixed, b2g-v2.0 wontfix, b2g-v2.0M wontfix, b2g-v2.1 wontfix, b2g-v2.1S wontfix, b2g-v2.2 fixed, b2g-v2.2r affected, b2g-master fixed)

Details

(Whiteboard: [post-critsmash-triage][adv-main42+][adv-esr38.7+], crash signature)

Attachments

(2 attachments)

(Reporter)

Description

3 years ago
Tracy pointed out that NS_CycleCollectorSuspect3 is the number #21 crash on Firefox 40 beta. I looked at a dozen of the crash reports and they all have stacks like this:

NS_CycleCollectorSuspect3
nsHTMLDocument::AddRef()
nsCOMPtr<nsNavHistoryQueryOptions>::nsCOMPtr<nsNavHistoryQueryOptions>(nsNavHistoryQueryOptions*)
nsDocShell::ShouldPrepareForIntercept(nsIURI*, bool, bool*)
mozilla::net::HttpBaseChannel::ShouldIntercept()
mozilla::net::nsHttpChannel::AsyncOpen(nsIStreamListener*, nsISupports*)
mozilla::net::WebSocketChannel::BeginOpenInternal()
mozilla::net::WebSocketChannel::BeginOpen(bool)
mozilla::net::WebSocketChannel::Notify(nsITimer*)

This is off the main thread.

Some reports:
https://crash-stats.mozilla.com/report/index/9ebb3224-6e42-40b2-887a-beea82150717
https://crash-stats.mozilla.com/report/index/727f35a4-122a-4f32-8582-1d8152150718
https://crash-stats.mozilla.com/report/index/ea6c598d-0794-483b-a6a2-deb252150716
https://crash-stats.mozilla.com/report/index/68840357-f4b8-4400-a1bc-afc342150717
(Reporter)

Updated

3 years ago
Flags: needinfo?(josh)
(Reporter)

Comment 1

3 years ago
This doesn't show up in any noticeable quantity on Aurora or Nightly.
Is this a regression from Bug 1170645?
Or hmm, more likely a variant of it?

Comment 4

3 years ago
[Tracking Requested - why for this release]:(In reply to Andrew McCreight [:mccr8] from comment #0)
> Tracy pointed out that NS_CycleCollectorSuspect3 is the number #21 crash on
> Firefox 40 beta.

Given that and the sec-high rating, I'm requesting tracking for 40.
status-firefox40: --- → affected
tracking-firefox40: --- → ?

Comment 5

3 years ago
This websocket code is totally bogus if it's being called off the main thread: http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp#1331
Flags: needinfo?(josh)

Comment 6

3 years ago
And more so, this timer expects to be notified from the main thread: http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp#2963

Updated

3 years ago
Blocks: 1170645
(Reporter)

Updated

3 years ago
Component: DOM → Networking: WebSockets
> This is off the main thread.

Why do you think it is? At a first glance of the code, it doesn't seem possible that this runs off the main thread.

Comment 9

3 years ago
In https://crash-stats.mozilla.com/report/index/9ebb3224-6e42-40b2-887a-beea82150717#allthreads the main thread is thread 0 and shows a different backtrace than the crashing thread.
(Reporter)

Updated

3 years ago
status-firefox39: --- → unaffected
status-firefox-esr38: --- → unaffected
Keywords: regression
Flags: needinfo?(michal.novotny)
Tracking 40+. I don't see any details on why this is rated sec-high. Is that the correct rating? 

Is 41 affected?
tracking-firefox40: ? → +
Flags: needinfo?(continuation)
(Reporter)

Comment 11

3 years ago
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #10)
> Tracking 40+. I don't see any details on why this is rated sec-high.
Using a main-thread-only object off the main thread could cause the object to get corrupted when multiple threads write to the same object at once.

> Is that the correct rating?
Yes.

> Is 41 affected?
I don't see any crashes at all with this signature on 41, which is weird. I also don't think I've seen any of these crashes on 42, so maybe this is 40-only somehow. There are also a huge number of crashes from bug 1185144 on 42 so it is hard to know what is really going on there. But maybe somehow this got fixed in 41.
status-firefox42: affected → ---
Flags: needinfo?(continuation)
(Assignee)

Comment 12

3 years ago
mReconnectDelayTimer is created in FailDelayManager::DelayOrBegin which should be always called on the main thread. I see several possible scenarios when it might be called off the main thread.

1) I need to investigate more the call coming from WebSocketChannel::StopSession, i.e. following call stack:

  FailDelayManager::DelayOrBegin
  nsWSAdmissionManager::ConnectNext
  nsWSAdmissionManager::OnStopSession
  WebSocketChannel::StopSession

I'm not sure how it is ensured that WebSocketChannel::StopSession is called always on the main thread when the channel is connecting.

2) If WebSocketChannel::AsyncOpen is called off the main thread FailDelayManager::DelayOrBegin can be called also on the same thread.

3) There could be also some extension that replaces ProtocolProxyService and/or DNSService which calls the callback on a wrong thread. I saw some weird extension 487f499a-039d-416f-90d7-8ee8d015cb39 at least in one crash report. It seems to be a kind of spyware. But I don't think this is the case since it would probably affect not only websocket channel.
Assignee: nobody → michal.novotny
Flags: needinfo?(michal.novotny)
status-firefox41: --- → unaffected
status-firefox42: --- → unaffected
41 and 42 are affected too.
it is too late for 40.
status-firefox41: unaffected → wontfix
status-firefox42: unaffected → affected
status-firefox43: --- → affected
Michal, any progress on this one? This is a rather bad sec-high bug.
Flags: needinfo?(michal.novotny)
(Assignee)

Comment 15

3 years ago
Created attachment 8652384 [details] [diff] [review]
patch v1

(In reply to Sylvestre Ledru [:sylvestre] from comment #13)
> 41 and 42 are affected too.

Are you sure? I didn't find any report for these versions. It seems to me this was fixed by bug #1170645.

Anyway, I had a deeper look at (1) from comment #12 and it seems that FailDelayManager::DelayOrBegin() can be in theory called off the main thread. If any message is enqueued between calls to WebSocketChannel::OnTransportAvailable() and WebSocketChannel::OnStartRequest() then WebSocketChannel::OnOutputStreamReady() is called and if it fails then WebSocketChannel::StopSession() is called off the main thread while the channel is still connecting.

This patch ensures that no message is enqueued until the channel is in NOT_CONNECTING state. I've also changed WebSocketChannel::AsyncOpen() so that it fails in release build when the method is not called on the main thread.
Flags: needinfo?(michal.novotny)
Attachment #8652384 - Flags: review?(mcmanus)
(In reply to Michal Novotny (:michal) from comment #15)
> > 41 and 42 are affected too.
> 
> Are you sure? I didn't find any report for these versions. It seems to me
> this was fixed by bug #1170645.
Yes, If you have a look to
https://crash-stats.mozilla.com/report/list?signature=NS_CycleCollectorSuspect3
Open the product section, you will see  41.0b3  with 8 crashes and  42.0a2  with 16 over the last 7 days.
(Assignee)

Comment 17

3 years ago
This bug is about crashes coming from mozilla::net::WebSocketChannel and I couldn't find any new reports with stack like the one in the description of this bug.
Flags: needinfo?(sledru)
Therefor, the crash signature should be updated or moved somewhere else, right?
Flags: needinfo?(sledru)
(Assignee)

Comment 19

3 years ago
I don't know how to differentiate the signatures when the top frame is the same. Other bugs blocking bug #1151643 has the same signature and nobody cares.

Btw. is it possible to search reports in crash-stats by methods on frame different than frame 0? I.e. to find all reports where mozilla::net::WebSocketChannel::BeginOpen is anywhere in the call stack?

Comment 20

3 years ago
(In reply to Michal Novotny (:michal) from comment #19)
> I don't know how to differentiate the signatures when the top frame is the
> same.

We could add this frame to the "prefix skiplist", which means that the next frame would be appended (with a | separator) to form the signature.

> Btw. is it possible to search reports in crash-stats by methods on frame
> different than frame 0? I.e. to find all reports where
> mozilla::net::WebSocketChannel::BeginOpen is anywhere in the call stack?

Unfortunately, we do not have this functionality at this time, see bug 480503.
(Reporter)

Comment 21

3 years ago
Yeah, there's unfortunately a tradeoff between being able to find any new instances of this crash using the single signature Suspect3 and being able to see if a particular instance of it has gone away without looking at all of the reports.
(Reporter)

Comment 22

3 years ago
There's a test case for this in bug 1198500.
Comment on attachment 8652384 [details] [diff] [review]
patch v1

Review of attachment 8652384 [details] [diff] [review]:
-----------------------------------------------------------------

this is tricky, but lgtm. thanks michal.
Attachment #8652384 - Flags: review?(mcmanus) → review+
Too late to do anything wrt 40.
status-firefox40: affected → wontfix
https://hg.mozilla.org/mozilla-central/rev/a084674ecc7e
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox43: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
This should have had sec-approval before landing, no?
Flags: needinfo?(michal.novotny)

Comment 27

3 years ago
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #26)
> This should have had sec-approval before landing, no?

Yes. It is a high rated security bug that affects more than one version. This should have have sec-approval+ before landing. We'll need the sec-approval? form filled out after the fact to get the information we need and also branch (including ESR38 if applicable) patches nominated as well.

Updated

3 years ago
Group: core-security → core-security-release
(Assignee)

Comment 28

3 years ago
Comment on attachment 8652384 [details] [diff] [review]
patch v1

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
It's hard to tell. The scenario described in comment #15 is IMO just theoretical. It's probably very hard (if not impossible) to ensure that WebSocketChannel::Close() is called exactly between calls to OnTransportAvailable() and OnStartRequest() and then ensure that OnOutputStreamReady() fails.

It's worth to note that what's described in comment #15 and fixed by this patch could lead to the same crashes as reported in this bug, but I'm pretty sure that the cause of the reported crashes is different since there are no reports for 41 and newer. I.e. it has been fixed by some other patch but we don't know which one. My guess was bug 1170645 so I tried to back that patch out and reproduce the crash with steps from bug 1198500, but I wasn't able to reproduce it.


Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
No.


Which older supported branches are affected by this flaw?
It was probably introduces by bug 711793, i.e. 3 years ago.


Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
It should be easy to backport this patch. But more important is to find the patch that actually fixed the reported crashes.


How likely is this patch to cause regressions; how much testing does it need?
It's unlikely. Existing mochitests should be enough.
Flags: needinfo?(michal.novotny)
Attachment #8652384 - Flags: sec-approval?
status-b2g-v2.0: --- → wontfix
status-b2g-v2.0M: --- → wontfix
status-b2g-v2.1: --- → wontfix
status-b2g-v2.1S: --- → affected
status-b2g-v2.2: --- → affected
status-b2g-v2.2r: --- → affected
status-b2g-master: --- → fixed
status-firefox-esr38: unaffected → affected
tracking-firefox-esr38: --- → ?

Comment 29

3 years ago
Can we back port this to 42?
tracking-firefox42: --- → +

Updated

3 years ago
Attachment #8652384 - Flags: sec-approval? → sec-approval+
(Assignee)

Comment 30

3 years ago
Comment on attachment 8652384 [details] [diff] [review]
patch v1

Approval Request Comment
[Feature/regressing bug #]: 711793
[User impact if declined]: using a main-thread-only object off the main thread
[Describe test coverage new/current, TreeHerder]: existing mochitests
[Risks and why]: low, the patch changes the code patch only for an edge case which maybe never happens
[String/UUID change made/needed]: none
Attachment #8652384 - Flags: approval-mozilla-aurora?
Comment on attachment 8652384 [details] [diff] [review]
patch v1

ok, let's take it.
Attachment #8652384 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/f4e37234ca80
status-firefox42: affected → fixed
Since this is wontfixed for 41, let's aim this for the esr that will come out with 42 (38.4.0).
status-b2g-v2.1S: affected → wontfix
Michal, could you please nominate the patch for uplift to mozilla-esr38? Do you believe it is safe for inclusion in ESR38.4.0? ESRs typically include fixes to sec-high and sec-critical bugs. Thanks!
Flags: needinfo?(michal.novotny)
for 2.2 this has conflict uplifting

grafting 260243:a084674ecc7e "Bug 1186160 - WebSocketChannel accesses nsDocShell and nsDocument off the main thread, r=mcmanus"
merging netwerk/protocol/websocket/WebSocketChannel.cpp
warning: conflicts during merge.
merging netwerk/protocol/websocket/WebSocketChannel.cpp incomplete! (edit conflicts, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue


could you take a look, thanks!
Whiteboard: [post-critsmash-triage]

Comment 36

3 years ago
We have missed this for ESR38 so far.
Flags: needinfo?(rkothari)
Jason, can you help with this or help find someone who can look at the merge conflict? Thanks.
Flags: needinfo?(jduell.mcbugs)
(Assignee)

Comment 38

3 years ago
Sorry, I somehow missed the comment about the conflict, I'll have a look at it today.
Flags: needinfo?(jduell.mcbugs)
(Assignee)

Comment 39

3 years ago
(In reply to Carsten Book [:Tomcat] from comment #35)
> for 2.2 this has conflict uplifting

What do you mean by 2.2? Where did you try to uplift the patch?
Flags: needinfo?(cbook)
(In reply to Michal Novotny (:michal) from comment #39)
> (In reply to Carsten Book [:Tomcat] from comment #35)
> > for 2.2 this has conflict uplifting
> 
> What do you mean by 2.2? Where did you try to uplift the patch?

http://hg.mozilla.org/releases/mozilla-b2g37_v2_2/ is the tree/repo i mean :)
Flags: needinfo?(cbook)
(Assignee)

Comment 41

3 years ago
Created attachment 8678881 [details] [diff] [review]
patch for mozilla-b2g37_v2_2 tree
Flags: needinfo?(michal.novotny)
(Assignee)

Comment 43

3 years ago
(In reply to Ritu Kothari (:ritu) from comment #34)
> Michal, could you please nominate the patch for uplift to mozilla-esr38? Do
> you believe it is safe for inclusion in ESR38.4.0? ESRs typically include
> fixes to sec-high and sec-critical bugs. Thanks!

Sorry I missed it for ESR38. But as I wrote in comment #28, I think this patch fixes a theoretical crash that might not happen at all. The reported crashes were fixed with some other patch but unfortunately we don't know which one :-/

Updated

3 years ago
Flags: needinfo?(rkothari)

Updated

3 years ago
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main42+]
This patch looks straightforward enough that we should just land it on ESR and not worry about how easy or hard it would be to exploit there.
Blocks: 711793
Michal, could you please nominate the patch for uplift to esr38? This will be included in esr38.7.0
Flags: needinfo?(michal.novotny)
(Assignee)

Comment 46

3 years ago
Comment on attachment 8652384 [details] [diff] [review]
patch v1

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: using a main-thread-only object off the main thread
Fix Landed on Version: 42
Risk to taking this patch (and alternatives if risky): low, the patch is in the tree for quite a long time and we're not aware of any issue
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Flags: needinfo?(michal.novotny)
Attachment #8652384 - Flags: approval-mozilla-esr38?
Comment on attachment 8652384 [details] [diff] [review]
patch v1

Sec-high issue that has been fixed for a while now (since 42). Let's uplift to esr38.7
Attachment #8652384 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+

Updated

2 years ago
tracking-firefox-esr38: ? → 45+

Updated

2 years ago
Whiteboard: [post-critsmash-triage][adv-main42+] → [post-critsmash-triage][adv-main42+][adv-esr38.7+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.