Closed Bug 1186160 Opened 9 years ago Closed 9 years ago

WebSocketChannel accesses nsDocShell and nsDocument off the main thread

Categories

(Core :: Networking: WebSockets, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox39 --- unaffected
firefox40 + wontfix
firefox41 --- wontfix
firefox42 + fixed
firefox43 --- fixed
firefox-esr38 45+ fixed
b2g-v2.0 --- wontfix
b2g-v2.0M --- wontfix
b2g-v2.1 --- wontfix
b2g-v2.1S --- wontfix
b2g-v2.2 --- fixed
b2g-v2.2r --- affected
b2g-master --- fixed

People

(Reporter: mccr8, Assigned: michal)

References

Details

(Keywords: csectype-race, regression, sec-high, Whiteboard: [post-critsmash-triage][adv-main42+][adv-esr38.7+])

Crash Data

Attachments

(2 files)

Tracy pointed out that NS_CycleCollectorSuspect3 is the number #21 crash on Firefox 40 beta. I looked at a dozen of the crash reports and they all have stacks like this:

NS_CycleCollectorSuspect3
nsHTMLDocument::AddRef()
nsCOMPtr<nsNavHistoryQueryOptions>::nsCOMPtr<nsNavHistoryQueryOptions>(nsNavHistoryQueryOptions*)
nsDocShell::ShouldPrepareForIntercept(nsIURI*, bool, bool*)
mozilla::net::HttpBaseChannel::ShouldIntercept()
mozilla::net::nsHttpChannel::AsyncOpen(nsIStreamListener*, nsISupports*)
mozilla::net::WebSocketChannel::BeginOpenInternal()
mozilla::net::WebSocketChannel::BeginOpen(bool)
mozilla::net::WebSocketChannel::Notify(nsITimer*)

This is off the main thread.

Some reports:
https://crash-stats.mozilla.com/report/index/9ebb3224-6e42-40b2-887a-beea82150717
https://crash-stats.mozilla.com/report/index/727f35a4-122a-4f32-8582-1d8152150718
https://crash-stats.mozilla.com/report/index/ea6c598d-0794-483b-a6a2-deb252150716
https://crash-stats.mozilla.com/report/index/68840357-f4b8-4400-a1bc-afc342150717
Flags: needinfo?(josh)
This doesn't show up in any noticeable quantity on Aurora or Nightly.
Is this a regression from Bug 1170645?
Or hmm, more likely a variant of it?
[Tracking Requested - why for this release]:(In reply to Andrew McCreight [:mccr8] from comment #0)
> Tracy pointed out that NS_CycleCollectorSuspect3 is the number #21 crash on
> Firefox 40 beta.

Given that and the sec-high rating, I'm requesting tracking for 40.
This websocket code is totally bogus if it's being called off the main thread: http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp#1331
Flags: needinfo?(josh)
And more so, this timer expects to be notified from the main thread: http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp#2963
Blocks: 1170645
Component: DOM → Networking: WebSockets
> This is off the main thread.

Why do you think it is? At a first glance of the code, it doesn't seem possible that this runs off the main thread.
In https://crash-stats.mozilla.com/report/index/9ebb3224-6e42-40b2-887a-beea82150717#allthreads the main thread is thread 0 and shows a different backtrace than the crashing thread.
Flags: needinfo?(michal.novotny)
Tracking 40+. I don't see any details on why this is rated sec-high. Is that the correct rating? 

Is 41 affected?
Flags: needinfo?(continuation)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #10)
> Tracking 40+. I don't see any details on why this is rated sec-high.
Using a main-thread-only object off the main thread could cause the object to get corrupted when multiple threads write to the same object at once.

> Is that the correct rating?
Yes.

> Is 41 affected?
I don't see any crashes at all with this signature on 41, which is weird. I also don't think I've seen any of these crashes on 42, so maybe this is 40-only somehow. There are also a huge number of crashes from bug 1185144 on 42 so it is hard to know what is really going on there. But maybe somehow this got fixed in 41.
Flags: needinfo?(continuation)
mReconnectDelayTimer is created in FailDelayManager::DelayOrBegin which should be always called on the main thread. I see several possible scenarios when it might be called off the main thread.

1) I need to investigate more the call coming from WebSocketChannel::StopSession, i.e. following call stack:

  FailDelayManager::DelayOrBegin
  nsWSAdmissionManager::ConnectNext
  nsWSAdmissionManager::OnStopSession
  WebSocketChannel::StopSession

I'm not sure how it is ensured that WebSocketChannel::StopSession is called always on the main thread when the channel is connecting.

2) If WebSocketChannel::AsyncOpen is called off the main thread FailDelayManager::DelayOrBegin can be called also on the same thread.

3) There could be also some extension that replaces ProtocolProxyService and/or DNSService which calls the callback on a wrong thread. I saw some weird extension 487f499a-039d-416f-90d7-8ee8d015cb39 at least in one crash report. It seems to be a kind of spyware. But I don't think this is the case since it would probably affect not only websocket channel.
Assignee: nobody → michal.novotny
Flags: needinfo?(michal.novotny)
41 and 42 are affected too.
it is too late for 40.
Michal, any progress on this one? This is a rather bad sec-high bug.
Flags: needinfo?(michal.novotny)
Attached patch patch v1Splinter Review
(In reply to Sylvestre Ledru [:sylvestre] from comment #13)
> 41 and 42 are affected too.

Are you sure? I didn't find any report for these versions. It seems to me this was fixed by bug #1170645.

Anyway, I had a deeper look at (1) from comment #12 and it seems that FailDelayManager::DelayOrBegin() can be in theory called off the main thread. If any message is enqueued between calls to WebSocketChannel::OnTransportAvailable() and WebSocketChannel::OnStartRequest() then WebSocketChannel::OnOutputStreamReady() is called and if it fails then WebSocketChannel::StopSession() is called off the main thread while the channel is still connecting.

This patch ensures that no message is enqueued until the channel is in NOT_CONNECTING state. I've also changed WebSocketChannel::AsyncOpen() so that it fails in release build when the method is not called on the main thread.
Flags: needinfo?(michal.novotny)
Attachment #8652384 - Flags: review?(mcmanus)
(In reply to Michal Novotny (:michal) from comment #15)
> > 41 and 42 are affected too.
> 
> Are you sure? I didn't find any report for these versions. It seems to me
> this was fixed by bug #1170645.
Yes, If you have a look to
https://crash-stats.mozilla.com/report/list?signature=NS_CycleCollectorSuspect3
Open the product section, you will see  41.0b3  with 8 crashes and  42.0a2  with 16 over the last 7 days.
This bug is about crashes coming from mozilla::net::WebSocketChannel and I couldn't find any new reports with stack like the one in the description of this bug.
Flags: needinfo?(sledru)
Therefor, the crash signature should be updated or moved somewhere else, right?
Flags: needinfo?(sledru)
I don't know how to differentiate the signatures when the top frame is the same. Other bugs blocking bug #1151643 has the same signature and nobody cares.

Btw. is it possible to search reports in crash-stats by methods on frame different than frame 0? I.e. to find all reports where mozilla::net::WebSocketChannel::BeginOpen is anywhere in the call stack?
(In reply to Michal Novotny (:michal) from comment #19)
> I don't know how to differentiate the signatures when the top frame is the
> same.

We could add this frame to the "prefix skiplist", which means that the next frame would be appended (with a | separator) to form the signature.

> Btw. is it possible to search reports in crash-stats by methods on frame
> different than frame 0? I.e. to find all reports where
> mozilla::net::WebSocketChannel::BeginOpen is anywhere in the call stack?

Unfortunately, we do not have this functionality at this time, see bug 480503.
Yeah, there's unfortunately a tradeoff between being able to find any new instances of this crash using the single signature Suspect3 and being able to see if a particular instance of it has gone away without looking at all of the reports.
There's a test case for this in bug 1198500.
Comment on attachment 8652384 [details] [diff] [review]
patch v1

Review of attachment 8652384 [details] [diff] [review]:
-----------------------------------------------------------------

this is tricky, but lgtm. thanks michal.
Attachment #8652384 - Flags: review?(mcmanus) → review+
Too late to do anything wrt 40.
https://hg.mozilla.org/mozilla-central/rev/a084674ecc7e
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
This should have had sec-approval before landing, no?
Flags: needinfo?(michal.novotny)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #26)
> This should have had sec-approval before landing, no?

Yes. It is a high rated security bug that affects more than one version. This should have have sec-approval+ before landing. We'll need the sec-approval? form filled out after the fact to get the information we need and also branch (including ESR38 if applicable) patches nominated as well.
Group: core-security → core-security-release
Comment on attachment 8652384 [details] [diff] [review]
patch v1

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
It's hard to tell. The scenario described in comment #15 is IMO just theoretical. It's probably very hard (if not impossible) to ensure that WebSocketChannel::Close() is called exactly between calls to OnTransportAvailable() and OnStartRequest() and then ensure that OnOutputStreamReady() fails.

It's worth to note that what's described in comment #15 and fixed by this patch could lead to the same crashes as reported in this bug, but I'm pretty sure that the cause of the reported crashes is different since there are no reports for 41 and newer. I.e. it has been fixed by some other patch but we don't know which one. My guess was bug 1170645 so I tried to back that patch out and reproduce the crash with steps from bug 1198500, but I wasn't able to reproduce it.


Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
No.


Which older supported branches are affected by this flaw?
It was probably introduces by bug 711793, i.e. 3 years ago.


Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
It should be easy to backport this patch. But more important is to find the patch that actually fixed the reported crashes.


How likely is this patch to cause regressions; how much testing does it need?
It's unlikely. Existing mochitests should be enough.
Flags: needinfo?(michal.novotny)
Attachment #8652384 - Flags: sec-approval?
Can we back port this to 42?
Attachment #8652384 - Flags: sec-approval? → sec-approval+
Comment on attachment 8652384 [details] [diff] [review]
patch v1

Approval Request Comment
[Feature/regressing bug #]: 711793
[User impact if declined]: using a main-thread-only object off the main thread
[Describe test coverage new/current, TreeHerder]: existing mochitests
[Risks and why]: low, the patch changes the code patch only for an edge case which maybe never happens
[String/UUID change made/needed]: none
Attachment #8652384 - Flags: approval-mozilla-aurora?
Comment on attachment 8652384 [details] [diff] [review]
patch v1

ok, let's take it.
Attachment #8652384 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Since this is wontfixed for 41, let's aim this for the esr that will come out with 42 (38.4.0).
Michal, could you please nominate the patch for uplift to mozilla-esr38? Do you believe it is safe for inclusion in ESR38.4.0? ESRs typically include fixes to sec-high and sec-critical bugs. Thanks!
Flags: needinfo?(michal.novotny)
for 2.2 this has conflict uplifting

grafting 260243:a084674ecc7e "Bug 1186160 - WebSocketChannel accesses nsDocShell and nsDocument off the main thread, r=mcmanus"
merging netwerk/protocol/websocket/WebSocketChannel.cpp
warning: conflicts during merge.
merging netwerk/protocol/websocket/WebSocketChannel.cpp incomplete! (edit conflicts, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue


could you take a look, thanks!
Whiteboard: [post-critsmash-triage]
We have missed this for ESR38 so far.
Flags: needinfo?(rkothari)
Jason, can you help with this or help find someone who can look at the merge conflict? Thanks.
Flags: needinfo?(jduell.mcbugs)
Sorry, I somehow missed the comment about the conflict, I'll have a look at it today.
Flags: needinfo?(jduell.mcbugs)
(In reply to Carsten Book [:Tomcat] from comment #35)
> for 2.2 this has conflict uplifting

What do you mean by 2.2? Where did you try to uplift the patch?
Flags: needinfo?(cbook)
(In reply to Michal Novotny (:michal) from comment #39)
> (In reply to Carsten Book [:Tomcat] from comment #35)
> > for 2.2 this has conflict uplifting
> 
> What do you mean by 2.2? Where did you try to uplift the patch?

http://hg.mozilla.org/releases/mozilla-b2g37_v2_2/ is the tree/repo i mean :)
Flags: needinfo?(cbook)
Flags: needinfo?(michal.novotny)
(In reply to Ritu Kothari (:ritu) from comment #34)
> Michal, could you please nominate the patch for uplift to mozilla-esr38? Do
> you believe it is safe for inclusion in ESR38.4.0? ESRs typically include
> fixes to sec-high and sec-critical bugs. Thanks!

Sorry I missed it for ESR38. But as I wrote in comment #28, I think this patch fixes a theoretical crash that might not happen at all. The reported crashes were fixed with some other patch but unfortunately we don't know which one :-/
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main42+]
This patch looks straightforward enough that we should just land it on ESR and not worry about how easy or hard it would be to exploit there.
Blocks: 711793
Michal, could you please nominate the patch for uplift to esr38? This will be included in esr38.7.0
Flags: needinfo?(michal.novotny)
Comment on attachment 8652384 [details] [diff] [review]
patch v1

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: using a main-thread-only object off the main thread
Fix Landed on Version: 42
Risk to taking this patch (and alternatives if risky): low, the patch is in the tree for quite a long time and we're not aware of any issue
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Flags: needinfo?(michal.novotny)
Attachment #8652384 - Flags: approval-mozilla-esr38?
Comment on attachment 8652384 [details] [diff] [review]
patch v1

Sec-high issue that has been fixed for a while now (since 42). Let's uplift to esr38.7
Attachment #8652384 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
Whiteboard: [post-critsmash-triage][adv-main42+] → [post-critsmash-triage][adv-main42+][adv-esr38.7+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: