Closed
Bug 805251
Opened 12 years ago
Closed 12 years ago
Crash [@ mozilla::DataChannelConnection::Close]
Categories
(Core :: WebRTC, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla20
Tracking | Status | |
---|---|---|
firefox19 | --- | disabled |
firefox20 | --- | fixed |
firefox-esr17 | --- | unaffected |
b2g18 | --- | unaffected |
People
(Reporter: florian, Assigned: jesup)
References
()
Details
(Keywords: crash, reproducible, sec-moderate, Whiteboard: [WebRTC], [blocking-webrtc+][adv-main20-])
Crash Data
Attachments
(3 files, 1 obsolete file)
3.54 KB,
patch
|
Details | Diff | Splinter Review | |
14.99 KB,
patch
|
ekr
:
review+
derf
:
review+
|
Details | Diff | Splinter Review |
18.36 KB,
patch
|
mcmanus
:
review+
jesup
:
review+
|
Details | Diff | Splinter Review |
https://crash-stats.mozilla.com/report/index/bp-c0c033e9-7534-45d1-813a-077612121024
https://crash-stats.mozilla.com/report/index/bp-2c4957dd-2388-450e-9ef1-5a6a62121024
This happened when restarting the browser, I was experimenting with adding a data channel to https://github.com/anantn/socialapi-demo
Updated•12 years ago
|
Crash Signature: [@ mozilla::DataChannelConnection::Close(unsigned short)]
Updated•12 years ago
|
Severity: normal → critical
Comment 1•12 years ago
|
||
Florian, do you have a testcase for it?
Reporter | ||
Comment 2•12 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #1)
> Florian, do you have a testcase for it?
Not really, but it seems fairly reproducible (it doesn't crash all the time, but I suspect running this under valgrind on linux would likely show errors all the time).
The code I was using is https://github.com/anantn/socialapi-demo/tree/312321224543ea0d249f697dd7c58708842100d9 with local changes (I'm attaching here the local changes I currently have).
I had established a call, and then typed something in the input box of the chat window. Restarting the browser (I'm using the restartless restart add-on for that, but I don't think it really matters, just quitting Firefox would probably give the same result).
Reporter | ||
Comment 3•12 years ago
|
||
Another slightly similar, but different stack: https://crash-stats.mozilla.com/report/index/bp-17c3ab7e-c69b-4b16-a9e9-cf6f52121025
And I got https://crash-stats.mozilla.com/report/index/bp-7bd712ef-21d0-4fed-b5d0-4ddf32121025 once in the same situation, but I think that's bug 805112.
Reporter | ||
Comment 4•12 years ago
|
||
The steps to reproduce are:
- use this version of the test social provider: https://github.com/anantn/socialapi-demo/tree/b7790d36f500c58a8278d1cd2f42efc6be7a411d
(reading the README.md file will definitely be required to understand how this works)
- Remove the "return false;" of the onsubmit handler (this line: https://github.com/anantn/socialapi-demo/commit/b7790d36f500c58a8278d1cd2f42efc6be7a411d#L0R178)
- Start a call
- Type something in the chat textbox (it will make the remote video disappear, that's "expected").
- Restart the browser.
In this situation, it crashes about 2 out of 3 times for me.
Assignee | ||
Comment 6•12 years ago
|
||
I'm not on facebook, and it appears I can't use this without a facebook account. Can someone catch a crash in gdb and give me the dumps of the relevant objects and params on the stack (and line number!)
I there any way to use this without Facebook?
Reporter | ||
Comment 7•12 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #6)
> I'm not on facebook, and it appears I can't use this without a facebook
> account.
This demo provider is completely unrelated to Facebook. If you are seeing Facebook instead of this, it's probably that you haven't restarted your browser after tweaking the pref (the social provider pref seems to only be read at startup).
Assignee | ||
Comment 8•12 years ago
|
||
Note you need the fix from bug 805871 to test this today
Comment 9•12 years ago
|
||
Florian - Can you help us get accurate STR for the crash?
Flags: needinfo?(florian)
Whiteboard: [WebRTC], [blocking-webrtc+]
Assignee | ||
Comment 10•12 years ago
|
||
I strongly believe this is a dup of bug 807929, which now has a patch.
Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Updated•12 years ago
|
Flags: needinfo?(florian)
Comment 13•12 years ago
|
||
As given by bug 824711 this is not fixed yet.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → rjesup
Priority: -- → P1
Comment 14•12 years ago
|
||
Definitely still reproducible on 12/29 build. I managed to reproduce when playing around with the data channels test page.
https://crash-stats.mozilla.com/report/index/bp-1021fd0c-63b4-48e0-b271-8389b2121230
Keywords: reproducible
Updated•12 years ago
|
Comment 15•12 years ago
|
||
I'm still seeing this as well on the social api demo:
https://crash-stats.mozilla.com/report/index/bp-a4468847-996f-40bd-a2a6-ec9522130102
https://crash-stats.mozilla.com/report/index/bp-9a11b549-b5e3-423d-a0cb-429fa2130102
Comment 16•12 years ago
|
||
Actually, I think this crash has migrated onto windows as well now:
https://crash-stats.mozilla.com/report/index/bp-1021fd0c-63b4-48e0-b271-8389b2121230
https://crash-stats.mozilla.com/report/index/9e5d2c6d-33d8-40dc-811c-2a04d2130102
Example stack being:
0 xul.dll mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock<mozilla::Mutex> obj-firefox/dist/include/mozilla/Mutex.h:153
1 xul.dll mozilla::DataChannelConnection::Close netwerk/sctp/datachannel/DataChannel.cpp:2008
2 xul.dll mozilla::DataChannel::Close netwerk/sctp/datachannel/DataChannel.cpp:2075
3 xul.dll mozilla::DataChannel::~DataChannel netwerk/sctp/datachannel/DataChannel.cpp:2049
4 xul.dll mozilla::DataChannel::`scalar deleting destructor'
5 xul.dll mozilla::DataChannel::Release netwerk/sctp/datachannel/DataChannel.h:297
6 xul.dll nsRefPtr<mozilla::DataChannel>::~nsRefPtr<mozilla::DataChannel> obj-firefox/dist/include/nsAutoPtr.h:876
7 xul.dll nsDOMDataChannel::~nsDOMDataChannel
8 xul.dll nsDOMDataChannel::`scalar deleting destructor'
9 xul.dll nsDOMEventTargetHelper::Release content/events/src/nsDOMEventTargetHelper.cpp:74
10 xul.dll ReleaseSliceNow js/xpconnect/src/XPCJSRuntime.cpp:576
11 xul.dll XPCIncrementalReleaseRunnable::ReleaseNow js/xpconnect/src/XPCJSRuntime.cpp:631
12 winmm.dll timeGetTime
13 xul.dll XPCIncrementalReleaseRunnable::Run js/xpconnect/src/XPCJSRuntime.cpp:660
14 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:627
15 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:82
16 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:208
17 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:182
18 xul.dll nsBaseAppShell::Run widget/xpwidgets/nsBaseAppShell.cpp:163
19 xul.dll nsAppShell::Run widget/windows/nsAppShell.cpp:232
20 xul.dll nsAppStartup::Run toolkit/components/startup/nsAppStartup.cpp:288
21 xul.dll XREMain::XRE_mainRun toolkit/xre/nsAppRunner.cpp:3823
22 xul.dll XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:3890
23 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:4088
24 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:105
25 firefox.exe __tmainCRTStartup crtexe.c:552
26 kernel32.dll BaseThreadInitThunk
27 ntdll.dll __RtlUserThreadStart
28 ntdll.dll _RtlUserThreadStart
Assignee | ||
Comment 17•12 years ago
|
||
Assignee | ||
Comment 18•12 years ago
|
||
Comment on attachment 698279 [details] [diff] [review]
Clean up DataChannel close logic, fix creation refcount
Patrick - this is a high priority for review as the Social API guys are hitting problems with this frequently when reloading pages (bug 814882). The *main* problem was the incorrect type on NotifyDataChannel causing a refcount imbalance.
The rest is largely cleanup of Close(), safety checks (of mConnection), and handling the WAITING_TO_OPEN case in SCTP stream reset (which was hitting a MOZ_ASSERT).
Attachment #698279 -
Flags: review?(mcmanus)
Assignee | ||
Comment 19•12 years ago
|
||
Simplified patch that just fixes the refcount issue and transport release per ekr's instructions, and one small other bug (not including a state in the incoming reset code) - for enabling review from a non-netwerk peer. refcount issue was the some DataChannels (from ::Open) were created with an already_AddRefed<> pointer, and some (from NotifyDataChannel) weren't
Assignee | ||
Comment 20•12 years ago
|
||
Comment on attachment 698457 [details] [diff] [review]
Minimal fix for creation refcount and transport release
Asking for review from derf (for the straight XPCOM refcount issues) and ekr (for transport issues) on the simplified patch. (state issue both/either - it's simple/obvious as are teh debugs). If/when we get review on the main patch, we'll substitute that for this.
Attachment #698457 -
Flags: review?(tterribe)
Attachment #698457 -
Flags: review?(ekr)
Assignee | ||
Comment 21•12 years ago
|
||
update main patch with transportflow changes from ekr
Assignee | ||
Updated•12 years ago
|
Attachment #698279 -
Attachment is obsolete: true
Attachment #698279 -
Flags: review?(mcmanus)
Assignee | ||
Updated•12 years ago
|
Attachment #698459 -
Flags: review?(mcmanus)
Attachment #698459 -
Flags: review?(ekr)
Comment 22•12 years ago
|
||
Comment on attachment 698457 [details] [diff] [review]
Minimal fix for creation refcount and transport release
Review of attachment 698457 [details] [diff] [review]:
-----------------------------------------------------------------
I only reviewed the refcount parts.
Attachment #698457 -
Flags: review?(tterribe) → review+
Comment 23•12 years ago
|
||
Comment on attachment 698457 [details] [diff] [review]
Minimal fix for creation refcount and transport release
Review of attachment 698457 [details] [diff] [review]:
-----------------------------------------------------------------
Depending on the answer to the question below, this is either an r- or an r+
::: netwerk/sctp/datachannel/DataChannel.cpp
@@ +27,5 @@
> #ifdef MOZ_PEERCONNECTION
> #include "mtransport/runnable_utils.h"
> #endif
> +
> +#define DATACHANNEL_LOG(args) LOG(args)
Why redefine this here? The defn. seems to be the same as in the .h file.
@@ +191,5 @@
> + MOZ_ASSERT(NS_IsMainThread());
> + RUN_ON_THREAD(mSTS, WrapRunnable(nsRefPtr<DataChannelConnection>(this),
> + &DataChannelConnection::disconnect_all),
> + NS_DISPATCH_NORMAL);
> + }
Can you guarantee that the DCConnection will be deleted before returning to the event loop?
The point of this code is to guarantee that the signals are disconnected on the STS
thread. However, if you are on the STS thread now and you don't disconnect
the signals and then you delete the DCConnection on the main thread, you are
going to have problems. Is there a reason to have this code be conditional on IsSTSThread?
@@ +2064,5 @@
> }
>
> // Clean up any pending opens for channels
> nsRefPtr<DataChannel> channel;
> + while (nullptr != (channel = dont_AddRef(static_cast<DataChannel *>(mPending.PopFront())))) {
Is there a reason not to just do !(channel = )
::: netwerk/sctp/datachannel/DataChannel.h
@@ +151,5 @@
> }
> int32_t SendBlob(uint16_t stream, nsIInputStream *aBlob);
>
> // Called on data reception from the SCTP library
> // must(?) be public so my c->c++ tramploine can call it
sp. trampoline
@@ +262,5 @@
> nsCOMPtr<nsIThread> mConnectThread;
> };
>
> +#define ENSURE_DATACONNECTION \
> + do { if (!mConnection) { DATACHANNEL_LOG(("%s: %p no connection!",__FUNCTION__, this)); return; } } while (0)
Not that I care, but why not have just ENSURE_DATA_CONNECTION_RET and use /**/ as the argument?
Comment 24•12 years ago
|
||
Comment on attachment 698457 [details] [diff] [review]
Minimal fix for creation refcount and transport release
Review of attachment 698457 [details] [diff] [review]:
-----------------------------------------------------------------
Depending on the answer to the question below, this is either an r- or an r+.
::: netwerk/sctp/datachannel/DataChannel.cpp
@@ +27,5 @@
> #ifdef MOZ_PEERCONNECTION
> #include "mtransport/runnable_utils.h"
> #endif
> +
> +#define DATACHANNEL_LOG(args) LOG(args)
Why redefine this here? The defn. seems to be the same as in the .h file.
@@ +191,5 @@
> + MOZ_ASSERT(NS_IsMainThread());
> + RUN_ON_THREAD(mSTS, WrapRunnable(nsRefPtr<DataChannelConnection>(this),
> + &DataChannelConnection::disconnect_all),
> + NS_DISPATCH_NORMAL);
> + }
Can you guarantee that the DCConnection will be deleted before returning to the event loop?
The point of this code is to guarantee that the signals are disconnected on the STS
thread. However, if you are on the STS thread now and you don't disconnect
the signals and then you delete the DCConnection on the main thread, you are
going to have problems. Is there a reason to have this code be conditional on IsSTSThread?
Reading the wrapper code, it appears that this is only called on the main thread, so I would
suggest removing that conditional entirely and just making this clause unconditional.
@@ +2064,5 @@
> }
>
> // Clean up any pending opens for channels
> nsRefPtr<DataChannel> channel;
> + while (nullptr != (channel = dont_AddRef(static_cast<DataChannel *>(mPending.PopFront())))) {
Is there a reason not to just do !(channel = )
::: netwerk/sctp/datachannel/DataChannel.h
@@ +151,5 @@
> }
> int32_t SendBlob(uint16_t stream, nsIInputStream *aBlob);
>
> // Called on data reception from the SCTP library
> // must(?) be public so my c->c++ tramploine can call it
sp. trampoline
@@ +262,5 @@
> nsCOMPtr<nsIThread> mConnectThread;
> };
>
> +#define ENSURE_DATACONNECTION \
> + do { if (!mConnection) { DATACHANNEL_LOG(("%s: %p no connection!",__FUNCTION__, this)); return; } } while (0)
Not that I care, but why not have just ENSURE_DATA_CONNECTION_RET and use /**/ as the argument?
Attachment #698457 -
Flags: review?(ekr) → review+
Assignee | ||
Comment 25•12 years ago
|
||
(In reply to Eric Rescorla (:ekr) from comment #24)
> Comment on attachment 698457 [details] [diff] [review]
> Minimal fix for creation refcount and transport release
>
> Review of attachment 698457 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Depending on the answer to the question below, this is either an r- or an r+.
>
> ::: netwerk/sctp/datachannel/DataChannel.cpp
> @@ +27,5 @@
> > #ifdef MOZ_PEERCONNECTION
> > #include "mtransport/runnable_utils.h"
> > #endif
> > +
> > +#define DATACHANNEL_LOG(args) LOG(args)
>
> Why redefine this here? The defn. seems to be the same as in the .h file.
I'm using this #define trickery because DataChannel.h is included by things that don't have DataChannelLog.h/etc, and have their own LOG() define.
>
> @@ +191,5 @@
> > + MOZ_ASSERT(NS_IsMainThread());
> > + RUN_ON_THREAD(mSTS, WrapRunnable(nsRefPtr<DataChannelConnection>(this),
> > + &DataChannelConnection::disconnect_all),
> > + NS_DISPATCH_NORMAL);
> > + }
>
> Can you guarantee that the DCConnection will be deleted before returning to
> the event loop?
>
> The point of this code is to guarantee that the signals are disconnected on
> the STS
> thread. However, if you are on the STS thread now and you don't disconnect
> the signals and then you delete the DCConnection on the main thread, you are
> going to have problems. Is there a reason to have this code be conditional
> on IsSTSThread?
>
> Reading the wrapper code, it appears that this is only called on the main
> thread, so I would
> suggest removing that conditional entirely and just making this clause
> unconditional.
Done.
>
> @@ +2064,5 @@
> > }
> >
> > // Clean up any pending opens for channels
> > nsRefPtr<DataChannel> channel;
> > + while (nullptr != (channel = dont_AddRef(static_cast<DataChannel *>(mPending.PopFront())))) {
>
> Is there a reason not to just do !(channel = )
No, not really, I think it came with some cargo-culted code, unless I was asked to do it in a review. Since that was already here (we just added the { at the end), I'd rather leave this issue for McManus' review.
>
> @@ +262,5 @@
> > nsCOMPtr<nsIThread> mConnectThread;
> > };
> >
> > +#define ENSURE_DATACONNECTION \
> > + do { if (!mConnection) { DATACHANNEL_LOG(("%s: %p no connection!",__FUNCTION__, this)); return; } } while (0)
>
> Not that I care, but why not have just ENSURE_DATA_CONNECTION_RET and use
> /**/ as the argument?
Someone just went through the tree removing all those sorts of tricks. :-)
Assignee | ||
Comment 26•12 years ago
|
||
Target Milestone: --- → mozilla20
Comment 27•12 years ago
|
||
> > > + while (nullptr != (channel = dont_AddRef(static_cast<DataChannel *>(mPending.PopFront())))) {
> >
> > Is there a reason not to just do !(channel = )
>
> No, not really, I think it came with some cargo-culted code,
That's just review bait :)
if you say (nullptr != (foo = blah()) the reviewer says "don't explicitly compare against null"
if you say (foo = blah()) you get compiler warnings suggesting you can't count = characters.
if you say ((foo = blah()) (my preference) you get review comments about extra parens
if you say (!!(foo = blah())) you get review comments about obtuseness.
I personally think the author should do what they want.
Comment 28•12 years ago
|
||
Comment on attachment 698459 [details] [diff] [review]
Clean up DataChannel close logic, fix creation refcount
Review of attachment 698459 [details] [diff] [review]:
-----------------------------------------------------------------
::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.h
@@ +136,5 @@
>
> // DataConnection observers
> void NotifyConnection();
> void NotifyClosedConnection();
> + void NotifyDataChannel(already_AddRefed<mozilla::DataChannel> aChannel);
this isn't a change request, because this code already uses this idiom so using it a little more is this is fine, but in general I really don't like these already_addrefed<> interfaces.. it makes using the smart pointers considerably harder imo.
this could just be a normal nsrefptr at the interface level and the implementation of NotifyDataChannel could do the forget().. That's really the logic of the NotifyDataChannel doing that work anyhow imo so it shouldn't be part of the calling interface.
Sure - it costs you an extra atomic inc/dec.
Attachment #698459 -
Flags: review?(mcmanus) → review+
Assignee | ||
Comment 29•12 years ago
|
||
Thanks
The main problem with already_AddRefed (I suspect) is that it silently converts and loses the already_AddRefed property. I think it could be modified to not allow this sort of error (at the cost of some explicit typing if you want to drop the status, not that it would be common outside of passing to a LOG() message).
I tend to automatically be wary of AddRef/Release, *especially* for threadsafe objects. Overly so, I'm sure. I've also found lots of spurious addref/releases make tracing any sort of leak virtually impossible. And the original DOM code from khuey used already_AddRefed I think, as does the nsDOMMediaStream stuff which I also play with, and tons more in the tree, mostly for constructors, DOM elements and things that pull something out of a list/datastructure I think. SVG seems quite littered with them too.
These DataChannel objects (which don't have high traffic) are not places we really need to optimize like this. I believe however that we're losing a fair bit of perf in places from individually small silent addref/release costs (especially if they get inlined and don't show in profilers or each have an individual impl and so don't get aggregated, and doubly so for threadsafe).
Comment 30•12 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #27)
> > > > + while (nullptr != (channel = dont_AddRef(static_cast<DataChannel *>(mPending.PopFront())))) {
> > >
> > > Is there a reason not to just do !(channel = )
> >
> > No, not really, I think it came with some cargo-culted code,
>
> That's just review bait :)
>
> if you say (nullptr != (foo = blah()) the reviewer says "don't explicitly
> compare against null"
>
> if you say (foo = blah()) you get compiler warnings suggesting you can't
> count = characters.
>
> if you say ((foo = blah()) (my preference) you get review comments about
> extra parens
>
> if you say (!!(foo = blah())) you get review comments about obtuseness.
>
> I personally think the author should do what they want.
Obviously, I don't care what Jesup does here... I was just operating based on the style guidelines.
"When testing a pointer, use (!myPtr) or (myPtr); don't use myPtr != nsnull or myPtr == nsnull."
Comment 31•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/ef5b1c8bfd0a
Leaving open for the remaining patch.
status-firefox20:
--- → fixed
Assignee | ||
Comment 32•12 years ago
|
||
Comment on attachment 698459 [details] [diff] [review]
Clean up DataChannel close logic, fix creation refcount
Carrying forward r+ from minimal patch (note that there's a followup bug patch (caused by this code) with an r? to ekr)
Attachment #698459 -
Flags: review?(ekr) → review+
Comment 33•12 years ago
|
||
I'm not sure what security rating to give this.
Assignee | ||
Comment 34•12 years ago
|
||
GC-related refcount incorrect crash, so would normally be high or crit, but it's preffed off by default. Trying moderate. (Remaining patch issues are not a sec issue, just need to unbitrot and land.)
Keywords: sec-moderate
Assignee | ||
Comment 35•12 years ago
|
||
Remainder of full patch:
https://hg.mozilla.org/integration/mozilla-inbound/rev/08f1385d7f0f
Assignee | ||
Comment 36•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/08f1385d7f0f
Marking for uplift as this fixes Close() functionality. (The first half of this bug landed before the uplift)
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Whiteboard: [WebRTC], [blocking-webrtc+] → [WebRTC], [blocking-webrtc+][webrtc-uplift]
Updated•12 years ago
|
status-b2g18:
--- → unaffected
status-firefox-esr17:
--- → unaffected
Comment 37•12 years ago
|
||
Needs a clearer test case in order to do in-testsuite? - although writing data channel mochitests in separate bugs might actually cover this bug.
Flags: in-testsuite-
Comment 38•12 years ago
|
||
Not getting a reproduction of this crash specifically off of the data channels page, so marking as verified.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Updated•12 years ago
|
status-firefox19:
--- → disabled
Whiteboard: [WebRTC], [blocking-webrtc+][webrtc-uplift] → [WebRTC], [blocking-webrtc+][webrtc-uplift][adv-main20-]
Updated•12 years ago
|
Whiteboard: [WebRTC], [blocking-webrtc+][webrtc-uplift][adv-main20-] → [WebRTC], [blocking-webrtc+][adv-main20-]
Updated•11 years ago
|
Group: core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•