Closed Bug 853998 Opened 7 years ago Closed 7 years ago

Use SyncRunnable in WebRTC

Categories

(Core :: WebRTC, defect, P1, major)

defect

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: jesup, Assigned: abr)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak, Whiteboard: [webrtc][blocking-webrtc+][qa-])

Attachments

(3 files, 13 obsolete files)

3.09 KB, patch
benjamin
: review+
Details | Diff | Splinter Review
19.56 KB, patch
abr
: review+
Details | Diff | Splinter Review
2.53 KB, patch
abr
: review+
Details | Diff | Splinter Review
We have a problem with upconverting threads to nsThreads to do DISPATCH_SYNC.

Given this:
on a pthread or other non-nsThread:
   Dispatch(..., DISPATCH_SYNC);

internally Dispatch will call GetCurrentThread(), which "upconverts" the pthread to an nsThread.(See nsThreadManager::GetCurrentThread()).

Then Dispatch() creates an nsThreadSyncDispatch(this_nsthread, runnable), and sends it to the target thread, and it sits in this loop:

    while (wrapper->IsPending())
      NS_ProcessNextEvent(thread);

Note that IsPending() is this:

  bool IsPending() {
    return mSyncTask != nullptr;
  }

and nsThreadSyncDispatch::Run() is this:

nsThreadSyncDispatch::Run()
{
  if (mSyncTask) {
    mResult = mSyncTask->Run();
    mSyncTask = nullptr;
    // unblock the origin thread
    mOrigin->Dispatch(this, NS_DISPATCH_NORMAL);
  }
  return NS_OK;
}

Note that it sets mSyncTask to nullptr *in the target task* and before returning the event/wrapper.

So the calling Dispatch() can exit without calling ProcessNextEvent().  This leaves the wrapper event in the nsThread's event queue, unprocessed - and it holds a reference to the nsThread itself!  So, later, when the pthread exits, and we drop the ref to the nsThread (from TLS storage), the count doesn't go to 0 and it leaks.

This only happens sometimes; usually (and always when run locally) it ends up waiting and clearing the event queue.

There are two obvious solutions: Don't up-promote threads to nsThread if they're not processing their event queues  (upconverting seems inherently dangerous, though with this fixed it might not be.  (I checked, a simple run of the browser showed no up-conversions other than from media/webrtc/signaling, but I didn't even browse around a little.)  Or fix DISPATCH_SYNC to not leave "to-be-cleaned-up" events on the queue.

This would be something like:

nsThreadSyncDispatch::Run()
{
  if (!mRun) {
    MOZ_ASSERT(mSyncTask);
    mResult = mSyncTask->Run();
    mRun = true;
    // unblock the origin thread
    mOrigin->Dispatch(this, NS_DISPATCH_NORMAL);
  } else {
    // We must be on the originating thread (mOrigin) which still holds a ref to us
    mSyncTask = nullptr;
  }
  return NS_OK;
}

A side-effect would be you could include RefPtrs/COMPtrs in the DISPATCH_SYNC runnable that aren't threadsafe (since they'd be released on the originating thread).
jesup++ for tracking this down :-D
An alternative solution. Randell asked me to post it.
Whoops, meant to null pointer, not forget it (why is there a forget anyway)?

Try with alternate solution - https://tbpl.mozilla.org/?tree=Try&rev=750d9b157221

I think I vote for Randell's though fwiw.
Attachment #728436 - Attachment is obsolete: true
Actually I don't think there's a thread-safe way to do solution B, so please ignore.
Comment on attachment 728417 [details] [diff] [review]
make DISPATCH_SYNC safe from up-promoted threads

Marking for review - if the decision is to remove upconversion, we can revert this at that time (and that will require WebRTC to replace upconversion first, which we're working on anyways).

FYI, I think jib's latest patch may also be safe in fact, though I haven't thought deeply on it.  His patch will allow short-term run-ups of wrappers waiting to be destroyed, though in normal cases this won't happen or will be very rare, and I doubt more than a few can accumulate temporarily outside of abusive test situations.  His patch does cause less change in behavior of DISPATCH_SYNC, though I don't know that this matters at all.
Attachment #728417 - Flags: review?(ehsan)
> FYI, I think jib's latest patch may also be safe in fact,

On looking again, I agree it is.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #4)
> Try with original solution -
> https://tbpl.mozilla.org/?tree=Try&rev=4c6d14a0178e

I've had about 150ish retriggers of M2 and C across all desktop platforms with no leaks.  Not definitive given the random nature, but looking quite good.
Looks great!

Odd, my try of solution B is not looking good at all. Might I have made a mistake pushing it? Trying again: https://tbpl.mozilla.org/?tree=Try&rev=0334f791d163
Solution B still leaks the runnable itself:

00:02:48 WARNING - TEST-UNEXPECTED-FAIL | leakcheck | 60 bytes leaked (nsRunnable) 

Makes sense since the underlying GSM prthread is gone, and the nsThread destructor apparently does not clean up events, but asserts:

00:02:44     INFO -  [Parent 3716] ###!!! ASSERTION: Non-empty event queue being destroyed; events being leaked.: 'IsEmpty()', file e:/builds/moz2_slave/try-w32-d-00000000000000000000/build/xpcom/threads/nsEventQueue.cpp, line 39

So Solution B would require more changes.
Assignee: nobody → rjesup
Priority: -- → P1
Whiteboard: [webrtc][blocking-webrtc+]
Comment on attachment 728417 [details] [diff] [review]
make DISPATCH_SYNC safe from up-promoted threads

Review of attachment 728417 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, except that it might still be racy on mHasRun, right?

And this is the type of thing that bsmedberg should review.  Although if I were going to review it, I'd ask you to add some comments about what mHasRun is being used for.
Attachment #728417 - Flags: review?(ehsan) → review?(benjamin)
Comment on attachment 728520 [details] [diff] [review]
make DISPATCH_SYNC safe from up-promoted threads (Solution B)

Review of attachment 728520 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is the right fix.
Attachment #728520 - Flags: review?(ehsan) → review-
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #14)
> Comment on attachment 728417 [details] [diff] [review]
> make DISPATCH_SYNC safe from up-promoted threads
> 
> Review of attachment 728417 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good, except that it might still be racy on mHasRun, right?

mHasRun isn't asynchronously referenced from another thread; it's basically a state var that says that the primary Dispatch is done, and next time it passes through the event queue is should die.  It had been implicitly using mSyncTask for this before.

> And this is the type of thing that bsmedberg should review.  Although if I
> were going to review it, I'd ask you to add some comments about what mHasRun
> is being used for.

Sure.  bsmedberg was already drinking wine per IRC, and you had indicated interest, so I started with you.  :-)
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #14)
> Comment on attachment 728417 [details] [diff] [review]
> make DISPATCH_SYNC safe from up-promoted threads
> 
> Review of attachment 728417 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good, except that it might still be racy on mHasRun, right?


I'm not seeing the race you are. mHasRun is set in the constructor (on the dispatching thread). Once it gets dispatched, mHasRun's exclusive ownership then transfers to the dispatched-to thread, which reads and sets it. The same runnable is then dispatched back to the original thread, at which point mHasRun's exclusive ownership is transferred back to the original dispatching thread.

What problem do you envision?
Comment on attachment 728417 [details] [diff] [review]
make DISPATCH_SYNC safe from up-promoted threads

As noted on IRC: if we had to use NS_DISPATCH_SYNC, this would be ok. But in this case we don't need the loop-spinning deadlock-detecting behavior of DISPATCH_SYNC and can instead use a simpler runnable:

Please instead make webrtc use this base class for its runnables: http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/src/PSMRunnable.h#17

You can either move that class somewhere shared (xpcom/threads) or copy its impl into webrtc for now. Right now it only supports dispatch to the main thread, so if you need to dispatch to other threads you may just want to copy it and add an explicit target parameter.

I would like us to stop supporting upconversion of threads, but for now we can just MOZ_ASSERT when upconversion happens and make sure that we don't do it in our testsuites...
Attachment #728417 - Flags: review?(benjamin) → review-
Attached patch SyncRunnable helper, rev. 1 (obsolete) — Splinter Review
You should be able to use this to wrap your existing runnables with minimal code change.
Attached patch test (WIP) (obsolete) — Splinter Review
Attachment #729582 - Attachment is obsolete: true
Attachment #730015 - Attachment is obsolete: true
The above patches are WIP, the tests particularly so. I've been fighting with the build system and can't get it to build my gtest builds, so I temporarily stuffed the tests in mtransport/test, where I know how to make gtest work.

jesup, abr: this patch should be good enough to test to see if it fixes the bug.

bsmedberg: comments welcome on naming

ted: any chance you could hoist the test into xpcom/tests and update the patch?
I haven't actually done any work with the in-tree GTest stuff. BenWa did all that work.
CCing benwa for help with the gtest tests
(In reply to Eric Rescorla (:ekr) from comment #23)
> The above patches are WIP, the tests particularly so. I've been fighting
> with the build system and can't get it to build my gtest builds, so I
> temporarily stuffed the tests in mtransport/test, where I know how to make
> gtest work.
> 

Can you tell me what problems you ran into?
Comment on attachment 730014 [details] [diff] [review]
test (WIP)

Review of attachment 730014 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/test/Makefile.in
@@ +110,5 @@
>    nrappkit_unittest.cpp \
>    sockettransportservice_unittest.cpp \
>    transport_unittests.cpp \
>    runnable_utils_unittest.cpp \
> +  TestSyncRunnable.cpp \

You want GTEST_CPPSRCS = TestSyncRunnable.cpp like in gfx/layers/Makefile.in.

Note that when bug 844288 lands we will need a nested test directory for GTEST sources sadly because of build restriction.
Comment on attachment 730014 [details] [diff] [review]
test (WIP)

Review of attachment 730014 [details] [diff] [review]:
-----------------------------------------------------------------

Make sure to build with enable-gtest (temporary)

::: media/mtransport/test/TestSyncRunnable.cpp
@@ +1,4 @@
> +/* -*- Mode: C++; tab-width: 12; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

I was told we could CC0 Tests. I'm not sure if it's preferred.

@@ +63,5 @@
> +
> +#include "mtransport_test_utils.h"
> +MtransportTestUtils *test_utils;
> +
> +int main(int argc, char **argv)

You actually don't want a main method. We use XRE_main to call RUN_ALL_TESTS. TEST_F will register the test.
Attached patch Proper unit tests (obsolete) — Splinter Review
Since not everyone here is CC'ed on both. From bug 833769, comment 1402:
> FYI, bug 590422 increased the frequency of these failures. The patch there changes
> (improves) timing/timeouts accuracy. Might be a hint that the failures here depend
> on timing.
>
> Please fix this, it brings spamzilla to new levels...
Assignee: rjesup → adam
Comment on attachment 730284 [details] [diff] [review]
Convert sync dispatches from SIPCC threads to use SyncRunnable (WIP, bustage)

Review of attachment 730284 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with nits

::: media/webrtc/signaling/signaling.gyp
@@ +51,5 @@
>          './src/sipcc/cpr/include',
>          '../../../ipc/chromium/src',
>          '../../../ipc/chromium/src/base/third_party/nspr',
>          '../../../xpcom/base',
> +        '../../../xpcom/threads',

remove

@@ +269,5 @@
>          '../../../dom/base',
>          '../trunk/third_party/libsrtp/srtp/include',
>          '../trunk/third_party/libsrtp/srtp/crypto/include',
>          '$(DEPTH)/dist/include',
> +        '../../../xpcom/threads',

remove

::: media/webrtc/signaling/src/media/VcmSIPCCBinding.cpp
@@ +22,5 @@
>  #include "transportlayerice.h"
>  #include "runnable_utils.h"
>  #include "cpr_stdlib.h"
>  #include "cpr_string.h"
> +#include "SyncRunnable.h"

mozilla/SyncRunnable.h

::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp
@@ +23,5 @@
>  #include "nsIObserverService.h"
>  #include "nsIObserver.h"
>  #include "mozilla/Services.h"
>  #include "StaticPtr.h"
> +#include "SyncRunnable.h"

mozilla/SyncRunnable.h

@@ +261,5 @@
>  void PeerConnectionCtx::onCallEvent(ccapi_call_event_e aCallEvent,
>                                        CSF::CC_CallPtr aCall,
>                                        CSF::CC_CallInfoPtr aInfo) {
>    // This is called on a SIPCC thread.
> +  // WARNING: Do not dispatch this with NS_DISPATCH_NORMAL.

This comment is confusing since we're using SyncRunnable

::: media/webrtc/signaling/test/Makefile.in
@@ +136,5 @@
>   -I$(topsrcdir)/media/webrtc/signaling/src/peerconnection \
>   -I$(topsrcdir)/media/webrtc/signaling/media-conduit\
>   -I$(topsrcdir)/media/webrtc/trunk/third_party/libjingle/source/ \
>   -I$(topsrcdir)/xpcom/base/ \
> + -I$(topsrcdir)/xpcom/threads/ \

remove

::: media/webrtc/signaling/test/mediapipeline_unittest.cpp
@@ +25,5 @@
>  #include "runnable_utils.h"
>  #include "transportflow.h"
>  #include "transportlayerprsock.h"
>  #include "transportlayerdtls.h"
> +#include "SyncRunnable.h"

mozilla/SyncRunnable.h

::: media/webrtc/signaling/test/signaling_unittests.cpp
@@ +32,5 @@
>  #include "nsIIOService.h"
>  #include "nsIDNSService.h"
>  #include "nsWeakReference.h"
>  #include "nricectx.h"
> +#include "SyncRunnable.h"

mozilla/SyncRunnable.h
Attachment #730284 - Flags: review+
Attachment #730284 - Attachment is obsolete: true
Attachment #730017 - Attachment is obsolete: true
Attachment #730291 - Attachment is obsolete: true
Attachment #730326 - Attachment is obsolete: true
Attachment #730322 - Flags: review?(benjamin)
Comment on attachment 730332 [details] [diff] [review]
Convert sync dispatches from SIPCC threads to use SyncRunnable

Carrying forward r+ from jesup
Attachment #730332 - Flags: review+
Attachment #730322 - Flags: review?(benjamin) → review+
Attachment #730154 - Attachment is obsolete: true
Attachment #728417 - Attachment is obsolete: true
Attachment #728520 - Attachment is obsolete: true
Depends on: 855462
Blocks: 855474
No longer depends on: 855462
Comment on attachment 730341 [details] [diff] [review]
Test for SyncRunnable using gtest framework (not working)

xpcom/tests/gtest test is now on bug 855474
Attachment #730341 - Attachment is obsolete: true
Comment on attachment 730014 [details] [diff] [review]
test (WIP)

Review of attachment 730014 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/test/TestSyncRunnable.cpp
@@ +48,5 @@
> +{
> +  nsRefPtr<TestRunnable> r(new TestRunnable());
> +  nsRefPtr<SyncRunnable> s(new SyncRunnable(r));
> +  s->DispatchToThread(gThread);
> +  

ws
Attachment #730014 - Flags: review+
Attachment #730014 - Attachment is obsolete: true
Comment on attachment 730488 [details] [diff] [review]
Test for SyncRunnable

Carrying forward r+ from jesup
Attachment #730488 - Flags: review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/8ba09dac7d26
Bustage fix for warnings-as-errors
Target Milestone: --- → mozilla22
Resummarizing so that this bug reflects what was changed.
Component: XPCOM → WebRTC
QA Contact: jsmith
Summary: Dispatch(...,DISPATCH_SYNC) from an "upconverted" nsThread is inherently leaky → Use SyncRunnable in WebRTC
Whiteboard: [webrtc][blocking-webrtc+] → [webrtc][blocking-webrtc+][qa-]
You need to log in before you can comment on or make changes to this bug.