Closed Bug 1151080 Opened 5 years ago Closed 5 years ago

Rewrite NR_async_schedule to not depend on timers

Categories

(Core :: WebRTC: Networking, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed
b2g-master --- fixed

People

(Reporter: ekr, Assigned: ekr)

Details

(Keywords: sec-audit, Whiteboard: [post-critsmash-triage][adv-main40-])

Attachments

(2 files, 7 obsolete files)

Byron has expressed concerns about whether 0-length timers are really sequential. Rewrite this so we are sure.

Marking security because it's a potential contributor to bug 1151046
I wrote a patch for this, but it looks like we do NR_ASYNC_TIMER_SET(0) a number of times where we then return the handle, so it looks like we instead need to special case NR_async_timer_set(0) and have it return a cancel handle. I'll do that, but probably over the weekend or Monday.
Assignee: nobody → ekr
Status: NEW → ASSIGNED
Attachment #8588641 - Flags: feedback?(martin.thomson)
Comment on attachment 8588641 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

Review of attachment 8588641 [details] [diff] [review]:
-----------------------------------------------------------------

juberti'd

::: media/mtransport/nr_timer.cpp
@@ +80,5 @@
>    /* additional members */
>    NR_async_cb cb_;
>    void *cb_arg_;
>    std::string function_;
>    int line_;

Unused...

@@ +98,5 @@
> +        timer_(timer) {}
> +
> +  virtual void Cancel() override {
> +    // The Cancel() call decrements the ref count. This keeps us
> +    // alive throught the end of the point.

Are you seriously permitting this to be called from a bare pointer?  Why not require that callers hold a RefPtr on this instead?

@@ +137,5 @@
> +    }
> +  }
> +
> +  virtual void Cancel() override {
> +    cb_ = nullptr;

This won't be guaranteed to be properly thread safe.  Assuming that Cancel() can be called from any thread.

@@ +161,5 @@
>    ASSERT_ON_THREAD(sts_thread);
>  }
>  
> +static int nr_async_timer_set_zero(NR_async_cb cb, void *arg,
> +                                   char *func, int l, void **handle) {

You don't _have_ to maintain the crazy void** on handle on these.  You could have this return an nrappkitCallback reference and assign the handle in the common function.

@@ +212,5 @@
>    return 0;
>  }
>  
> +int NR_async_timer_set(int timeout, NR_async_cb cb, void *arg,
> +                       char *func, int l, void **handle) {

Why is timeout signed?

const char *func ?

::: media/mtransport/test/nrappkit_unittest.cpp
@@ +125,5 @@
>  }
>  
> +TEST_F(TimerTest, CancelTimer0) {
> +  ArmCancelTimer(0);
> +  PR_Sleep(2000);

2 seconds?  The whole test suite should take less time than that.  100ms should be plenty, especially if you add an instruction telling the timer to fail the test if it pops.  Just define cb_fail for use in CancelTimer_w.
Attachment #8588641 - Flags: feedback?(martin.thomson)
Comment on attachment 8588641 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

Review of attachment 8588641 [details] [diff] [review]:
-----------------------------------------------------------------

Too bad you didn't catch the memory leak of nrappkitScheduledTimer.

::: media/mtransport/nr_timer.cpp
@@ +80,5 @@
>    /* additional members */
>    NR_async_cb cb_;
>    void *cb_arg_;
>    std::string function_;
>    int line_;

Useful for debugging. I'd be happy to add a log message if it would make you feel
better.

@@ +98,5 @@
> +        timer_(timer) {}
> +
> +  virtual void Cancel() override {
> +    // The Cancel() call decrements the ref count. This keeps us
> +    // alive throught the end of the point.

If you look upward, you'll see that nrappkitCallback isn't a refcounted
structure.

@@ +137,5 @@
> +    }
> +  }
> +
> +  virtual void Cancel() override {
> +    cb_ = nullptr;

It can't be. This is a single-threaded API.

@@ +212,5 @@
>    return 0;
>  }
>  
> +int NR_async_timer_set(int timeout, NR_async_cb cb, void *arg,
> +                       char *func, int l, void **handle) {

This API is defined in nrappkit, we're just implementing it.
(In reply to Eric Rescorla (:ekr) from comment #4)
> Comment on attachment 8588641 [details] [diff] [review]
> Rewrite NR_async_timer_set(0) to be in order
> 
> Review of attachment 8588641 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Too bad you didn't catch the memory leak of nrappkitScheduledTimer.

Also too bad that I didn't catch it.


> ::: media/mtransport/nr_timer.cpp
> @@ +80,5 @@
> >    /* additional members */
> >    NR_async_cb cb_;
> >    void *cb_arg_;
> >    std::string function_;
> >    int line_;
> 
> Useful for debugging. I'd be happy to add a log message if it would make you
> feel
> better.
> 
> @@ +98,5 @@
> > +        timer_(timer) {}
> > +
> > +  virtual void Cancel() override {
> > +    // The Cancel() call decrements the ref count. This keeps us
> > +    // alive throught the end of the point.
> 
> If you look upward, you'll see that nrappkitCallback isn't a refcounted
> structure.
> 
> @@ +137,5 @@
> > +    }
> > +  }
> > +
> > +  virtual void Cancel() override {
> > +    cb_ = nullptr;
> 
> It can't be. This is a single-threaded API.
> 
> @@ +212,5 @@
> >    return 0;
> >  }
> >  
> > +int NR_async_timer_set(int timeout, NR_async_cb cb, void *arg,
> > +                       char *func, int l, void **handle) {
> 
> This API is defined in nrappkit, we're just implementing it.
* * *
Update for some of Martin's comments
* * *
RefPtr magic
Comment on attachment 8588841 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

Review of attachment 8588841 [details] [diff] [review]:
-----------------------------------------------------------------

Byron, this needs another look by me, but here's the direction I am going in.

You may want to double-check RefPtr logic.
Attachment #8588841 - Flags: feedback?(docfaraday)
Comment on attachment 8588841 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

Review of attachment 8588841 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nr_timer.cpp
@@ +245,5 @@
>  
> +  // Make this a RefPtr because callback->Cancel() can decrement the
> +  // reference count on callback.
> +  RefPtr<nrappkitCallback> callback = static_cast<nrappkitCallback *>(handle);
> +  callback->Cancel();

I would get rid of this RefPtr, and deal with this in the impl of Cancel, so the weirdness is all in one place. I had some difficulty wrapping my head around it because it was spread around. I can tell we need to avoid |timer_| pointing at freed memory; that can be accomplished with a local variable that shadows it (with a comment explaining why we're doing that). Calling Release() on this local var will work, right (if not, we have bigger problems that we need to look at more closely)? We can also simplify the inheritance after doing this.
Attachment #8588841 - Flags: feedback?(docfaraday)
(In reply to Byron Campen [:bwc] from comment #10)
> Comment on attachment 8588841 [details] [diff] [review]
> Rewrite NR_async_timer_set(0) to be in order
> 
> Review of attachment 8588841 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: media/mtransport/nr_timer.cpp
> @@ +245,5 @@
> >  
> > +  // Make this a RefPtr because callback->Cancel() can decrement the
> > +  // reference count on callback.
> > +  RefPtr<nrappkitCallback> callback = static_cast<nrappkitCallback *>(handle);
> > +  callback->Cancel();
> 
> I would get rid of this RefPtr, and deal with this in the impl of Cancel, so
> the weirdness is all in one place. 

This is like the opposite of MT's comment. Let's have a quick colloquy tomorrow.


> I had some difficulty wrapping my head
> around it because it was spread around. I can tell we need to avoid |timer_|
> pointing at freed memory; that can be accomplished with a local variable
> that shadows it (with a comment explaining why we're doing that). Calling
> Release() on this local var will work, right (if not, we have bigger
> problems that we need to look at more closely)? We can also simplify the
> inheritance after doing this.
Comment on attachment 8588841 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

Review of attachment 8588841 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nr_timer.cpp
@@ +119,1 @@
>    cb_(0, 0, cb_arg_);

Which thread is this run on again?
This is a single-threaded system. Everything here is always run on the STS thread.
Or at least should be.
The reason I ask is that I think that nsITimer maintains its own thread.  I thought that the timer callbacks happened on that thread.
(In reply to Martin Thomson [:mt] from comment #15)
> The reason I ask is that I think that nsITimer maintains its own thread.  I
> thought that the timer callbacks happened on that thread.

Timer has a thread, but timers fire on the thread that inited the timer via a runnable.
(In reply to Randell Jesup [:jesup] from comment #16)
> (In reply to Martin Thomson [:mt] from comment #15)
> > The reason I ask is that I think that nsITimer maintains its own thread.  I
> > thought that the timer callbacks happened on that thread.
> 
> Timer has a thread, but timers fire on the thread that inited the timer via
> a runnable.

IIRC it's actually the thread where the timer was constructed:

https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#288

But in any case, the timers fire on the target thread.
Attached patch Revert RefPtr changes (obsolete) — Splinter Review
Comment on attachment 8589356 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

Review of attachment 8589356 [details] [diff] [review]:
-----------------------------------------------------------------

Byron,

This reverts Martin's proposed RefPtr changes. Note that I do an explicit
AddRef and delete to force the callback to be deleted after the timer.
Would you prefer that I hold a raw pointer to the timer so that
you can delete the callback before the timer?
Attachment #8589356 - Flags: feedback?(docfaraday)
Comment on attachment 8589356 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

Review of attachment 8589356 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nr_timer.cpp
@@ +101,5 @@
> +  virtual void Cancel() override {
> +    AddRef();  // Cancel does Release() on this. AddRef() keeps us alive.
> +    timer_->Cancel();
> +    timer_->Release();
> +    Release(); // This will cause this to be deleted.

I'm ok with this way too.
Attachment #8589356 - Flags: feedback?(docfaraday) → feedback+
OK, I pushed this onto try to see if it breaks everything:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2dfc6b163bcb
Try build looks good, modulo one red that appears to be in a different test.
Ah yes, I reintroduced the leak when I pulled the RefPtr off.
Attachment #8592318 - Flags: review?(martin.thomson)
Comment on attachment 8592318 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

Review of attachment 8592318 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nr_timer.cpp
@@ +88,5 @@
> +class nrappkitTimerCallback : public nrappkitCallback,
> +                              public nsITimerCallback {
> + public:
> +  // We're going to release ourself in the callback, so we need to be threadsafe
> +  NS_DECL_THREADSAFE_ISUPPORTS

If this really is confined to the STS thread, is this really necessary?

@@ +165,5 @@
> +static int nr_async_timer_set_zero(NR_async_cb cb, void *arg,
> +                                   char *func, int l,
> +                                   nrappkitCallback **handle) {
> +  nsIThread *thread;
> +  nsresult rv = NS_GetCurrentThread(&thread);

This is a different call to the one above: 
sts_thread = do_GetService(NS_SOCKETTRANSPORTSERVICE_CONTRACTID, &rv);

nsIEventTarget has a dispatch method too.  That might save a few lines of code.

@@ +173,5 @@
> +  nrappkitScheduledCallback* callback(new nrappkitScheduledCallback(
> +      cb, arg, func, l));
> +
> +  rv = thread->Dispatch(WrapRunnable(
> +      nsAutoPtr<nrappkitScheduledCallback>(callback),

Not a comment, but I want to work out whether we can use MakeUnique and Move with WrapRunnable.  I'd like to move away from nsAutoPtr.

@@ +194,2 @@
>    nsresult rv;
>    CheckSTSThread();

Can this be moved to the common part?  The _zero variant doesn't call this.
Attachment #8592318 - Flags: review?(martin.thomson) → review+
Comment on attachment 8592318 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

Review of attachment 8592318 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nr_timer.cpp
@@ +88,5 @@
> +class nrappkitTimerCallback : public nrappkitCallback,
> +                              public nsITimerCallback {
> + public:
> +  // We're going to release ourself in the callback, so we need to be threadsafe
> +  NS_DECL_THREADSAFE_ISUPPORTS

That's a good question. I think this may be being manipulated on the timer thread as part of the timer internals. This is not new code despite what splinter thinks

I suggest we leave as-is for now.

@@ +165,5 @@
> +static int nr_async_timer_set_zero(NR_async_cb cb, void *arg,
> +                                   char *func, int l,
> +                                   nrappkitCallback **handle) {
> +  nsIThread *thread;
> +  nsresult rv = NS_GetCurrentThread(&thread);

So my claim here is that the hard rule is that this code runs in one thread and that we might not make it STS later. So, this code ought to do "same thread" and not insist that it be STS.

Was your thought that we would have a "get me the STS thread" call that both places used?

@@ +194,2 @@
>    nsresult rv;
>    CheckSTSThread();

Good catch. Yes.
(In reply to Eric Rescorla (:ekr) from comment #29)
> I suggest we leave as-is for now.

I can live with that.
 
> @@ +165,5 @@
> > +  nsresult rv = NS_GetCurrentThread(&thread);
> 
> So my claim here is that the hard rule is that this code runs in one thread
> and that we might not make it STS later. So, this code ought to do "same
> thread" and not insist that it be STS.

Whatever method you like, but it should at least be consistent throughout.  That means that these need to identify their thread and commit to it (like nsTimerImpl does), or just commit to STS and worry about the consequences of that if things need to change later.  I have a slight tendency toward the latter.

> Was your thought that we would have a "get me the STS thread" call that both
> places used?

Yeah, I figured that CheckSTSThread could return it, since it has already gone to the effort of acquiring it.
Comment on attachment 8592481 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

Review of attachment 8592481 [details] [diff] [review]:
-----------------------------------------------------------------

Carrying forward r+ from mt
Attachment #8592481 - Flags: sec-approval?
Attachment #8592481 - Flags: review+
>    How easily can the security issue be deduced from the patch? 

It's not clear that there is an issue. Code inspection suggests that this
patch shouldn't change the timing of events, but we want to stop depending
on what's not clearly documented timer behavior.

>    Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? 

No.

>    Which older supported branches are affected by this flaw? 

All with WebRTC if any.


>    If not all supported branches, which bug introduced the flaw? 
>   Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? 

We don't, but this patch should apply. However, I propose to just
land this on trunk and maybe uplift to Aurora.


>    How likely is this patch to cause regressions; how much testing does it need? 

Unlikely. Ordinary testing should be fine.
Comment on attachment 8592481 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to be in order

sec-approval=dveditz

You don't really need approval on a sec-audit bug, unless the audit uncovers a specific vulnerability (in which case we should also change the rating).
Attachment #8592481 - Flags: sec-approval? → sec-approval+
(In reply to Daniel Veditz [:dveditz] from comment #34)
> Comment on attachment 8592481 [details] [diff] [review]
> Rewrite NR_async_timer_set(0) to be in order
> 
> sec-approval=dveditz
> 
> You don't really need approval on a sec-audit bug, unless the audit uncovers
> a specific vulnerability (in which case we should also change the rating).

Thanks. Now I know.
Attachment #8588641 - Attachment is obsolete: true
Attachment #8588833 - Attachment is obsolete: true
Attachment #8588839 - Attachment is obsolete: true
Attachment #8588841 - Attachment is obsolete: true
Attachment #8589354 - Attachment is obsolete: true
Attachment #8589356 - Attachment is obsolete: true
Attachment #8592318 - Attachment is obsolete: true
Comment on attachment 8593348 [details] [diff] [review]
Rewrite NR_async_timer_set(0) to use direct dispatch

Review of attachment 8593348 [details] [diff] [review]:
-----------------------------------------------------------------

The version I committed (with improved title).
https://hg.mozilla.org/mozilla-central/rev/6013a3ee3fdf

Is this something we might want on the long-lived 37/38 branches?
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(ekr)
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Not sure. We don't know for sure it's causing any problems, so my thought would be to wait a week or two and see if Nightly crashstats are affected. Absent some strong evidence, suggest we let it ride the trains.
Flags: needinfo?(ekr)
We've never seen bug 1151046 on 40, so if it works we'll never have proof. *shrugs*
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main40-]
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.