Closed Bug 1059572 Opened 10 years ago Closed 9 years ago

We should close the racy window of opportunity around PostTimerEvent

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed
b2g-master --- fixed

People

(Reporter: bwc, Assigned: bwc)

References

Details

(Keywords: sec-moderate, Whiteboard: [post-critsmash-triage][adv-main42-])

Attachments

(4 files, 6 obsolete files)

Currently, while dispatching a timer, we unlock the mutex so that certain corner cases that require either re-adding the timer or removing the timer don't deadlock. We should just keep the mutex locked, and use lock-free variants of these functions. This will remove many race conditions.
Initial cut.
Assignee: nobody → docfaraday
Status: NEW → ASSIGNED
Attachment #8480260 - Attachment is obsolete: true
See Also: → 1036515
A fuzz test-case I've been working on. It reveals that there are numerous problems related to rescheduling timers while PostTimerEvent is running.
Attachment #8631297 - Attachment filename: . → timer_fuzz_test.patch
Attachment #8637328 - Attachment filename: . → move_posttimerevent.patch
Attachment #8480261 - Attachment is obsolete: true
Fix warning on non-debug builds.
Attachment #8637380 - Attachment filename: . → timer_fuzz_test.patch
Attachment #8631297 - Attachment is obsolete: true
Attachment #8638105 - Attachment filename: . → move_posttimerevent.patch
Attachment #8638105 - Flags: review?(nfroyd)
Attachment #8637328 - Attachment is obsolete: true
Comment on attachment 8637380 [details] [diff] [review]
Part 0: (WIP) Fuzz test for timers

Review of attachment 8637380 [details] [diff] [review]:
-----------------------------------------------------------------

I've also converted NS_ASSERTION to MOZ_ASSERT in this test case, since hitting an NS_ASSERTION when running mach cppunittest doesn't cause tests to fail.
Attachment #8637380 - Flags: feedback?(nfroyd)
Comment on attachment 8637380 [details] [diff] [review]
Part 0: (WIP) Fuzz test for timers

Review of attachment 8637380 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable overall.

::: xpcom/tests/TestTimers.cpp
@@ +322,5 @@
> +        InitRandomTimer(CancelRandomTimer().get());
> +      }
> +      // Reschedule some timers without a Cancel first.
> +      for (size_t i = 0; i < kNumRescheduled; ++i) {
> +        InitRandomTimer(RemoveRandomTimer().get());

That you can do this and nothing complains seems surprising to me, but I haven't looked closely at the timer documentation recently.

@@ +438,5 @@
> +  for (size_t i = 0; i < kNumThreads; ++i) {
> +    threadStates[i]->Stop();
> +  }
> +
> +  PR_Sleep(PR_MillisecondsToInterval(1000));

I guess this is to wait for all outstanding timers to stop?  Seems like it would be better to wait on them somehow, to avoid the possibility of intermittents on the b2g emulators.
Attachment #8637380 - Flags: feedback?(nfroyd) → feedback+
Comment on attachment 8638105 [details] [diff] [review]
Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to protect it

Review of attachment 8638105 [details] [diff] [review]:
-----------------------------------------------------------------

I will say that this division of labor makes a lot more sense than the previous setup.

For my own edification, and posterity's sake, can you describe a scenario or two where having the monitor covering more of PostTimerEvent is beneficial?  I think I can believe it, and it certainly seems better to not have to keep acquiring the monitor to manipulate the timer list, but I'm at a loss for constructing an actual scenario.
Attachment #8638105 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #9)
> Comment on attachment 8638105 [details] [diff] [review]
> Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to
> protect it
> 
> Review of attachment 8638105 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I will say that this division of labor makes a lot more sense than the
> previous setup.
> 
> For my own edification, and posterity's sake, can you describe a scenario or
> two where having the monitor covering more of PostTimerEvent is beneficial? 
> I think I can believe it, and it certainly seems better to not have to keep
> acquiring the monitor to manipulate the timer list, but I'm at a loss for
> constructing an actual scenario.

   There are a few races. One is on nsTimerImpl::mGeneration. If the target thread reschedules a timer while it is being processed by PostTimerEvent, the TimerEvent could end up initting with the new value of mGeneration (or an undefined value) here:

https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp?from=nsTimerEvent#157

   This can allow the rescheduled timer to fire anyway at its previously scheduled time, since this check will be true:

https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#689

   You can also get races on timer type, or the timer target, if for some bizarre reason something re-inits in a way that changes these things.

   I've observed these races (except for the timer target one) with the test-case in part 0.
Fix a race caused by reinitting a repeating precise timer while that timer is being processed by PostTimerEvent.
Attachment #8639934 - Attachment filename: . → ensure_removed.patch
Attachment #8639934 - Flags: review?(nfroyd)
(In reply to Byron Campen [:bwc] from comment #11)
> Created attachment 8639934 [details] [diff] [review]
> Part 2: Make absolutely sure a timer is removed before reinitting it
> 
> Fix a race caused by reinitting a repeating precise timer while that timer
> is being processed by PostTimerEvent.

   I should point out that the result of this race is that the timeout is modified while searching for an insert position in TimerThread::mTimers, which results in the timer being inserted into an undefined slot, which can then cause subsequent binary inserts to lead to undefined behavior too.
Attached patch Part 0: Fuzz test for timers (obsolete) — Splinter Review
Incorporate feedback.
Attachment #8639938 - Attachment filename: . → timer_fuzz_test.patch
Attachment #8639938 - Flags: review?(nfroyd)
Attachment #8637380 - Attachment is obsolete: true
Comment on attachment 8639934 [details] [diff] [review]
Part 2: Make absolutely sure a timer is removed before reinitting it

Review of attachment 8639934 [details] [diff] [review]:
-----------------------------------------------------------------

Hooray for dodgy code being removed.
Attachment #8639934 - Flags: review?(nfroyd) → review+
Comment on attachment 8639938 [details] [diff] [review]
Part 0: Fuzz test for timers

Review of attachment 8639938 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/tests/TestTimers.cpp
@@ +399,5 @@
> +      uint32_t delay = rand() % (FUZZ_MAX_TIMEOUT + 1);
> +      uint32_t type = GetRandomType();
> +      nsresult rv = aTimer->InitWithCallback(
> +          this,
> +          PR_MillisecondsToInterval(delay),

Timers don't use NSPR's intervals; they use straight-up milliseconds.

@@ +445,5 @@
> +  }
> +
> +  PRIntervalTime start = PR_IntervalNow();
> +
> +  while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) {

This condition assumes that the loop always takes less than 10 seconds, right?  Why not write it as something like:

for (auto& state : threadStates) {
  while (state->HasTimersOutstanding()) {
    PR_Sleep(PR_MillisecondsToInterval(10));
  }
}

?  That way we're not dependent on how long the loop body takes to execute, and we never re-examine threads that are already done with their timers.
Attachment #8639938 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> Comment on attachment 8639938 [details] [diff] [review]
> Part 0: Fuzz test for timers
> 
> Review of attachment 8639938 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: xpcom/tests/TestTimers.cpp
> @@ +399,5 @@
> > +      uint32_t delay = rand() % (FUZZ_MAX_TIMEOUT + 1);
> > +      uint32_t type = GetRandomType();
> > +      nsresult rv = aTimer->InitWithCallback(
> > +          this,
> > +          PR_MillisecondsToInterval(delay),
> 
> Timers don't use NSPR's intervals; they use straight-up milliseconds.
> 

   Oh! I'll fix the pre-existing instances of this in this test-case then. I hope it doesn't cause any tests to start failing...

> @@ +445,5 @@
> > +  }
> > +
> > +  PRIntervalTime start = PR_IntervalNow();
> > +
> > +  while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) {
> 
> This condition assumes that the loop always takes less than 10 seconds,
> right?  Why not write it as something like:
> 
> for (auto& state : threadStates) {
>   while (state->HasTimersOutstanding()) {
>     PR_Sleep(PR_MillisecondsToInterval(10));
>   }
> }
> 
> ?  That way we're not dependent on how long the loop body takes to execute,
> and we never re-examine threads that are already done with their timers.

   I did not want to have this test hang, requiring it to be killed, if something went wrong.
(In reply to Byron Campen [:bwc] from comment #17)
> (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > Timers don't use NSPR's intervals; they use straight-up milliseconds.
> > 
> 
>    Oh! I'll fix the pre-existing instances of this in this test-case then. I
> hope it doesn't cause any tests to start failing...

Doh.  I'm OK with you just fixing this instance and filing a bug on the other instances.

> > @@ +445,5 @@
> > > +  }
> > > +
> > > +  PRIntervalTime start = PR_IntervalNow();
> > > +
> > > +  while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) {
> > 
> > This condition assumes that the loop always takes less than 10 seconds,
> > right?  Why not write it as something like:
> > 
> > for (auto& state : threadStates) {
> >   while (state->HasTimersOutstanding()) {
> >     PR_Sleep(PR_MillisecondsToInterval(10));
> >   }
> > }
> > 
> > ?  That way we're not dependent on how long the loop body takes to execute,
> > and we never re-examine threads that are already done with their timers.
> 
>    I did not want to have this test hang, requiring it to be killed, if
> something went wrong.

That's a fair point.  The original loop is fine, then, but please note why we're using this fixed interval in a comment.  Another option is to make the storage a std::deque or similar, so we can pop states off once we've determined they have no outstanding timers.
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #18)
> (In reply to Byron Campen [:bwc] from comment #17)
> > (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > > Timers don't use NSPR's intervals; they use straight-up milliseconds.
> > > 
> > 
> >    Oh! I'll fix the pre-existing instances of this in this test-case then. I
> > hope it doesn't cause any tests to start failing...
> 
> Doh.  I'm OK with you just fixing this instance and filing a bug on the
> other instances.

   Fixing these does not seem to have any ill effect locally. We'll see what try says.
(In reply to Byron Campen [:bwc] from comment #19)
> (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #18)
> > (In reply to Byron Campen [:bwc] from comment #17)
> > > (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > > > Timers don't use NSPR's intervals; they use straight-up milliseconds.
> > > > 
> > > 
> > >    Oh! I'll fix the pre-existing instances of this in this test-case then. I
> > > hope it doesn't cause any tests to start failing...
> > 
> > Doh.  I'm OK with you just fixing this instance and filing a bug on the
> > other instances.
> 
>    Fixing these does not seem to have any ill effect locally. We'll see what
> try says.

Hooray.  For avoidance of doubt, please put those changes in a separate patch, so we can back them out separately if more extensive testing once checked in causes problems.
Incorportate feedback
Comment on attachment 8640562 [details] [diff] [review]
Part 0: Fuzz test for timers

Carry forward r=nfroyd
Attachment #8640562 - Attachment filename: . → timer_fuzz_test.patch
Attachment #8640562 - Flags: review+
Factor out fixes for pre-existing problems in TestTimers.
Comment on attachment 8640566 [details] [diff] [review]
Part 0.5: Fixes for pre-existing problems in TestTimers

I split out the s/NS_ASSERTION/MOZ_ASSERT/ too.
Attachment #8640566 - Attachment filename: . → fix_testtimers_problems.patch
Attachment #8640566 - Flags: review?(nfroyd)
Attachment #8639938 - Attachment is obsolete: true
Attachment #8640566 - Flags: review?(nfroyd) → review+
Check back on try
Flags: needinfo?(docfaraday)
Group: core-security → core-security-release
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main42-]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: