We should close the racy window of opportunity around PostTimerEvent

RESOLVED FIXED in Firefox 42, Firefox OS master

Status

()

Core
XPCOM
RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: bwc, Assigned: bwc)

Tracking

({sec-moderate})

Trunk
mozilla42
sec-moderate
Points:
---

Firefox Tracking Flags

(firefox42 fixed, b2g-master fixed)

Details

(Whiteboard: [post-critsmash-triage][adv-main42-])

Attachments

(4 attachments, 6 obsolete attachments)

(Assignee)

Description

4 years ago
Currently, while dispatching a timer, we unlock the mutex so that certain corner cases that require either re-adding the timer or removing the timer don't deadlock. We should just keep the mutex locked, and use lock-free variants of these functions. This will remove many race conditions.
(Assignee)

Comment 1

4 years ago
Created attachment 8480260 [details] [diff] [review]
(WIP) Keep the mutex locked while calling PostTimerEvent.

Initial cut.
(Assignee)

Updated

4 years ago
Assignee: nobody → docfaraday
Status: NEW → ASSIGNED
(Assignee)

Comment 2

4 years ago
Created attachment 8480261 [details] [diff] [review]
(WIP) Keep the mutex locked while calling PostTimerEvent.

Unrot.
(Assignee)

Updated

4 years ago
Attachment #8480260 - Attachment is obsolete: true
Keywords: sec-moderate
(Assignee)

Updated

4 years ago
See Also: → bug 1036515
(Assignee)

Comment 3

3 years ago
Created attachment 8631297 [details] [diff] [review]
Part 0: (WIP) Fuzz test for timers

A fuzz test-case I've been working on. It reveals that there are numerous problems related to rescheduling timers while PostTimerEvent is running.
(Assignee)

Updated

3 years ago
Attachment #8631297 - Attachment filename: . → timer_fuzz_test.patch
(Assignee)

Comment 4

3 years ago
Created attachment 8637328 [details] [diff] [review]
Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to protect it
(Assignee)

Updated

3 years ago
Attachment #8637328 - Attachment filename: . → move_posttimerevent.patch
(Assignee)

Updated

3 years ago
Attachment #8480261 - Attachment is obsolete: true
(Assignee)

Comment 5

3 years ago
Created attachment 8637380 [details] [diff] [review]
Part 0: (WIP) Fuzz test for timers

Fix warning on non-debug builds.
(Assignee)

Updated

3 years ago
Attachment #8637380 - Attachment filename: . → timer_fuzz_test.patch
(Assignee)

Updated

3 years ago
Attachment #8631297 - Attachment is obsolete: true
(Assignee)

Comment 6

3 years ago
Created attachment 8638105 [details] [diff] [review]
Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to protect it
(Assignee)

Updated

3 years ago
Attachment #8638105 - Attachment filename: . → move_posttimerevent.patch
Attachment #8638105 - Flags: review?(nfroyd)
(Assignee)

Updated

3 years ago
Attachment #8637328 - Attachment is obsolete: true
(Assignee)

Comment 7

3 years ago
Comment on attachment 8637380 [details] [diff] [review]
Part 0: (WIP) Fuzz test for timers

Review of attachment 8637380 [details] [diff] [review]:
-----------------------------------------------------------------

I've also converted NS_ASSERTION to MOZ_ASSERT in this test case, since hitting an NS_ASSERTION when running mach cppunittest doesn't cause tests to fail.
Attachment #8637380 - Flags: feedback?(nfroyd)
Comment on attachment 8637380 [details] [diff] [review]
Part 0: (WIP) Fuzz test for timers

Review of attachment 8637380 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable overall.

::: xpcom/tests/TestTimers.cpp
@@ +322,5 @@
> +        InitRandomTimer(CancelRandomTimer().get());
> +      }
> +      // Reschedule some timers without a Cancel first.
> +      for (size_t i = 0; i < kNumRescheduled; ++i) {
> +        InitRandomTimer(RemoveRandomTimer().get());

That you can do this and nothing complains seems surprising to me, but I haven't looked closely at the timer documentation recently.

@@ +438,5 @@
> +  for (size_t i = 0; i < kNumThreads; ++i) {
> +    threadStates[i]->Stop();
> +  }
> +
> +  PR_Sleep(PR_MillisecondsToInterval(1000));

I guess this is to wait for all outstanding timers to stop?  Seems like it would be better to wait on them somehow, to avoid the possibility of intermittents on the b2g emulators.
Attachment #8637380 - Flags: feedback?(nfroyd) → feedback+
Comment on attachment 8638105 [details] [diff] [review]
Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to protect it

Review of attachment 8638105 [details] [diff] [review]:
-----------------------------------------------------------------

I will say that this division of labor makes a lot more sense than the previous setup.

For my own edification, and posterity's sake, can you describe a scenario or two where having the monitor covering more of PostTimerEvent is beneficial?  I think I can believe it, and it certainly seems better to not have to keep acquiring the monitor to manipulate the timer list, but I'm at a loss for constructing an actual scenario.
Attachment #8638105 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 10

3 years ago
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #9)
> Comment on attachment 8638105 [details] [diff] [review]
> Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to
> protect it
> 
> Review of attachment 8638105 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I will say that this division of labor makes a lot more sense than the
> previous setup.
> 
> For my own edification, and posterity's sake, can you describe a scenario or
> two where having the monitor covering more of PostTimerEvent is beneficial? 
> I think I can believe it, and it certainly seems better to not have to keep
> acquiring the monitor to manipulate the timer list, but I'm at a loss for
> constructing an actual scenario.

   There are a few races. One is on nsTimerImpl::mGeneration. If the target thread reschedules a timer while it is being processed by PostTimerEvent, the TimerEvent could end up initting with the new value of mGeneration (or an undefined value) here:

https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp?from=nsTimerEvent#157

   This can allow the rescheduled timer to fire anyway at its previously scheduled time, since this check will be true:

https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#689

   You can also get races on timer type, or the timer target, if for some bizarre reason something re-inits in a way that changes these things.

   I've observed these races (except for the timer target one) with the test-case in part 0.
(Assignee)

Comment 11

3 years ago
Created attachment 8639934 [details] [diff] [review]
Part 2: Make absolutely sure a timer is removed before reinitting it

Fix a race caused by reinitting a repeating precise timer while that timer is being processed by PostTimerEvent.
(Assignee)

Updated

3 years ago
Attachment #8639934 - Attachment filename: . → ensure_removed.patch
Attachment #8639934 - Flags: review?(nfroyd)
(Assignee)

Comment 12

3 years ago
(In reply to Byron Campen [:bwc] from comment #11)
> Created attachment 8639934 [details] [diff] [review]
> Part 2: Make absolutely sure a timer is removed before reinitting it
> 
> Fix a race caused by reinitting a repeating precise timer while that timer
> is being processed by PostTimerEvent.

   I should point out that the result of this race is that the timeout is modified while searching for an insert position in TimerThread::mTimers, which results in the timer being inserted into an undefined slot, which can then cause subsequent binary inserts to lead to undefined behavior too.
(Assignee)

Comment 13

3 years ago
Created attachment 8639938 [details] [diff] [review]
Part 0: Fuzz test for timers

Incorporate feedback.
(Assignee)

Updated

3 years ago
Attachment #8639938 - Attachment filename: . → timer_fuzz_test.patch
Attachment #8639938 - Flags: review?(nfroyd)
(Assignee)

Updated

3 years ago
Attachment #8637380 - Attachment is obsolete: true
Comment on attachment 8639934 [details] [diff] [review]
Part 2: Make absolutely sure a timer is removed before reinitting it

Review of attachment 8639934 [details] [diff] [review]:
-----------------------------------------------------------------

Hooray for dodgy code being removed.
Attachment #8639934 - Flags: review?(nfroyd) → review+
Comment on attachment 8639938 [details] [diff] [review]
Part 0: Fuzz test for timers

Review of attachment 8639938 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/tests/TestTimers.cpp
@@ +399,5 @@
> +      uint32_t delay = rand() % (FUZZ_MAX_TIMEOUT + 1);
> +      uint32_t type = GetRandomType();
> +      nsresult rv = aTimer->InitWithCallback(
> +          this,
> +          PR_MillisecondsToInterval(delay),

Timers don't use NSPR's intervals; they use straight-up milliseconds.

@@ +445,5 @@
> +  }
> +
> +  PRIntervalTime start = PR_IntervalNow();
> +
> +  while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) {

This condition assumes that the loop always takes less than 10 seconds, right?  Why not write it as something like:

for (auto& state : threadStates) {
  while (state->HasTimersOutstanding()) {
    PR_Sleep(PR_MillisecondsToInterval(10));
  }
}

?  That way we're not dependent on how long the loop body takes to execute, and we never re-examine threads that are already done with their timers.
Attachment #8639938 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 17

3 years ago
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> Comment on attachment 8639938 [details] [diff] [review]
> Part 0: Fuzz test for timers
> 
> Review of attachment 8639938 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: xpcom/tests/TestTimers.cpp
> @@ +399,5 @@
> > +      uint32_t delay = rand() % (FUZZ_MAX_TIMEOUT + 1);
> > +      uint32_t type = GetRandomType();
> > +      nsresult rv = aTimer->InitWithCallback(
> > +          this,
> > +          PR_MillisecondsToInterval(delay),
> 
> Timers don't use NSPR's intervals; they use straight-up milliseconds.
> 

   Oh! I'll fix the pre-existing instances of this in this test-case then. I hope it doesn't cause any tests to start failing...

> @@ +445,5 @@
> > +  }
> > +
> > +  PRIntervalTime start = PR_IntervalNow();
> > +
> > +  while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) {
> 
> This condition assumes that the loop always takes less than 10 seconds,
> right?  Why not write it as something like:
> 
> for (auto& state : threadStates) {
>   while (state->HasTimersOutstanding()) {
>     PR_Sleep(PR_MillisecondsToInterval(10));
>   }
> }
> 
> ?  That way we're not dependent on how long the loop body takes to execute,
> and we never re-examine threads that are already done with their timers.

   I did not want to have this test hang, requiring it to be killed, if something went wrong.
(In reply to Byron Campen [:bwc] from comment #17)
> (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > Timers don't use NSPR's intervals; they use straight-up milliseconds.
> > 
> 
>    Oh! I'll fix the pre-existing instances of this in this test-case then. I
> hope it doesn't cause any tests to start failing...

Doh.  I'm OK with you just fixing this instance and filing a bug on the other instances.

> > @@ +445,5 @@
> > > +  }
> > > +
> > > +  PRIntervalTime start = PR_IntervalNow();
> > > +
> > > +  while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) {
> > 
> > This condition assumes that the loop always takes less than 10 seconds,
> > right?  Why not write it as something like:
> > 
> > for (auto& state : threadStates) {
> >   while (state->HasTimersOutstanding()) {
> >     PR_Sleep(PR_MillisecondsToInterval(10));
> >   }
> > }
> > 
> > ?  That way we're not dependent on how long the loop body takes to execute,
> > and we never re-examine threads that are already done with their timers.
> 
>    I did not want to have this test hang, requiring it to be killed, if
> something went wrong.

That's a fair point.  The original loop is fine, then, but please note why we're using this fixed interval in a comment.  Another option is to make the storage a std::deque or similar, so we can pop states off once we've determined they have no outstanding timers.
(Assignee)

Comment 19

3 years ago
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #18)
> (In reply to Byron Campen [:bwc] from comment #17)
> > (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > > Timers don't use NSPR's intervals; they use straight-up milliseconds.
> > > 
> > 
> >    Oh! I'll fix the pre-existing instances of this in this test-case then. I
> > hope it doesn't cause any tests to start failing...
> 
> Doh.  I'm OK with you just fixing this instance and filing a bug on the
> other instances.

   Fixing these does not seem to have any ill effect locally. We'll see what try says.
(In reply to Byron Campen [:bwc] from comment #19)
> (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #18)
> > (In reply to Byron Campen [:bwc] from comment #17)
> > > (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > > > Timers don't use NSPR's intervals; they use straight-up milliseconds.
> > > > 
> > > 
> > >    Oh! I'll fix the pre-existing instances of this in this test-case then. I
> > > hope it doesn't cause any tests to start failing...
> > 
> > Doh.  I'm OK with you just fixing this instance and filing a bug on the
> > other instances.
> 
>    Fixing these does not seem to have any ill effect locally. We'll see what
> try says.

Hooray.  For avoidance of doubt, please put those changes in a separate patch, so we can back them out separately if more extensive testing once checked in causes problems.
(Assignee)

Comment 21

3 years ago
Created attachment 8640562 [details] [diff] [review]
Part 0: Fuzz test for timers

Incorportate feedback
(Assignee)

Comment 22

3 years ago
Comment on attachment 8640562 [details] [diff] [review]
Part 0: Fuzz test for timers

Carry forward r=nfroyd
Attachment #8640562 - Attachment filename: . → timer_fuzz_test.patch
Attachment #8640562 - Flags: review+
(Assignee)

Comment 23

3 years ago
Created attachment 8640566 [details] [diff] [review]
Part 0.5: Fixes for pre-existing problems in TestTimers

Factor out fixes for pre-existing problems in TestTimers.
(Assignee)

Comment 24

3 years ago
Comment on attachment 8640566 [details] [diff] [review]
Part 0.5: Fixes for pre-existing problems in TestTimers

I split out the s/NS_ASSERTION/MOZ_ASSERT/ too.
Attachment #8640566 - Attachment filename: . → fix_testtimers_problems.patch
Attachment #8640566 - Flags: review?(nfroyd)
(Assignee)

Updated

3 years ago
Attachment #8639938 - Attachment is obsolete: true
Attachment #8640566 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 26

3 years ago
Check back on try
Flags: needinfo?(docfaraday)
https://hg.mozilla.org/mozilla-central/rev/d6ae2155496b
https://hg.mozilla.org/mozilla-central/rev/7793255f366e
https://hg.mozilla.org/mozilla-central/rev/88854d602d98
https://hg.mozilla.org/mozilla-central/rev/80677b385ce0
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-b2g-master: --- → fixed
status-firefox42: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42

Updated

3 years ago
Group: core-security → core-security-release
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main42-]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.