Closed Bug 1059572 Opened 10 years ago Closed 9 years ago

We should close the racy window of opportunity around PostTimerEvent

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla42

Tracking Flags:

Tracking

Status

firefox42

---

fixed

b2g-master

---

fixed

People

(Reporter: bwc, Assigned: bwc)

References

Details

(Keywords: sec-moderate, Whiteboard: [post-critsmash-triage][adv-main42-])

Attachments

(4 files, 6 obsolete files)

(WIP) Keep the mutex locked while calling PostTimerEvent. 10 years ago Byron Campen [:bwc] 7.50 KB, patch		Details \| Diff \| Splinter Review
(WIP) Keep the mutex locked while calling PostTimerEvent. 10 years ago Byron Campen [:bwc] 7.51 KB, patch		Details \| Diff \| Splinter Review
Part 0: (WIP) Fuzz test for timers 9 years ago Byron Campen [:bwc] 10.04 KB, patch		Details \| Diff \| Splinter Review
Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to protect it 9 years ago Byron Campen [:bwc] 20.60 KB, patch		Details \| Diff \| Splinter Review
Part 0: (WIP) Fuzz test for timers 9 years ago Byron Campen [:bwc] 10.06 KB, patch	froydnj : feedback+	Details \| Diff \| Splinter Review
Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to protect it 9 years ago Byron Campen [:bwc] 20.53 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 2: Make absolutely sure a timer is removed before reinitting it 9 years ago Byron Campen [:bwc] 1.69 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 0: Fuzz test for timers 9 years ago Byron Campen [:bwc] 10.60 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 0: Fuzz test for timers 9 years ago Byron Campen [:bwc] 9.69 KB, patch	bwc : review+	Details \| Diff \| Splinter Review
Part 0.5: Fixes for pre-existing problems in TestTimers 9 years ago Byron Campen [:bwc] 2.95 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review

Byron Campen [:bwc]

Assignee

Description

•

10 years ago

Currently, while dispatching a timer, we unlock the mutex so that certain corner cases that require either re-adding the timer or removing the timer don't deadlock. We should just keep the mutex locked, and use lock-free variants of these functions. This will remove many race conditions.

Byron Campen [:bwc]

Assignee

Comment 1

•

10 years ago

Attached patch (WIP) Keep the mutex locked while calling PostTimerEvent. (obsolete) — Details — Splinter Review

Initial cut.

Byron Campen [:bwc]

Assignee

Updated

•

10 years ago

Assignee: nobody → docfaraday

Status: NEW → ASSIGNED

Byron Campen [:bwc]

Assignee

Comment 2

•

10 years ago

Attached patch (WIP) Keep the mutex locked while calling PostTimerEvent. (obsolete) — Details — Splinter Review

Unrot.

Byron Campen [:bwc]

Assignee

Updated

•

10 years ago

Attachment #8480260 - Attachment is obsolete: true

Daniel Veditz [:dveditz]

Updated

•

10 years ago

Keywords: sec-moderate

Byron Campen [:bwc]

Assignee

Updated

•

10 years ago

Comment 3

•

9 years ago

Attached patch Part 0: (WIP) Fuzz test for timers (obsolete) — Details — Splinter Review

A fuzz test-case I've been working on. It reveals that there are numerous problems related to rescheduling timers while PostTimerEvent is running.

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8631297 - Attachment filename: . → timer_fuzz_test.patch

Byron Campen [:bwc]

Assignee

Comment 4

•

9 years ago

Attached patch Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to protect it (obsolete) — Details — Splinter Review

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8637328 - Attachment filename: . → move_posttimerevent.patch

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8480261 - Attachment is obsolete: true

Byron Campen [:bwc]

Assignee

Comment 5

•

9 years ago

Attached patch Part 0: (WIP) Fuzz test for timers (obsolete) — Details — Splinter Review

Fix warning on non-debug builds.

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8637380 - Attachment filename: . → timer_fuzz_test.patch

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8631297 - Attachment is obsolete: true

Byron Campen [:bwc]

Assignee

Comment 6

•

9 years ago

Attached patch Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to protect it — Details — Splinter Review

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8638105 - Attachment filename: . → move_posttimerevent.patch

Attachment #8638105 - Flags: review?(nfroyd)

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8637328 - Attachment is obsolete: true

Byron Campen [:bwc]

Assignee

Comment 7

•

9 years ago

Comment on attachment 8637380 [details] [diff] [review]
Part 0: (WIP) Fuzz test for timers

Review of attachment 8637380 [details] [diff] [review]:
-----------------------------------------------------------------

I've also converted NS_ASSERTION to MOZ_ASSERT in this test case, since hitting an NS_ASSERTION when running mach cppunittest doesn't cause tests to fail.

Attachment #8637380 - Flags: feedback?(nfroyd)

Nathan Froyd [:froydnj]

Comment 8

•

9 years ago

Comment on attachment 8637380 [details] [diff] [review]
Part 0: (WIP) Fuzz test for timers

Review of attachment 8637380 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable overall.

::: xpcom/tests/TestTimers.cpp
@@ +322,5 @@
> +        InitRandomTimer(CancelRandomTimer().get());
> +      }
> +      // Reschedule some timers without a Cancel first.
> +      for (size_t i = 0; i < kNumRescheduled; ++i) {
> +        InitRandomTimer(RemoveRandomTimer().get());

That you can do this and nothing complains seems surprising to me, but I haven't looked closely at the timer documentation recently.

@@ +438,5 @@
> +  for (size_t i = 0; i < kNumThreads; ++i) {
> +    threadStates[i]->Stop();
> +  }
> +
> +  PR_Sleep(PR_MillisecondsToInterval(1000));

I guess this is to wait for all outstanding timers to stop?  Seems like it would be better to wait on them somehow, to avoid the possibility of intermittents on the b2g emulators.

Attachment #8637380 - Flags: feedback?(nfroyd) → feedback+

Nathan Froyd [:froydnj]

Comment 9

•

9 years ago

Comment on attachment 8638105 [details] [diff] [review]
Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to protect it

Review of attachment 8638105 [details] [diff] [review]:
-----------------------------------------------------------------

I will say that this division of labor makes a lot more sense than the previous setup.

For my own edification, and posterity's sake, can you describe a scenario or two where having the monitor covering more of PostTimerEvent is beneficial?  I think I can believe it, and it certainly seems better to not have to keep acquiring the monitor to manipulate the timer list, but I'm at a loss for constructing an actual scenario.

Attachment #8638105 - Flags: review?(nfroyd) → review+

Byron Campen [:bwc]

Assignee

Comment 10

•

9 years ago

(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #9)
> Comment on attachment 8638105 [details] [diff] [review]
> Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to
> protect it
> 
> Review of attachment 8638105 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I will say that this division of labor makes a lot more sense than the
> previous setup.
> 
> For my own edification, and posterity's sake, can you describe a scenario or
> two where having the monitor covering more of PostTimerEvent is beneficial? 
> I think I can believe it, and it certainly seems better to not have to keep
> acquiring the monitor to manipulate the timer list, but I'm at a loss for
> constructing an actual scenario.

   There are a few races. One is on nsTimerImpl::mGeneration. If the target thread reschedules a timer while it is being processed by PostTimerEvent, the TimerEvent could end up initting with the new value of mGeneration (or an undefined value) here:

https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp?from=nsTimerEvent#157

   This can allow the rescheduled timer to fire anyway at its previously scheduled time, since this check will be true:

https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#689

   You can also get races on timer type, or the timer target, if for some bizarre reason something re-inits in a way that changes these things.

   I've observed these races (except for the timer target one) with the test-case in part 0.

Byron Campen [:bwc]

Assignee

Comment 11

•

9 years ago

Attached patch Part 2: Make absolutely sure a timer is removed before reinitting it — Details — Splinter Review

Fix a race caused by reinitting a repeating precise timer while that timer is being processed by PostTimerEvent.

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8639934 - Attachment filename: . → ensure_removed.patch

Attachment #8639934 - Flags: review?(nfroyd)

Byron Campen [:bwc]

Assignee

Comment 12

•

9 years ago

(In reply to Byron Campen [:bwc] from comment #11)
> Created attachment 8639934 [details] [diff] [review]
> Part 2: Make absolutely sure a timer is removed before reinitting it
> 
> Fix a race caused by reinitting a repeating precise timer while that timer
> is being processed by PostTimerEvent.

   I should point out that the result of this race is that the timeout is modified while searching for an insert position in TimerThread::mTimers, which results in the timer being inserted into an undefined slot, which can then cause subsequent binary inserts to lead to undefined behavior too.

Byron Campen [:bwc]

Assignee

Comment 13

•

9 years ago

Attached patch Part 0: Fuzz test for timers (obsolete) — Details — Splinter Review

Incorporate feedback.

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8639938 - Attachment filename: . → timer_fuzz_test.patch

Attachment #8639938 - Flags: review?(nfroyd)

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8637380 - Attachment is obsolete: true

Byron Campen [:bwc]

Assignee

Comment 14

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a300e42a5ad7

Nathan Froyd [:froydnj]

Comment 15

•

9 years ago

Comment on attachment 8639934 [details] [diff] [review]
Part 2: Make absolutely sure a timer is removed before reinitting it

Review of attachment 8639934 [details] [diff] [review]:
-----------------------------------------------------------------

Hooray for dodgy code being removed.

Attachment #8639934 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Comment 16

•

9 years ago

Comment on attachment 8639938 [details] [diff] [review]
Part 0: Fuzz test for timers

Review of attachment 8639938 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/tests/TestTimers.cpp
@@ +399,5 @@
> +      uint32_t delay = rand() % (FUZZ_MAX_TIMEOUT + 1);
> +      uint32_t type = GetRandomType();
> +      nsresult rv = aTimer->InitWithCallback(
> +          this,
> +          PR_MillisecondsToInterval(delay),

Timers don't use NSPR's intervals; they use straight-up milliseconds.

@@ +445,5 @@
> +  }
> +
> +  PRIntervalTime start = PR_IntervalNow();
> +
> +  while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) {

This condition assumes that the loop always takes less than 10 seconds, right?  Why not write it as something like:

for (auto& state : threadStates) {
  while (state->HasTimersOutstanding()) {
    PR_Sleep(PR_MillisecondsToInterval(10));
  }
}

?  That way we're not dependent on how long the loop body takes to execute, and we never re-examine threads that are already done with their timers.

Attachment #8639938 - Flags: review?(nfroyd) → review+

Byron Campen [:bwc]

Assignee

Comment 17

•

9 years ago

(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> Comment on attachment 8639938 [details] [diff] [review]
> Part 0: Fuzz test for timers
> 
> Review of attachment 8639938 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: xpcom/tests/TestTimers.cpp
> @@ +399,5 @@
> > +      uint32_t delay = rand() % (FUZZ_MAX_TIMEOUT + 1);
> > +      uint32_t type = GetRandomType();
> > +      nsresult rv = aTimer->InitWithCallback(
> > +          this,
> > +          PR_MillisecondsToInterval(delay),
> 
> Timers don't use NSPR's intervals; they use straight-up milliseconds.
> 

   Oh! I'll fix the pre-existing instances of this in this test-case then. I hope it doesn't cause any tests to start failing...

> @@ +445,5 @@
> > +  }
> > +
> > +  PRIntervalTime start = PR_IntervalNow();
> > +
> > +  while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) {
> 
> This condition assumes that the loop always takes less than 10 seconds,
> right?  Why not write it as something like:
> 
> for (auto& state : threadStates) {
>   while (state->HasTimersOutstanding()) {
>     PR_Sleep(PR_MillisecondsToInterval(10));
>   }
> }
> 
> ?  That way we're not dependent on how long the loop body takes to execute,
> and we never re-examine threads that are already done with their timers.

   I did not want to have this test hang, requiring it to be killed, if something went wrong.

Nathan Froyd [:froydnj]

Comment 18

•

9 years ago

(In reply to Byron Campen [:bwc] from comment #17)
> (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > Timers don't use NSPR's intervals; they use straight-up milliseconds.
> > 
> 
>    Oh! I'll fix the pre-existing instances of this in this test-case then. I
> hope it doesn't cause any tests to start failing...

Doh.  I'm OK with you just fixing this instance and filing a bug on the other instances.

> > @@ +445,5 @@
> > > +  }
> > > +
> > > +  PRIntervalTime start = PR_IntervalNow();
> > > +
> > > +  while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) {
> > 
> > This condition assumes that the loop always takes less than 10 seconds,
> > right?  Why not write it as something like:
> > 
> > for (auto& state : threadStates) {
> >   while (state->HasTimersOutstanding()) {
> >     PR_Sleep(PR_MillisecondsToInterval(10));
> >   }
> > }
> > 
> > ?  That way we're not dependent on how long the loop body takes to execute,
> > and we never re-examine threads that are already done with their timers.
> 
>    I did not want to have this test hang, requiring it to be killed, if
> something went wrong.

That's a fair point.  The original loop is fine, then, but please note why we're using this fixed interval in a comment.  Another option is to make the storage a std::deque or similar, so we can pop states off once we've determined they have no outstanding timers.

Byron Campen [:bwc]

Assignee

Comment 19

•

9 years ago

(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #18)
> (In reply to Byron Campen [:bwc] from comment #17)
> > (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > > Timers don't use NSPR's intervals; they use straight-up milliseconds.
> > > 
> > 
> >    Oh! I'll fix the pre-existing instances of this in this test-case then. I
> > hope it doesn't cause any tests to start failing...
> 
> Doh.  I'm OK with you just fixing this instance and filing a bug on the
> other instances.

   Fixing these does not seem to have any ill effect locally. We'll see what try says.

Nathan Froyd [:froydnj]

Comment 20

•

9 years ago

(In reply to Byron Campen [:bwc] from comment #19)
> (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #18)
> > (In reply to Byron Campen [:bwc] from comment #17)
> > > (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16)
> > > > Timers don't use NSPR's intervals; they use straight-up milliseconds.
> > > > 
> > > 
> > >    Oh! I'll fix the pre-existing instances of this in this test-case then. I
> > > hope it doesn't cause any tests to start failing...
> > 
> > Doh.  I'm OK with you just fixing this instance and filing a bug on the
> > other instances.
> 
>    Fixing these does not seem to have any ill effect locally. We'll see what
> try says.

Hooray.  For avoidance of doubt, please put those changes in a separate patch, so we can back them out separately if more extensive testing once checked in causes problems.

Byron Campen [:bwc]

Assignee

Comment 21

•

9 years ago

Attached patch Part 0: Fuzz test for timers — Details — Splinter Review

Incorportate feedback

Byron Campen [:bwc]

Assignee

Comment 22

•

9 years ago

Comment on attachment 8640562 [details] [diff] [review]
Part 0: Fuzz test for timers

Carry forward r=nfroyd

Attachment #8640562 - Attachment filename: . → timer_fuzz_test.patch

Attachment #8640562 - Flags: review+

Byron Campen [:bwc]

Assignee

Comment 23

•

9 years ago

Attached patch Part 0.5: Fixes for pre-existing problems in TestTimers — Details — Splinter Review

Factor out fixes for pre-existing problems in TestTimers.

Byron Campen [:bwc]

Assignee

Comment 24

•

9 years ago

Comment on attachment 8640566 [details] [diff] [review]
Part 0.5: Fixes for pre-existing problems in TestTimers

I split out the s/NS_ASSERTION/MOZ_ASSERT/ too.

Attachment #8640566 - Attachment filename: . → fix_testtimers_problems.patch

Attachment #8640566 - Flags: review?(nfroyd)

Byron Campen [:bwc]

Assignee

Updated

•

9 years ago

Attachment #8639938 - Attachment is obsolete: true

Nathan Froyd [:froydnj]

Updated

•

9 years ago

Attachment #8640566 - Flags: review?(nfroyd) → review+

Byron Campen [:bwc]

Assignee

Comment 25

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4eab8f744a1

Byron Campen [:bwc]

Assignee

Comment 26

•

9 years ago

Check back on try

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 27

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d6ae2155496b
https://hg.mozilla.org/integration/mozilla-inbound/rev/88854d602d98
https://hg.mozilla.org/integration/mozilla-inbound/rev/7793255f366e
https://hg.mozilla.org/integration/mozilla-inbound/rev/80677b385ce0

Flags: needinfo?(docfaraday)

Carsten Book [:Tomcat]

Comment 28

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/d6ae2155496b
https://hg.mozilla.org/mozilla-central/rev/7793255f366e
https://hg.mozilla.org/mozilla-central/rev/88854d602d98
https://hg.mozilla.org/mozilla-central/rev/80677b385ce0

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-b2g-master: --- → fixed

status-firefox42: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

BMO Automation

Updated

•

9 years ago

Group: core-security → core-security-release

Matt Wobensmith [:mwobensmith][:matt:]

Updated

•

9 years ago

Whiteboard: [post-critsmash-triage]

Al Billings [:abillings - ex-MoCo]

Updated

•

9 years ago

Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main42-]

Daniel Veditz [:dveditz]

Updated

•

8 years ago

Group: core-security-release

You need to log in before you can comment on or make changes to this bug.