Closed
Bug 1059572
Opened 10 years ago
Closed 9 years ago
We should close the racy window of opportunity around PostTimerEvent
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla42
People
(Reporter: bwc, Assigned: bwc)
References
Details
(Keywords: sec-moderate, Whiteboard: [post-critsmash-triage][adv-main42-])
Attachments
(4 files, 6 obsolete files)
20.53 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
1.69 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
9.69 KB,
patch
|
bwc
:
review+
|
Details | Diff | Splinter Review |
2.95 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
Currently, while dispatching a timer, we unlock the mutex so that certain corner cases that require either re-adding the timer or removing the timer don't deadlock. We should just keep the mutex locked, and use lock-free variants of these functions. This will remove many race conditions.
Assignee | ||
Comment 1•10 years ago
|
||
Initial cut.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → docfaraday
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•10 years ago
|
||
Unrot.
Assignee | ||
Updated•10 years ago
|
Attachment #8480260 -
Attachment is obsolete: true
Updated•10 years ago
|
Keywords: sec-moderate
Assignee | ||
Comment 3•9 years ago
|
||
A fuzz test-case I've been working on. It reveals that there are numerous problems related to rescheduling timers while PostTimerEvent is running.
Assignee | ||
Updated•9 years ago
|
Attachment #8631297 -
Attachment filename: . → timer_fuzz_test.patch
Assignee | ||
Comment 4•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Attachment #8637328 -
Attachment filename: . → move_posttimerevent.patch
Assignee | ||
Updated•9 years ago
|
Attachment #8480261 -
Attachment is obsolete: true
Assignee | ||
Comment 5•9 years ago
|
||
Fix warning on non-debug builds.
Assignee | ||
Updated•9 years ago
|
Attachment #8637380 -
Attachment filename: . → timer_fuzz_test.patch
Assignee | ||
Updated•9 years ago
|
Attachment #8631297 -
Attachment is obsolete: true
Assignee | ||
Comment 6•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Attachment #8638105 -
Attachment filename: . → move_posttimerevent.patch
Attachment #8638105 -
Flags: review?(nfroyd)
Assignee | ||
Updated•9 years ago
|
Attachment #8637328 -
Attachment is obsolete: true
Assignee | ||
Comment 7•9 years ago
|
||
Comment on attachment 8637380 [details] [diff] [review] Part 0: (WIP) Fuzz test for timers Review of attachment 8637380 [details] [diff] [review]: ----------------------------------------------------------------- I've also converted NS_ASSERTION to MOZ_ASSERT in this test case, since hitting an NS_ASSERTION when running mach cppunittest doesn't cause tests to fail.
Attachment #8637380 -
Flags: feedback?(nfroyd)
Comment 8•9 years ago
|
||
Comment on attachment 8637380 [details] [diff] [review] Part 0: (WIP) Fuzz test for timers Review of attachment 8637380 [details] [diff] [review]: ----------------------------------------------------------------- Seems reasonable overall. ::: xpcom/tests/TestTimers.cpp @@ +322,5 @@ > + InitRandomTimer(CancelRandomTimer().get()); > + } > + // Reschedule some timers without a Cancel first. > + for (size_t i = 0; i < kNumRescheduled; ++i) { > + InitRandomTimer(RemoveRandomTimer().get()); That you can do this and nothing complains seems surprising to me, but I haven't looked closely at the timer documentation recently. @@ +438,5 @@ > + for (size_t i = 0; i < kNumThreads; ++i) { > + threadStates[i]->Stop(); > + } > + > + PR_Sleep(PR_MillisecondsToInterval(1000)); I guess this is to wait for all outstanding timers to stop? Seems like it would be better to wait on them somehow, to avoid the possibility of intermittents on the b2g emulators.
Attachment #8637380 -
Flags: feedback?(nfroyd) → feedback+
Comment 9•9 years ago
|
||
Comment on attachment 8638105 [details] [diff] [review] Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to protect it Review of attachment 8638105 [details] [diff] [review]: ----------------------------------------------------------------- I will say that this division of labor makes a lot more sense than the previous setup. For my own edification, and posterity's sake, can you describe a scenario or two where having the monitor covering more of PostTimerEvent is beneficial? I think I can believe it, and it certainly seems better to not have to keep acquiring the monitor to manipulate the timer list, but I'm at a loss for constructing an actual scenario.
Attachment #8638105 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 10•9 years ago
|
||
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #9) > Comment on attachment 8638105 [details] [diff] [review] > Part 1: Move PostTimerEvent to TimerThread to allow TimerThread's monitor to > protect it > > Review of attachment 8638105 [details] [diff] [review]: > ----------------------------------------------------------------- > > I will say that this division of labor makes a lot more sense than the > previous setup. > > For my own edification, and posterity's sake, can you describe a scenario or > two where having the monitor covering more of PostTimerEvent is beneficial? > I think I can believe it, and it certainly seems better to not have to keep > acquiring the monitor to manipulate the timer list, but I'm at a loss for > constructing an actual scenario. There are a few races. One is on nsTimerImpl::mGeneration. If the target thread reschedules a timer while it is being processed by PostTimerEvent, the TimerEvent could end up initting with the new value of mGeneration (or an undefined value) here: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp?from=nsTimerEvent#157 This can allow the rescheduled timer to fire anyway at its previously scheduled time, since this check will be true: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#689 You can also get races on timer type, or the timer target, if for some bizarre reason something re-inits in a way that changes these things. I've observed these races (except for the timer target one) with the test-case in part 0.
Assignee | ||
Comment 11•9 years ago
|
||
Fix a race caused by reinitting a repeating precise timer while that timer is being processed by PostTimerEvent.
Assignee | ||
Updated•9 years ago
|
Attachment #8639934 -
Attachment filename: . → ensure_removed.patch
Attachment #8639934 -
Flags: review?(nfroyd)
Assignee | ||
Comment 12•9 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #11) > Created attachment 8639934 [details] [diff] [review] > Part 2: Make absolutely sure a timer is removed before reinitting it > > Fix a race caused by reinitting a repeating precise timer while that timer > is being processed by PostTimerEvent. I should point out that the result of this race is that the timeout is modified while searching for an insert position in TimerThread::mTimers, which results in the timer being inserted into an undefined slot, which can then cause subsequent binary inserts to lead to undefined behavior too.
Assignee | ||
Comment 13•9 years ago
|
||
Incorporate feedback.
Assignee | ||
Updated•9 years ago
|
Attachment #8639938 -
Attachment filename: . → timer_fuzz_test.patch
Attachment #8639938 -
Flags: review?(nfroyd)
Assignee | ||
Updated•9 years ago
|
Attachment #8637380 -
Attachment is obsolete: true
Assignee | ||
Comment 14•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a300e42a5ad7
Comment 15•9 years ago
|
||
Comment on attachment 8639934 [details] [diff] [review] Part 2: Make absolutely sure a timer is removed before reinitting it Review of attachment 8639934 [details] [diff] [review]: ----------------------------------------------------------------- Hooray for dodgy code being removed.
Attachment #8639934 -
Flags: review?(nfroyd) → review+
Comment 16•9 years ago
|
||
Comment on attachment 8639938 [details] [diff] [review] Part 0: Fuzz test for timers Review of attachment 8639938 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/tests/TestTimers.cpp @@ +399,5 @@ > + uint32_t delay = rand() % (FUZZ_MAX_TIMEOUT + 1); > + uint32_t type = GetRandomType(); > + nsresult rv = aTimer->InitWithCallback( > + this, > + PR_MillisecondsToInterval(delay), Timers don't use NSPR's intervals; they use straight-up milliseconds. @@ +445,5 @@ > + } > + > + PRIntervalTime start = PR_IntervalNow(); > + > + while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) { This condition assumes that the loop always takes less than 10 seconds, right? Why not write it as something like: for (auto& state : threadStates) { while (state->HasTimersOutstanding()) { PR_Sleep(PR_MillisecondsToInterval(10)); } } ? That way we're not dependent on how long the loop body takes to execute, and we never re-examine threads that are already done with their timers.
Attachment #8639938 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 17•9 years ago
|
||
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16) > Comment on attachment 8639938 [details] [diff] [review] > Part 0: Fuzz test for timers > > Review of attachment 8639938 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: xpcom/tests/TestTimers.cpp > @@ +399,5 @@ > > + uint32_t delay = rand() % (FUZZ_MAX_TIMEOUT + 1); > > + uint32_t type = GetRandomType(); > > + nsresult rv = aTimer->InitWithCallback( > > + this, > > + PR_MillisecondsToInterval(delay), > > Timers don't use NSPR's intervals; they use straight-up milliseconds. > Oh! I'll fix the pre-existing instances of this in this test-case then. I hope it doesn't cause any tests to start failing... > @@ +445,5 @@ > > + } > > + > > + PRIntervalTime start = PR_IntervalNow(); > > + > > + while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) { > > This condition assumes that the loop always takes less than 10 seconds, > right? Why not write it as something like: > > for (auto& state : threadStates) { > while (state->HasTimersOutstanding()) { > PR_Sleep(PR_MillisecondsToInterval(10)); > } > } > > ? That way we're not dependent on how long the loop body takes to execute, > and we never re-examine threads that are already done with their timers. I did not want to have this test hang, requiring it to be killed, if something went wrong.
Comment 18•9 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #17) > (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16) > > Timers don't use NSPR's intervals; they use straight-up milliseconds. > > > > Oh! I'll fix the pre-existing instances of this in this test-case then. I > hope it doesn't cause any tests to start failing... Doh. I'm OK with you just fixing this instance and filing a bug on the other instances. > > @@ +445,5 @@ > > > + } > > > + > > > + PRIntervalTime start = PR_IntervalNow(); > > > + > > > + while (PR_IntervalToMilliseconds(PR_IntervalNow() - start) < 10000) { > > > > This condition assumes that the loop always takes less than 10 seconds, > > right? Why not write it as something like: > > > > for (auto& state : threadStates) { > > while (state->HasTimersOutstanding()) { > > PR_Sleep(PR_MillisecondsToInterval(10)); > > } > > } > > > > ? That way we're not dependent on how long the loop body takes to execute, > > and we never re-examine threads that are already done with their timers. > > I did not want to have this test hang, requiring it to be killed, if > something went wrong. That's a fair point. The original loop is fine, then, but please note why we're using this fixed interval in a comment. Another option is to make the storage a std::deque or similar, so we can pop states off once we've determined they have no outstanding timers.
Assignee | ||
Comment 19•9 years ago
|
||
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #18) > (In reply to Byron Campen [:bwc] from comment #17) > > (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16) > > > Timers don't use NSPR's intervals; they use straight-up milliseconds. > > > > > > > Oh! I'll fix the pre-existing instances of this in this test-case then. I > > hope it doesn't cause any tests to start failing... > > Doh. I'm OK with you just fixing this instance and filing a bug on the > other instances. Fixing these does not seem to have any ill effect locally. We'll see what try says.
Comment 20•9 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #19) > (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #18) > > (In reply to Byron Campen [:bwc] from comment #17) > > > (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #16) > > > > Timers don't use NSPR's intervals; they use straight-up milliseconds. > > > > > > > > > > Oh! I'll fix the pre-existing instances of this in this test-case then. I > > > hope it doesn't cause any tests to start failing... > > > > Doh. I'm OK with you just fixing this instance and filing a bug on the > > other instances. > > Fixing these does not seem to have any ill effect locally. We'll see what > try says. Hooray. For avoidance of doubt, please put those changes in a separate patch, so we can back them out separately if more extensive testing once checked in causes problems.
Assignee | ||
Comment 21•9 years ago
|
||
Incorportate feedback
Assignee | ||
Comment 22•9 years ago
|
||
Comment on attachment 8640562 [details] [diff] [review] Part 0: Fuzz test for timers Carry forward r=nfroyd
Attachment #8640562 -
Attachment filename: . → timer_fuzz_test.patch
Attachment #8640562 -
Flags: review+
Assignee | ||
Comment 23•9 years ago
|
||
Factor out fixes for pre-existing problems in TestTimers.
Assignee | ||
Comment 24•9 years ago
|
||
Comment on attachment 8640566 [details] [diff] [review] Part 0.5: Fixes for pre-existing problems in TestTimers I split out the s/NS_ASSERTION/MOZ_ASSERT/ too.
Attachment #8640566 -
Attachment filename: . → fix_testtimers_problems.patch
Attachment #8640566 -
Flags: review?(nfroyd)
Assignee | ||
Updated•9 years ago
|
Attachment #8639938 -
Attachment is obsolete: true
Updated•9 years ago
|
Attachment #8640566 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 25•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4eab8f744a1
Assignee | ||
Comment 27•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d6ae2155496b https://hg.mozilla.org/integration/mozilla-inbound/rev/88854d602d98 https://hg.mozilla.org/integration/mozilla-inbound/rev/7793255f366e https://hg.mozilla.org/integration/mozilla-inbound/rev/80677b385ce0
Flags: needinfo?(docfaraday)
Comment 28•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d6ae2155496b https://hg.mozilla.org/mozilla-central/rev/7793255f366e https://hg.mozilla.org/mozilla-central/rev/88854d602d98 https://hg.mozilla.org/mozilla-central/rev/80677b385ce0
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-b2g-master:
--- → fixed
status-firefox42:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Updated•9 years ago
|
Group: core-security → core-security-release
Updated•9 years ago
|
Whiteboard: [post-critsmash-triage]
Updated•9 years ago
|
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main42-]
Updated•8 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•