Closed Bug 1033121 Opened 6 years ago Closed 6 years ago
Race in ns
Timer Event destructor
I found this problem when debugging timeouts in test_playback_rate.html.  It looks like it is possible for |event|  to be executed in the target thread before nsTimerImpl::PostTimerEvent returns. Then nsTimerEvent will be deleted on exit of nsTimerImpl::PostTimerEvent in the timer thread. In the destructor of nsTimerEvent, it will call nsTimerImpl::Release in the timer thread. At the same time, it is also possible for nsTimerImpl::InitWithCallback to be called in the target thread. Therefore we might hit bug 997844 comment 0 again.  bug 1020707 comment 42  http://hg.mozilla.org/mozilla-central/file/3a28d3bb4f55/xpcom/threads/nsTimerImpl.cpp#l763
Hi Boris, Can you check if this is a problem of nsTimer?
Dupe of bug 1024765 maybe?
Doesn't seem so. Does Dispatch not take the reference of the passed Runnable when it succeeds? If not, then yeah we could end up back in the scenario described in bug 997844. Ideally, the behavior I would want from Dispatch is an optional reference transfer (ie; if it works, the reference is transferred, otherwise it is not), but this may not be feasible. If it isn't, I think we'd have to refactor so that we did not need to unlock the monitor here: http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#253 This would prevent the target thread from getting to AddTimerInternal while all this is going on, which would also fix the race in bug 1024765 comment 33 I think.
The signature is Dispatch(nsIRunnable *event, uint32_t flags). The event target can only add reference but not take reference on success.
> It looks like it is possible for |event|  to be executed in the target thread before > nsTimerImpl::PostTimerEvent returns. Yes, this is always possible for event-dispatching situations. In particular, |event| can execute before Dispatch() has returned... You're right that this is a problem. I think the right thing to do here is to null out mTimer in nsTimerEvent::Run() before any of its return statements. Agreed?
Flags: needinfo?(bzbarsky) → needinfo?(jwwang)
That seems like it would work.
Is there anyone gonna take and work on this issue?
Please feel free to write the patch and request review?
OS: Android → All
Hardware: ARM → All
Release |mTimer| in nsTimerEvent::Run() instead of ~nsTimerEvent() to avoid race.
Assignee: nobody → jwwang
Status: NEW → ASSIGNED
Attachment #8451627 - Flags: review?(bzbarsky)
Comment on attachment 8451627 [details] [diff] [review] 1033121_fix_race_in_nsTimerEvent_destructor-v1.patch r=me. I can't think of as sane way to add a testcase for this. :(
Attachment #8451627 - Flags: review?(bzbarsky) → review+
How about adding assertions to check most nsITimer functions should be called in the target thread.
try: https://tbpl.mozilla.org/?tree=Try&rev=10b52bb4f343 most green.
> How about adding assertions That might work, yes. Want to give it a shot on try without this patch?
Does this need to go through sec-approval?
Is it exploitable and rated sec-high or sec-crit? Does it affect more than trunk? If yes to both, then it does. It seems kind of sec-high-ish to me, because it sounds like it could lead to a use after free or something.
(I was kind of hoping you would rate it ...)
So, I don't think this could lead to a UAF. The fundamental problem here is that an nsTimerImpl could fail to be scheduled if the user gets really unlucky. This could cascade and cause other problems, but likely not a UAF (more likely stuff like leaks and hangs, and asserts in debug builds probably). Coupled with the fact that this is an extremely tight race, it is probably somewhat difficult to trigger in an attack. @jwwang Were you seeing dangerous-looking crashes (ie; anything other than an assert) with this?
It seems theoretically possible to cause use-after-free or memory corruption if  is run in the timer thread after  is run in the target thread when 2 threads are calling nsTimerImpl::Release() at the same time.  http://hg.mozilla.org/mozilla-central/file/7f9db2379b3f/xpcom/threads/nsTimerImpl.cpp#l264  http://hg.mozilla.org/mozilla-central/file/7f9db2379b3f/xpcom/threads/nsTimerImpl.cpp#l230 Btw, what is UAF?
Flags: needinfo?(jwwang) → needinfo?(bzbarsky)
UAF = Use After Free
(In reply to JW Wang [:jwwang] from comment #20) > It seems theoretically possible to cause use-after-free or memory corruption > if  is run in the timer thread after  is run in the target thread when > 2 threads are calling nsTimerImpl::Release() at the same time. Ah, that's lovely. We've managed to take a threadsafe refcount, and make it non-threadsafe by overriding Release to do a whole bunch of extra stuff.
Btw, there is also a race in nsTimerImpl::SetDelayInternal() which is called by nsTimerImpl::PostTimerEvent which is called in the timer thread. We should really think about making nsTimerImpl thread-safe. Lockless code is just hard to get right.
(In reply to JW Wang [:jwwang] from comment #23) > Btw, there is also a race in nsTimerImpl::SetDelayInternal() which is called > by nsTimerImpl::PostTimerEvent which is called in the timer thread. We > should really think about making nsTimerImpl thread-safe. Lockless code is > just hard to get right. Yes, I believe we do have another race here, because we have no assurance that |Fire| will not be called on the target thread at the same time as |PostTimerEvent| is called on the timer thread, either because we're using repeating precise timers, or because something racy has happened with |mGeneration| (yes, there is a race here too, because it is written on the target thread and read on the timer thread with no sync barrier). IMO, What we really need to do is greatly expand unit-test for this, and run it through its paces with TSAN.
Shall we take this patch while re-design is in progress?
So the way forward is to request sec-approval, write comments about that, and then put "check-in-needed" again?
sec-moderates don't need approval. https://wiki.mozilla.org/Security/Bug_Approval_Process
https://hg.mozilla.org/integration/mozilla-inbound/rev/2f00e114a489 Probably worth nominating this for at least Aurora uplift. Not sure about beta/b2g30 given where we are in the release cycle. I can always disable my way to victory on any media tests being bitten by this if need-be.
Comment on attachment 8451627 [details] [diff] [review] 1033121_fix_race_in_nsTimerEvent_destructor-v1.patch Approval Request Comment [Feature/regressing bug #]:unknown [User impact if declined]:timer callback will cease to fire which will impact the overall system functionality. [Describe test coverage new/current, TBPL]: aurora - https://tbpl.mozilla.org/?tree=Try&rev=cb7a6367e147 beta - https://tbpl.mozilla.org/?tree=Try&rev=2e704d9d39d6 It already runs fine on inbound/central for a week. [Risks and why]: low for the change is simple and root cause is well known [String/UUID change made/needed]:none
Comment on attachment 8451627 [details] [diff] [review] 1033121_fix_race_in_nsTimerEvent_destructor-v1.patch Too late for 31. Not sure it is critical enough for ESR 31.
I don't think so. For now it seems only Android will hit this bug.
https://hg.mozilla.org/releases/mozilla-aurora/rev/86f331357783 Not sure if this is something that could affect real B2G devices in the wild or not.
Comment on attachment 8451627 [details] [diff] [review] 1033121_fix_race_in_nsTimerEvent_destructor-v1.patch [Approval Request Comment] Bug caused by (feature/regressing bug #): unknown User impact if declined: timer callback will cease to fire which will impact the overall system functionality. Testing completed: It has run fine on inbound/central for about 2 weeks. try: https://tbpl.mozilla.org/?tree=Try&rev=95651f209cc9 Risk to taking this patch (and alternatives if risky): low for the change is simple and root cause is well known String or UUID changes made by this patch:none
Add more Mozilla person
Let's land this in v1.4. Hi JW, Do we need to rebase the patch for v1.4. Thanks?
blocking-b2g: 1.4? → 1.4+
No, I could apply it cleanly on b2g30.
Comment on attachment 8451627 [details] [diff] [review] 1033121_fix_race_in_nsTimerEvent_destructor-v1.patch Taking it because ESR31 is also used for Android 2.2 and ARMv6 31 release...
Attachment #8451627 - Flags: approval-mozilla-esr31? → approval-mozilla-esr31+
Marking qe-verify- for verification purposes due to lack of test case or easier STR. Please feel free to provide if you'd like me to verify the fix. Thank you.
You need to log in before you can comment on or make changes to this bug.