Closed
Bug 1024765
Opened 9 years ago
Closed 9 years ago
Refcounting of nsTimerImpl looks wrong in in TimerThread when PostTimerEvent fails
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
People
(Reporter: bwc, Assigned: bwc)
References
Details
(Keywords: csectype-race, csectype-uaf, sec-high, Whiteboard: [adv-main31+][adv-esr24.7+][qa-])
Attachments
(4 files, 6 obsolete files)
8.25 KB,
patch
|
bwc
:
review+
Sylvestre
:
approval-mozilla-aurora+
abillings
:
approval-mozilla-beta+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
2.82 KB,
patch
|
bzbarsky
:
review+
Sylvestre
:
approval-mozilla-aurora+
abillings
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
2.83 KB,
patch
|
Sylvestre
:
approval-mozilla-esr24+
|
Details | Diff | Splinter Review |
8.38 KB,
patch
|
Sylvestre
:
approval-mozilla-esr24+
|
Details | Diff | Splinter Review |
Noticed this while working on bug 941751. In TimerThread, we call nsTimerImpl::PostTimerEvent (just after doing an addref/release while removing the timer from mTimers), which does the following: http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#722 The constructor of nsTimerEvent does not addref (because we are essentially transferring a reference from mTimers to the nsTimerEvent): http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#116 It does set itself up to perform a release once it is destroyed, which makes sense: http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#157 If we fail for any reason (other than OOM), the nsTimerEvent is destroyed prior to the return, and the refcount is decremented. Then, we decrement the refcount _again_ here, leading to a very misleading logging statement for yours truly, and probably a dangling nsITimer pointer somewhere: http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#266
Assignee | ||
Comment 1•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → docfaraday
Status: NEW → ASSIGNED
Assignee | ||
Updated•9 years ago
|
Assignee: docfaraday → nobody
Assignee | ||
Comment 2•9 years ago
|
||
Debug builds are probably safe here most of the time (since we'll usually hit the assertion), but if that assertion doesn't fire I'm not sure what will happen.
Assignee | ||
Updated•9 years ago
|
Status: ASSIGNED → NEW
![]() |
||
Comment 3•9 years ago
|
||
Byron, did you plan to fix this?
Assignee | ||
Comment 4•9 years ago
|
||
I can take a crack at it I suppose.
Assignee | ||
Comment 5•9 years ago
|
||
Ok, I see a few ways that might make sense to do this: 1) Have PostTimerEvent do a brute-force forget().take() on the timer to relinquish the reference when an error is about to be returned. 2) Refactor to have PostTimerEvent() take an already_AddRefed<nsTimerImpl>& inout param that the function may or may not take (if not, the caller will still hold the reference). 3) Make nsresult an out param and return an already_AddRefed containing the timer in the event of failure. I kinda like option 2.
Assignee | ||
Comment 6•9 years ago
|
||
Option 1. Seems like the least amount of code and the smallest change, but it could be argued that some refactoring should be done here.
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → docfaraday
Status: NEW → ASSIGNED
Assignee | ||
Comment 7•9 years ago
|
||
Comment on attachment 8440133 [details] [diff] [review] Part 2: Do not take a reference in PostTimerEvent in the case of failure. Review of attachment 8440133 [details] [diff] [review]: ----------------------------------------------------------------- This seems like the most minimal fix, but would you prefer that I rewrite the code in TimerThread to stop using direct refcount manipulation?
Attachment #8440133 -
Flags: feedback?(bzbarsky)
Updated•9 years ago
|
status-firefox33:
--- → affected
![]() |
||
Comment 8•9 years ago
|
||
Comment on attachment 8440133 [details] [diff] [review] Part 2: Do not take a reference in PostTimerEvent in the case of failure. This looks fine, but I'd just make ForgetTimer() return void and move the take() into it. And maybe in the comments indicate that it's the destructor of nsTimerEvent that we don't want to decrement the refcount?
Attachment #8440133 -
Flags: feedback?(bzbarsky) → feedback+
Assignee | ||
Comment 9•9 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #8) > Comment on attachment 8440133 [details] [diff] [review] > Part 2: Do not take a reference in PostTimerEvent in the case of failure. > > This looks fine, but I'd just make ForgetTimer() return void and move the > take() into it. And maybe in the comments indicate that it's the destructor > of nsTimerEvent that we don't want to decrement the refcount? The reason I had it return already_AddRefed is to make it easier to implement the more explicit options 2 or 3, should we decide to go that route (now, or someday later). Also, the scary (void)event->ForgetTimer().take() makes it more obvious that some funky reference counting stuff is going on, and that the code in |PostTimerEvent| needs to be treated carefully. Just calling ForgetTimer() might not jump out in the same way. I should point out that I am quite willing to implement option 2 or 3, but I'm not all that familiar with the conventions that are used when implementing an optional reference transfer, so I don't know which fits the rest of the code better (or if there's another variant I haven't thought of). I worry that option 1 leaves |PostTimerEvent| fragile, and also paints a pretty obvious bullseye on the problem, whereas options 2 and 3 leave us in a better place going forward, and just look like code cleanup.
![]() |
||
Comment 10•9 years ago
|
||
Ah, ok. In that case, I think the best solution here is actually a combination of 2 and 3: an already_AddRefed in param and an already_AddRefed return value, with the return value null on success, non-null on failure. The caller ignores the actual return value anyway, so there's no point propagating out a particular nsresult.
Assignee | ||
Comment 11•9 years ago
|
||
Alright, I'll give that a go. Thanks!
Assignee | ||
Comment 12•9 years ago
|
||
An alternative approach.
Assignee | ||
Comment 13•9 years ago
|
||
Restore some ordering.
Assignee | ||
Updated•9 years ago
|
Attachment #8440784 -
Attachment is obsolete: true
Assignee | ||
Comment 14•9 years ago
|
||
Remove an outdated comment.
Assignee | ||
Updated•9 years ago
|
Attachment #8440819 -
Attachment is obsolete: true
Assignee | ||
Comment 15•9 years ago
|
||
Comment on attachment 8440835 [details] [diff] [review] Part 2: Make refcounting logic around PostTimerEvent more explicit. Review of attachment 8440835 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/threads/TimerThread.cpp @@ +247,2 @@ > RemoveTimerInternal(timer); > + timer = nullptr; Setting this here to avoid use in the code below; should be using timerRef instead. @@ +267,5 @@ > + if (timerRef) { > + // We got our reference back due to an error. > + // Unhook the nsRefPtr, and release manually so we can get the > + // refcount. > + nsrefcnt rc = timerRef.forget().take()->Release(); Is there a simpler way to do this? ::: xpcom/threads/nsTimerImpl.cpp @@ +732,2 @@ > > + nsRefPtr<nsTimerEvent> event = new nsTimerEvent; If we wanted to pass the timer as an already_AddRefed here, we'd need to keep an extra reference around so we can do the stuff below. I can do that if it is preferable, just let me know.
Attachment #8440835 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 16•9 years ago
|
||
CC'ing maire and jesup because this bug is related to bug 941751 (one of our oranges).
Assignee | ||
Comment 17•9 years ago
|
||
Also CC'ing Nils and Shell by request.
![]() |
||
Comment 18•9 years ago
|
||
Comment on attachment 8440835 [details] [diff] [review] Part 2: Make refcounting logic around PostTimerEvent more explicit. > Is there a simpler way to do this? No. :( >+nsTimerImpl::PostTimerEvent(already_AddRefed<nsTimerImpl> timerRef) aTimerRef? >+ nsRefPtr<nsTimerEvent> event = new nsTimerEvent; This can't return null. Does that make it simpler to just go ahead and pass aTimerRef to the constructor here? We could then use event->Timer() (which would return nsTimerImpl*) for whatever we need the timer for, and event->ForgetTimer() for error returns. Then we don't need a SetTimer(). r=me
Attachment #8440835 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 19•9 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #18) > Comment on attachment 8440835 [details] [diff] [review] > Part 2: Make refcounting logic around PostTimerEvent more explicit. > > >+ nsRefPtr<nsTimerEvent> event = new nsTimerEvent; > > This can't return null. Does that make it simpler to just go ahead and pass > aTimerRef to the constructor here? We could then use event->Timer() (which > would return nsTimerImpl*) for whatever we need the timer for, and > event->ForgetTimer() for error returns. Then we don't need a SetTimer(). > Actually, we override operator new to alloc from a memory pool for this, so it might not be infallible: http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp?from=nsTimerEvent#138 That in itself probably merits a comment.
![]() |
||
Comment 21•9 years ago
|
||
Oh, ouch. Yes, that can definitely use a comment!
Assignee | ||
Comment 22•9 years ago
|
||
Update commit message.
Assignee | ||
Updated•9 years ago
|
Attachment #8439584 -
Attachment is obsolete: true
Assignee | ||
Comment 23•9 years ago
|
||
Incorporate feedback, and suppress an unused variable warning.
Assignee | ||
Updated•9 years ago
|
Attachment #8440835 -
Attachment is obsolete: true
Assignee | ||
Comment 24•9 years ago
|
||
Comment on attachment 8440939 [details] [diff] [review] Part 1: Add test-case for timers firing when the target thread is not running. Review of attachment 8440939 [details] [diff] [review]: ----------------------------------------------------------------- Might be a good idea to land this too?
Attachment #8440939 -
Flags: feedback?(bzbarsky)
Assignee | ||
Updated•9 years ago
|
Attachment #8440133 -
Attachment is obsolete: true
![]() |
||
Comment 25•9 years ago
|
||
Comment on attachment 8440939 [details] [diff] [review] Part 1: Add test-case for timers firing when the target thread is not running. Yes, please!
Attachment #8440939 -
Flags: feedback?(bzbarsky) → feedback+
Assignee | ||
Comment 26•9 years ago
|
||
Comment on attachment 8440941 [details] [diff] [review] Part 2: Make refcounting logic around PostTimerEvent more explicit. Review of attachment 8440941 [details] [diff] [review]: ----------------------------------------------------------------- Carry forward r=bz
Attachment #8440941 -
Flags: review+
Assignee | ||
Comment 27•9 years ago
|
||
Seems to be working, will give the test-case another scrub tomorrow morning since it was just a hacked-up copy/paste job. https://tbpl.mozilla.org/?tree=Try&rev=677d6e237448
Comment 28•9 years ago
|
||
It should be an assertable problem for a timer to stay alive when the target thread is not running. I think we should assert that there, but also consider whether we can assert it at nsThread::Shutdown also.
Assignee | ||
Comment 29•9 years ago
|
||
I don't know about that. Cancel() does not prevent TimerThread from trying to dispatch; it only prevents the timer from popping once it hits the target thread. So, if Cancel() races the TimerThread, it is very easy for a dispatch to the target thread to fail.
Comment 30•9 years ago
|
||
Why can't it prevent the dispatch? In particular I expect the following sequence should not attempt to dispatch to a dead thread: mTimer->Cancel(); mThread->Shutdown(); This pattern should really assert if we can make it: mThread->Shutdown(); // ASSERT: Active timer pointing at dying thread mTimer->Cancel();
Comment 31•9 years ago
|
||
Sounds bad, so I'm going to mark this sec-high. Please get sec-approval before landing.
Keywords: csectype-uaf,
sec-high
Assignee | ||
Comment 32•9 years ago
|
||
Looking at the code again, it does look like Cancel() should be removing the timer from TimerThread's set of timers, so I'll need to look into why we don't learn that our thread is shutting down until after dispatches are being denied.
Assignee | ||
Comment 33•9 years ago
|
||
Ok, there is still a race here. Let's say the TimerThread is cruising along, and decides to fire a timer. We end up going through the motions, and end up here: http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#251 We have removed the timer from mTimers at this point, and we unlock. Then, on the target thread, we call Cancel(). This flips the mCancelled bit, and tries to remove from mTimers again (with no effect). Since the monitor is unlocked (see above), we return straightaway: http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#434 Then, the target thread shuts down, and TimerThread continues execution where it left off, and attempts to post the timer.
Updated•9 years ago
|
Keywords: csectype-race
Assignee | ||
Comment 34•9 years ago
|
||
I should clarify; the race above is pretty much benign once the above patches are in place, since we recover from the failed dispatch. But if we add an assert, we're going to hit it occasionally even when the user of the timer is doing the right thing.
Assignee | ||
Comment 35•9 years ago
|
||
Pare down the test case a bit.
Assignee | ||
Updated•9 years ago
|
Attachment #8441653 -
Flags: review?(bzbarsky)
Assignee | ||
Updated•9 years ago
|
Attachment #8440939 -
Attachment is obsolete: true
![]() |
||
Comment 36•9 years ago
|
||
Comment on attachment 8441653 [details] [diff] [review] Part 1: Add test-case for timers firing when the target thread is not running. r=me. Does this test actually fail without the part 2 patch?
Attachment #8441653 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 37•9 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #36) > Comment on attachment 8441653 [details] [diff] [review] > Part 1: Add test-case for timers firing when the target thread is not > running. > > r=me. Does this test actually fail without the part 2 patch? Yes, we hit the rc != 0 assertion. I could do a non-debug ASAN build and verify that we double free, but that would take quite a bit of time. Needinfo self to check back on try push. https://tbpl.mozilla.org/?tree=Try&rev=638b6db666c1
Flags: needinfo?(docfaraday)
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(docfaraday) → sec-review?
Assignee | ||
Comment 38•9 years ago
|
||
Oh now this is interesting; the Marionette tests are very sad on windows with these patches.
Assignee | ||
Comment 39•9 years ago
|
||
Trying again with a more recent pull: https://tbpl.mozilla.org/?tree=Try&rev=5d297691df6a
Assignee | ||
Comment 40•9 years ago
|
||
Comment on attachment 8440941 [details] [diff] [review] Part 2: Make refcounting logic around PostTimerEvent more explicit. [Security approval request comment] How easily could an exploit be constructed based on the patch? For code that calls Cancel() on its timers before thread shutdown, it is an extremely tight race to hit this bug. If there was something content could do to cause very short lifetime timers to be scheduled, and then cause thread shutdown, it might be able to trigger the race in comment 33 with some frequency, but it is hard to say (this is my current hypothesis as to the cause of bug 941751). For code that does not call Cancel() when it is supposed to, it becomes much easier, but would require content to cause the target thread in question to shut down in some way (possibly by inducing the user to close the browser, for example). However, it seems doubtful that we have code that does this, or else we'd see this assertion being hit very frequently. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? The test-case might provide a hint, but the bugfix looks like code cleanup. Which older supported branches are affected by this flaw? Probably all; the patch on bug 744836 introduced this. Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? I do not have backports, but given how little the code has changed, I do not think they will be particularly hard to construct. How likely is this patch to cause regressions; how much testing does it need? Well, this patch is a change in refcounting logic, on a widely-used feature. I think I've gotten it right, but there is some risk. At least we're making use of the safer nsRefPtr and already_AddRefed (the only departure from this is where we check the refcount when PostTimerEvent fails, but we know for sure the existing code would do something nasty here).
Attachment #8440941 -
Flags: sec-approval?
Assignee | ||
Updated•9 years ago
|
Flags: sec-review?
Comment 41•9 years ago
|
||
Comment on attachment 8440941 [details] [diff] [review] Part 2: Make refcounting logic around PostTimerEvent more explicit. sec-approval+ for trunk. We should get Aurora, Beta, and ESR24 patches for this.
Attachment #8440941 -
Flags: sec-approval? → sec-approval+
Updated•9 years ago
|
status-firefox30:
--- → wontfix
status-firefox31:
--- → affected
status-firefox32:
--- → affected
status-firefox-esr24:
--- → affected
tracking-firefox31:
--- → +
tracking-firefox32:
--- → +
tracking-firefox33:
--- → +
tracking-firefox-esr24:
--- → 31+
Assignee | ||
Comment 42•9 years ago
|
||
I'll go ahead and cook those up.
Assignee | ||
Comment 43•9 years ago
|
||
Patches seem to apply cleanly to aurora. Verifying that they build and then running some tests.
Assignee | ||
Comment 44•9 years ago
|
||
Patches apply cleanly to beta. Checking build-tests there now.
Assignee | ||
Comment 45•9 years ago
|
||
Comment on attachment 8441653 [details] [diff] [review] Part 1: Add test-case for timers firing when the target thread is not running. aurora and beta seem to do what I'd expect with the patches as they are. Not sure if we want to land the test case too, but here's the request. [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 744836 User impact if declined: No unit test in CI for the bugfix. Testing completed (on m-c, etc.): This test is being run in CI. Risk to taking this patch (and alternatives if risky): None apart from test failures. String or IDL/UUID changes made by this patch: None.
Attachment #8441653 -
Flags: approval-mozilla-beta?
Attachment #8441653 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 46•9 years ago
|
||
Comment on attachment 8440941 [details] [diff] [review] Part 2: Make refcounting logic around PostTimerEvent more explicit. aurora and beta seem to work fine with the patches as-is [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 744836 User impact if declined: Rare UAF when shutting down a thread that uses timers. Testing completed (on m-c, etc.): CI testing on try, plus a new test-case in TestTimers. Risk to taking this patch (and alternatives if risky): There is some risk here, since we're messing with refcounts on a widely-used feature. I suppose an alternative would be to abort() if the dispatch fails, so we have a controlled crash. String or IDL/UUID changes made by this patch: None.
Attachment #8440941 -
Flags: approval-mozilla-beta?
Attachment #8440941 -
Flags: approval-mozilla-aurora?
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 47•9 years ago
|
||
test case for esr24
Assignee | ||
Comment 48•9 years ago
|
||
Comment on attachment 8443829 [details] [diff] [review] Part 1: (patch for esr24) Add test-case for timers firing when the target thread is not running. [Approval Request Comment] See separate comment for aurora/beta approval.
Attachment #8443829 -
Flags: approval-mozilla-esr24?
Assignee | ||
Comment 49•9 years ago
|
||
bugfix for esr24
Assignee | ||
Comment 50•9 years ago
|
||
Comment on attachment 8443830 [details] [diff] [review] Part 2: (patch for esr24) Make refcounting logic around PostTimerEvent more explicit. [Approval Request Comment] See comment for aurora/beta approval.
Attachment #8443830 -
Flags: approval-mozilla-esr24?
Updated•9 years ago
|
Attachment #8443830 -
Flags: approval-mozilla-esr24? → approval-mozilla-esr24+
Updated•9 years ago
|
Attachment #8443829 -
Flags: approval-mozilla-esr24? → approval-mozilla-esr24+
Comment 51•9 years ago
|
||
Comment on attachment 8441653 [details] [diff] [review] Part 1: Add test-case for timers firing when the target thread is not running. Taking the 2 patches for aurora. If no issues are found, I will accept them for beta 5 (end of the week)
Attachment #8441653 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•9 years ago
|
Attachment #8440941 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 52•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/2210c3fbcf63 https://hg.mozilla.org/integration/mozilla-inbound/rev/0c09da2ef546
Flags: in-testsuite+
Keywords: checkin-needed
Updated•9 years ago
|
Attachment #8440941 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Updated•9 years ago
|
Attachment #8441653 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/mozilla-central/rev/2210c3fbcf63 https://hg.mozilla.org/mozilla-central/rev/0c09da2ef546
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Comment 54•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/b1873b86994c https://hg.mozilla.org/releases/mozilla-aurora/rev/358992828820 https://hg.mozilla.org/releases/mozilla-beta/rev/7ba3110d484c https://hg.mozilla.org/releases/mozilla-beta/rev/e3522dec2b84 https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/14bfee369c2c https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/9117bdce9e48 https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/541fb0359578 https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/9e79ec1dd96f https://hg.mozilla.org/releases/mozilla-esr24/rev/a55d20b0230a https://hg.mozilla.org/releases/mozilla-esr24/rev/1ba65e0ee8b1
status-b2g-v1.3:
--- → fixed
status-b2g-v1.3T:
--- → affected
status-b2g-v1.4:
--- → fixed
status-b2g-v2.0:
--- → fixed
status-b2g-v2.1:
--- → fixed
Updated•9 years ago
|
Updated•9 years ago
|
Whiteboard: [adv-main31+][adv-esr24.7+]
Comment 56•9 years ago
|
||
Marking [qa-] for verification purposes, based on comments and lack of testable media / STR. Please feel free to provide if you'd like help in testing, thank you.
Whiteboard: [adv-main31+][adv-esr24.7+] → [adv-main31+][adv-esr24.7+][qa-]
Updated•9 years ago
|
Alias: CVE-2014-1553
Updated•9 years ago
|
Alias: CVE-2014-1553
Updated•8 years ago
|
Group: core-security → core-security-release
Updated•7 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•