Closed Bug 1036515 Opened 7 years ago Closed 6 years ago
Refcounting on ns
Timer Impl is not actually threadsafe
While the refcounting on nsTimerImpl is declared threadsafe, we override Release in a way that completely negates any threadsafety protection: http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#218 If two threads call Release at the same time, it is possible that one thread will destroy the nsTimerImpl before the other thread returns from Release, leading to a UAF. (Credit to jwwang for spotting this)
Nathan, who do you think might be able to look at this?
I'm not even sure the scenario described in comment 0 is plausible. My impression was that timers have threadsafe refcounting because the arming/creating thread and the timer thread have to refcount a given timer, not so any two arbitrary threads can refcount the timer. And there's a scary amount of interplay between the timer thread and the arming/creating thread to ensure that we don't UAF. I could be reading or remembering things wrongly, though. I'll look at this a little more closely tomorrow.
There is a window of opportunity for both the timer and target thread to manipulate this refcount at the same time, when the timer is firing (we unlock the monitor while getting ready to dispatch the timer).
(In reply to Byron Campen [:bwc] from comment #4) > There is a window of opportunity for both the timer and target thread to > manipulate this refcount at the same time, when the timer is firing (we > unlock the monitor while getting ready to dispatch the timer). Hm, it looks to me like the timer thread is holding a strong ref over the entire firing process: http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#260 I guess the concern is that if we fail to post the timer event, we wind up here: http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#282 and then we can race with the target thread? I'm not entirely sure that's right; please outline the sequence of steps necessary for the race if I'm off the mark.
Yes, that is the race I'm referring to. Note that the release on the target thread is not necessarily due to the target thread letting go of its nsCOMPtr; it could also be caused by an earlier TimerEvent popping (which also holds a ref). So, it doesn't really help if the target thread ensures that it has released its timers before its thread stops accepting messages.
I think bug 1036550 is the ultimate way to fix this. The short-term fix that comes to mind involves deleting the grotty code in nsTimerImpl::Release and TimerThread::Run (and maybe threading the locked monitor through the call to PostTimerEvent), but I'm unable to convince myself that actually solves anything. I think it just moves the problem someplace else...
I think that rewriting so the monitor is not unlocked is the fix here. I've even taken a crack at this, but after working around the obvious deadlocks, I ran into a deadlock here: http://dxr.mozilla.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#1299 This stuff is just plain gross.
Is there somebody who can work on this in the near term, either the short term or long term fix? If not I can try to drum up some resources from jst or something.
Can Byron post his patch here and bent weight in on rewriting that WorkerPrivate.cpp code?
Let me try to track that patch down.
Byron, any luck with your patch?
I haven't messed with it since comment 8
For avoidance of doubt, is the patch in bug 1059572 the patch before or after fixing the "obvious deadlocks" mentioned in comment 8? (Testing it seems to reveal several hangs, mostly in what looks like IndexedDB-related things.) If it's not, could you provide an updated patch?
Yes, the patch on bug 1059572 has all of the deadlock fixes I have written.
I haven't redone all the wandering through the timer code I did for comment 3 and comment 5. But based on the potential race identified in comment 5 and Byron's positive response in comment 6, I believe this patch closes that particular hole. We only need to have the monitor unlocked during the timer posting process; we don't need to have it unlocked for twiddling with the refcount in the case of an error. I'm not convinced this solves all the threadsafety problems...the nsTimerImpl::Release implementation is still pretty scary...
Attachment #8561398 - Flags: review?(benjamin)
Comment on attachment 8561398 [details] [diff] [review] narrow the scope of unlocking mMonitor in nsTimerImpl::PostTimerEvents [Security approval request comment] How easily could an exploit be constructed based on the patch? Not easily. The UAF has survived many rounds of ASAN/Valgrind testing and never manifested there; I suspect constructing an actual exploit would be difficult. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No. Which older supported branches are affected by this flaw? All of them. Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? I have not made branch-specific patches, but the creation of them should not be difficult (this is stable code), and their associated risk is no greater than this patch. How likely is this patch to cause regressions; how much testing does it need? The patch is handling a specific failure mode that is rarely seen. Our normal testing regimen should be sufficient.
Attachment #8561398 - Flags: sec-approval?
Assignee: nobody → nfroyd
Let's get this into trunk but also get patches for Aurora, Beta, and ESR31 nominated and in by the end of the week if possible.
Attachment #8561398 - Flags: sec-approval? → sec-approval+
It turns out the last Beta build goes to build tomorrow so we need this on the Beta branch by then if it is going to make Firefox 36 (and probably ESR31) then too.
(In reply to Al Billings [:abillings] from comment #20) > It turns out the last Beta build goes to build tomorrow so we need this on > the Beta branch by then if it is going to make Firefox 36 (and probably > ESR31) then too. No special futzing is needed to apply this patch to beta. But given the tight schedule, how does landing this work? Landing it on inbound, and then on aurora and beta straightaway, without waiting for a merge?
Flags: needinfo?(nfroyd) → needinfo?(abillings)
If you land on inbound now, you shouldn't have any issue getting it to aurora/beta on time by following the usual practices. Just please get it nominated ASAP.
Comment on attachment 8561398 [details] [diff] [review] narrow the scope of unlocking mMonitor in nsTimerImpl::PostTimerEvents Approval Request Comment [Feature/regressing bug #]: Something lost in the mists of time [User impact if declined]: Possible exploitable security hole [Describe test coverage new/current, TreeHerder]: [Risks and why]: Low. It does change locking behavior, but not in unusual ways. [String/UUID change made/needed]: None.
Comment on attachment 8561398 [details] [diff] [review] narrow the scope of unlocking mMonitor in nsTimerImpl::PostTimerEvents [Approval Request Comment] User impact if declined: Possibly exploitable security hole. Fix Landed on Version: Gecko 38. Risk to taking this patch (and alternatives if risky): Low. It does change locking behavior, but not in unusual ways. String or UUID changes made by this patch: None. See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8561398 - Flags: approval-mozilla-esr31+
Attachment #8561398 - Flags: approval-mozilla-beta?
Attachment #8561398 - Flags: approval-mozilla-beta+
Attachment #8561398 - Flags: approval-mozilla-aurora?
Attachment #8561398 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/bcab77a47096 https://hg.mozilla.org/releases/mozilla-beta/rev/78815ed2e606 https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/bcab77a47096 https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/e395bfad7bc9 https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/4464db272c61 https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/5b4b101f29cc https://hg.mozilla.org/releases/mozilla-esr31/rev/17de3730a0e8
Nathan, Al, this introduced bug 1133381 and it is blocking the 36 release. I would like to backout this patch (or get a fix for bug 1133381). Can you help?
(In reply to Sylvestre Ledru [:sylvestre] from comment #28) > Nathan, Al, this introduced bug 1133381 and it is blocking the 36 release. > I would like to backout this patch (or get a fix for bug 1133381). Can you > help? I personally don't understand how this could have caused bug 1133381, but I also have to admit that given the regression range there, this bug is the likeliest candidate. However, I don't know what the protocol for dealing with sec-sensitive bugs that cause crashes is. Do we just back the change out on beta, or all affected branches? (I'd really like some input from the security team, unless the release process has dealt with things like this before...)
Flags: needinfo?(nfroyd) → needinfo?(sledru)
If this bug is the cause, I'd just back it out on the Beta branch so it doesn't affect the release. We have more time on other branches.
Looks like Al answered to Nathan's question. Ryan, could you backout the patch ? (beta only for now). Thanks CC Lawrence to make sure he knows for 37.
Backed out from all Gecko <37 per IRC conversation with Sylvestre. https://hg.mozilla.org/releases/mozilla-beta/rev/ad8bd14634dd https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/da49bdf1cf41 https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/8fbfc80ec260 https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/4d96752782bf https://hg.mozilla.org/releases/mozilla-esr31/rev/b5228115a202
I have tracked bug 1133381. We'll need to decide whether we can fix that bug or need to back this out of 37 as well.
Whiteboard: [adv-main37+][adv-esr31.6+] → [adv-main37+][adv-esr31.6+][b2g-adv-main2.2+]
You need to log in before you can comment on or make changes to this bug.