Closed Bug 1036515 Opened 10 years ago Closed 9 years ago

Refcounting on nsTimerImpl is not actually threadsafe

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox35 --- wontfix
firefox36 + wontfix
firefox37 + fixed
firefox38 + fixed
firefox-esr31 37+ fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed
b2g-v2.0M --- fixed
b2g-v2.1 --- fixed
b2g-v2.1S --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: bwc, Assigned: froydnj)

References

Details

(Keywords: csectype-race, sec-high, Whiteboard: [adv-main37+][adv-esr31.6+][b2g-adv-main2.2+])

Attachments

(1 file)

While the refcounting on nsTimerImpl is declared threadsafe, we override Release in a way that completely negates any threadsafety protection:

http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsTimerImpl.cpp#218

If two threads call Release at the same time, it is possible that one thread will destroy the nsTimerImpl before the other thread returns from Release, leading to a UAF. (Credit to jwwang for spotting this)
Blocks: 1036531
Nathan, who do you think might be able to look at this?
Flags: needinfo?(nfroyd)
I'm not even sure the scenario described in comment 0 is plausible.  My impression was that timers have threadsafe refcounting because the arming/creating thread and the timer thread have to refcount a given timer, not so any two arbitrary threads can refcount the timer.  And there's a scary amount of interplay between the timer thread and the arming/creating thread to ensure that we don't UAF.

I could be reading or remembering things wrongly, though.  I'll look at this a little more closely tomorrow.
There is a window of opportunity for both the timer and target thread to manipulate this refcount at the same time, when the timer is firing (we unlock the monitor while getting ready to dispatch the timer).
(In reply to Byron Campen [:bwc] from comment #4)
> There is a window of opportunity for both the timer and target thread to
> manipulate this refcount at the same time, when the timer is firing (we
> unlock the monitor while getting ready to dispatch the timer).

Hm, it looks to me like the timer thread is holding a strong ref over the entire firing process:

http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#260

I guess the concern is that if we fail to post the timer event, we wind up here:

http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#282

and then we can race with the target thread?  I'm not entirely sure that's right; please outline the sequence of steps necessary for the race if I'm off the mark.
Yes, that is the race I'm referring to. Note that the release on the target thread is not necessarily due to the target thread letting go of its nsCOMPtr; it could also be caused by an earlier TimerEvent popping (which also holds a ref). So, it doesn't really help if the target thread ensures that it has released its timers before its thread stops accepting messages.
I think bug 1036550 is the ultimate way to fix this.

The short-term fix that comes to mind involves deleting the grotty code in nsTimerImpl::Release and TimerThread::Run (and maybe threading the locked monitor through the call to PostTimerEvent), but I'm unable to convince myself that actually solves anything.  I think it just moves the problem someplace else...
Flags: needinfo?(nfroyd)
I think that rewriting so the monitor is not unlocked is the fix here. I've even taken a crack at this, but after working around the obvious deadlocks, I ran into a deadlock here: http://dxr.mozilla.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#1299

This stuff is just plain gross.
Is there somebody who can work on this in the near term, either the short term or long term fix?  If not I can try to drum up some resources from jst or something.
Can Byron post his patch here and bent weight in on rewriting that WorkerPrivate.cpp code?
Let me try to track that patch down.
See Also: → 1059572
Byron, any luck with your patch?
Flags: needinfo?(docfaraday)
I haven't messed with it since comment 8
Flags: needinfo?(docfaraday)
For avoidance of doubt, is the patch in bug 1059572 the patch before or after fixing the "obvious deadlocks" mentioned in comment 8?  (Testing it seems to reveal several hangs, mostly in what looks like IndexedDB-related things.)  If it's not, could you provide an updated patch?
Flags: needinfo?(docfaraday)
Yes, the patch on bug 1059572 has all of the deadlock fixes I have written.
Flags: needinfo?(docfaraday)
I haven't redone all the wandering through the timer code I did for comment 3
and comment 5.  But based on the potential race identified in comment 5 and
Byron's positive response in comment 6, I believe this patch closes that
particular hole.  We only need to have the monitor unlocked during the timer
posting process; we don't need to have it unlocked for twiddling with the
refcount in the case of an error.

I'm not convinced this solves all the threadsafety problems...the
nsTimerImpl::Release implementation is still pretty scary...
Attachment #8561398 - Flags: review?(benjamin)
Attachment #8561398 - Flags: review?(benjamin) → review+
Comment on attachment 8561398 [details] [diff] [review]
narrow the scope of unlocking mMonitor in nsTimerImpl::PostTimerEvents

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

Not easily.  The UAF has survived many rounds of ASAN/Valgrind testing and never manifested there; I suspect constructing an actual exploit would be difficult.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No.

Which older supported branches are affected by this flaw?

All of them.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

I have not made branch-specific patches, but the creation of them should not be difficult (this is stable code), and their associated risk is no greater than this patch.

How likely is this patch to cause regressions; how much testing does it need?

The patch is handling a specific failure mode that is rarely seen.  Our normal testing regimen should be sufficient.
Attachment #8561398 - Flags: sec-approval?
We haven't landed the patch yet. :p
Let's get this into trunk but also get patches for Aurora, Beta, and ESR31 nominated and in by the end of the week if possible.
Attachment #8561398 - Flags: sec-approval? → sec-approval+
It turns out the last Beta build goes to build tomorrow so we need this on the Beta branch by then if it is going to make Firefox 36 (and probably ESR31) then too.
Flags: needinfo?(nfroyd)
(In reply to Al Billings [:abillings] from comment #20)
> It turns out the last Beta build goes to build tomorrow so we need this on
> the Beta branch by then if it is going to make Firefox 36 (and probably
> ESR31) then too.

No special futzing is needed to apply this patch to beta.  But given the tight schedule, how does landing this work?  Landing it on inbound, and then on aurora and beta straightaway, without waiting for a merge?
Flags: needinfo?(nfroyd) → needinfo?(abillings)
If you land on inbound now, you shouldn't have any issue getting it to aurora/beta on time by following the usual practices. Just please get it nominated ASAP.
Comment on attachment 8561398 [details] [diff] [review]
narrow the scope of unlocking mMonitor in nsTimerImpl::PostTimerEvents

Approval Request Comment
[Feature/regressing bug #]: Something lost in the mists of time
[User impact if declined]: Possible exploitable security hole
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: Low.  It does change locking behavior, but not in unusual ways.
[String/UUID change made/needed]: None.
Attachment #8561398 - Flags: approval-mozilla-beta?
Attachment #8561398 - Flags: approval-mozilla-aurora?
Comment on attachment 8561398 [details] [diff] [review]
narrow the scope of unlocking mMonitor in nsTimerImpl::PostTimerEvents

[Approval Request Comment]
User impact if declined: Possibly exploitable security hole.
Fix Landed on Version: Gecko 38.
Risk to taking this patch (and alternatives if risky): Low.  It does change locking behavior, but not in unusual ways.
String or UUID changes made by this patch: None.

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8561398 - Flags: approval-mozilla-esr31?
https://hg.mozilla.org/mozilla-central/rev/faf3c7ba3985
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Attachment #8561398 - Flags: approval-mozilla-esr31?
Attachment #8561398 - Flags: approval-mozilla-esr31+
Attachment #8561398 - Flags: approval-mozilla-beta?
Attachment #8561398 - Flags: approval-mozilla-beta+
Attachment #8561398 - Flags: approval-mozilla-aurora?
Attachment #8561398 - Flags: approval-mozilla-aurora+
Nathan, Al, this introduced bug 1133381 and it is blocking the 36 release.
I would like to backout this patch (or get a fix for bug 1133381). Can you help?
Flags: needinfo?(nfroyd)
Flags: needinfo?(abillings)
(In reply to Sylvestre Ledru [:sylvestre] from comment #28)
> Nathan, Al, this introduced bug 1133381 and it is blocking the 36 release.
> I would like to backout this patch (or get a fix for bug 1133381). Can you
> help?

I personally don't understand how this could have caused bug 1133381, but I also have to admit that given the regression range there, this bug is the likeliest candidate.

However, I don't know what the protocol for dealing with sec-sensitive bugs that cause crashes is.  Do we just back the change out on beta, or all affected branches?  (I'd really like some input from the security team, unless the release process has dealt with things like this before...)
Flags: needinfo?(nfroyd) → needinfo?(sledru)
If this bug is the cause, I'd just back it out on the Beta branch so it doesn't affect the release. We have more time on other branches.
Flags: needinfo?(abillings)
Looks like Al answered to Nathan's question.
Ryan, could you backout the patch ? (beta only for now). Thanks
CC Lawrence to make sure he knows for 37.
Flags: needinfo?(sledru)
Flags: needinfo?(ryanvm)
Flags: needinfo?(lmandel)
Depends on: 1133381
I have tracked bug 1133381. We'll need to decide whether we can fix that bug or need to back this out of 37 as well.
Flags: needinfo?(lmandel)
No longer depends on: 1133381
Whiteboard: [adv-main37+][adv-esr31.6+]
Whiteboard: [adv-main37+][adv-esr31.6+] → [adv-main37+][adv-esr31.6+][b2g-adv-main2.2+]
Group: core-security → core-security-release
Group: core-security-release
Depends on: 1334701
You need to log in before you can comment on or make changes to this bug.