Closed Bug 1486400 Opened 3 years ago Closed 3 years ago

Make use-after-free due to thread races better detectable in ASan Nightly

Categories

(Firefox :: Security, defect)

x86_64
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 64
Tracking Status
firefox-esr60 --- unaffected
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- fixed

People

(Reporter: decoder, Assigned: decoder)

References

(Blocks 1 open bug)

Details

(Keywords: sec-other, sec-want, Whiteboard: [post-critsmash-triage][adv-main64-])

Attachments

(2 files, 1 obsolete file)

After the first few use-after-free bugs reported by the ASan Nightly project have been fixed, I reached out to developers to figure out what we can do to make thse bugs more likely to trigger. All of the bugs reported (with the exception of one) only had a single report associated, indicating that there might be many more similar bugs that we cannot detect because the userbase is too small. Before asking developers, I thought GC/CC was the key and filed bug 1484655, but now it seems that thread races or other issues around thread/task scheduling are the primary source of the uafs we have seen so far.

Some more detailed comments:

Bug 1478849:

> For this UAF, an async runnable/task needs to run before something else that's
> done synchronously, which more or less means that the first thread needs to be
> preempted.  This is the kind of thing that rr chaos mode approaches more
> generally, but for this specific case we might get some mileage out of doing a
> sched_yield() and/or a short sleep when posting a runnable/task to another
> thread.

Bug 1478575:


> This is influenced by thread timing, ordering, load, etc.  some form of thread
> 'chaos' mode may expose more (see rr's chaos mode, etc).  Especially timing
> around IPC; typically on multicore an IPC may not force an immediate yield by
> the caller; conversely a receiver may run immediately and finish before the
> caller has done "a lot".  Ditto Dispatch() (perhaps even more in Dispatch).
> Having the sender or receiver occasionally (randomly) yield for a short time
> might increase the odds of an IPC/Dispatch-related UAF.

In ASan Nightly, we have the option to implement the things suggested in these comments. It is perfectly acceptable to add any kinds of delays, even if it slows down the browser a bit. For the purpose of ASan Nightly, the crashes also don't have to be reproducible. We have a very high fix rate for use-after-free, typically the trace is enough to point out the problem.

Since I'm not very familiar with our IPC or our threading code, I won't be able to implement any of this on my own, so I'm asking any of you to help with suggestions or by implementing parts of these suggestions to make this happen.
Short, even random-ish delays on the dispatching thread for tasks/runnables are super-easy to implement.  It's less easy for me to imagine how to implement delays on the dispatched-to thread, but I probably just lack imagination here.
If you're willing to block the receiving thread, it isn't hard; if you want to not block it or the sender, you could send it to a "delay" thread which holds it for N ms and then sends it to the target thread.  the runnable to the delay thread would just take a runnable, a target thread, and a delay value, and on the delay thread would set up a timer to fire and Dispatch() it to the real target.
I see two action points here:


1) Enable MOZ_CHAOSMODE for ASan Nightly builds. I can make a patch for that, but should we just enable it unconditionally? I tested it locally and did not see any noticeable slowdown, so performance-wise I would think this is ok. If we feared that we would miss certain bugs because it is enabled, I can enable it 50:50 at random every time the browser starts. Opinions?


2) Implement comment 1 and comment 2 as an additional mode for MOZ_CHAOSMODE. Nathan, can you help with this?
Flags: needinfo?(nfroyd)
This is maybe not perfect, but it is at least a start.  Seems to compile at least.
Attachment #9008461 - Flags: review?(rjesup)
Flags: needinfo?(nfroyd)
Comment on attachment 9008461 [details] [diff] [review]
add task dispatch/run delays for ChaosMode

Review of attachment 9008461 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/threads/nsThreadPool.cpp
@@ +242,5 @@
> +
> +      // Delay event processing to encourage whoever dispatched this event
> +      // to run.
> +      if (ChaosMode::isActive(ChaosFeature::TaskRunning)) {
> +        const uint32_t milliseconds = 100;

Not that I think it's a problem, but is there a reason this is 100ms and the others are all 1000ms?
Attachment #9008461 - Flags: review?(rjesup) → review+
(In reply to Randell Jesup [:jesup] from comment #6)
> Comment on attachment 9008461 [details] [diff] [review]
> add task dispatch/run delays for ChaosMode
> 
> Review of attachment 9008461 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: xpcom/threads/nsThreadPool.cpp
> @@ +242,5 @@
> > +
> > +      // Delay event processing to encourage whoever dispatched this event
> > +      // to run.
> > +      if (ChaosMode::isActive(ChaosFeature::TaskRunning)) {
> > +        const uint32_t milliseconds = 100;
> 
> Not that I think it's a problem, but is there a reason this is 100ms and the
> others are all 1000ms?

The idea was to delay event running less, but I can change it to be consistent.

Also, thinking about it now, delaying for 1000ms means delays up to 1s, which is probably unacceptable.  What we really want are delays of up to 1000us, I think?  Will probably have to modify things because PR_Sleep is not usable with such intervals...
Comment on attachment 9007735 [details]
Bug 1486400 - Enable MOZ_CHAOSMODE by default for ASan Nightly Reporter builds. r?froydnj

Nathan Froyd [:froydnj] has approved the revision.
Attachment #9007735 - Flags: review+
probably a 10ms sleep is good enough.  We don't need this to have no impact on browser speed, and we may need it to service other code waiting to run, or to run "enough".
Nathan, is this ready to land or do we need to do something more here?
Flags: needinfo?(nfroyd)
OK, fixed up to do something reasonable on Windows and Unix.

decoder, are you able to try this locally and make sure it doesn't break too much?
Attachment #9010067 - Flags: review+
Attachment #9010067 - Flags: feedback?(choller)
Attachment #9008461 - Attachment is obsolete: true
(In reply to Christian Holler (:decoder) from comment #10)
> Nathan, is this ready to land or do we need to do something more here?

This is ready to land once decoder successfully browses with it for a bit.
Flags: needinfo?(nfroyd)
Comment on attachment 9010067 [details] [diff] [review]
add task dispatch/run delays for ChaosMode

I surfed around the whole day with the Linux try build and also started the build on a Windows machine for some quick surfing. Overall, I feel it has a somewhat noticable performance impact, but nothing that wouldn't be acceptable within the project. Also no crashes on any of the two platforms, so I think we're good to land this. \o/
Flags: needinfo?(nfroyd)
Attachment #9010067 - Flags: feedback?(choller) → feedback+
https://hg.mozilla.org/mozilla-central/rev/3a99736fa847
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 64
Depends on: 1492929
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main64-]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.