Closed Bug 1480965 Opened 3 years ago Closed 3 years ago
Sanitizer: heap-use-after-free [@ mozilla::dom::Worker Private::Dispatch To Main Thread] with READ of size 8
The attached crash information was submitted via the ASan Nightly Reporter on mozilla-central-asan-nightly revision 63.0a1-20180801100116-https://hg.mozilla.org/mozilla-central/rev/af6a7edf0069549543f2fba6a8ee3ea251b20829. For detailed crash information, see attachment.
Skimming the code, it looks like ServiceWorkerPrivate::TerminateWorker() does | mWorkerPrivate = nullptr;| which destroys something, then it goes through and cancels things, and somehow ends up touching a dead object. Should something hold alive the worker private? WorkerRunnable seems to hold only a weak pointer to the private. Andrew, can you take a look?
:mrbkap, was this the bug we were looking at tonight, or just the exact same signature? Either way, :mccr8's analysis is correct and we have a plan for an immediate fix and some potentially larger cleanup fixes. Oh, and I can't needinfo :mrbkap. Well, either way we'll iron this out tomorrow. Leaving needinfo up on me.
I'm guessing that this is not a new issue, but I could be wrong.
It looks like this showed up in 61.
Andrew will drive this in.
Assignee: nobody → bugmail
Comment on attachment 9001238 [details] Bug 1480965. r=asuth Andrew Sutherland [:asuth] has approved the revision.
Trying to convey that we disabled ServiceWorkers on ESR60 via preference; maybe unaffected instead of disabled? Not sure.
Comment on attachment 9001238 [details] Bug 1480965. r=asuth [Security approval request comment] Q: How easily could an exploit be constructed based on the patch? A: This is a short time-window UAF; if an attacker can get an allocation in another thread (main or other worker) to claim the former WorkerPrivate's memory space, it seems possible to exploit. (If there are platforms where we're using an allocator with per-thread free-lists, we might be okay there.) Q: Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? A: It'd be obvious to someone who works in UAF's. Q: Which older supported branches are affected by this flaw? A: ESR is unaffected because we disabled ServiceWorkers on ESR. All current non-ESR releases. Specifically, crash-stats has only seen this happening in Firefox 61 onwards (which shouldn't be an artifact of the great crash-stats wipe, I believe), but it's not immediately obvious why it couldn't reproduce in releases earlier than that. Q: If not all supported branches, which bug introduced the flaw? A: n/a Q: Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? A: I will attach those now. The hunks don't uplift to 62 cleanly due to unrelated changes close by. Q: How likely is this patch to cause regressions; how much testing does it need? A: Very little testing. The fix is intentionally designed to keep the WorkerPrivate alive while maintaining the existing life-cycle around mWorkerPrivate. That is, existing logic that dubiously holds onto non-refcounted WorkerPrivate pointers will be okay, but we're not giving any other sketchy logic a chance to make a bad decision.
Attachment #9001238 - Flags: sec-approval?
I'd love to take this but we only have a single beta left, building tomorrow. I need to ni? release management and get their approval.
Allow me. I'll take it.
Comment on attachment 9001238 [details] Bug 1480965. r=asuth Sec-approval+ for trunk. We'll need a beta patch nominated ASAP to try to get tomorrow's final beta.
Attachment #9001238 - Flags: sec-approval? → sec-approval+
Comment on attachment 9002827 [details] [diff] [review] Firefox 62 version. This is the beta patch version of the trunk patch whose approval request was comment 11.
Attachment #9002827 - Flags: sec-approval?
Comment on attachment 9002827 [details] [diff] [review] Firefox 62 version. Whoops, wrong approval type! Approval Request Comment [Feature/Bug causing the regression]: Longstanding bug. [User impact if declined]: Potential security risk. See comment 11. [Is this code covered by automated tests?]: Automated tests cover the general operation of ServiceWorker functionality and shutdown, but not this edge-case. We can't add a test at this time because it's a security bug. [Has the fix been verified in Nightly?]: Not quite yet, but imminently. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No. [Why is the change risky/not risky?]: We extend the lifetime of an object that potentially had a use-after-free but do so without introducing additional hazards. (We still clear mWorkerPrivate, but now we just forget() it into a strong local reference that provides the lifetime guarantees necessary.) [String changes made/needed]: None.
Attachment #9002827 - Flags: sec-approval? → approval-mozilla-beta?
https://hg.mozilla.org/integration/mozilla-inbound/rev/a7caa2f69479cee185e4a7caa9a9516ed1159298 Bug 1480965 - Fix review nit. r=asuth
Attachment #9002827 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
When you find some time, it would be very helpful if you could answer the following questions so I can work on improving ASan Nightly further. In particular, many of the reports we saw about use-after-free were sent only once, not multiple times. So I'm asking myself what changes to ASan Nightly I can make to improve overall uaf detection rate: 1) Do you think more frequent GC/CC would have helped detecting this problem more frequently? 2) Do you think GC/CC on specific DOM events (e.g. beforeunload, pagehide, DOMContentLoaded) would have helped? 3) Do you think triggering the event loop in the specific child processes would have helped here? (e.g. dummy XMLHttpRequest onbeforeunload, useful in fuzzing). 4) Do you have any other ideas to reproduce this particular problem better or do you think any of the above points (or other approaches) might help to find other use-after-free issues in areas that you work in? Any feedback would be greatly appreciated. Thanks!
Whiteboard: [adv-main62+] → [adv-main62+][[post-critsmash-triage]
Comment on attachment 9001238 [details] Bug 1480965. r=asuth Andrew Sutherland [:asuth] has been removed from the revision.
(In reply to Christian Holler (:decoder) from comment #21) > 1) Do you think more frequent GC/CC would have helped detecting this problem > more frequently? No. Not a GC issue but a deterministic worker shutdown related phenomenon. > 2) Do you think GC/CC on specific DOM events (e.g. beforeunload, pagehide, > DOMContentLoaded) would have helped? n/a > 3) Do you think triggering the event loop in the specific child processes > would have helped here? (e.g. dummy XMLHttpRequest onbeforeunload, useful in > fuzzing). No. > 4) Do you have any other ideas to reproduce this particular problem better > or do you think any of the above points (or other approaches) might help to > find other use-after-free issues in areas that you work in? This specific problem is part of a known footgun implementation that's in the process of being cleaned up, but even once cleaned up, there will still be interesting shutdown-related phenomenon. It could be interesting if we could provide a mechanism to intentionally stall PBackground IPC message delivery to the worker thread in a way that doesn't violate ordering invariants. WorkerPrivate implements its own custom dispatch logic, so there's already a place to put it in. This is probably something we'll consider as we clean up the implementation and better understand what's actually happening. (Workers have a complicated setup with "control" runnables and "normal" WorkerRunnables, which a hybrid event target that wraps runnables into control runnables when normal shutdown occurs because of the need for IPC delivery to continue, etc. It's not great.) > Any feedback would be greatly appreciated. Thanks! I love ASAN! Thank you so much for your efforts here. The additional details provided by the ASAN builds in the wild are invaluable. I have spent far too much of my life trying to mentally simulate and re-create weird crash scenarios that the ASAN builds illuminate immediately.
You need to log in before you can comment on or make changes to this bug.