Closed Bug 1720568 Opened 3 years ago Closed 3 years ago

Intermittent SUMMARY: AddressSanitizer: heap-use-after-free /builds/worker/workspace/obj-build/dist/include/mozilla/dom/WorkerPrivate.h:169:12 in IsDebuggerRegistered

Categories

(Core :: DOM: Workers, defect, P5)

defect

Tracking

()

RESOLVED FIXED
92 Branch
Tracking Status
firefox-esr78 91+ fixed
firefox90 --- wontfix
firefox91 --- fixed
firefox92 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jstutte)

References

Details

(4 keywords, Whiteboard: [sec-survey][adv-main91+r][adv-esr78.13+r])

Crash Data

Attachments

(2 files)

Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=345220415&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/G3j4Ni0eSyWTGl2v7x1Iow/runs/0/artifacts/public/logs/live_backing.log


[task 2021-07-14T19:47:53.527Z] 19:47:53     INFO - GECKO(9367) | =================================================================
[task 2021-07-14T19:47:53.528Z] 19:47:53    ERROR - GECKO(9367) | ==10355==ERROR: AddressSanitizer: heap-use-after-free on address 0x61b0000b65fc at pc 0x7f54edfed08d bp 0x7fff8b4c6190 sp 0x7fff8b4c6188
[task 2021-07-14T19:47:53.530Z] 19:47:53     INFO - GECKO(9367) | READ of size 1 at 0x61b0000b65fc thread T0 (Isolated Web Co)
[task 2021-07-14T19:47:53.695Z] 19:47:53     INFO - TEST-START | toolkit/components/reader/test/browser_readerMode_with_anchor.js
[task 2021-07-14T19:47:53.803Z] 19:47:53     INFO - GECKO(9367) | -----------------------------------------------------
[task 2021-07-14T19:47:53.804Z] 19:47:53     INFO - GECKO(9367) | Suppressions used:
[task 2021-07-14T19:47:53.805Z] 19:47:53     INFO - GECKO(9367) |   count      bytes template
[task 2021-07-14T19:47:53.805Z] 19:47:53     INFO - GECKO(9367) |      14        448 nsComponentManagerImpl
[task 2021-07-14T19:47:53.806Z] 19:47:53     INFO - GECKO(9367) |       2        288 libfontconfig.so
[task 2021-07-14T19:47:53.806Z] 19:47:53     INFO - GECKO(9367) | -----------------------------------------------------
[task 2021-07-14T19:47:53.982Z] 19:47:53     INFO - GECKO(9367) | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-07-14T19:47:53.983Z] 19:47:53     INFO - GECKO(9367) | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-07-14T19:47:55.289Z] 19:47:55     INFO - GECKO(9367) | -----------------------------------------------------
[task 2021-07-14T19:47:55.290Z] 19:47:55     INFO - GECKO(9367) | Suppressions used:
[task 2021-07-14T19:47:55.291Z] 19:47:55     INFO - GECKO(9367) |   count      bytes template
[task 2021-07-14T19:47:55.292Z] 19:47:55     INFO - GECKO(9367) |      14        448 nsComponentManagerImpl
[task 2021-07-14T19:47:55.293Z] 19:47:55     INFO - GECKO(9367) |       2        288 libfontconfig.so
[task 2021-07-14T19:47:55.294Z] 19:47:55     INFO - GECKO(9367) | -----------------------------------------------------
[...]
[task 2021-07-14T19:47:57.607Z] 19:47:57     INFO - GECKO(9367) | SUMMARY: AddressSanitizer: heap-use-after-free /builds/worker/workspace/obj-build/dist/include/mozilla/dom/WorkerPrivate.h:169:12 in IsDebuggerRegistered
[task 2021-07-14T19:47:57.608Z] 19:47:57     INFO - GECKO(9367) | Shadow bytes around the buggy address:
[task 2021-07-14T19:47:57.611Z] 19:47:57     INFO - GECKO(9367) |   0x0c368000ec60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2021-07-14T19:47:57.612Z] 19:47:57     INFO - GECKO(9367) |   0x0c368000ec70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2021-07-14T19:47:57.612Z] 19:47:57     INFO - GECKO(9367) |   0x0c368000ec80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2021-07-14T19:47:57.613Z] 19:47:57     INFO - GECKO(9367) |   0x0c368000ec90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2021-07-14T19:47:57.616Z] 19:47:57     INFO - GECKO(9367) |   0x0c368000eca0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2021-07-14T19:47:57.617Z] 19:47:57     INFO - GECKO(9367) | =>0x0c368000ecb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd[fd]
[task 2021-07-14T19:47:57.620Z] 19:47:57     INFO - GECKO(9367) |   0x0c368000ecc0: fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa fa
[task 2021-07-14T19:47:57.622Z] 19:47:57     INFO - GECKO(9367) |   0x0c368000ecd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[task 2021-07-14T19:47:57.623Z] 19:47:57     INFO - GECKO(9367) |   0x0c368000ece0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[task 2021-07-14T19:47:57.624Z] 19:47:57     INFO - GECKO(9367) |   0x0c368000ecf0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2021-07-14T19:47:57.629Z] 19:47:57     INFO - GECKO(9367) |   0x0c368000ed00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2021-07-14T19:47:57.630Z] 19:47:57     INFO - GECKO(9367) | Shadow byte legend (one shadow byte represents 8 application bytes):
[task 2021-07-14T19:47:57.631Z] 19:47:57     INFO - GECKO(9367) |   Addressable:           00
[task 2021-07-14T19:47:57.631Z] 19:47:57     INFO - GECKO(9367) |   Partially addressable: 01 02 03 04 05 06 07
[task 2021-07-14T19:47:57.632Z] 19:47:57     INFO - GECKO(9367) |   Heap left redzone:       fa
[task 2021-07-14T19:47:57.636Z] 19:47:57     INFO - GECKO(9367) |   Freed heap region:       fd
[task 2021-07-14T19:47:57.637Z] 19:47:57     INFO - GECKO(9367) |   Stack left redzone:      f1
[task 2021-07-14T19:47:57.638Z] 19:47:57     INFO - GECKO(9367) |   Stack mid redzone:       f2
[task 2021-07-14T19:47:57.639Z] 19:47:57     INFO - GECKO(9367) |   Stack right redzone:     f3
[task 2021-07-14T19:47:57.640Z] 19:47:57     INFO - GECKO(9367) |   Stack after return:      f5
[task 2021-07-14T19:47:57.641Z] 19:47:57     INFO - GECKO(9367) |   Stack use after scope:   f8
[task 2021-07-14T19:47:57.642Z] 19:47:57     INFO - GECKO(9367) |   Global redzone:          f9
[task 2021-07-14T19:47:57.643Z] 19:47:57     INFO - GECKO(9367) |   Global init order:       f6
[task 2021-07-14T19:47:57.643Z] 19:47:57     INFO - GECKO(9367) |   Poisoned by user:        f7
[task 2021-07-14T19:47:57.644Z] 19:47:57     INFO - GECKO(9367) |   Container overflow:      fc
[task 2021-07-14T19:47:57.645Z] 19:47:57     INFO - GECKO(9367) |   Array cookie:            ac
[task 2021-07-14T19:47:57.646Z] 19:47:57     INFO - GECKO(9367) |   Intra object redzone:    bb
[task 2021-07-14T19:47:57.648Z] 19:47:57     INFO - GECKO(9367) |   ASan internal:           fe
[task 2021-07-14T19:47:57.649Z] 19:47:57     INFO - GECKO(9367) |   Left alloca redzone:     ca
[task 2021-07-14T19:47:57.653Z] 19:47:57     INFO - GECKO(9367) |   Right alloca redzone:    cb
[task 2021-07-14T19:47:57.654Z] 19:47:57     INFO - GECKO(9367) |   Shadow gap:              cc
[task 2021-07-14T19:47:57.655Z] 19:47:57     INFO - GECKO(9367) | ==10355==ABORTING
[task 2021-07-14T19:47:58.852Z] 19:47:58     INFO - GECKO(9367) | -----------------------------------------------------
[task 2021-07-14T19:47:58.852Z] 19:47:58     INFO - GECKO(9367) | Suppressions used:
[task 2021-07-14T19:47:58.853Z] 19:47:58     INFO - GECKO(9367) |   count      bytes template
[task 2021-07-14T19:47:58.854Z] 19:47:58     INFO - GECKO(9367) |      14        448 nsComponentManagerImpl
[task 2021-07-14T19:47:58.854Z] 19:47:58     INFO - GECKO(9367) |       2        288 libfontconfig.so
[task 2021-07-14T19:47:58.855Z] 19:47:58     INFO - GECKO(9367) | -----------------------------------------------------
[task 2021-07-14T19:47:59.225Z] 19:47:59     INFO - GECKO(9367) | MEMORY STAT | vsize 20975155MB | residentFast 1066MB
[task 2021-07-14T19:47:59.227Z] 19:47:59     INFO - TEST-OK | toolkit/components/reader/test/browser_readerMode_with_anchor.js | took 5531ms
Group: core-security → dom-core-security
Keywords: csectype-uaf
Attached file ASan report

My guess would be that this is some kind of shutdown race involving workers. This failure is happening in toolkit/components/reader/test/ with Fission enabled, so we're likely starting tons of workers (for the reader mode stuff) and shutting down processes frequently (because of the Fission part). It looks like workers spin up some kind of unregister debugger thing with a weak reference to the worker private.

This testcase is in a Reader Mode test which would require user interaction, but the bug seems to be a race between a failure to create the worker (why? what failure) while clean-up comes along to see if there's a debugger to detach. Seems plausible it could be triggered from content? rating it on that basis.

So we dispatch TopLevelWorkerFinishedRunnable with the explicit intent to be dispatched with low priority:

    // Note, this uses the lower priority DispatchToMainThreadForMessaging for
    // dispatching TopLevelWorkerFinishedRunnable to the main thread so that
    // other relevant runnables are guaranteed to run before it.
    RefPtr<TopLevelWorkerFinishedRunnable> runnable =
        new TopLevelWorkerFinishedRunnable(this);
    if (NS_FAILED(DispatchToMainThreadForMessaging(runnable.forget()))) {
      NS_WARNING("Failed to dispatch runnable!");
    }

and we store a weak reference to the WorkerPrivate inside that runnable. That makes it hard to impossible to guarantee through other invariants that the WorkerPrivate is still alife when we execute the runnable, especially during shutdown we dispatch a storm of runnables to the main thread. And the probability of races is increased by the low priority dispatch (but to raise priority would not entirely solve the problem, I assume).

WorkerPrivate supports ref-counting so we might want to use RefPtr<WorkerPrivate> and instead look at mStatus more often before we use it.

Please note that the same pattern applies for WorkerFinishedRunnable.

Flags: needinfo?(echuang)
Assignee: nobody → jstutte
Status: NEW → ASSIGNED

Thanks to the ASAN stacks and even more a conversation with :asuth, we understood that we most probably see an early failure during compiler->Dispatch() which will schedule a TopLevelWorkerFinishedRunnable before we set aside the self-reference which we are supposed to clear only through ClearSelfAndParentEventTargetRef called by the exact same TopLevelWorkerFinishedRunnable. So we see an edge case in failure handling here.

Flags: needinfo?(echuang)

Comment on attachment 9232645 [details]
Bug 1720568: Be more explicit about the WorkerPrivate self-reference. r?#dom-worker-reviewers

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: We do not have STR and the changed function's result is only modified in a (probably rare) edge case during worker startup. The comment and patch's title want to suggest more a cleanup than a fix, too.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: It is just a one line move up, so very easy and not risky.
  • How likely is this patch to cause regressions; how much testing does it need?: Very unlikely, the existing service worker tests should be enough to ensure the normal functioning.
Attachment #9232645 - Flags: sec-approval?

Comment on attachment 9232645 [details]
Bug 1720568: Be more explicit about the WorkerPrivate self-reference. r?#dom-worker-reviewers

Approved to land and uplift

Attachment #9232645 - Flags: sec-approval? → sec-approval+
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch

Comment on attachment 9232645 [details]
Bug 1720568: Be more explicit about the WorkerPrivate self-reference. r?#dom-worker-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: Unclear, but it is a sec-high. The normal code-paths are covered by tests, this failure path is not known how to trigger.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It is just a one line move up, so very easy and not risky.
  • String changes made/needed:
Attachment #9232645 - Flags: approval-mozilla-beta?

Jens, does it require an uplift on esr78? (your sec-approval request mentioned the flaws affects all branches)

Flags: needinfo?(jstutte)

(In reply to Pascal Chevrel:pascalc from comment #11)

Jens, does it require an uplift on esr78? (your sec-approval request mentioned the flaws affects all branches)

Probably yes. I did not have the time to check if the patch applies cleanly, though, thus I did not ask for it, yet.

Flags: needinfo?(jstutte)
Flags: needinfo?(pascalc)

Comment on attachment 9232645 [details]
Bug 1720568: Be more explicit about the WorkerPrivate self-reference. r?#dom-worker-reviewers

Approved for 91 beta 8, thanks.

Flags: needinfo?(pascalc)
Attachment #9232645 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9232645 [details]
Bug 1720568: Be more explicit about the WorkerPrivate self-reference. r?#dom-worker-reviewers

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  • User impact if declined: Have no big impact, but crash by accessing nullptr when a specific runnable sequence on worker.
  • Fix Landed on Version: 92
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It is not risky. The patch moves the self-reference setting point earlier. This is reasonable because the worker thread and worker should be ready for it at the moment.
  • String or UUID changes made by this patch: NO
Attachment #9232645 - Flags: approval-mozilla-esr91?

Comment on attachment 9232645 [details]
Bug 1720568: Be more explicit about the WorkerPrivate self-reference. r?#dom-worker-reviewers

I am morphing it into an esr78 uplift request, we are going to have 2 ESR branches overlapping for a while and esr91 is based on Firefox 91 so it will inherit from this patch at merge time.

Attachment #9232645 - Flags: approval-mozilla-esr91? → approval-mozilla-esr78+

(In reply to Pascal Chevrel:pascalc from comment #16)

I am morphing it into an esr78 uplift request, we are going to have 2 ESR branches overlapping for a while and esr91 is based on Firefox 91 so it will inherit from this patch at merge time.

Thanks!

As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.

Please visit this google form to reply.

Flags: needinfo?(jstutte)
Whiteboard: [sec-survey]
Flags: needinfo?(jstutte)

(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #19)

As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.

Please visit this google form to reply.

Done.

QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-
Whiteboard: [sec-survey] → [sec-survey][adv-main90+r]
Whiteboard: [sec-survey][adv-main90+r] → [sec-survey][adv-main91+r]
Whiteboard: [sec-survey][adv-main91+r] → [sec-survey][adv-main91+r][adv-esr78.13+r]
Crash Signature: [@ mozilla::dom::WorkerDebuggerManager::UnregisterDebuggerMainThread]
Crash Signature: [@ mozilla::dom::WorkerDebuggerManager::UnregisterDebuggerMainThread] → [@ mozilla::dom::WorkerDebuggerManager::UnregisterDebuggerMainThread] [@ mozilla::dom::WorkerPrivate::AssertIsOnWorkerThread() const]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: