Closed Bug 1777574 Opened 2 years ago Closed 2 years ago

Hit MOZ_CRASH(JS holder AbortSignal contains pointers to GC things in more than one zone (found in mReason) ) at /xpcom/base/CycleCollectedJSRuntime.cpp:1345

Categories

(Core :: DOM: Core & HTML, defect, P2)

x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
106 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 105+ fixed
firefox102 --- wontfix
firefox103 --- wontfix
firefox104 --- wontfix
firefox105 + fixed
firefox106 + fixed

People

(Reporter: jkratzer, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(4 keywords, Whiteboard: [bugmon:bisected,confirmed][adv-main105+r][adv-esr102.3+r])

Attachments

(6 files)

Testcase found while fuzzing mozilla-central rev 65e579f52525 (built with: --enable-debug --enable-fuzzing).

Testcase can be reproduced using the following commands:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 65e579f52525 --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
Hit MOZ_CRASH(JS holder AbortSignal contains pointers to GC things in more than one zone (found in mReason) ) at /xpcom/base/CycleCollectedJSRuntime.cpp:1345

    ==1462768==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f6458109625 bp 0x7fff3ca736c0 sp 0x7fff3ca73690 T1462768)
    ==1462768==The signal is caused by a WRITE memory access.
    ==1462768==Hint: address points to the zero page.
        #0 0x7f6458109625 in MOZ_Crash /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:261:3
        #1 0x7f6458109625 in checkZone /xpcom/base/CycleCollectedJSRuntime.cpp:1342:5
        #2 0x7f6458109625 in CheckZoneTracer::Trace(JS::Heap<JS::Value>*, char const*, void*) const /xpcom/base/CycleCollectedJSRuntime.cpp:1352:7
        #3 0x7f64580feb5a in CheckHolderIsSingleZone /xpcom/base/CycleCollectedJSRuntime.cpp:1411:17
        #4 0x7f64580feb5a in mozilla::CycleCollectedJSRuntime::TraceJSHolders(JSTracer*, mozilla::JSHolderMap::Iter&, js::SliceBudget&) /xpcom/base/CycleCollectedJSRuntime.cpp:1471:7
        #5 0x7f64580fe3b3 in mozilla::CycleCollectedJSRuntime::TraceNativeGrayRoots(JSTracer*, mozilla::JSHolderMap::WhichHolders, js::SliceBudget&) /xpcom/base/CycleCollectedJSRuntime.cpp:1452:19
        #6 0x7f645fb9e195 in traceEmbeddingGrayRoots /js/src/gc/RootMarking.cpp:403:10
        #7 0x7f645fb9e195 in js::gc::GCRuntime::traceEmbeddingGrayRoots(JSTracer*) /js/src/gc/RootMarking.cpp:390:3
        #8 0x7f645fb9d696 in js::gc::GCRuntime::traceRuntimeCommon(JSTracer*, js::gc::GCRuntime::TraceOrMarkRuntime) /js/src/gc/RootMarking.cpp:371:7
        #9 0x7f645fb9dd94 in js::gc::GCRuntime::traceRuntime(JSTracer*, js::gc::AutoTraceSession&) /js/src/gc/RootMarking.cpp:290:3
        #10 0x7f645fbe245c in HeapCheckTracerBase::traceHeap(js::gc::AutoTraceSession&) /js/src/gc/Verifier.cpp:864:12
        #11 0x7f645fbe3351 in check /js/src/gc/Verifier.cpp:1008:8
        #12 0x7f645fbe3351 in js::CheckGrayMarkingState(JSRuntime*) /js/src/gc/Verifier.cpp:1026:17
        #13 0x7f64580fefde in mozilla::CycleCollectedJSRuntime::CheckGrayBits() const /xpcom/base/CycleCollectedJSRuntime.cpp:1594:3
        #14 0x7f645812704d in nsCycleCollector::BeginCollection(mozilla::CCReason, ccIsManual, nsICycleCollectorListener*) /xpcom/base/nsCycleCollector.cpp:3582:19
        #15 0x7f6458126bca in nsCycleCollector::Collect(mozilla::CCReason, ccIsManual, js::SliceBudget&, nsICycleCollectorListener*, bool) /xpcom/base/nsCycleCollector.cpp:3412:9
        #16 0x7f6458129413 in nsCycleCollector_collectSlice(js::SliceBudget&, mozilla::CCReason, bool) /xpcom/base/nsCycleCollector.cpp:3934:21
        #17 0x7f6459e50c7c in nsJSContext::RunCycleCollectorSlice(mozilla::CCReason, mozilla::TimeStamp) /dom/base/nsJSEnvironment.cpp:1460:5
        #18 0x7f6459e51f62 in mozilla::CCGCScheduler::CCRunnerFired(mozilla::TimeStamp) /dom/base/nsJSEnvironment.cpp:1609:9
        #19 0x7f64581ebb82 in operator() /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14
        #20 0x7f64581ebb82 in mozilla::IdleTaskRunner::Run() /xpcom/threads/IdleTaskRunner.cpp:125:14
        #21 0x7f64581ec6a5 in mozilla::IdleTaskRunnerTask::Run() /xpcom/threads/IdleTaskRunner.cpp:46:15
        #22 0x7f6458203b99 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:851:26
        #23 0x7f6458202849 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:725:15
        #24 0x7f6458202993 in mozilla::TaskController::ProcessPendingMTTask(bool) /xpcom/threads/TaskController.cpp:461:36
        #25 0x7f645822ecc6 in operator() /xpcom/threads/TaskController.cpp:187:37
        #26 0x7f645822ecc6 in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:531:5
        #27 0x7f64582185df in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1205:16
        #28 0x7f645821ebed in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:465:10
        #29 0x7f6458dea1d6 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:85:21
        #30 0x7f6458d10777 in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:380:10
        #31 0x7f6458d10682 in RunHandler /ipc/chromium/src/base/message_loop.cc:373:3
        #32 0x7f6458d10682 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:355:3
        #33 0x7f645cf8e538 in nsBaseAppShell::Run() /widget/nsBaseAppShell.cpp:150:27
        #34 0x7f645f0b3d8b in XRE_RunAppShell() /toolkit/xre/nsEmbedFunctions.cpp:875:20
        #35 0x7f6458deb0ca in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:235:9
        #36 0x7f6458d10777 in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:380:10
        #37 0x7f6458d10682 in RunHandler /ipc/chromium/src/base/message_loop.cc:373:3
        #38 0x7f6458d10682 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:355:3
        #39 0x7f645f0b33ac in XRE_InitChildProcess(int, char**, XREChildData const*) /toolkit/xre/nsEmbedFunctions.cpp:734:34
        #40 0x55d226d2ff70 in content_process_main /browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
        #41 0x55d226d2ff70 in main /browser/app/nsBrowserApp.cpp:338:18
        #42 0x7f646ea7f082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
        #43 0x55d226d05d1c in _start (/home/jkratzer/builds/mc-debug/firefox-bin+0x15d1c) (BuildId: 8005d97f53a118b7a2223744b49a806eab077df4)
    
    UndefinedBehaviorSanitizer can not provide additional info.
    SUMMARY: UndefinedBehaviorSanitizer: SEGV /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:261:3 in MOZ_Crash
    ==1462768==ABORTING
Attached file Testcase
Component: XPCOM → DOM: Core & HTML

I think this might be a security issue, but I could be misremembering...

Group: dom-core-security

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20220630212430-65f99678a1ef.
The bug appears to have been introduced in the following build range:

Start: 11ecf2c5eb7b2e2c6461676ec28178ec5d2417e0 (20220421191935)
End: a440fb0e93f1f3b65c8b310466892aec98b14bf2 (20220421192030)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=11ecf2c5eb7b2e2c6461676ec28178ec5d2417e0&tochange=a440fb0e93f1f3b65c8b310466892aec98b14bf2

Keywords: regression
Whiteboard: [bugmon:confirm] → [bugmon:bisected,confirmed]

Setting regressed_by field after analyzing regression range found by bugmon.

Regressed by: 1734997

Set release status flags based on info from the regressing bug 1734997

:sefeng, since you are the author of the regressor, bug 1734997, could you take a look?
For more information, please visit auto_nag documentation.

Flags: needinfo?(sefeng)

(In reply to Andrew McCreight [:mccr8] from comment #2)

I think this might be a security issue, but I could be misremembering...

Yes, because we might miss tracing some gray roots.

The testcase just happens use use scheduler API to trigger this, but I don't think it really caused this.

We have these days probably tons of multizone stuff around.

No longer regressed by: 1734997

Kagami, it looks like you added AbortSignalImpl::mReason this field in bug 1737771. Could you take a look? Thanks.

Flags: needinfo?(krosylight)

I think we'll need to audit lots of code and annotate them being multizone.

Hmm, interesting. I guess all stream interfaces should use NS_IMPL_CYCLE_COLLECTION_MULTI_ZONE_JSHOLDER_CLASS instead.

https://searchfox.org/mozilla-central/rev/9ad01b5eae48436e3bdb8df01be3973e683d5242/xpcom/base/nsCycleCollectionParticipant.h#950-951

// Most JS holder classes should only contain pointers to JS GC things in a
// single JS zone (excluding pointers into the atoms zone), but there are some
// exceptions. Such classes should use this macro to tell the system about this.

Not quite sure this is true.

Edit: https://searchfox.org/mozilla-central/rev/9ad01b5eae48436e3bdb8df01be3973e683d5242/xpcom/base/CycleCollectedJSRuntime.cpp#1334-1348 (from comment #8) says that's a last resort, okay then.

I'm not very familiar with this multizone issue. Do we have some examples of the options mentioned in comment #8?

// - wrap all JS GC things into the same compartment
// - split GC thing pointers between separate cycle collected objects

Flags: needinfo?(sefeng)
Flags: needinfo?(krosylight)
Flags: needinfo?(jcoppeard)
Flags: needinfo?(continuation)

It looks like you added a single field, so you should be able to wrap this field when it is assigned.

I'm not too familiar with DOM code, but something like: get the JSObject for your associated global object, use JSAutoRealm to enter the global's realm, and then use JS_WrapValue to wrap the value into the same realm as the global.

I couldn't find a good example of this but Andrew probably knows where this happens already.

Flags: needinfo?(jcoppeard)

We need to make this setup less error prone. We have tons of code having JS::Heap<Foo> member variables in js holders. Manual wrapping is way too easy to forget.

(In reply to Olli Pettay [:smaug] from comment #15)

We need to make this setup less error prone. We have tons of code having JS::Heap<Foo> member variables in js holders. Manual wrapping is way too easy to forget.

Absolutely. dom/streams have only one JS_WrapValue call (but without JSAutoReam, is it required?) even when most of the code is touched by a SpiderMonkey person.

I'd just go ahead and use the multi zone holder thing, unless you expect there to be a ton of these objects. I don't know off the top of my head where places do wrapping when they assign to a field.

Flags: needinfo?(continuation)
Assignee: nobody → smaug

Btw, the testcase itself is Nightly only by default, since .scheduler is nightly only.
But I'm still rather worried about non-Nightly cases. Trying to find some.
(but I'm also writing a patch to fix the zone handling)

Severity: -- → S2
Priority: -- → P2

The patch is massive, but just because of macro name changes.
The interesting stuff happens in xpcom/base/nsCycleCollectionParticipant.h and dom/base/nsWrapperCache.h

Attachment #9290666 - Attachment description: Bug 1777574, automate CC zone handling, r=mccr8,jonco → Bug 1777574, automate CC zone handling, r=mccr8

Comment on attachment 9290666 [details]
Bug 1777574, automate CC zone handling, r=mccr8

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Crashing because of accessing a JS::Heap value which has been already GCed might not be too hard, but the patch does possibly hide the issue quite well. Though, bug 1785109 landed recently to branches and thus pinpointing to this issue. However, the testcase in this bug relies on Nightly only APIs and as of now I'm not aware other similar issues.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: NA
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: They shouldn't be risky, but the patch is massive so might take a bit time to update it.
    But, as I said, the known issue is Nightly only. Bug 1785109 was another instance and it was fixed on branches.
  • How likely is this patch to cause regressions; how much testing does it need?: If this causes regressions, they should show up soon after landing, just by running all the tests.
  • Is Android affected?: Yes
Attachment #9290666 - Flags: sec-approval?

FYI, the current patch doesn't apply cleanly to m-c, but it does to Beta. It'll also need a rebased patch for ESR102. TBH, I'm a bit unsure if we want to land this patch now with only a week of Beta left before 105 goes to RC.

I wasn't thinking to land the patch to esr at all. I hope we don't need it there. But feel free to disagree :)
And I'll rebase the patch once it has been approved.

I think that if we don't want this on ESR, we need to re-evaluate the sec-rating of the bug. Generally, sec-high bugs fall into automatic uplift territory.

Yeah. The question is again that if one needs to be able to change a pref value in order to trigger the issue, is the issue sec-high?
(But if one can change pref values, I'm sure one can find quite a few issues)

Comment on attachment 9290666 [details]
Bug 1777574, automate CC zone handling, r=mccr8

I would like to land this this cycle and uplift to beta if we can.

Attachment #9290666 - Flags: sec-approval? → sec-approval+
Group: dom-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 106 Branch

The patch landed in nightly and beta is affected.
:smaug, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox105 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(smaug)

Bugmon Analysis
Verified bug as fixed on rev mozilla-central 20220908123603-65e0896af25d.

Status: RESOLVED → VERIFIED

Comment on attachment 9293364 [details]
Bug 1777574, automate CC zone handling, for beta, r=mccr8

Beta/Release Uplift Approval Request

  • User impact if declined: The patch is making certain parts of cycle collector macro usage less error prone. Bug 1785109 is a sign that we have had issues even on release builds.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: To reproduce this easily one needs to use a debug build and then try 'testcase for debug builds' from the bug.
    I guess one could download a debug build from CI and verify this.
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This is changing small amount of code in lots of places. However the macros should make things safer by enforcing correct use in most cases.
  • String changes made/needed: NA
  • Is Android affected?: Yes
Flags: needinfo?(smaug)
Attachment #9293364 - Flags: approval-mozilla-beta?
Attachment #9293371 - Flags: approval-mozilla-beta?

Comment on attachment 9293364 [details]
Bug 1777574, automate CC zone handling, for beta, r=mccr8

Approved for 105.0b9 and 102.3esr.

Attachment #9293364 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9293371 - Flags: approval-mozilla-beta? → approval-mozilla-esr102+

I find these patches make the macros generated from webidl-example unbuildable. Should NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS in CGExampleClass be replaced by NS_DECL_CYCLE_COLLECTION_WRAPPERCACHE_CLASS as well?

Regressions: 1790481
Whiteboard: [bugmon:bisected,confirmed] → [bugmon:bisected,confirmed][adv-main105+r]
Whiteboard: [bugmon:bisected,confirmed][adv-main105+r] → [bugmon:bisected,confirmed][adv-main105+r][adv-esr102.3+r]

I have been told that the ESR commit for this bug here is reverting our patch for bug 1785109, a sec-high use-after-free, ESR commit here.

Is that intentional?

Flags: needinfo?(smaug)

bug 1785109 was a special case, and this bug fixes the whole class of those issues. So yes, it is intentional.

Flags: needinfo?(smaug)

Setting Regressed by field after analyzing regression range found by bugmon in comment #3.

Regressed by: 1734997
Duplicate of this bug: 1788238
Group: core-security-release
Assignee: smaug → nobody
Keywords: bugmon
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: