Closed Bug 1800470 Opened 1 year ago Closed 1 year ago

Assertion failure: cached.Data()->mParticipant == aParticipant (nsCycleCollectionParticipant shouldn't change!), at /xpcom/base/nsCycleCollector.cpp:1987

Categories

(Core :: DOM: File, defect, P2)

x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
109 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox107 --- unaffected
firefox108 --- wontfix
firefox109 --- verified

People

(Reporter: jkratzer, Assigned: smaug)

References

(Depends on 1 open bug, Blocks 2 open bugs, Regression)

Details

(Keywords: regression, testcase, Whiteboard: [bugmon:bisected,confirmed][fuzzblocker])

Attachments

(2 files)

Testcase found while fuzzing mozilla-central rev 20e2590324e2 (built with: --enable-debug --enable-fuzzing).

Please note, that this appears to be a recent regression and is occurring at such a high rate that it is negatively impacting fuzzing.

Testcase can be reproduced using the following commands:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 20e2590324e2 --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
Assertion failure: cached.Data()->mParticipant == aParticipant (nsCycleCollectionParticipant shouldn't change!), at /xpcom/base/nsCycleCollector.cpp:1987

    ==805881==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f64cf0f33d2 bp 0x7f64c1f59200 sp 0x7f64c1f59180 T805919)
    ==805881==The signal is caused by a WRITE memory access.
    ==805881==Hint: address points to the zero page.
        #0 0x7f64cf0f33d2 in CCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*) /xpcom/base/nsCycleCollector.cpp:1986:5
        #1 0x7f64cf0f35eb in CCGraphBuilder::AddPurpleRoot(void*, nsCycleCollectionParticipant*) /xpcom/base/nsCycleCollector.cpp:2024:22
        #2 0x7f64cf107bb5 in AddPurpleRoot /xpcom/base/nsCycleCollector.cpp:2270:19
        #3 0x7f64cf107bb5 in SelectPointersVisitor::Visit(nsPurpleBuffer&, nsPurpleBufferEntry*) /xpcom/base/nsCycleCollector.cpp:1051:9
        #4 0x7f64cf0f1f98 in void nsPurpleBuffer::VisitEntries<SelectPointersVisitor>(SelectPointersVisitor&) /xpcom/base/nsCycleCollector.cpp:954:27
        #5 0x7f64cf0f1b57 in nsPurpleBuffer::SelectPointers(CCGraphBuilder&) /xpcom/base/nsCycleCollector.cpp:1063:3
        #6 0x7f64cf0fbc7d in nsCycleCollector::BeginCollection(mozilla::CCReason, ccIsManual, nsICycleCollectorListener*) /xpcom/base/nsCycleCollector.cpp:3648:14
        #7 0x7f64cf0fb61a in nsCycleCollector::Collect(mozilla::CCReason, ccIsManual, js::SliceBudget&, nsICycleCollectorListener*, bool) /xpcom/base/nsCycleCollector.cpp:3440:9
        #8 0x7f64cf0fdc3e in nsCycleCollector_collect(mozilla::CCReason, nsICycleCollectorListener*) /xpcom/base/nsCycleCollector.cpp:3948:28
        #9 0x7f64d3cad30f in mozilla::dom::workerinternals::(anonymous namespace)::WorkerJSRuntime::CustomGCCallback(JSGCStatus) /dom/workers/RuntimeService.cpp:816:11
        #10 0x7f64cf0d2913 in mozilla::CycleCollectedJSRuntime::OnGC(JSContext*, JSGCStatus, JS::GCReason) /xpcom/base/CycleCollectedJSRuntime.cpp:1884:3
        #11 0x7f64d6eb63df in js::gc::GCRuntime::maybeCallGCCallback(JSGCStatus, JS::GCReason) /js/src/gc/GC.cpp:3905:3
        #12 0x7f64d6eb7098 in ~AutoCallGCCallbacks /js/src/gc/GC.cpp:3878:32
        #13 0x7f64d6eb7098 in js::gc::GCRuntime::gcCycle(bool, js::SliceBudget const&, JS::GCReason) /js/src/gc/GC.cpp:3995:1
        #14 0x7f64d6eb8095 in js::gc::GCRuntime::collect(bool, js::SliceBudget const&, JS::GCReason) /js/src/gc/GC.cpp:4177:9
        #15 0x7f64d6e8be77 in js::gc::GCRuntime::gc(JS::GCOptions, JS::GCReason) /js/src/gc/GC.cpp:4254:3
        #16 0x7f64d6ed6e5b in JS::NonIncrementalGC(JSContext*, JS::GCOptions, JS::GCReason) /js/src/gc/GCAPI.cpp:297:21
        #17 0x7f64d3cac1b1 in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() /dom/workers/RuntimeService.cpp:2084:7
        #18 0x7f64cf1f0887 in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1198:16
        #19 0x7f64cf1f6f3d in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:465:10
        #20 0x7f64cfddde43 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:330:5
        #21 0x7f64cfd02c88 in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:381:10
        #22 0x7f64cfd02b91 in RunHandler /ipc/chromium/src/base/message_loop.cc:374:3
        #23 0x7f64cfd02b91 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:356:3
        #24 0x7f64cf1ebc37 in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:383:10
        #25 0x7f64e1e71c86 in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:201:5
        #26 0x7f64e2712b42 in start_thread nptl/pthread_create.c:442:8
        #27 0x7f64e27a49ff  misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
    
    UndefinedBehaviorSanitizer can not provide additional info.
    SUMMARY: UndefinedBehaviorSanitizer: SEGV /xpcom/base/nsCycleCollector.cpp:1986:5 in CCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*)
    ==805881==ABORTING
Attached file Testcase

We are running the final GC/CC on a worker shutdown here. The stack looks pretty much like corrupted data inside the CC, at least I cannot see any non-GC/CC code, :smaug ?

Flags: needinfo?(smaug)

I think this would indicate an error in the implementation of a CC participant. Knowing the values of aParticipant and cached.Data()->mParticipant would help, because then we could audit it. It could be some kind of shutdown issue.

Jason, if you happen to be able to provide a pernosco session here, that would help to shed some light for sure.

Flags: needinfo?(jkratzer)

Incorrect CC participants could cause UAFs due to incorrect tracing of JS objects, so I'll hide this until we know more.

Group: dom-core-security

I added some local logging to CCGraphBuilder::AddNode. The old participant is WritableStream and the new one is FileSystemWritableFileStream.

    auto* parti1 = cached.Data()->mParticipant;
    auto* parti2 = aParticipant;
    if (parti1 != parti2) {
      printf("cached participant: %s; add node participant: %s\n",
             parti1 ? parti1->ClassName() : "null",
             parti2 ? parti2->ClassName() : "null");
    }

I'll see if I can figure out what might be going wrong with those classes.

FileSystemWritableFileStream inherits from WritableStream. It also defines its own cycle collector participant, so that it can traverse mManager and call Close() in unlink. WritableStream traces JS, so it calls mozilla::HoldJSObjects(this); in the ctor. HoldJSObjects calls QI to get the participant, but we still have the vtable of WritableStream, so we get the participant for WritableStream. Later after the FileSystemWritableFileStream is fully set up, we end up adding it to the CC graph again. This time, when we QI it to get the participant, we end up with FileSystemWritableFileStream.

I'm not sure what the right fix for this is. I tried adding HoldJSObjects to the ctor for FileSystemWritableFileStream, but that hits an assert I added in bug 1627213. In that bug, I could fix the issue there by removing the HoldJSObjects from the superclass (AuthenticatorResponse) because it is a "virtual" base class that is never directly instantiated, but people do construct WritableStream directly.

I'll unhide this, because FileSystemWritableFileStream doesn't define a different trace method, so from the perspective of worrying about keeping JS alive there's no issue.

Group: dom-core-security
Flags: needinfo?(jkratzer)

A possible fix would be to make the WritableStream ctor used by FileSystemWritableFileStream private, remove the call to HoldJSObjects, add a call to HoldJSObjects to the ctor of FileSystemWritableFileStream, then make a Create method on WritableStream that calls the ctor and does HoldJSObjects. Kind of messy.

I'm a bit surprised this hasn't happened before. It would be a bit cleaner if we could add some way in HoldJSObjects to lazily resolve the participant. So we'd have some magical initial value that is like "I dunno" and then later, which will presumably be after the ctor, we call QI there and fill in the real value.

Maybe the WritableStream could add a boolean argument that says whether to call HoldJSObjects and it defaults to true. Of course, it would be bad if somebody manually passed in false and then didn't call HoldJSObjects but I feel like people wouldn't go out of their way to do that.

Didn't we have this very same or very similar bug with some other FileSystem* classes too. Trying to find which one.

Flags: needinfo?(smaug)

https://searchfox.org/mozilla-central/rev/898dfc8ee404f8af365251a322318f7a9e2f7f09/dom/fs/api/FileSystemWritableFileStream.h#45 had the comment why we don't do CC stuff in that class, but it was just removed later.

Regressed by: 1798459

Yeah, but the removal of those macros caused a different assertion in worker shutdown. IIRC, it was about keeping the global scope alive. That was easy to see when I was running normal tests for OPFS. This bug was filed later during fuzzing OPFS.

We just can't avoid the CC macros because mManager needs to be traversed and we also need to do some other work in unlink.

Sure, but the comment was there explaining why the macros can't be used without doing some changes elsewhere (WritableStream)

I can take this.

Assignee: nobody → smaug

Set release status flags based on info from the regressing bug 1798459

Attachment #9303292 - Attachment mime type: text/plain → text/html

Olli, would you mind to triage this, too? Thanks

Flags: needinfo?(smaug)
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/779cf97ae5de
make it possible to extend WritableStream with another cycle collectable class, r=saschanaz
Severity: -- → S3
Flags: needinfo?(smaug)
Priority: -- → P2
Depends on: 1801281
Depends on: 1798773
Depends on: 1801288

(In reply to Andrew McCreight [:mccr8] from comment #9)

I'm a bit surprised this hasn't happened before.

I'm pretty sure it did, but it's been a while. One other solution is to make the constructor protected and pass in the right nsScriptObjectTracer* as an argument to that constructor (with a Create helper for when we really just need to create a WritableStream).

Verified bug as reproducible on mozilla-central 20221118154632-3b5a8f67189b.
The bug appears to have been introduced in the following build range:

Start: f3f4ef6894dce654450c2302d6fb5cdf08a9f800 (20221109170402)
End: 569f150007e7e7171369eff56a244831d3ac2b97 (20221109192947)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=f3f4ef6894dce654450c2302d6fb5cdf08a9f800&tochange=569f150007e7e7171369eff56a244831d3ac2b97

Whiteboard: [bugmon:confirm][fuzzblocker] → [bugmon:bisected,confirmed][fuzzblocker]

This bug prevents fuzzing from making progress; however, it has low severity. It is important for fuzz blocker bugs to be addressed in a timely manner (see here why?).
:smaug, could you increase the severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(smaug)

This will land once the blocking bugs have been fixed.

Flags: needinfo?(smaug)
Pushed by jvarga@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c612a0971391
make it possible to extend WritableStream with another cycle collectable class, r=saschanaz
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch

Verified bug as fixed on rev mozilla-central 20221124212638-e12f31999d33.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Status: RESOLVED → VERIFIED
Keywords: bugmon
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: