Closed Bug 1769878 Opened 2 years ago Closed 2 years ago

crash at null in [@ NS_CycleCollectorSuspect3]

Categories

(Core :: DOM: Workers, defect, P3)

defect

Tracking

()

VERIFIED FIXED
103 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- wontfix
firefox100 --- wontfix
firefox101 --- wontfix
firefox102 --- wontfix
firefox103 --- verified

People

(Reporter: tsmith, Assigned: smaug)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression, testcase, Whiteboard: [bugmon:bisected,confirmed])

Crash Data

Attachments

(4 files, 3 obsolete files)

Attached file testcase.html

Found while fuzzing m-c 20220517-8027f6771a74 (--enable-address-sanitizer --enable-fuzzing)

To reproduce via Grizzly Replay:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch -a --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
==23648==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f5d4a2006a7 bp 0x7f5d161fd850 sp 0x7f5d161fd780 T62)
==23648==The signal is caused by a READ memory access.
==23648==Hint: address points to the zero page.
    #0 0x7f5d4a2006a7 in operator! /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:311:36
    #1 0x7f5d4a2006a7 in NS_CycleCollectorSuspect3 /gecko/xpcom/base/nsCycleCollector.cpp:3764:7
    #2 0x7f5d4fceb523 in incr<&NS_CycleCollectorSuspect3> /builds/worker/workspace/obj-build/dist/include/nsISupportsImpl.h:248:7
    #3 0x7f5d4fceb523 in incr<&NS_CycleCollectorSuspect3> /builds/worker/workspace/obj-build/dist/include/nsISupportsImpl.h:234:12
    #4 0x7f5d4fceb523 in mozilla::DOMEventTargetHelper::AddRef() /gecko/dom/events/DOMEventTargetHelper.cpp:85:1
    #5 0x7f5d4fe59ce8 in mozilla::dom::Blob::Create(nsIGlobalObject*, mozilla::dom::BlobImpl*) /gecko/dom/file/Blob.cpp:81:59
    #6 0x7f5d4f6c9463 in mozilla::dom::OffscreenCanvas::CreateEncodeCompleteCallback(nsCOMPtr<nsIGlobalObject>&&, mozilla::dom::Promise*)::EncodeCallback::ReceiveBlobImpl(already_AddRefed<mozilla::dom::BlobImpl>) /gecko/dom/canvas/OffscreenCanvas.cpp:292:31
    #7 0x7f5d4d5b9101 in mozilla::dom::EncodingCompleteEvent::Run() /gecko/dom/base/ImageEncoder.cpp
    #8 0x7f5d4a3d3e8e in nsThread::ProcessNextEvent(bool, bool*) /gecko/xpcom/threads/nsThread.cpp:1174:16
    #9 0x7f5d4a3cd48f in NS_ProcessPendingEvents(nsIThread*, unsigned int) /gecko/xpcom/threads/nsThreadUtils.cpp:430:19
    #10 0x7f5d51faf8aa in mozilla::dom::WorkerPrivate::ClearMainEventQueue(mozilla::dom::WorkerPrivate::WorkerRanOrNot) /gecko/dom/workers/WorkerPrivate.cpp:3760:5
    #11 0x7f5d51f9c804 in mozilla::dom::WorkerPrivate::ScheduleDeletion(mozilla::dom::WorkerPrivate::WorkerRanOrNot) /gecko/dom/workers/WorkerPrivate.cpp:3579:3
    #12 0x7f5d51f83602 in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() /gecko/dom/workers/RuntimeService.cpp:2100:19
    #13 0x7f5d4a3d3e8e in nsThread::ProcessNextEvent(bool, bool*) /gecko/xpcom/threads/nsThread.cpp:1174:16
    #14 0x7f5d4a3dd7cc in NS_ProcessNextEvent(nsIThread*, bool) /gecko/xpcom/threads/nsThreadUtils.cpp:465:10
    #15 0x7f5d4baf30d1 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /gecko/ipc/glue/MessagePump.cpp:330:5
    #16 0x7f5d4b978131 in RunInternal /gecko/ipc/chromium/src/base/message_loop.cc:380:10
    #17 0x7f5d4b978131 in RunHandler /gecko/ipc/chromium/src/base/message_loop.cc:373:3
    #18 0x7f5d4b978131 in MessageLoop::Run() /gecko/ipc/chromium/src/base/message_loop.cc:355:3
    #19 0x7f5d4a3cba5b in nsThread::ThreadFunc(void*) /gecko/xpcom/threads/nsThread.cpp:378:10
    #20 0x7f5d6f8d857e in _pt_root /gecko/nsprpub/pr/src/pthreads/ptthread.c:201:5
    #21 0x7f5d70575608 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477:8
    #22 0x7f5d7013c132 in __clone /build/glibc-SzIz7B/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Flags: in-testsuite?

A Pernosco session is available here: https://pernos.co/debug/-uCYdFryO8hFvxXKTu5zCA/index.html

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20220518031437-1e98fd258975.
The bug appears to have been introduced in the following build range:

Start: ca296f209b269a1ea7b29a2a36929c1f05f6b6d4 (20220325152959)
End: 683ad09c90953c4a288c5e625c2e64a87fe1ae27 (20220325161857)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ca296f209b269a1ea7b29a2a36929c1f05f6b6d4&tochange=683ad09c90953c4a288c5e625c2e64a87fe1ae27

Keywords: regression
Whiteboard: [bugmon:bisected,confirmed]

(In reply to Bugmon [:jkratzer for issues] from comment #2)

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20220518031437-1e98fd258975.
The bug appears to have been introduced in the following build range:

Start: ca296f209b269a1ea7b29a2a36929c1f05f6b6d4 (20220325152959)
End: 683ad09c90953c4a288c5e625c2e64a87fe1ae27 (20220325161857)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ca296f209b269a1ea7b29a2a36929c1f05f6b6d4&tochange=683ad09c90953c4a288c5e625c2e64a87fe1ae27

Jens, Olli, could this have been caused by your commits in bug 1756172?

Kelsey, given this is a null crash and not common, should we decrease the severity?

Flags: needinfo?(jstutte)
Flags: needinfo?(jgilbert)
Flags: needinfo?(bugs)

(In reply to Tyson Smith [:tsmith] from comment #1)

A Pernosco session is available here: https://pernos.co/debug/-uCYdFryO8hFvxXKTu5zCA/index.html

Thanks, always appreciated! I put some notes.

(In reply to Marco Castelluccio [:marco] from comment #3)

Jens, Olli, could this have been caused by your commits in bug 1756172?

Yes, I think so. It seems we anticipated the moment the cycle collector is shut down before the last round of event processing which happens inside WorkerPrivate::ScheduleDeletion and thus changing the refcount of cycle collected objects as happening here for the global (which is kept alive by a callback beyond the worker's lifetime) will just fail.

Actually I think we should inhibit any further dispatch to the worker thread starting from here before we do our last explicit round of processing before the cycle collector shuts down. And WorkerPrivate::ScheduleDeletion IMHO should not process any further events on the worker thread at all.

WDYT, Olli?

Severity: S2 → S3
Flags: needinfo?(jstutte)
Priority: -- → P3
Regressed by: 1756172

Yes. We seem to have yet another place where event loop is cleared.
https://searchfox.org/mozilla-central/rev/b462b11e71b500e084f51e61fbd9e19ea0122c78/dom/workers/WorkerPrivate.cpp#3575
We need to mark the thread as shutting down before that or something, or hmm, EncodingCompleteEvent should use Worker Refs to detect shutdown I think.

Flags: needinfo?(bugs)
Has Regression Range: --- → yes
Component: Canvas: WebGL → DOM: Workers
Assignee: nobody → jstutte
Status: NEW → ASSIGNED
See Also: → 1772084

Comment on attachment 9278488 [details]
Bug 1769878: Ensure WorkerThreadPrimaryRunnable will not run any events after the cycle collector is destroyed. r?#dom-worker-reviewers

Revision D147516 was moved to bug 1772084. Setting attachment 9278488 [details] to obsolete.

Attachment #9278488 - Attachment is obsolete: true
Severity: S3 → S4
Flags: needinfo?(jgilbert)

:jgilbert, sorry for not clearing your ni?, I already decreased severity.

Severity: S4 → S3
Assignee: jstutte → bugs
Attachment #9279083 - Attachment is obsolete: true
Attachment #9279081 - Attachment is obsolete: true
Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b202476826d2 Add the reproducible crashtest. r=smaug https://hg.mozilla.org/integration/autoland/rev/851d0144fd91 Remove redundant RefPtr<nsIGlobalObject> from EncodeComplete callback. r=smaug https://hg.mozilla.org/integration/autoland/rev/cd4ca1691c0b ensure objects owned by the worker thread are cleared when the worker is shutting down, r=asuth,aosmond

Bugmon Analysis
Verified bug as fixed on rev mozilla-central 20220623035450-d26536dbf462.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Status: RESOLVED → VERIFIED
Keywords: bugmon
Regressions: 1776878
Flags: in-testsuite? → in-testsuite+
Crash Signature: [@ NS_CycleCollectorSuspect3]

Per discussion with smaug, the relevant API affected by this bug is enabled on ESR, but the use cases are pretty limited and it's not clear how high-impact it will actually be there. Let's keep an eye on it for a couple cycles and we can decide later about a possible backport. The patches do graft cleanly.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: