Closed Bug 1947303 Opened 9 days ago Closed 7 days ago

crash in [@ mozilla::dom::ContentParent::AsyncSendShutDownMessage]

Categories

(Core :: DOM: Content Processes, defect)

defect

Tracking

()

VERIFIED FIXED
137 Branch
Tracking Status
firefox-esr128 --- unaffected
firefox135 --- unaffected
firefox136 --- unaffected
firefox137 --- verified

People

(Reporter: tsmith, Assigned: nika)

References

(Blocks 2 open bugs, Regression, )

Details

(Keywords: crash, regression, testcase, Whiteboard: [fuzzblocker][bugmon:bisected,confirmed])

Attachments

(2 files)

Attached file testcase.html

Found while fuzzing 20250208-053595a05e65 (--enable-address-sanitizer --enable-fuzzing)

This is the top fuzzblocker by far, fuzzers have reported >15K in ~48h.

To reproduce via Grizzly Replay:

$ pip install fuzzfetch grizzly-framework --upgrade
$ python -m fuzzfetch -a --fuzzing -n firefox
$ python -m grizzly.replay.bugzilla ./firefox/firefox <bugid>
==55884==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7244eb87c361 bp 0x7ffdaf51f750 sp 0x7ffdaf51f720 T0)
==55884==The signal is caused by a READ memory access.
==55884==Hint: address points to the zero page.
    #0 0x7244eb87c361 in mozilla::dom::ContentParent::AsyncSendShutDownMessage() /gecko/dom/ipc/ContentParent.cpp:1655:34
    #1 0x7244eb83a92c in mozilla::dom::ContentParent::MaybeBeginShutDown(bool, bool) /gecko/dom/ipc/ContentParent.cpp:2241:5
    #2 0x7244eb880c2f in mozilla::dom::ContentParent::RemoveKeepAlive(unsigned long) /gecko/dom/ipc/ContentParent.cpp:2132:3
    #3 0x7244eb986bc0 in operator() /gecko/dom/ipc/UniqueContentParentKeepAlive.cpp:15:14
    #4 0x7244eb986bc0 in reset /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:302:7
    #5 0x7244eb986bc0 in operator= /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:272:5
    #6 0x7244eb986bc0 in mozilla::dom::(anonymous namespace)::XpcomContentParentKeepAlive::cycleCollection::Unlink(void*) /gecko/dom/ipc/UniqueContentParentKeepAlive.cpp:98:19
    #7 0x7244e21168ec in nsCycleCollector::CollectWhite() /gecko/xpcom/base/nsCycleCollector.cpp:3270:26
    #8 0x7244e211aa1d in nsCycleCollector::Collect(mozilla::CCReason, ccIsManual, JS::SliceBudget&, nsICycleCollectorListener*, bool) /gecko/xpcom/base/nsCycleCollector.cpp:3678:26
    #9 0x7244e211a0ca in nsCycleCollector::ShutdownCollect() /gecko/xpcom/base/nsCycleCollector.cpp:3585:20
    #10 0x7244e211d166 in nsCycleCollector::Shutdown(bool) /gecko/xpcom/base/nsCycleCollector.cpp:3917:5
    #11 0x7244e211f7ab in nsCycleCollector_shutdown(bool) /gecko/xpcom/base/nsCycleCollector.cpp:4250:18
    #12 0x7244e2347275 in mozilla::ShutdownXPCOM(nsIServiceManager*) /gecko/xpcom/build/XPCOMInit.cpp:737:3
    #13 0x7244ee575613 in ScopedXPCOMStartup::~ScopedXPCOMStartup() /gecko/toolkit/xre/nsAppRunner.cpp:1992:5
    #14 0x7244ee586ed0 in operator() /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:460:5
    #15 0x7244ee586ed0 in reset /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:302:7
    #16 0x7244ee586ed0 in mozilla::UniquePtr<ScopedXPCOMStartup, mozilla::DefaultDelete<ScopedXPCOMStartup>>::operator=(std::nullptr_t) /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:272:5
    #17 0x7244ee586525 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /gecko/toolkit/xre/nsAppRunner.cpp:6138:16
    #18 0x7244ee5874e3 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /gecko/toolkit/xre/nsAppRunner.cpp:6174:21
    #19 0x5c6f6e67a1e4 in do_main /gecko/browser/app/nsBrowserApp.cpp:232:22
    #20 0x5c6f6e67a1e4 in main /gecko/browser/app/nsBrowserApp.cpp:464:16
    #21 0x724504796d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #22 0x724504796e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #23 0x5c6f6e599bb8 in _start (/home/worker/builds/m-c-20250208091603-fuzzing-asan-opt/firefox+0xd3bb8) (BuildId: b05c56e37c14a1419ec6d2aa0bc6a00c5a13f19e)

==55884==Register values:
rax = 0x0000000000000000  rbx = 0x00005070001726f0  rcx = 0x000000000000003f  rdx = 0x00005c6f6f1a4c00  
rdi = 0x0000507000172700  rsi = 0x0000000000001898  rbp = 0x00007ffdaf51f750  rsp = 0x00007ffdaf51f720  
 r8 = 0x0000000000001890   r9 = 0x0000000000000002  r10 = 0x00007fffffffff01  r11 = 0x05504b6186f2fa01  
r12 = 0x00000fffb5ea3ef4  r13 = 0x0000000000000000  r14 = 0x0000000000000000  r15 = 0x0000000000000000  
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /gecko/dom/ipc/ContentParent.cpp:1655:34 in mozilla::dom::ContentParent::AsyncSendShutDownMessage()

This has also been reported by live site testing.

I'm guessing this was caused by bug 1943648.

Keywords: regression
Regressed by: 1943648

Bug 1942128 also looks suspicious to me. That touched the keep alive stuff.

Actually, it even added XpcomContentParentKeepAlive which is in the stack.

Regressed by: 1942128
No longer regressed by: 1943648
Flags: needinfo?(nika)

Set release status flags based on info from the regressing bug 1942128

After the changes in bug 1942128, it is now possible for the final
KeepAlive for a process to be destroyed very late during shutdown,
during final cycle-collection. In this case, the main thread's event
target would already be dead, and the call to dispatch in
AsyncSendShutDownMessage crashes with a null pointer.

This adds some basic checks to make sure the ShutDownProcess function
will have an effect before trying to dispatch. This should ensure that
the dispatch is skipped when the actor is already dead, which will be
the case late during shutdown.

Assignee: nobody → nika
Status: NEW → ASSIGNED

As a side-note. I do find it somewhat surprising that the test case from comment 0 is apparently causing the "inference" process to be started, and (presumably translations?) to be initialized and start running. Seems a bit inefficient to start the entire translations engine for that case. Perhaps that's something inference folks should be aware of?

Flags: needinfo?(nika) → needinfo?(enordin)

I don't think this is Translations.

Here is a profile of the test case searching for Translations markers in the inference process (there are none):

I do see markers from ML, particularly for MLEngine:GetInferenceProcessInfo

:tarek, can you audit this and ensure that the MLEngine isn't being initialized in a context where it doesn't need to be?

Flags: needinfo?(enordin) → needinfo?(tziade)
Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/50128bea153c Skip dispatching AsyncSendShutDownMessage if already shut(ting) down, r=ipc-reviewers,mccr8

(In reply to Erik Nordin [:nordzilla] from comment #8)

I don't think this is Translations.

Here is a profile of the test case searching for Translations markers in the inference process (there are none):

I do see markers from ML, particularly for MLEngine:GetInferenceProcessInfo

:tarek, can you audit this and ensure that the MLEngine isn't being initialized in a context where it doesn't need to be?

I can see actual inference calls in that profile -- we don't have (yet) more markers. We have a couple of features that will start the inference process early on when Firefox starts. (autofill and suggest) - we will have a look to see if anything is abnormal

Verified bug as reproducible on mozilla-central 20250211214755-7f5fd9b4d345.
The bug appears to have been introduced in the following build range:

Start: 90f0b0004226da5025acc6f9b45d2d2371bf71cb (20250207205556)
End: b95de61de7638546897604c446045ce42782b388 (20250207205658)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=90f0b0004226da5025acc6f9b45d2d2371bf71cb&tochange=b95de61de7638546897604c446045ce42782b388

Whiteboard: [fuzzblocker] → [fuzzblocker][bugmon:bisected,confirmed]
Status: ASSIGNED → RESOLVED
Closed: 7 days ago
Resolution: --- → FIXED
Target Milestone: --- → 137 Branch
Flags: needinfo?(tziade)

Verified bug as fixed on rev mozilla-central 20250212093207-11a45cb6835c.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Status: RESOLVED → VERIFIED
Keywords: bugmon
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: