crash near null in [@ CanSend]
Categories
(Core :: Graphics: WebGPU, defect, P5)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr115 | --- | unaffected |
| firefox126 | --- | disabled |
| firefox127 | --- | disabled |
| firefox128 | --- | verified |
People
(Reporter: tsmith, Assigned: bradwerth)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: crash, regression, testcase, Whiteboard: [fuzzblocker][bugmon:bisected,confirmed])
Attachments
(2 files, 1 obsolete file)
|
348 bytes,
text/html
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta-
|
Details | Review |
Found while fuzzing m-c 20240423-4630feab4425 (--enable-address-sanitizer --enable-fuzzing)
To reproduce via Grizzly Replay:
$ pip install fuzzfetch grizzly-framework --upgrade
$ python -m fuzzfetch -a --fuzzing -n firefox
$ python -m grizzly.replay.bugzilla ./firefox/firefox <bugid>
==38826==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000011 (pc 0x74f6f60fa2e1 bp 0x7ffc7e655b10 sp 0x7ffc7e655a20 T0)
==38826==The signal is caused by a READ memory access.
==38826==Hint: address points to the zero page.
#0 0x74f6f60fa2e1 in CanSend /builds/worker/workspace/obj-build/dist/include/mozilla/ipc/ProtocolUtils.h:206:45
#1 0x74f6f60fa2e1 in EndComputePass /builds/worker/checkouts/gecko/dom/webgpu/CommandEncoder.cpp:240:17
#2 0x74f6f60fa2e1 in End /builds/worker/checkouts/gecko/dom/webgpu/ComputePassEncoder.cpp:115:14
#3 0x74f6f60fa2e1 in Cleanup /builds/worker/checkouts/gecko/dom/webgpu/ComputePassEncoder.cpp:45:5
#4 0x74f6f60fa2e1 in mozilla::webgpu::ComputePassEncoder::cycleCollection::Unlink(void*) /builds/worker/checkouts/gecko/dom/webgpu/ComputePassEncoder.cpp:16:1
#5 0x74f6eec816dc in nsCycleCollector::CollectWhite() /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp:3161:26
#6 0x74f6eec85c28 in nsCycleCollector::Collect(mozilla::CCReason, ccIsManual, js::SliceBudget&, nsICycleCollectorListener*, bool) /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp:3531:26
#7 0x74f6eec8514a in nsCycleCollector::ShutdownCollect() /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp:3442:20
#8 0x74f6eec87e16 in nsCycleCollector::Shutdown(bool) /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp:3741:5
#9 0x74f6eec8a580 in nsCycleCollector_shutdown(bool) /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp:4067:18
#10 0x74f6eef3aec7 in mozilla::ShutdownXPCOM(nsIServiceManager*) /builds/worker/checkouts/gecko/xpcom/build/XPCOMInit.cpp:718:3
#11 0x74f6ff60f0f2 in XRE_InitChildProcess(int, char**, XREChildData const*) /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:651:16
#12 0x5ea4860c778c in content_process_main /builds/worker/checkouts/gecko/browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
#13 0x5ea4860c778c in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:378:18
#14 0x74f718029d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#15 0x74f718029e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#16 0x5ea485feba98 in _start (/home/user/workspace/browsers/m-c-20240509094442-fuzzing-asan-opt/firefox+0xdca98) (BuildId: 5d6ac57b84088a48be1f0ffb1be1ce03bb936176)
Comment 1•1 year ago
|
||
Verified bug as reproducible on mozilla-central 20240512212637-45d7400ced7e.
The bug appears to have been introduced in the following build range:
Start: ac4906fa040ce76ad5ea8f73caf83045f291b613 (20240126200508)
End: ffdd1c2161a9731cfbb79d08fec23f9c35b8d04f (20240126213944)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ac4906fa040ce76ad5ea8f73caf83045f291b613&tochange=ffdd1c2161a9731cfbb79d08fec23f9c35b8d04f
| Reporter | ||
Updated•1 year ago
|
Comment 2•1 year ago
•
|
||
Brad, based on the regression range it seems like bug 1865921 may be the cause?
Updated•1 year ago
|
| Assignee | ||
Comment 3•1 year ago
|
||
(In reply to Lee Salzman [:lsalzman] from comment #2)
Brad, based on the regression range it seems like bug 1865921 may be the cause?
Huh. I can't explain how that's a regressor here, but I'll see if I can fix it.
| Assignee | ||
Comment 4•1 year ago
|
||
This is happening because, during cycle collection, the parent object CommandEncoder is destroyed before the child object ComputePassEncoder. This is the unlink pattern for all webgpu objects. Is that good? I don't know. One fix for this problem is to destroy child before the parent. I think that's better semantics, so I'll provide a patch that does that.
As an alternative, I'll provide a patch that special-cases the implementations of CommandEncoder::EndComputePass and ::EndRenderPass to check for the existence of the bridge. I'll mark that as WIP until reviewers make judgment on the cycle collection change patch.
| Assignee | ||
Comment 5•1 year ago
|
||
This changes the unlinking order for all webgpu objects. It changes the
process to be depth-first, and to call Unlink/Destroy method on the
children before calling it on the parent. This gives an opportunity for
children to tell their parent about their destruction before the
parent loses the ability to send messages across the bridge.
| Assignee | ||
Comment 6•1 year ago
|
||
Because EndComputePass and EndRenderPass are called during cleanup, it's
necessary for the CommandEncoder to check for a non-null bridge. This is
noted as being necessary in dom/webgpu/ObjectModel.h.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 7•1 year ago
|
||
Set release status flags based on info from the regressing bug 1865921
Comment 9•1 year ago
|
||
| bugherder | ||
Comment 10•1 year ago
|
||
Verified bug as fixed on rev mozilla-central 20240515202418-b7871b7f2c05.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Comment 11•1 year ago
|
||
The patch landed in nightly and beta is affected.
:bradwerth, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox127towontfix.
For more information, please visit BugBot documentation.
Comment 12•1 year ago
|
||
:bradwerth, to add to Comment 11.
Do you want to add a release uplift too?
| Assignee | ||
Comment 13•1 year ago
|
||
Comment on attachment 9401542 [details]
Bug 1896195: Make CommandEncoder check for bridge existence when ending compute passes and render passes.
Beta/Release Uplift Approval Request
- User impact if declined: Users viewing WebGPU content may crash under unusual circumstances.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Adds an early-out to some destruction methods.
- String changes made/needed:
- Is Android affected?: Yes
| Assignee | ||
Comment 14•1 year ago
|
||
(In reply to Donal Meehan [:dmeehan] from comment #12)
:bradwerth, to add to Comment 11.
Do you want to add a release uplift too?
I don't think it's important enough to uplift to release. WebGPU is still an experimental part of Firefox, only Windows users have it enabled by default, and there's not much WebGPU content out there to view. This can ride the trains to release.
Comment 15•1 year ago
•
|
||
(In reply to Brad Werth [:bradwerth] from comment #14)
(In reply to Donal Meehan [:dmeehan] from comment #12)
:bradwerth, to add to Comment 11.
Do you want to add a release uplift too?I don't think it's important enough to uplift to release. WebGPU is still an experimental part of Firefox, only Windows users have it enabled by default, and there's not much WebGPU content out there to view. This can ride the trains to release.
Sounds good
Updated•1 year ago
|
Comment 16•1 year ago
|
||
Comment on attachment 9401542 [details]
Bug 1896195: Make CommandEncoder check for bridge existence when ending compute passes and render passes.
Riding the trains to 128 release
Description
•