Closed Bug 1831548 Opened 1 year ago Closed 1 year ago

async RemoteTexture + accelerated Canvas on Linux: Tab crash (without crash report) occurs after a while on HTML5 Fish Bowl test page

Categories

(Core :: Graphics, defect)

Firefox 114
Desktop
Linux
defect

Tracking

()

VERIFIED FIXED
116 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox113 --- unaffected
firefox114 --- disabled
firefox115 --- disabled
firefox116 --- verified

People

(Reporter: csasca, Assigned: sotaro)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression, Whiteboard: [sp3])

Attachments

(5 files, 2 obsolete files)

Attached video tab crash fishbowl.webm

Found in

  • Firefox 114.0a1

Affected versions

  • Firefox 114.0a1

Tested platforms

  • Affected platforms: Ubuntu 22.04
  • Unaffected platforms: macOS 13, Windows 11

Steps to reproduce

  1. Launch Firefox
  2. Access this test page
  3. Select auto / 2000 fish and wait a bit

Expected result

  • The tab crash doesn't occur

Actual result

  • A tab crash occurs after a while

Regression range

  • Will see for a regression, 113 doesn't seem to be affected

Additional notes

  • The issue can be seen in the attachment
  • Firefox 113 and 114 has different behaviors when selection auto. In 114 it will load for example ~300 fish and when the fps drops the number of fish stays still, while 114 will report 60fps constantly until it will load ~2000 or past then the tab will crash (seems that the fps suffered a bit when I was recording the tab crash)

Reproduced on KDE Wayland, Debian Testing.
My whole desktop slowed down (mouse pointer was frozen sometimes) and then I got a tab crash without crash report.

mozregression --good 2023-01-05 --bad 2023-05-05 -a https://testdrive-archive.azurewebsites.net/Performance/FishBowl/

28:53.52 INFO: Last good revision: 54334826f02ea9aea488b9b35e44409d9851e414
28:53.52 INFO: First bad revision: 2b90b458178fa4de234b11771cb670c65c0cea03
28:53.52 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=54334826f02ea9aea488b9b35e44409d9851e414&tochange=2b90b458178fa4de234b11771cb670c65c0cea03

2b90b458178fa4de234b11771cb670c65c0cea03 sotaro — Bug 1829052 - Enable async RemoteTexture on nightly except Android r=gfx-reviewers,lsalzman

Edit: Also reproducible Asan Nightly, but it doesn't create a crash report.

$ firefox-asan/firefox -P fishbowlasan
libva info: VA-API version 1.17.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_17
libva info: va_openDriver() returns 0
Unsupported modifier, resource creation failed.
Unsupported modifier, resource creation failed.
Unsupported modifier, resource creation failed.
Unsupported modifier, resource creation failed.
Unsupported modifier, resource creation failed.
Unsupported modifier, resource creation failed.
[Parent 24751, IPC I/O Parent] WARNING: Message needs unreceived descriptors channel:61200005df40 message-type:11599877 header()->num_handles:1 num_fds:0 fds_i:0: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:489
Exiting due to channel error.

Tab crash does not occur with async RemoteTexture disabled:
mozregression --launch 2023-05-05 --pref webgl.out-of-process.async-present.force-sync:true -a https://testdrive-archive.azurewebsites.net/Performance/FishBowl/

Blocks: 1804233
Regressed by: 1829052
Summary: Tab crash occurs after a while on HTML5 Fish Bowl test page → async RemoteTexture: Tab crash (without crash report) occurs after a while on HTML5 Fish Bowl test page

Tab crash also occurs with dmabuf webgl disabled:
mozregression --launch 2023-05-05 --pref widget.dmabuf-webgl.enabled:false -a https://testdrive-archive.azurewebsites.net/Performance/FishBowl/
MOZ_ENABLE_WAYLAND=0 mozregression --launch 2023-05-05 --pref widget.dmabuf-webgl.enabled:false -a https://testdrive-archive.azurewebsites.net/Performance/FishBowl/

Tab crash does not seem to occur if accelerated canvas is disabled:
mozregression --launch 2023-05-05 --pref gfx.canvas.accelerated:false -a https://testdrive-archive.azurewebsites.net/Performance/FishBowl/
MOZ_ENABLE_WAYLAND=0 mozregression --launch 2023-05-05 --pref gfx.canvas.accelerated:false -a https://testdrive-archive.azurewebsites.net/Performance/FishBowl/

Is this an fd exhaustion?

Blocks: gpu-canvas
Summary: async RemoteTexture: Tab crash (without crash report) occurs after a while on HTML5 Fish Bowl test page → async RemoteTexture + accelerated Canvas on Linux: Tab crash (without crash report) occurs after a while on HTML5 Fish Bowl test page

:sotaro, since you are the author of the regressor, bug 1829052, could you take a look?

For more information, please visit BugBot documentation.

Flags: needinfo?(sotaro.ikeda.g)
Assignee: nobody → sotaro.ikeda.g
Flags: needinfo?(sotaro.ikeda.g)

:csasca, can you attach about:support and crash report to this bug? I could not reproduce the problem on Ubuntu 22.04.

Flags: needinfo?(catalin.sasca)
Attached file about:support 114

Yes Sotaro, here's the about:support info. Unfortunately crash reports aren't generated for this particular tab crash (as mentioned by Darkspirit in Comment 1). If there is any other way to capture a tab crash error please let me know. Thanks!

Flags: needinfo?(csasca)
Attachment #9332152 - Attachment mime type: application/octet-stream → text/plain

Correction: You need Hardware WebRender, accelerated Canvas, async RemoteTexture.

Whiteboard: [sp3]
Blocks: 1832480

From the following, the problem might happen by out of file descriptor.

[Parent 24751, IPC I/O Parent] WARNING: Message needs unreceived descriptors channel:61200005df40 message-type:11599877 header()->num_handles:1 num_fds:0 fds_i:0: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:489
Exiting due to channel error.

I'm going to yank this to S2 because this is a crash that does not generate a crash report. if it's happening in the field we're blind to it.

Severity: S4 → S2
Duplicate of this bug: 1835275

Can you run with MOZ_LOG="Dmabuf:5" to see how dmabuf is utilized and if we fail to create/release one?

Attached patch patch - Add log (obsolete) — Splinter Review

When fish count was 10, log out of RecvDispatchCommands() was like the following.

WebGLParent::RecvDispatchCommands()_E shmemBytes 6816
RemoteTextureMap::PushTexture() aTextureId 980 aOwnerId 1 **************************************
WebGLParent::RecvDispatchCommands()_X

When fish count was 2000, log out of RecvDispatchCommands() was like the following. command had a lot of DrawArraysInstanced calls.

WebGLParent::RecvDispatchCommands()_E shmemBytes 100000
WebGLParent::RecvDispatchCommands()_X
WebGLParent::RecvDispatchCommands()_E shmemBytes 100000
WebGLParent::RecvDispatchCommands()_X
WebGLParent::RecvDispatchCommands()_E shmemBytes 99984
WebGLParent::RecvDispatchCommands()_X
WebGLParent::RecvDispatchCommands()_E shmemBytes 99968
WebGLParent::RecvDispatchCommands()_X
WebGLParent::RecvDispatchCommands()_E shmemBytes 100000
WebGLParent::RecvDispatchCommands()_X
WebGLParent::RecvDispatchCommands()_E shmemBytes 79984
RemoteTextureMap::PushTexture() aTextureId 1202 aOwnerId 1 **************************************
WebGLParent::RecvDispatchCommands()_X

Attached patch patch - Add logSplinter Review

Added file descriptor limit and current file descriptor count.

Attachment #9338212 - Attachment is obsolete: true

(In reply to Catalin Sasca, Desktop QA [:csasca] from comment #0)

Created attachment 9331806 [details]
tab crash fishbowl.webm

Found in

  • Firefox 114.0a1

Affected versions

  • Firefox 114.0a1

Tested platforms

  • Affected platforms: Ubuntu 22.04

Hmm, I could not reproduce the problem on my Ubuntu 22.04 PC :(

Flags: needinfo?(csasca)
Attached file tab crash gfx

Sure thing, here's the log (bottom of the text where the tab crash happened) with the provided build.
One other thing I saw is that by selecting the dedicated Nvidia gtx 960M gpu from the Nvidia X server, I wasn't able to reproduce the tab crash either and the performance was very much stable even above 2000 fish. As soon as I selected the integrated Intel gpu, the performance went down and the tab crash is present. Maybe that's why you couldn't reproduce the issue either as you sure have a dedicated gpu in your PC (or much powerful integrated than my laptop's one).
Please let me know if I can help with anything else.

Flags: needinfo?(csasca)

I could reproduce the problem on Ubuntu 20.04 with VMWare with Attachment 9338214 [details] [diff]. When the crash happened, fd count was increased from 133 to 4096.

It seemed that pending WebGL IPC messages seemed to increase fd count.

Attachment #9339219 - Attachment description: WIP: Bug 1831548 - Force sync IPC if there are many flushed cmds between GetFrontBuffer() → Bug 1831548 - Force sync IPC if there are many flushed cmds between GetFrontBuffer()

D181033 addressed the problem for me.

Pushed by sikeda.birchill@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/698d86db485b Force sync IPC if there are many flushed cmds between GetFrontBuffer() r=gfx-reviewers,lsalzman
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 116 Branch

:csasca, can you check if the problem is addressed with latest nightly?

Flags: needinfo?(csasca)

Sure thing. Checked on Firefox 116.0a1 (2023-06-15) on Ubuntu 22.04 and the issue is no longer reproducibe, and now the fps needle work as expected as well and reports the correct fps.

Flags: needinfo?(csasca)

Great! Thank you.

No longer duplicate of this bug: 1835275
See Also: → 1835275

I could still reproduce the crash with multiple windows. Then created Bug 1839314.

Attachment #9340042 - Attachment is obsolete: true
Flags: qe-verify+

I've reproduced this issue using Nightly 116.0a1(2023-06-15) following the STR from Comment 0 on Ubuntu 22.04.
Verified as fixed on the latest Nightly 117.0a1 and Firefox 116.0 versions under same configuration where the issue no longer persists.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: