Open Bug 1835275 Opened 1 year ago Updated 2 months ago

[Intel Arc A770] rendertexturehost tab crash without crash report (Nightly: WR/EGL/fluxbox X11/Intel, VAAPI force-enabled)

Categories

(Core :: Graphics, defect, P3)

Firefox 115
x86_64
Linux
defect

Tracking

()

REOPENED
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 --- affected
firefox114 --- disabled
firefox115 --- disabled
firefox116 --- wontfix
firefox117 --- wontfix
firefox118 --- fix-optional

People

(Reporter: zlice555, Assigned: sotaro)

References

(Regression)

Details

(Keywords: crash, regression)

Attachments

(5 files, 1 obsolete file)

Attached file about-support.txt

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0

Steps to reproduce:

Browser misc sites like Twitch.tv or redfin.com

Actual results:

Random tab crashes. Got this by running from terminal. Didn't get a trace

[GFX1-]: unexpected remote texture size: Size(0,0) expected: Size(256,256)
[GFX1-]: Failed to get RenderTextureHost for extId:32844
[Parent 20328, IPC I/O Parent] WARNING: Message needs unreceived descriptors channel:7feb16998ee0 message-type:11665413 header()->num_handles:1 num_fds:0 fds_i:0: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:489
Exiting due to channel error.

Expected results:

No crash

The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Widget: Gtk
Product: Firefox → Core

Thanks for the report!
Can this crash be prevented by setting webgl.out-of-process.async-present.force-sync to true on about:config and restarting Firefox? (bug 1831548)

Keywords: crash
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Summary: rendertexturehost tab crash → rendertexturehost tab crash without crash report (Nightly: WR/EGL/fluxbox X11/Intel, VAAPI force-enabled)

lol ya pretty sure that's it. force-sync true and i can look around for several minutes, turn it off and the tab crashes looking at houses on redfin almost instantly.

should be good to mark as dupe. thanks!

Status: UNCONFIRMED → RESOLVED
Closed: 1 year ago
Duplicate of bug: 1831548
Resolution: --- → DUPLICATE

I guess I spoke too soon? Still getting tab crashes, just not as often. Will try to capture terminal output =/

Status: RESOLVED → VERIFIED

(In reply to zlice from comment #4)

I guess I spoke too soon? Still getting tab crashes, just not as often. Will try to capture terminal output =/

To be absolutely sure: Please reboot after setting webgl.out-of-process.async-present.force-sync to true.

I have when I switch between true/false. (Obviously had to restart FF to run from terminal too)

Try rebooting Linux after the pref change in case the problem is somehow partly outside of Firefox.

Ahhh. Would a drop_cache 3 work ?

echo 3 > /proc/sys/vm/drop_caches && sync

Testing other things (restarting X several times) I have noticed FF thinks it is still running and opening a URI gives the "another process is running" message. Even though I will run FF once, close and give it time, then mess with things. Which does require full reboot iirc

Went a day without a crash so force-sync looks like its fixing for now. Thanks!

I saw that bug was resolved and disabled force-sync and seem to be getting tab crashes again.

116.0a1 (2023-06-16)

Still seems to be crashing on the same sites without the force-sync item set to true.

Not sure how dupes and bugs work, should I open another bug?

Status: VERIFIED → REOPENED
Depends on: 1832480
No longer duplicate of bug: 1831548
Ever confirmed: true
Keywords: regression
Regressed by: 1829052
Resolution: DUPLICATE → ---
See Also: → 1831548
Blocks: 1832480
No longer depends on: 1832480

Set release status flags based on info from the regressing bug 1829052

:sotaro, since you are the author of the regressor, bug 1829052, could you take a look?

For more information, please visit BugBot documentation.

Flags: needinfo?(sotaro.ikeda.g)
Assignee: nobody → sotaro.ikeda.g
Flags: needinfo?(sotaro.ikeda.g)

(In reply to zlice from comment #0)

Steps to reproduce:

Browser misc sites like Twitch.tv or redfin.com

The step was not clear for me. I tried randomly to Twitch.tv or redfin.com, but the crash did not happen. But I could reproduce the Bug 1839314. Then I am going to look into Bug 1839314 for now.

It can take some time. Redfin is easier to reproduce for me, just zooming in/out or moving around with filters on houses seems to crash faster than twitch. It looks like the new bug has a similar reproduce step, just have to try several times. Thanks, I'll follow that!

Depends on: 1839314

Hi zlice, can you re-check if the problem is addressed with latest nightly? Thank you.

Flags: needinfo?(zlice555)
See Also: → 1841380

Sweet, that didn't take long lol. Ya still crashing, just applied updates and restarted nightly.

Flags: needinfo?(zlice555)
Component: Widget: Gtk → Graphics

Set release status flags based on info from the regressing bug 1829052

Similar to Bug 1841380 Comment 11, during visiting redfin.com, I saw cases that there were cases that more than 2000 DrawTargetWebgls and Framebuffers were created. Though it did not cause out of fd problem for me. But I wonder if it might cause the problem with some drivers.

fd count became very large easily for me when pref widget.dmabuf.force-enabled was true.

Depends on: 1841380

Hi zlice, can you check if the problem is addressed with latest nightly?

Flags: needinfo?(zlice555)

Ya =/ didn't take long for a crash. Saw big black/gray checkered squares for a split second this time before the tab crash page.

Built from https://hg.mozilla.org/mozilla-central/rev/196cda3a105202c8969a926a0637db0e0014c07d

Flags: needinfo?(zlice555)

Hi zlice, can you check if the following address the problem for you?

  • Set pref gfx.canvas.accelerated.max-draw-target-count = 20 from about:config
  • Restart firefox.

Can you also check if the following address the problem for you?

  • Set pref widget.dmabuf-webgl.enabled = false from about:config
  • Restart firefox.
Flags: needinfo?(zlice555)

I turned off the previous force-sync fix and am having trouble getting it at all with a quick 2min test, let alone with either of those 2 options. I'll leave it off and see how it goes this week. If it pops up I'll set 1 of those.

I left these custom options off yesterday and had only 1 tab crash and it took a while to happen. Previously it was very easy to poke around redfin to a crash. After that crash I set dmabuf false and didn't see any more crashes throughout the day. I can't really say if dmabuf or draw-target-count have an effect since it seems like it was harder to trigger to begin with. I have draw-target-count at 20 right now, if I see any crashes I'll let you know.

Flags: needinfo?(zlice555)

Great! Thank you.

Did have draw-target-count crash with 20

Thank you for checking! Hmm, async RemoteTexture seemed to have the problem of "out of file descriptors" with IntelArc A770 Graphics with dmabuf enabled.

dmabuf could also be used for hardware video decoding. It might also be related to the problem.

The severity field is not set for this bug.
:bhood, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(bhood)
Severity: -- → S3
Flags: needinfo?(bhood)
Priority: -- → P2

I saw a comment on https://bugzilla.mozilla.org/show_bug.cgi?id=1839314 about nightly being fixed. But without any of the about:config variables set 117.0a1 (2023-07-29) is still crashing on redfin.

Depends on: 1845697

Hi zlice, can you check again if the problem is addressed with latest nightly? I wonder if Bug 1845697 addressed the problem. Thank you.

Flags: needinfo?(zlice555)

Still crashing =/

Flags: needinfo?(zlice555)

Thank you for checking. hmm.

No longer disabled since bug 1832480 (116). And bug 1777430 (115) shipped hardware decoding, it might or might not be related.

(In reply to zlice from comment #31)

Still crashing =/

Can the crash be prevented by disabling VAAPI and restarting Firefox?
media.hardware-video-decoding.enabled=false
media.hardware-video-decoding.force-enabled=false
media.ffmpeg.vaapi.enabled=false

Still happens without vaapi

Actually had my auto-cfg set to vaapi true, need to re-test.

That was quick. Without vaapi there isn't a 'tab crash' but a clear freeze on redfin

(see attached)

the left is not loading or moving and the right scrolls instead of the left zooming. No pictures or loading taking place

Attached image freeze.png

Would setting webgl.threadsafe-gl.force-disabled to true and restarting Firefox help?

disable? not enable? is that similar to force-sync?

I can try later

(That feature/assumption is the opposite of what it says. If THREADSAFE_GL is enabled (default), different threads are used for WebRender and WebGL, which is problematic (bug 1845765, bug 1847822, GLX: bug 1777849 comment 21). If THREADSAFE_GL is disabled (default for Nouveau + proprietary Nvidia), then GL is used threadsafe by using the same thread for WebRender and WebGL.)

Got it. Did some short testing with

webgl.threadsafe-gl.force-disabled=true
media.hardware-video-decoding.enabled=true
media.hardware-video-decoding.force-enabled=false
media.ffmpeg.vaapi.enabled=true

didn't see anything weird happen. Leaving threadsafe disabled=true and I'll see if anything 'crashes' or I hit a page freeze

fwiw with those settings over the past few days i have noticed actual tab crashes that want to send a report, no freezes or non-report crashes.

:stransky, is there a case that hardware decoding with dma buf consumes a lot of file descriptors?

Flags: needinfo?(stransky)

(In reply to Sotaro Ikeda [:sotaro] from comment #43)

:stransky, is there a case that hardware decoding with dma buf consumes a lot of file descriptors?

The code is optimized to close unused file descriptors so under normal conditions it should work fine (unless there's a bug somewhere).

As this is Intel(R) Arc(tm) hardware which is new and a bit rare I'd expect we may see a driver bug or so. The code is the same for all Intel devices but we don't see such reports for regular Intel ones.

zlice, can you test different VA-API client like mpv player or so? run

mpv --hwdec=vaapi test_clip

you can use direct YT url here.

You can also flip media.ffmpeg.vaapi.force-surface-zero-copy pref to '1' to match firefox and mpv VA-API playback mode and test Firefox then.

Thanks.

Flags: needinfo?(stransky) → needinfo?(zlice555)
Summary: rendertexturehost tab crash without crash report (Nightly: WR/EGL/fluxbox X11/Intel, VAAPI force-enabled) → [Intel Arc A770] rendertexturehost tab crash without crash report (Nightly: WR/EGL/fluxbox X11/Intel, VAAPI force-enabled)

If you can reproduce it with mpv please file a bug at Mesa (https://gitlab.freedesktop.org/mesa/mesa/-/issues).

intel-gpu-top shows hwdec working with mpv and firefox.

i have filed bugs for intel's 'media-driver' for other issues before, but those were reproducible errors. and this started happening in firefox a bit before i posted this bug. before that everything was working fine and if there were crashes they were able to be sent to firefox with a bug-report-box popup. there haven't been any changes in ffmpeg or intel's 'media-driver' recently.

Flags: needinfo?(zlice555)
Attached image mpvvaapi.png

ofc the screenshot i took froze at the wrong spot, but the 'video' part has usage while playing back video w/ hwaccel

Depends on: 1848171

Hi zlice, can you check again if the problem still happens with latest nightly? I wonder if bug 1848171 might affect to the problem.

Flags: needinfo?(zlice555)

Still crashing =/

Flags: needinfo?(zlice555)

Hmm, thank you for checking.

(In reply to Martin Stránský [:stransky] (ni? me) from comment #44)

As this is Intel(R) Arc(tm) hardware which is new and a bit rare I'd expect we may see a driver bug or so. The code is the same for all Intel devices but we don't see such reports for regular Intel ones.

It might be better to block DMABUF with mesa and with Intel Arc A.

Flags: needinfo?(zlice555)

Ended up doing the freezing thing on redfin like disabling vaapi did

attaching screenshots, looks like dma disabled in about:support and only webgl dma enabled in about:config

Flags: needinfo?(zlice555)
Attached image dmabuf-support.png
Attached image dma-cfg.png

(In reply to zlice from comment #55)

Ended up doing the freezing thing on redfin like disabling vaapi did

attaching screenshots, looks like dma disabled in about:support and only webgl dma enabled in about:config

Thank you for checking! Can you explain more about "like disabling vaapi did" part?

Flags: needinfo?(zlice555)

https://bugzilla.mozilla.org/show_bug.cgi?id=1835275#c36 the page freezes but there's no indication of a crashed tab, just blank squares where things should be and no loading or clicking actions

Flags: needinfo?(zlice555)

Is the tab crash reproducible with mozregression?
$ pip3 install mozregression
$ ~/.local/bin/mozregression --good 2023-04-01 --bad 2023-05-26 -P stdout -a https://twitch.tv -a https://redfin.com

i have swapped back to my amd gpu for the time being. not sure when/if i'll go back to the intel card as it seems they have very little linux priority (xe driver which may or may not have vaapi support apparently, vulkan sparse residency memory, av1 decode doesnt work still afaik)

Depends on: 1851377

Hi zlice, can you check again if the problem is addressed with latest nightly if you have a time? Thank you/

Flags: needinfo?(zlice555)
Priority: P2 → P3

Hey. Sorry like I said in the last comment I switched GPUs back. The Intel GPU driver on Linux has other issues. If I get time i can try to swap for a bit to test, not sure when that'd be.

I switched back to my Intel GPU and didn't get any issues for some short testing.

I'll use it for a few and see if I get any 'silent' tab crashes.

Flags: needinfo?(zlice555)

Great! Thank you for checking.

no problem. fwiw i havent seen it happen on twitch. double checked my about:config and it looks like the same settings i had before.

Attachment #9350576 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: