Closed Bug 1632698 Opened 4 years ago Closed 3 years ago

Crash in [@ webrender::renderer::Renderer::draw_frame] (NVidia driver issue)

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED FIXED
85 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox83 --- disabled
firefox84 --- wontfix
firefox85 --- fixed

People

(Reporter: mccr8, Assigned: aosmond)

References

(Blocks 2 open bugs)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-dd24f92c-53b3-45d6-a7ef-ad7450200422.

Top 10 frames of crashing thread:

0 libxul.so RustMozCrash mozglue/static/rust/wrappers.cpp:17
1 libxul.so mozglue_static::panic_hook mozglue/static/rust/lib.rs:89
2 libxul.so core::ops::function::Fn::call src/libcore/ops/function.rs:72
3 libxul.so std::panicking::rust_panic_with_hook src/libstd/panicking.rs:475
4 libxul.so std::panicking::begin_panic src/libstd/panicking.rs:404
5 libxul.so webrender::renderer::Renderer::draw_frame gfx/wr/webrender/src/renderer.rs
6 libxul.so webrender::renderer::Renderer::render_impl gfx/wr/webrender/src/renderer.rs:3440
7 libxul.so webrender::renderer::Renderer::render gfx/wr/webrender/src/renderer.rs:3192
8 libxul.so wr_renderer_render gfx/webrender_bindings/src/bindings.rs:600
9 libxul.so mozilla::wr::RendererOGL::UpdateAndRender gfx/webrender_bindings/RendererOGL.cpp:155

assertion failed: dimensions.width >= self.max_dynamic_size.width

These started showing up on Linux on the 20200422093542 Nightly build. There are also some crashes on 76 on OSX.

Here's the changesets added in the 20200422093542 build: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=263963426b561b2aa687aeeaeddc4fd93fff9e57&tochange=d8eecc663784c8463af1d2bc3f91f8078c7e1940

Bug 1632002 is in that range. I don't know if these crashes are all happening on "Gen 7.5" or not.

Blocks: wr-stability

23 crashes over the last 3 days.
All on Intel Graphics 630 (mobile)

Crashes here:
https://searchfox.org/mozilla-central/rev/b8fbb6ead517720daf0b0211115f407b4b951c74/gfx/wr/webrender/src/render_target.rs#276

@gw: Could you take a look at this? Side question: Would these asserts trigger in release builds?

Blocks: gfx-triage
Flags: needinfo?(gwatson)
Priority: -- → P1

Appears that some code path is trying to allocate a texture larger than the GPU can support.

This should be easy to fix, but we'll need a repro to find out which code path is triggering this - do we have any URL(s) or other repro steps available?

I think these will trigger in release builds.

Flags: needinfo?(gwatson) → needinfo?(continuation)

No comments on any of the crash reports. There is a URL that looks like a Twitter DM, YouTube, Telegram, https://itsfoss.com/ubuntu-20-04-release-features/ , a GitHub repo, and a chat.mozilla.org room.

Flags: needinfo?(continuation)

I spent some time trying to reproduce (with reduced maximum texture size) on the specific URL above, and also manually creating some test cases that I thought might be able to trigger this, without success so far though.

Seems like possibly all one person, so probably not a p1

Priority: P1 → P3
No longer blocks: gfx-triage
Severity: -- → normal

The severity field is not set for this bug.
:jbonisteel, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jbonisteel)
Severity: normal → S3
Flags: needinfo?(jbonisteel)

FWIW, I was able to reproduce this crash with a similar signature by making the browser's window width extremely large. I did this by moving the window almost off screen, increasing the width of the window, move almost off screen again and repeat. I'm not sure if this is what caused the other instances of this assertion failure.

I don't imagine there is a real use-case for resizing the browser like this, but we should probably do something better than panicking in this case.

Edit: I repro'd this on stable 77 but I can't on 79 nightly, so maybe the browser resizing case is fixed somehow?

(First comment here, so if something's out of sorts, please point me in the right direction.)

I get the same crash rather frequently for a different reason on OS X (called Option::unwrap() on a None value) whenever accessing an internal page (like about:preferences) for the first time after launching FF. Does a separate bug need to be filed for that?

Associated report: 67c7e8a8-58a2-4fd2-9683-9cca90200924

Crash Signature: [@ webrender::renderer::Renderer::draw_frame] → [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame] [@ webrender::renderer::Renderer::draw_frame]

I've been getting this crash every time I resume from sleep on Ubuntu since Nov 10th. First crash encountered at https://crash-stats.mozilla.org/report/index/bd902c30-42ab-42dd-b526-a90000201110

That is Firefox is running, the computer sleeps, I wake it up and Firefox crashes immediately.

See Also: → 1677515
Crash Signature: [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame] [@ webrender::renderer::Renderer::draw_frame] → [@ __memcpy_sse2_unaligned_erms | webrender::renderer::Renderer::draw_frame] [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame] [@ webrender::renderer::Renderer::draw_frame]
Blocks: gfx-triage
Crash Signature: [@ __memcpy_sse2_unaligned_erms | webrender::renderer::Renderer::draw_frame] [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame] [@ webrender::renderer::Renderer::draw_frame] → [@ __memcpy_sse2_unaligned_erms | webrender::renderer::Renderer::draw_frame] [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame]
No longer blocks: wr-stability
Summary: Crash in [@ webrender::renderer::Renderer::draw_frame] → Crash in [@ webrender::renderer::Renderer::draw_frame] (NVidia driver issue)
Blocks: wr-stability
No longer blocks: gfx-triage
Flags: needinfo?(aosmond)

There are a bunch of questions I have looking at this:

  1. Do we use robustness with EGL today? EGL contexts expect the CreateContextFlags::PREFER_ROBUSTNESS to be set in order to configure it so:

https://searchfox.org/mozilla-central/rev/6bb59b783b193f06d6744c5ccaac69a992e9ee7b/gfx/gl/GLContextProviderEGL.cpp#692

We don't appear to ask for it on the EGL path used by Android and Wayland:

https://searchfox.org/mozilla-central/rev/6bb59b783b193f06d6744c5ccaac69a992e9ee7b/gfx/webrender_bindings/RenderThread.cpp#1064
https://searchfox.org/mozilla-central/rev/6bb59b783b193f06d6744c5ccaac69a992e9ee7b/gfx/gl/GLContextProviderEGL.cpp#322

  1. We use NV_robustness_video_memory_purge with GLX but we don't appear to use it with EGL.

https://searchfox.org/mozilla-central/rev/6bb59b783b193f06d6744c5ccaac69a992e9ee7b/gfx/gl/GLContextProviderGLX.cpp#207

  1. I wonder if RendererOGL::CheckGraphicsResetStatus() and RenderCompositor::IsContextLost() are conflicting, and the former swallows the resets that aren't for video memory:

https://searchfox.org/mozilla-central/rev/6bb59b783b193f06d6744c5ccaac69a992e9ee7b/gfx/webrender_bindings/RendererOGL.cpp#259
https://searchfox.org/mozilla-central/rev/6bb59b783b193f06d6744c5ccaac69a992e9ee7b/gfx/webrender_bindings/RenderCompositor.cpp#189

  1. could explain the NV issues, but we should also fix 1) and 2).
Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Flags: needinfo?(aosmond)
Blocks: wr-nv-linux
  1. We use NV_robustness_video_memory_purge with GLX but we don't appear to use it with EGL.

https://searchfox.org/mozilla-central/rev/6bb59b783b193f06d6744c5ccaac69a992e9ee7b/gfx/gl/GLContextProviderGLX.cpp#207

When Bug 1469496 added LOCAL_GL_PURGED_CONTEXT_RESET_NV handling, RenderCompositorEGL did not exit yet. If we want to handle context reset. It seems better to handle it also on EGL.

For example, chromium seems to handle it.

:nical, can you comment to Comment 14?

Flags: needinfo?(nical.bugzilla)

With today's Nightly, I'm also seeing very frequent hangs when entering text in
just about any web app (bmo, gdocs, matrix) and the hangs often end with
this crash.

Perf profile while typing in a Google Doc: https://share.firefox.dev/37BlrA9

Attachment #9191211 - Attachment description: Bug 1632698 - Always do a full context reset for NVIDIA video memory purges. → Bug 1632698 - Better handle device resets when we don't have a GPU process.

I filed bug 1680759 to track the EGL robustness concerns and ni'd nical on that bug.

Flags: needinfo?(nical.bugzilla)
See Also: → 1680759
Crash Signature: [@ __memcpy_sse2_unaligned_erms | webrender::renderer::Renderer::draw_frame] [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame] → [@ __memcpy_sse2_unaligned_erms | webrender::renderer::Renderer::draw_frame] [@ __memmove_avx_unaligned_erms | webrender::renderer::Renderer::draw_frame] [@ __memcpy_ssse3 | webrender::renderer::Renderer::draw_frame]
Pushed by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b98b945b1c0a
Better handle device resets when we don't have a GPU process. r=sotaro,kvark,nical
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch

There has been another crash report even since the patch in bug 1681671 landed. However I note that the report in question:

https://crash-stats.mozilla.org/report/index/3fff07c7-3c95-4ee9-b90a-098650201215

is using EGL. There is no DeviceResetReason recorded in the crash report because robustness isn't turned on for EGL yet, and so we didn't even try to handle the reset.

Firefox 84 is consistently crashing after switching to and returning from the text console.

https://crash-stats.mozilla.org/report/index/e53eef89-b3eb-4e3b-ac85-93f190201215
https://crash-stats.mozilla.org/report/index/52d738db-306c-47ca-9ac8-0550e0201215

I have NVIDIA binary drivers 455.45.01, gfx.webrender.all=true and Firefox is run with MOZ_X11_EGL=1

(In reply to Artem S. Tashkinov from comment #21)

Firefox 84 is consistently crashing after switching to and returning from the text console.

https://crash-stats.mozilla.org/report/index/e53eef89-b3eb-4e3b-ac85-93f190201215
https://crash-stats.mozilla.org/report/index/52d738db-306c-47ca-9ac8-0550e0201215

I have NVIDIA binary drivers 455.45.01, gfx.webrender.all=true and Firefox is run with MOZ_X11_EGL=1

Yep, follow bug 1680759 for progress on that. If it continues to broken after that is fixed/landed, then please/by all means file a new bug :).

(In reply to Andrew Osmond [:aosmond] from comment #22)

Yep, follow bug 1680759 for progress on that. If it continues to broken after that is fixed/landed, then please/by all means file a new bug :).

I'm not sure the bug you're referring to is relevant:

We should probably just assume things work now (on linux) and validate it on Nightly for a train or two.

I have crashes with Firefox 84 right now (haven't had them for years) and the commenter made it sound like the bug wasn't going to be fixed any time soon.

@Artem, we are actively fixing bugs with WebRender and the NVIDIA binary driver. I hope to have whatever problems you are seeing addressed in 86, and you will see that impact since you are forcing on WebRender and EGL in release (most users wouldn't see this because this isn't a configuration we ship to).

Also, I suspect bug 1675453 is part of the issue here, so if you flip media.ffmpeg.dmabuf-textures.disabled to true, the crashes may go away. It would be helpful if you could confirm that in fact :).

Proprietary Nvidia can't take any DMABuf code paths, it doesn't support them (yet?):
ffmpeg-dmabuf-textures is for software-decoded video. A check should be added so that Dmabuf is only used with Mesa.
Neither Dmabuf VAAPI is possible on proprietary Nvidia: bug 1669189
EGL-ffmpeg-dmabuf-textures and EGL-dmabuf-webgl are already enabled although they require bug 1588904 to be fixed.

See Also: → 1683266

I'm hitting a crash with this signature on resume-from-suspend (sleep), and I filed bug 1683266.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: