Open Bug 1692514 Opened 10 months ago Updated 4 days ago

Crash in [@ OOM | large | mozalloc_abort | webrender::renderer::upload::upload_to_texture_cache]

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

Tracking Status
thunderbird_esr78 --- unaffected
thunderbird_esr91 --- unaffected
firefox-esr78 --- unaffected
firefox85 --- unaffected
firefox86 --- unaffected
firefox87 --- wontfix
firefox88 --- wontfix

People

(Reporter: aryx, Unassigned)

References

(Blocks 2 open bugs, Regression)

Details

(Keywords: crash, regression)

Crash Data

11 crashes on 11 machines, all on Windows so far.

Crash report: https://crash-stats.mozilla.org/report/index/bcb23ef3-0d33-4515-b614-7ec090210206

MOZ_CRASH Reason: MOZ_CRASH()

Top 10 frames of crashing thread:

0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 mozglue.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:51
2 xul.dll gkrust_shared::oom_hook::hook toolkit/library/rust/shared/lib.rs:134
3 xul.dll std::alloc::rust_oom ../e1884a8e3c3e813aada8254edfa120e85bf5ffca//library/std/src/alloc.rs:330
4 xul.dll alloc::alloc::__alloc_error_handler::__rg_oom ../e1884a8e3c3e813aada8254edfa120e85bf5ffca//library/alloc/src/alloc.rs:409
5 xul.dll _rust_alloc_error_handler 
6 xul.dll alloc::alloc::handle_alloc_error ../e1884a8e3c3e813aada8254edfa120e85bf5ffca//library/alloc/src/alloc.rs:363
7 xul.dll webrender::renderer::upload::upload_to_texture_cache gfx/wr/webrender/src/renderer/upload.rs:113
8 xul.dll webrender::renderer::Renderer::update_texture_cache gfx/wr/webrender/src/renderer/mod.rs:2396
9  @0xafb9bfe43f 
Flags: needinfo?(nical.bugzilla)
Blocks: wr-stability
Severity: -- → S3

This can happen when a page has to upload a large amount of texture data within a frame with the batched direct upload code path (Windows). For example lots of large blob images. We end up having to create a large amount of CPU-side staging buffers.

We could improve the situation by triggering the upload earlier. Instead of waiting for all updates to be written into staging buffers to kick the uploads, accumulate content into the staging buffer and trigger the immediate upload as soon as an allocation into the atlas fails, so that we can reuse the staging memory. THis would let us use a single staging memory buffer per texture type instead of some unbounded number depending on the amount of content to upload.

Unfortunately the upload code is written with very tight rust lifetime constraints which make it so that we can't use a new texture with the uploader as soon as we have started uploading because the texture needs to outlive the uploader, so it's tricky continue growing the number of textures in use while we have started the uploads. I think that doing this would require rewriting the upload code entirely, so that the texture uploader is responsible for managing the staging memory buffers and can start uploads earlier.

It's not so bad because the upload code is in a rough shape and a rewrite could take this critical part of webrender to a more maintainable state. On the other hand the crash volume is quite low so this isn't very high in my priority list.

Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(jmathies) → needinfo?(nical.bugzilla)

I went through the commits in that period and I don't see suspicious changes in that area of the code so I suspect it's the OOM rate that increased and it consolidated here.

I'm working on some texture uploads changes that will hopefully decrease the need for allocating large chunks of memory here. It'll probably land in a couple of weeks.

Flags: needinfo?(nical.bugzilla)
You need to log in before you can comment on or make changes to this bug.