Crash in [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | webrender_bindings::bindings::wr_state_new]
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
People
(Reporter: lizzard, Assigned: nical)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, Whiteboard: [tbird crash])
Crash Data
Attachments
(1 file)
This bug is for crash report bp-6623cfde-df3d-473a-ba11-96c2c0190228.
Looks like this started showing up in the 20190222081112 nightly builds.
Top 10 frames of crashing thread:
0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 mozglue.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:51
2 xul.dll static void gkrust_shared::oom_hook::hook toolkit/library/rust/shared/lib.rs:255
3 xul.dll static void std::alloc::rust_oom src/libstd/alloc.rs:220
4 xul.dll static void alloc::alloc::handle_alloc_error src/liballoc/alloc.rs:234
5 xul.dll struct webrender_bindings::bindings::WrState* webrender_bindings::bindings::wr_state_new gfx/webrender_bindings/src/bindings.rs:1887
6 xul.dll mozilla::wr::DisplayListBuilder::DisplayListBuilder gfx/webrender_bindings/WebRenderAPI.cpp:650
7 xul.dll mozilla::layers::WebRenderLayerManager::EndTransactionWithoutLayer gfx/layers/wr/WebRenderLayerManager.cpp:259
8 xul.dll nsDisplayList::PaintRoot layout/painting/nsDisplayList.cpp:2615
9 xul.dll nsLayoutUtils::PaintFrame layout/base/nsLayoutUtils.cpp:3780
Updated•6 years ago
|
Comment 1•6 years ago
|
||
I wonder if this is caused by us OOMing with a really big display list
Comment 2•6 years ago
|
||
Nical, want to take a look?
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 3•6 years ago
|
||
The majority of these OOM crashes are caused by a failure to pre-allocate a display list buffer of less than 1.5 MB which isn't that high. It would be nice to have access to Vec::try_reserve, but that would likely just move the signature elsewhere without affecting the overall crash volume.
Asking for a MB of contiguous memory isn't unreasonable but large enough that some of our OOM crash signatures will move there.
There are a few reports with bigger OOM alloc size (39 MB and 18 MB in the last 7 days).
If reducing this signature is really important we could think about:
- We don't really need this data buffer to be contiguous. We could get by with a list of buffers which would be easier to allocate if the address space is very fragmented.
- We talked about implementing a "data pipe" for WebRender a while ago with the idea of serializing directly into shared memory instead of using an intermediate data buffer like this.
- It wouldn't hurt to compress the representation of the serialized displaylist a bit more, but I don't think it would make the crash volume go down that much (although it might be worth doing if only for performance reasons). For example our display lists have a lot of
ColorF
in them. Normalized u16 per channel should be more than enough precision and save about 1 KB on some of the pages I measured.
But really I doubt it will make a big difference in the crash volume overall. The browser has a tough time staying alive when it can't allocate 1MB.
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment 4•6 years ago
|
||
Nicolas, is there a way to mitigate part of these crashes? My worry is that we ship webrender v1 with 67 so the crash volume may be significant in the 67 release.
Assignee | ||
Comment 5•6 years ago
|
||
Gankro is currently working on a set of changes that will reduce the average size of display lists (the failing allocation in this signature), I don't have the bug number handy, sorry.
Other than that there isn't a lot we can do as we are basically OOM'ing on kinda-large-but-not-unreasonably-so temporary allocations, and this signature is really mostly stealing OOM crashes from other existing signatures (the browser don't usually stay alive very long when we can't allocate 1MB), so even with Gankro's changes, chances are that the overall crash volume will remain the same even if some portion of these OOMs will move back to other signatures.
Comment 6•6 years ago
|
||
Thanks nical for the detailed explanation, marking as wontfix for beta as there isn't anything actionable for us to do in the 67 time frame.
Comment 7•6 years ago
|
||
I'm a little concerned about the OOM situation still. I was just looking at the nvidia dashboard and according to the graphs we're OOMing almost 400% with WR enabled vs disabled on nightly, which is pretty bad. On Beta it's ~135% so it seems like something regressed on Nightly.
Comment 8•6 years ago
|
||
(Moving the general OOM discussion to bug 1546671 so we can leave this one focused on the specific stack that includes wr_state_new
)
Comment 9•5 years ago
|
||
Bulk change of P3 carryover bugs to wontfix for 68.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Reporter | ||
Comment 10•5 years ago
|
||
Crash volume fairly low, and this is set to P3.
Marking fix-optional to remove this from weekly regression triages.
Updated•5 years ago
|
Comment 12•5 years ago
•
|
||
A user posted about experiencing this bug here: https://www.reddit.com/r/firefox/comments/f4eiy5/is_it_normal_that_my_browser_crashes/ in case you need someone to try to reproduce issues.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 14•4 years ago
|
||
Comment 15•4 years ago
|
||
Comment 16•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Comment 17•4 years ago
|
||
The patch landed in nightly and beta is affected.
:nical, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 18•4 years ago
|
||
I should have set the right flags. The patch will hopefully reduce the amount of OOMs that land here but it won't fix it entirely, so it's good to keep the bug open as new crashes will come in.
I don't think it is worth uplifting unless we find that it has had a significant impact on overall OOM crashes. In practice I suspect that a portion of the OOMs from this signature that this patch avoids will move to another signature where we make siilarly large-ish allocations.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 20•3 years ago
|
||
Looks like the signature changed in 92.
Comment 21•3 years ago
|
||
(In reply to Julien Cristau [:jcristau] from comment #20)
Looks like the signature changed in 92.
Opened https://github.com/mozilla-services/socorro/pull/5849 to address that.
Updated•3 years ago
|
Comment 22•2 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:nical, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 23•2 years ago
|
||
Not much to do for this OOM signature. We could silence it by using a fallible allocation when pre-allocating the display list, but it won't affect the crash volume.
Updated•2 years ago
|
Comment 24•2 years ago
|
||
The severity field for this bug is relatively low, S3. However, the bug has 3 duplicates.
:nical, could you consider increasing the bug severity?
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•2 years ago
|
Comment 25•6 months ago
|
||
Closing because no crashes reported for 12 weeks.
Updated•5 months ago
|
Description
•