Closed Bug 1531819 Opened 5 years ago Closed 1 month ago

Crash in [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | webrender_bindings::bindings::wr_state_new]

Categories

(Core :: Graphics: WebRender, defect, P3)

66 Branch
Unspecified
Windows 10
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr78 --- disabled
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- wontfix
firefox83 --- wontfix
firefox84 --- wontfix
firefox85 --- wontfix

People

(Reporter: lizzard, Assigned: nical)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [tbird crash])

Crash Data

Attachments

(1 file)

This bug is for crash report bp-6623cfde-df3d-473a-ba11-96c2c0190228.

Looks like this started showing up in the 20190222081112 nightly builds.

Top 10 frames of crashing thread:

0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 mozglue.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:51
2 xul.dll static void gkrust_shared::oom_hook::hook toolkit/library/rust/shared/lib.rs:255
3 xul.dll static void std::alloc::rust_oom src/libstd/alloc.rs:220
4 xul.dll static void alloc::alloc::handle_alloc_error src/liballoc/alloc.rs:234
5 xul.dll struct webrender_bindings::bindings::WrState* webrender_bindings::bindings::wr_state_new gfx/webrender_bindings/src/bindings.rs:1887
6 xul.dll mozilla::wr::DisplayListBuilder::DisplayListBuilder gfx/webrender_bindings/WebRenderAPI.cpp:650
7 xul.dll mozilla::layers::WebRenderLayerManager::EndTransactionWithoutLayer gfx/layers/wr/WebRenderLayerManager.cpp:259
8 xul.dll nsDisplayList::PaintRoot layout/painting/nsDisplayList.cpp:2615
9 xul.dll nsLayoutUtils::PaintFrame layout/base/nsLayoutUtils.cpp:3780

I wonder if this is caused by us OOMing with a really big display list

Nical, want to take a look?

Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
Priority: -- → P2

The majority of these OOM crashes are caused by a failure to pre-allocate a display list buffer of less than 1.5 MB which isn't that high. It would be nice to have access to Vec::try_reserve, but that would likely just move the signature elsewhere without affecting the overall crash volume.
Asking for a MB of contiguous memory isn't unreasonable but large enough that some of our OOM crash signatures will move there.

There are a few reports with bigger OOM alloc size (39 MB and 18 MB in the last 7 days).

If reducing this signature is really important we could think about:

  • We don't really need this data buffer to be contiguous. We could get by with a list of buffers which would be easier to allocate if the address space is very fragmented.
  • We talked about implementing a "data pipe" for WebRender a while ago with the idea of serializing directly into shared memory instead of using an intermediate data buffer like this.
  • It wouldn't hurt to compress the representation of the serialized displaylist a bit more, but I don't think it would make the crash volume go down that much (although it might be worth doing if only for performance reasons). For example our display lists have a lot of ColorF in them. Normalized u16 per channel should be more than enough precision and save about 1 KB on some of the pages I measured.

But really I doubt it will make a big difference in the crash volume overall. The browser has a tough time staying alive when it can't allocate 1MB.

Flags: needinfo?(nical.bugzilla)
Priority: P2 → P3
Blocks: wr-67
No longer blocks: stage-wr-trains

Nicolas, is there a way to mitigate part of these crashes? My worry is that we ship webrender v1 with 67 so the crash volume may be significant in the 67 release.

Flags: needinfo?(nical.bugzilla)

Gankro is currently working on a set of changes that will reduce the average size of display lists (the failing allocation in this signature), I don't have the bug number handy, sorry.

Other than that there isn't a lot we can do as we are basically OOM'ing on kinda-large-but-not-unreasonably-so temporary allocations, and this signature is really mostly stealing OOM crashes from other existing signatures (the browser don't usually stay alive very long when we can't allocate 1MB), so even with Gankro's changes, chances are that the overall crash volume will remain the same even if some portion of these OOMs will move back to other signatures.

Flags: needinfo?(nical.bugzilla)

Thanks nical for the detailed explanation, marking as wontfix for beta as there isn't anything actionable for us to do in the 67 time frame.

I'm a little concerned about the OOM situation still. I was just looking at the nvidia dashboard and according to the graphs we're OOMing almost 400% with WR enabled vs disabled on nightly, which is pretty bad. On Beta it's ~135% so it seems like something regressed on Nightly.

(Moving the general OOM discussion to bug 1546671 so we can leave this one focused on the specific stack that includes wr_state_new)

Bulk change of P3 carryover bugs to wontfix for 68.

Blocks: wr-68
No longer blocks: wr-67
Blocks: wr-oom
No longer blocks: wr-68

Crash volume fairly low, and this is set to P3.
Marking fix-optional to remove this from weekly regression triages.

Crash Signature: [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | webrender_bindings::bindings::wr_state_new] → [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | webrender_bindings::bindings::wr_state_new] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::…

A user posted about experiencing this bug here: https://www.reddit.com/r/firefox/comments/f4eiy5/is_it_normal_that_my_browser_crashes/ in case you need someone to try to reproduce issues.

Crash Signature: [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | webrender_bindings::bindings::wr_state_new] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | → [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | webrender_bindings::bindings::wr_state_new] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook |
Crash Signature: std::alloc::rust_oom | webrender_api::display_list::DisplayListBuilder::with_capacity] → std::alloc::rust_oom | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | webrender_bindings::bindings::wr_state_new ]
Crash Signature: std::alloc::rust_oom | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | webrender_bindings::bindings::wr_state_new ] → std::alloc::rust_oom | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | webrender_bindings::bindings::wr_state_new ] [@ mozalloc_abort | webrender_bindings::bindings::wr_state_new ]
Crash Signature: std::alloc::rust_oom | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | webrender_bindings::bindings::wr_state_new ] [@ mozalloc_abort | webrender_bindings::bindings::wr_state_new ] → std::alloc::rust_oom | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | webrender_bindings::bindings::wr_state_new ] [@ mozalloc_abort | webrender_bindings::bindings::wr_state_new ] [@ OOM | large | moz…
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e50e60b1dcbc
Don't preallocate more than 0.3MB for WR display lists. r=miko
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch

The patch landed in nightly and beta is affected.
:nical, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(nical.bugzilla)

I should have set the right flags. The patch will hopefully reduce the amount of OOMs that land here but it won't fix it entirely, so it's good to keep the bug open as new crashes will come in.

I don't think it is worth uplifting unless we find that it has had a significant impact on overall OOM crashes. In practice I suspect that a portion of the OOMs from this signature that this patch avoids will move to another signature where we make siilarly large-ish allocations.

Status: RESOLVED → REOPENED
Flags: needinfo?(nical.bugzilla)
Keywords: leave-open
Resolution: FIXED → ---
Status: REOPENED → NEW
Target Milestone: 85 Branch → ---
Crash Signature: mozalloc_abort | webrender_api::display_list::DisplayListBuilder::with_capacity] → mozalloc_abort | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | alloc::raw_vec::RawVec<T>::allocate_in<T>]
Crash Signature: mozalloc_abort | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | alloc::raw_vec::RawVec<T>::allocate_in<T>] → mozalloc_abort | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | alloc::raw_vec::RawVec<T>::allocate_in<T>]
See Also: → 1689956
Crash Signature: [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | webrender_bindings::bindings::wr_state_new] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc:… → [@ mozalloc_abort | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | webrender_bindings::bindings::wr_state_new] [@ OOM | large…

Looks like the signature changed in 92.

Crash Signature: mozalloc_abort | webrender_bindings::bindings::wr_state_new ] [@ OOM | large | mozalloc_abort | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | alloc::raw_vec::RawVec<T>::allocate_in<T>] → mozalloc_abort | webrender_bindings::bindings::wr_state_new ] [@ OOM | large | mozalloc_abort | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | alloc::raw_vec::RawVec<T>::allocate_in<T>] [@ OOM | large…

(In reply to Julien Cristau [:jcristau] from comment #20)

Looks like the signature changed in 92.

Opened https://github.com/mozilla-services/socorro/pull/5849 to address that.

Whiteboard: [tbird crash]

The leave-open keyword is there and there is no activity for 6 months.
:nical, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(nical.bugzilla)

Not much to do for this OOM signature. We could silence it by using a fallible allocation when pre-allocating the display list, but it won't affect the crash volume.

Flags: needinfo?(nical.bugzilla)
Severity: critical → S3

The severity field for this bug is relatively low, S3. However, the bug has 3 duplicates.
:nical, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(nical.bugzilla)

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 4 years ago1 month ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: