1531819 - Crash in [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | webrender_bindings::bindings::wr_state_new]

Liz Henry (:lizzard) (relman/hg->git project)

Reporter

Description

•

6 years ago

This bug is for crash report bp-6623cfde-df3d-473a-ba11-96c2c0190228.

Looks like this started showing up in the 20190222081112 nightly builds.

Top 10 frames of crashing thread:

0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 mozglue.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:51
2 xul.dll static void gkrust_shared::oom_hook::hook toolkit/library/rust/shared/lib.rs:255
3 xul.dll static void std::alloc::rust_oom src/libstd/alloc.rs:220
4 xul.dll static void alloc::alloc::handle_alloc_error src/liballoc/alloc.rs:234
5 xul.dll struct webrender_bindings::bindings::WrState* webrender_bindings::bindings::wr_state_new gfx/webrender_bindings/src/bindings.rs:1887
6 xul.dll mozilla::wr::DisplayListBuilder::DisplayListBuilder gfx/webrender_bindings/WebRenderAPI.cpp:650
7 xul.dll mozilla::layers::WebRenderLayerManager::EndTransactionWithoutLayer gfx/layers/wr/WebRenderLayerManager.cpp:259
8 xul.dll nsDisplayList::PaintRoot layout/painting/nsDisplayList.cpp:2615
9 xul.dll nsLayoutUtils::PaintFrame layout/base/nsLayoutUtils.cpp:3780

Jeff Muizelaar [:jrmuizel]

Updated

•

6 years ago

Blocks: stage-wr-trains

Jeff Muizelaar [:jrmuizel]

Comment 1

•

6 years ago

I wonder if this is caused by us OOMing with a really big display list

Jeff Muizelaar [:jrmuizel]

Comment 2

•

6 years ago

Nical, want to take a look?

Assignee: nobody → nical.bugzilla

Flags: needinfo?(nical.bugzilla)

Jeff Muizelaar [:jrmuizel]

Updated

•

6 years ago

Priority: -- → P2

BugBot [:suhaib / :marco/ :calixte]

Updated

•

6 years ago

Keywords: regression

Nicolas Silva [:nical]

Assignee

Comment 3

•

6 years ago

The majority of these OOM crashes are caused by a failure to pre-allocate a display list buffer of less than 1.5 MB which isn't that high. It would be nice to have access to Vec::try_reserve, but that would likely just move the signature elsewhere without affecting the overall crash volume.
Asking for a MB of contiguous memory isn't unreasonable but large enough that some of our OOM crash signatures will move there.

There are a few reports with bigger OOM alloc size (39 MB and 18 MB in the last 7 days).

If reducing this signature is really important we could think about:

We don't really need this data buffer to be contiguous. We could get by with a list of buffers which would be easier to allocate if the address space is very fragmented.
We talked about implementing a "data pipe" for WebRender a while ago with the idea of serializing directly into shared memory instead of using an intermediate data buffer like this.
It wouldn't hurt to compress the representation of the serialized displaylist a bit more, but I don't think it would make the crash volume go down that much (although it might be worth doing if only for performance reasons). For example our display lists have a lot of ColorF in them. Normalized u16 per channel should be more than enough precision and save about 1 KB on some of the pages I measured.

But really I doubt it will make a big difference in the crash volume overall. The browser has a tough time staying alive when it can't allocate 1MB.

Flags: needinfo?(nical.bugzilla)

Calixte Denizet (:calixte)

Updated

•

6 years ago

status-firefox66: unaffected → affected

status-firefox68: --- → affected

Jeff Muizelaar [:jrmuizel]

Updated

•

6 years ago

Priority: P2 → P3

Jeff Muizelaar [:jrmuizel]

Updated

•

6 years ago

Blocks: wr-67
No longer blocks: stage-wr-trains

Tim Spurway [:tspurway]

Updated

•

6 years ago

status-firefox66: affected → wontfix

Pascal Chevrel:pascalc

Comment 4

•

6 years ago

Nicolas, is there a way to mitigate part of these crashes? My worry is that we ship webrender v1 with 67 so the crash volume may be significant in the 67 release.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 5

•

6 years ago

Gankro is currently working on a set of changes that will reduce the average size of display lists (the failing allocation in this signature), I don't have the bug number handy, sorry.

Other than that there isn't a lot we can do as we are basically OOM'ing on kinda-large-but-not-unreasonably-so temporary allocations, and this signature is really mostly stealing OOM crashes from other existing signatures (the browser don't usually stay alive very long when we can't allocate 1MB), so even with Gankro's changes, chances are that the overall crash volume will remain the same even if some portion of these OOMs will move back to other signatures.

Flags: needinfo?(nical.bugzilla)

Pascal Chevrel:pascalc

Comment 6

•

6 years ago

Thanks nical for the detailed explanation, marking as wontfix for beta as there isn't anything actionable for us to do in the 67 time frame.

status-firefox67: affected → wontfix

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 7

•

6 years ago

I'm a little concerned about the OOM situation still. I was just looking at the nvidia dashboard and according to the graphs we're OOMing almost 400% with WR enabled vs disabled on nightly, which is pretty bad. On Beta it's ~135% so it seems like something regressed on Nightly.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

6 years ago

Comment 8

•

6 years ago

(Moving the general OOM discussion to bug 1546671 so we can leave this one focused on the specific stack that includes wr_state_new)

Patricia Lawless

Comment 9

•

6 years ago

Bulk change of P3 carryover bugs to wontfix for 68.

status-firefox68: affected → wontfix

Jeff Muizelaar [:jrmuizel]

Updated

•

6 years ago

Blocks: wr-68
No longer blocks: wr-67

Marcia Knous [:marcia]

Updated

•

6 years ago

status-firefox69: --- → affected

Jeff Muizelaar [:jrmuizel]

Updated

•

5 years ago

Blocks: wr-oom
No longer blocks: wr-68

Liz Henry (:lizzard) (relman/hg->git project)

Reporter

Comment 10

•

5 years ago

Crash volume fairly low, and this is set to P3.
Marking fix-optional to remove this from weekly regression triages.

status-firefox69: affected → fix-optional

status-firefox70: --- → fix-optional

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Asif Youssuff

Comment 12

•

5 years ago

•

Edited

A user posted about experiencing this bug here: https://www.reddit.com/r/firefox/comments/f4eiy5/is_it_normal_that_my_browser_crashes/ in case you need someone to try to reproduce issues.

Darkspirit

Updated

•

4 years ago

Crash Signature: std::alloc::rust_oom | webrender_api::display_list::DisplayListBuilder::with_capacity] → std::alloc::rust_oom | webrender_api::display_list::DisplayListBuilder::with_capacity] [@ OOM | large | mozalloc_abort | webrender_bindings::bindings::wr_state_new ]

Darkspirit

Updated

•

4 years ago

Darkspirit

Updated

•

4 years ago

status-firefox69: fix-optional → wontfix

status-firefox70: fix-optional → wontfix

status-firefox78: --- → wontfix

status-firefox79: --- → wontfix

status-firefox80: --- → fix-optional

status-firefox-esr78: --- → disabled

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox80: fix-optional → wontfix

Nicolas Silva [:nical]

Assignee

Comment 14

•

4 years ago

Attached file Bug 1531819 - Don't preallocate more than 0.3MB for WR display lists. — Details

Pulsebot

Comment 15

•

4 years ago

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e50e60b1dcbc Don't preallocate more than 0.3MB for WR display lists. r=miko

Bogdan Tara[:bogdan_tara | bogdant]

Comment 16

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e50e60b1dcbc

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox85: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 85 Branch

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox83: --- → wontfix

status-firefox84: --- → affected

BugBot [:suhaib / :marco/ :calixte]

Comment 17

•

4 years ago

The patch landed in nightly and beta is affected.
:nical, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 18

•

4 years ago

I should have set the right flags. The patch will hopefully reduce the amount of OOMs that land here but it won't fix it entirely, so it's good to keep the bug open as new crashes will come in.

I don't think it is worth uplifting unless we find that it has had a significant impact on overall OOM crashes. In practice I suspect that a portion of the OOMs from this signature that this patch avoids will move to another signature where we make siilarly large-ish allocations.

Status: RESOLVED → REOPENED

status-firefox85: fixed → affected

Flags: needinfo?(nical.bugzilla)

Keywords: leave-open

Resolution: FIXED → ---

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

Status: REOPENED → NEW

status-firefox84: affected → wontfix

Target Milestone: 85 Branch → ---

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Julien Cristau [:jcristau]

Updated

•

4 years ago

status-firefox85: affected → wontfix

Andrew McCreight [:mccr8]

Updated

•

4 years ago

Updated

•

4 years ago

Julien Cristau [:jcristau]

Comment 20

•

3 years ago

Looks like the signature changed in 92.

Julien Cristau [:jcristau]

Comment 21

•

3 years ago

(In reply to Julien Cristau [:jcristau] from comment #20)

Looks like the signature changed in 92.

Opened https://github.com/mozilla-services/socorro/pull/5849 to address that.

Wayne Mery (:wsmwk)

Updated

•

3 years ago

Whiteboard: [tbird crash]

BugBot [:suhaib / :marco/ :calixte]

Comment 22

•

2 years ago

The leave-open keyword is there and there is no activity for 6 months.
:nical, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 23

•

2 years ago

Not much to do for this OOM signature. We could silence it by using a fallible allocation when pre-allocating the display list, but it won't affect the crash volume.

Flags: needinfo?(nical.bugzilla)

Glenn Watson [:gw]

Updated

•

2 years ago

Severity: critical → S3

BugBot [:suhaib / :marco/ :calixte]

Comment 24

•

2 years ago

The severity field for this bug is relatively low, S3. However, the bug has 3 duplicates.
:nical, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Updated

•

2 years ago

Flags: needinfo?(nical.bugzilla)

Andrew Osmond [:aosmond] (he/him)

Updated

•

2 years ago

Depends on: 1785673

BugBot [:suhaib / :marco/ :calixte]

Comment 25

•

6 months ago

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED

Closed: 4 years ago → 6 months ago

Resolution: --- → WORKSFORME

BugBot (nomail) [:suhaib / :marco/ :calixte]

Updated

•

6 months ago

Keywords: leave-open