Closed Bug 1640564 Opened 5 years ago Closed 5 years ago

Crash in [@ OOM | large | mozalloc_abort | hashbrown::raw::RawTable<T>::new_uninitialized<T> | webrender_bindings::moz2d_renderer::{{impl}}::create_blob_rasterizer ]

Tracking

()

Status:

RESOLVED FIXED

Milestone:

81 Branch

Tracking Flags:

Tracking

Status

firefox-esr68

---

unaffected

firefox-esr78

---

unaffected

firefox77

---

unaffected

firefox78

---

wontfix

firefox79

---

wontfix

firefox80

---

wontfix

firefox81

---

fixed

People

(Reporter: gsvelto, Assigned: aosmond)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, perf-alert)

Crash Data

Attachments

(1 file)

Bug 1640564 - Delete the blob commands when we delete the blob image template. 5 years ago Andrew Osmond [:aosmond] (he/him) 47 bytes, text/x-phabricator-request		Details \| Review

Gabriele Svelto [:gsvelto]

Reporter

Description

•

5 years ago

This bug is for crash report bp-e4802059-f714-46b0-84c6-e76460200523.

Top 9 frames of crashing thread:

0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 mozglue.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:51
2 xul.dll gkrust_shared::oom_hook::hook toolkit/library/rust/shared/lib.rs:221
3 xul.dll std::alloc::rust_oom ../4fb7144ed159f94491249e86d5bbd033b5d60550//src/libstd/alloc.rs:240
4 xul.dll alloc::alloc::handle_alloc_error ../4fb7144ed159f94491249e86d5bbd033b5d60550//src/liballoc/alloc.rs:268
5 xul.dll static hashbrown::raw::RawTable< /builds/worker/workspace/obj-build/toolkit/library/build/C:/Users/VssAdministrator/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.6.2/src/raw/mod.rs:393
6 xul.dll webrender_bindings::moz2d_renderer::{{impl}}::create_blob_rasterizer gfx/webrender_bindings/src/moz2d_renderer.rs:729
7 xul.dll webrender_api::resources::ApiResources::update gfx/wr/webrender_api/src/resources.rs:159
8  @0x49b86bea57

These are new OOM crashes which started with the introduction of hashbrown. The crashes aren't strange - these machines were running out of memory on their own - however it might be worth keeping an eye on them. Some of the OOM allocation sizes are several megabytes in size, in the case of this crash over 50 MiB. I understand hashbrown uses open addressing and stores elements inline, so it's normal that it will try and allocate large arrays for tables with a lot of entries, but still several megabytes sounds like the keys & entries are either very large or we're storing a ton of them.

Darkspirit

Updated

•

5 years ago

Blocks: wr-stability

Andrew McCreight [:mccr8]

Updated

•

5 years ago

Updated

•

5 years ago

Severity: -- → S4

Priority: -- → P5

Darkspirit

Updated

•

5 years ago

status-firefox77: --- → unaffected

status-firefox78: --- → fix-optional

status-firefox79: --- → fix-optional

status-firefox-esr68: --- → unaffected

Darkspirit

Updated

•

5 years ago

Summary: Crash in [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | gkrust_shared::oom_hook::hook | std::alloc::rust_oom | hashbrown::raw::RawTable<T>::new_uninitialized<T>] → Crash in [@ OOM | large | mozalloc_abort | hashbrown::raw::RawTable<T>::new_uninitialized<T> | webrender_bindings::moz2d_renderer::{{impl}}::create_blob_rasterizer ]

Jessie [:jbonisteel] pls NI

Comment 2

•

5 years ago

Jeff - do you think there is anything we should look at here?

Flags: needinfo?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Comment 3

•

5 years ago

I don't think there's much we can do.

Flags: needinfo?(jmuizelaar)

Darkspirit

Updated

•

5 years ago

Updated

•

5 years ago

Blocks: wr-oom

Darkspirit

Updated

•

5 years ago

Crash Signature: webrender_bindings::moz2d_renderer::{{impl}}::create_blob_rasterizer] → webrender_bindings::moz2d_renderer::{{impl}}::create_blob_rasterizer] [@ OOM | large | mozalloc_abort | webrender_api::resources::ApiResources::update]

Mayank Bansal

Updated

•

5 years ago

Comment 5

•

5 years ago

I think we can do better. We are asking for huge allocations sometimes, >10MB, for what is likely a blob update for a fraction of them. We don't need to clone the entire table, just the entries we need.

However, the entries themselves are relatively small. A rect, size and Arc. The table can only have a large footprint if there is a huge number of entries.

From my own testing, it seems easy to bloat the number of entries:

E.g. To go https://www.cbc.ca/news and scroll down to the bottom of the page and back up. If you add a print to watch the number of entries, you can see it is only ever updating a handful, but it can grow into the thousands quite quickly with only a few scrolling repetitions.

That seems more like a bug than something we should paper over without understand more.

Assignee: nobody → aosmond

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

5 years ago

Blocks: wr-81

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

5 years ago

Severity: S4 → S3

Priority: P5 → P3

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 6

•

5 years ago

https://searchfox.org/mozilla-central/rev/d54210d490ef335b13fc1fcac817525120c8c46b/gfx/wr/webrender_api/src/resources.rs#108

We probably should be checking self.blob_image_handler and deleting the cached blob commands here. It appears nothing ever removes it, unless we clear the entire map by closing the tab, etc.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 7

•

5 years ago

The patch to fix the leak is nominally trivial (notwithstanding any issues we may run into once we start cleaning up data that might have complicated life expectancies).

I've noticed here:

https://searchfox.org/mozilla-central/rev/d54210d490ef335b13fc1fcac817525120c8c46b/gfx/wr/webrender_api/src/resources.rs#230

... that we often have duplicate keys in the array. I'm investigating this as well now.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 8

•

5 years ago

Never mind, we get bailed out in the tile calculations.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 9

•

5 years ago

Attached file Bug 1640564 - Delete the blob commands when we delete the blob image template. — Details

Pulsebot

Comment 10

•

5 years ago

Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/54d73028c01f Delete the blob commands when we delete the blob image template. r=nical

Narcis Beleuzu [:NarcisB]

Comment 11

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/54d73028c01f

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox81: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 81 Branch

BugBot [:suhaib / :marco/ :calixte]

Comment 12

•

5 years ago

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

status-firefox80: --- → ?

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

5 years ago

status-firefox78: fix-optional → wontfix

status-firefox79: fix-optional → wontfix

status-firefox80: ? → wontfix

status-firefox-esr78: --- → unaffected

Florin Strugariu [:Bebe]

Comment 13

•

5 years ago

== Change summary for alert #26799 (as of Wed, 26 Aug 2020 04:09:55 GMT) ==

Improvements:

35% Heap Unclassified windows10-64-shippable-qr opt 94,848,234.77 -> 61,557,475.02
34% Heap Unclassified windows10-64-shippable-qr opt 94,131,926.94 -> 61,727,578.04
10% Explicit Memory windows10-64-shippable-qr opt 401,881,782.15 -> 360,629,770.98
6% Resident Memory windows10-64-shippable-qr opt 606,783,131.93 -> 567,448,454.36
6% Heap Unclassified linux1804-64-shippable-qr opt 312,788,771.65 -> 293,997,667.17
5% Heap Unclassified windows10-64-shippable-qr opt tp6 76,177,792.89 -> 72,087,993.19
4% Explicit Memory linux1804-64-shippable-qr opt 620,928,900.77 -> 598,200,568.41

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=26799

Keywords: perf-alert

Ryan VanderMeulen [:RyanVM]

Comment 14

•

5 years ago

•

Edited

Wowza, those are some big improvements. Also, are we missing some memory reporters here given the massive reduction in heap unclassified?

Flags: needinfo?(aosmond)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 15

•

5 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #14)

Wowza, those are some big improvements. Also, are we missing some memory reporters here given the massive reduction in heap unclassified?

It wasn't a leak in so far as would never free the memory, but if you stay on a page, it will accumulate over time as you interact with it. So I'm not too surprised to see changes.

Flags: needinfo?(aosmond)

Ryan VanderMeulen [:RyanVM]

Comment 16

•

5 years ago

That didn't answer my question about missing memory reporters. Changes of that magnitude in heap unclassified would suggest to me that we're missing something here.

Flags: needinfo?(aosmond)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 17

•

5 years ago

Sorry, bug 1655039 is supposed to address blob images in general in the memory report. It is probably true if we had that completed, it would have identified this issue. In this case, it was a structure growing that should never have gotten that big. I will make a note on that bug to make sure the blob commands hash map is included in this effort.

Flags: needinfo?(aosmond)

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

5 years ago

Updated

•

5 years ago

Blocks: 1654013

You need to log in before you can comment on or make changes to this bug.