1493177 - Crash Report [@ mozalloc_abort | abort | __rust_start_panic | rust_begin_unwind ] @ webrender::texture_cache::TextureCache::get

Reporter

Description

•

6 years ago

Attached file stack — Details

Beginning with this mornings update to Nightly I stared crashing when maximizing or resizing a window containing several dozen tabs. Nightly is unusable for me with Webrender enabled.

MozCrashReason 	BUG: was dropped from cache or not updated!

bp-f005f684-1ae4-4d47-9218-8e7c70180921
bp-ca1f6ec0-6a9d-43d5-b2e8-ce00f0180921
bp-e1f4f2c7-fa18-407c-8eb8-2fa890180921
bp-43d1be5a-e550-430b-86d9-c6f990180921
bp-c7737268-740c-41b2-b09d-5cf580180921

The crash goes away when gfx.webrender.all is set to false.

Darkspirit

Updated

•

6 years ago

Blocks: wr-stability, stage-wr-trains

status-firefox64: --- → affected

Keywords: regression

Priority: -- → P3

Darkspirit

Comment 1

•

6 years ago

This looks like a regression from yesterday to today?

Calixte Denizet (:calixte)

Comment 2

•

6 years ago

Yes the crashes are in nightly with buildid 20180921100113.
In looking at the backtrace, the patch https://hg.mozilla.org/mozilla-central/rev?node=14c6b338e32c is the one which modified a line close to a line appearing in the bt (gfx/webrender/src/frame_builder.rs:416).

Calixte Denizet (:calixte)

Comment 3

•

6 years ago

What I said is for signature "core::option::expect_failed | webrender::texture_cache::TextureCache::get".
The other one ("mozalloc_abort | abort | __rust_start_panic | rust_begin_unwind") appeared in 20180919123806.

Darkspirit

Comment 4

•

6 years ago

The unwind signature contains other crashes. I searched for the crash reason: https://crash-stats.mozilla.com/search/?moz_crash_reason=~BUG%3A%20was%20dropped%20from%20cache%20or%20not%20updated%21&product=Firefox&date=%3E%3D2018-08-21T17%3A20%3A41.000Z&date=%3C2018-09-21T17%3A20%3A41.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Darkspirit

Comment 5

•

6 years ago

bp-91d65bc4-a8d3-43d7-8c4e-4b5a90180921
Random Windows crash reports of this crash reason seem all to contain "ClipDataMarker" which has been introduced by bug 1492566 (https://github.com/servo/webrender/pull/3075).

But all crashes began one WebRender update later with build 20180921100113 which could indicate that
bug 1492880 (https://github.com/servo/webrender/compare/f17f6a491d6ff3dc3e13e998dda788d8f5856338...a601f9c291cee83257241ef61aaf62353c613438) might have introduced the problem.

https://github.com/servo/webrender/pull/3079 touched the resource cache.

Flags: needinfo?(nical.bugzilla)

Toshihiro Yamada

Comment 6

•

6 years ago

Yes, with mozilla-central's changeset 8dc63538dff7 build, I haven't experienced any crash yet(Of course, still testing...)

Darkspirit

Updated

•

6 years ago

Keywords: topcrash

Priority: P3 → P1

Darkspirit

Comment 7

•

6 years ago

Changelog between 20180920220102 (no crashes) and 20180921100113 (crashes):
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5347c7e4811a&tochange=8b93a94b92c3

Darkspirit

Updated

•

6 years ago

Keywords: steps-wanted

Glenn Watson [:gw]

Comment 8

•

6 years ago

From the backtrace, it looks like this is occurring when trying to get the image mask for a clip item.

It's probably triggered by certain pages only (in combination with some other factors).

I've been browsing for an hour or so without any luck reproducing it so far.

Has anyone found a page or set of pages so far that are semi-reliable in reproducing this?

Bob Clary [:bc] (inactive)

Reporter

Comment 9

•

6 years ago

I re-enabled webrender after the afternoon nightly update and have been unable to reproduce either with my original profile/session nor a clean profile. I'll keep running with webrender and will report back if it reappears.

Darkspirit

Comment 10

•

6 years ago

https://crash-stats.mozilla.com/signature/?product=Firefox&moz_crash_reason=~BUG%3A%20was%20dropped%20from%20cache%20or%20not%20updated%21&signature=mozalloc_abort%20%7C%20abort%20%7C%20__rust_start_panic%20%7C%20rust_begin_unwind&date=%3E%3D2018-09-15T00%3A24%3A53.000Z&date=%3C2018-09-22T00%3A24%3A53.000Z#comments

Adam Gashlin (he/him) [:agashlin] (ex-moco)

Comment 11

•

6 years ago

I was hitting it regularly on Github when clicking on an issue from an issues list, but some phase has shifted and I can't repro it anymore.

Aaron Kaluszka

Comment 12

•

6 years ago

Similarly, I hit it multiple times while scrolling the tab list in the Sidebar tabs extension, but now it's not reproducing.

Aaron Kaluszka

Comment 13

•

6 years ago

I also hit it a couple times when closing a tab (including after typing that last comment in this bug).

Bob Clary [:bc] (inactive)

Reporter

Comment 14

•

6 years ago

new one bp-e6939930-1e8e-4275-98e4-608e20180921

I opened a bunch of pages from my history for the last couple of days and resized the page and maximized it but nothing happened. I was cleaning the pages out by deleting the tabs and moving stuff around.

Possibly relevent extensions: Sort Tabs by URL, Tabs Stats and
Tab Center Redux.

Just now the browser hung while scrolling the sidebar in Tab
Center Redux after I had used Tab Stats to delete duplicate tabs,
and Sort Tabs.

If someone wants to drive a remote debugging session, I'm in the process of building opt/debug Nightly.

Glenn Watson [:gw]

Comment 15

•

6 years ago

I still haven't reproduced this, but I have a couple of theories, that might be relevant to either reproducing or debugging this.

Originally I suspected the clip interning change, but as mentioned above the crashes started after https://github.com/servo/webrender/pull/3079 landed. This change does modify request_image().

If request_image() doesn't correctly request an image when it needs to, the result will be the exact panic that we are seeing (that is: we try to get the image handle during batching, asserting that request_image was called and ensured the image was in the texture cache).

It seems like a lot of the crash reports occurred either during resizing, moving tabs or closing tabs.

I _think_ the gradient clip mask image that the tabs use is a blob image (which is affected by the PR above).

My current hunch is that the bug is related to that clip mask image not always being added / retained in the cache.

Ludovic Hirlimann [:Usul]

Comment 17

•

6 years ago

I've managed to reproduce using youtube. Seeking a video once or twice and then pausing it. Unpause and boom.

Comment hidden (offtopic)

(In reply to Ludovic Hirlimann [:Usul] from comment #17)
> I've managed to reproduce using youtube. Seeking a video once or twice and then pausing it. Unpause and boom.

I've tried that and resizing the window and got a different crash reason than you. But I haven't managed to reproduce it again.

RUST_BACKTRACE=1 mozregression --launch 2018-09-21 -B debug --pref gfx.webrender.all:true -a https://www.youtube.com/watch?v=XxP8kxUn5bc
> Assertion failure: released, at /builds/worker/workspace/build/src/gfx/layers/wr/AsyncImagePipelineManager.cpp:630

The assertion has been added by bug 1492925 within the possible regression range (comment 7).

Comment hidden (offtopic)

https://searchfox.org/mozilla-central/rev/21588b2a9824e0758fe11d10065e2c01ea9f32be/gfx/layers/wr/AsyncImagePipelineManager.cpp#628-630
> DebugOnly<bool> released = SharedSurfacesParent::Release(holder->mExternalImages.front().mImageId);
> MOZ_ASSERT(released);

Maybe I would have run into this bug's crash reason if I haven't used a debug build?

Leonardo

Comment 20

•

6 years ago

STR

This issue always happens (wiht me) in this page: https://www.uol.com.br/ after a scrolling up/down sometimes with the mouse.

When this happen, the page and interface gets fully white, without text or images, frozen for some seconds and after that the text and images shows again, without manually reload the page.

Calixte Denizet (:calixte)

Comment 21

•

6 years ago

I can reproduce easily the crash with signature "mozalloc_abort | abort | __rust_start_panic | rust_begin_unwind" with the url given by Leonardo and in scrolling the page:
https://crash-stats.mozilla.com/report/index/c0f085f6-df5d-46f3-a3a7-bdbff0180922

Darkspirit

Comment 22

•

6 years ago

Debian Testing, KDE, Xorg, GTX 1060
bp-fe0674b5-a53d-481c-9136-46e9f0180922

mozregression --good 2018-09-20 --bad 2018-09-21 --pref gfx.webrender.all:true -a https://www.uol.com.br/
> [...]
> https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8dc63538dff7755eb40ea9783962b513b09ced15&tochange=8b93a94b92c361e3a2e498bfdd7e7605cd8f7b5e
> [...]
> 5:39.76 INFO: Last good revision: 5ab8b903147a0cc97b21d278299840b9e38aa1f6
> 5:39.76 INFO: First bad revision: 096b1dc47d712a49daf361f17fa4f569cfae8050
> 5:39.76 INFO: Pushlog:
> https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=5ab8b903147a0cc97b21d278299840b9e38aa1f6&tochange=096b1dc47d712a49daf361f17fa4f569cfae8050

> 096b1dc47d71	Jeff Muizelaar — Bug 1492880. Re-generate FFI header
> 4a4ccba9abe3	Jeff Muizelaar — Bug 1492880. Update Cargo lockfiles and re-vendor rust dependencies
> d6bea517aec2	Jeff Muizelaar — Bug 1492880. Update webrender to commit a601f9c291cee83257241ef61aaf62353c613438

Blocks: 1492880

Keywords: steps-wanted

Darkspirit

Updated

•

6 years ago

URL: https://www.uol.com.br/

Has Regression Range: --- → yes

Has STR: --- → yes

status-firefox62: --- → unaffected

status-firefox63: --- → unaffected

status-firefox-esr60: --- → unaffected

status-geckoview62: --- → unaffected

OS: Unspecified → All

Maire Reavy [:mreavy]

Comment 23

•

6 years ago

Glenn -- Can I give you this one since it's a top-crasher and we have a test case with a regression range?

Assignee: nobody → gwatson

Glenn Watson [:gw]

Comment 24

•

6 years ago

Yes, I'm going to take a look at it now - I'm now able to reproduce locally with the link in comment 20.

Glenn Watson [:gw]

Comment 25

•

6 years ago

The regression comes from this specific commit:

https://github.com/servo/webrender/pull/3079/commits/6f7ab2a848570089b4668cbb5f72fadc2a6dec3c

I'll add some debug logs to try and track down what is occurring. Nical, any ideas / thoughts on what may cause this?

Darkspirit

Comment 26

•

6 years ago

I think I can't reproduce the crash with layers.async-pan-zoom.enabled;false.

Glenn Watson [:gw]

Comment 27

•

6 years ago

As a temporary fix, we're reverting this patch in WR and pushing that through to m-c.

See: https://github.com/servo/webrender/pull/3108

Handing the bug to nical to investigate on Monday, with the following details:

- Opening https://www.uol.com.br/, then scrolling up and down a few times is a 100% repro for me.
- In this case, the assert comes during batching when it tries to resolve a BorderSource::Image.
- The image queue when processing the rasterized blob has length 0.
- Although it gets inserted into pending_image_requests, it never seems to get handled in add_rasterized_blob_images.

I assume that last bit of information is the cause, although I don't know why that's happening.

Glenn Watson [:gw]

Updated

•

6 years ago

Assignee: gwatson → nical.bugzilla

Darkspirit

Comment 28

•

6 years ago

I've created a chrome/userContent.css in the profile folder with * { text-shadow: none; } as content. Can't reproduce the crash with it so far.

Darkspirit

Updated

•

6 years ago

Depends on: 1493473

See Also: → https://github.com/servo/webrender/pull/3108

Darkspirit

Comment 29

•

6 years ago

Scrolling up & down on https://uebermedien.de/31902/maassens-irrefuehrende-vorwuerfe-gegen-die-tagesschau/.
Fixed with build 20180923100316.

Nicolas Silva [:nical]

Assignee

Updated

•

6 years ago

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 30

•

6 years ago

Ok that one was a little sneaky but turned out to be a simple bug: in the texture cache we have an automatic and a manual eviction policy. The manual eviction policy ensures blob images don't get automatically evicted from the cache which could otherwise race with the asynchronous rasterization. The Texture cache was missing a tiny piece of code to ensure the eviction policy was respected in for shared cache entries (it does work for normal cache entries). I bet border images make uses of these shared entries. While scrolling the image eventually gets discarded and next time we see it we request the image. The problem is that when we decide whether a blob image is missing, we only look at whether there is an entry in the rasterized_blob_image map, but we don't check whether that entry contains any actual data to upload in its queue.

Before the regressing commit, the bug would occur without crashing, but we wouldn't necessary upload all of the image (only the last available dirty region). With that commit we end up requesting something we think we have but don't have and later panic with an empty handed upload request.

Nicolas Silva [:nical]

Assignee

Comment 31

•

6 years ago

Fixed in https://github.com/servo/webrender/pull/3111

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Darkspirit

Updated

•

6 years ago

status-firefox64: affected → fixed

Depends on: 1494042

See Also: → https://github.com/servo/webrender/pull/3111