Crash Report [@ mozalloc_abort | abort | __rust_start_panic | rust_begin_unwind ] @ webrender::texture_cache::TextureCache::get

RESOLVED FIXED

Status

()

defect
P1
critical
RESOLVED FIXED
10 months ago
10 months ago

People

(Reporter: bc, Assigned: nical)

Tracking

(Blocks 2 bugs, {crash, regression, topcrash})

unspecified
Unspecified
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(geckoview62 unaffected, firefox-esr60 unaffected, firefox62 unaffected, firefox63 unaffected, firefox64 fixed)

Details

(crash signature, )

Attachments

(1 attachment)

Posted file stack
Beginning with this mornings update to Nightly I stared crashing when maximizing or resizing a window containing several dozen tabs. Nightly is unusable for me with Webrender enabled.

MozCrashReason 	BUG: was dropped from cache or not updated!

bp-f005f684-1ae4-4d47-9218-8e7c70180921
bp-ca1f6ec0-6a9d-43d5-b2e8-ce00f0180921
bp-e1f4f2c7-fa18-407c-8eb8-2fa890180921
bp-43d1be5a-e550-430b-86d9-c6f990180921
bp-c7737268-740c-41b2-b09d-5cf580180921

The crash goes away when gfx.webrender.all is set to false.
Crash Signature: [@ mozalloc_abort | abort | __rust_start_panic | rust_begin_unwind ] → [@ mozalloc_abort | abort | __rust_start_panic | rust_begin_unwind ] [@ core::option::expect_failed | webrender::texture_cache::TextureCache::get ]
Keywords: regression
Priority: -- → P3
This looks like a regression from yesterday to today?
Yes the crashes are in nightly with buildid 20180921100113.
In looking at the backtrace, the patch https://hg.mozilla.org/mozilla-central/rev?node=14c6b338e32c is the one which modified a line close to a line appearing in the bt (gfx/webrender/src/frame_builder.rs:416).
What I said is for signature "core::option::expect_failed | webrender::texture_cache::TextureCache::get".
The other one ("mozalloc_abort | abort | __rust_start_panic | rust_begin_unwind") appeared in 20180919123806.
bp-91d65bc4-a8d3-43d7-8c4e-4b5a90180921
Random Windows crash reports of this crash reason seem all to contain "ClipDataMarker" which has been introduced by bug 1492566 (https://github.com/servo/webrender/pull/3075).

But all crashes began one WebRender update later with build 20180921100113 which could indicate that
bug 1492880 (https://github.com/servo/webrender/compare/f17f6a491d6ff3dc3e13e998dda788d8f5856338...a601f9c291cee83257241ef61aaf62353c613438) might have introduced the problem.

https://github.com/servo/webrender/pull/3079 touched the resource cache.
Flags: needinfo?(nical.bugzilla)
Yes, with mozilla-central's changeset 8dc63538dff7 build, I haven't experienced any crash yet(Of course, still testing...)
Keywords: topcrash
Priority: P3 → P1
Changelog between 20180920220102 (no crashes) and 20180921100113 (crashes):
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5347c7e4811a&tochange=8b93a94b92c3
From the backtrace, it looks like this is occurring when trying to get the image mask for a clip item.

It's probably triggered by certain pages only (in combination with some other factors).

I've been browsing for an hour or so without any luck reproducing it so far.

Has anyone found a page or set of pages so far that are semi-reliable in reproducing this?
I re-enabled webrender after the afternoon nightly update and have been unable to reproduce either with my original profile/session nor a clean profile. I'll keep running with webrender and will report back if it reappears.
I was hitting it regularly on Github when clicking on an issue from an issues list, but some phase has shifted and I can't repro it anymore.
Similarly, I hit it multiple times while scrolling the tab list in the Sidebar tabs extension, but now it's not reproducing.
I also hit it a couple times when closing a tab (including after typing that last comment in this bug).
new one bp-e6939930-1e8e-4275-98e4-608e20180921

I opened a bunch of pages from my history for the last couple of days and resized the page and maximized it but nothing happened. I was cleaning the pages out by deleting the tabs and moving stuff around.

Possibly relevent extensions: Sort Tabs by URL, Tabs Stats and
Tab Center Redux.

Just now the browser hung while scrolling the sidebar in Tab
Center Redux after I had used Tab Stats to delete duplicate tabs,
and Sort Tabs.

If someone wants to drive a remote debugging session, I'm in the process of building opt/debug Nightly.
I still haven't reproduced this, but I have a couple of theories, that might be relevant to either reproducing or debugging this.

Originally I suspected the clip interning change, but as mentioned above the crashes started after https://github.com/servo/webrender/pull/3079 landed. This change does modify request_image().

If request_image() doesn't correctly request an image when it needs to, the result will be the exact panic that we are seeing (that is: we try to get the image handle during batching, asserting that request_image was called and ensured the image was in the texture cache).

It seems like a lot of the crash reports occurred either during resizing, moving tabs or closing tabs.

I _think_ the gradient clip mask image that the tabs use is a blob image (which is affected by the PR above).

My current hunch is that the bug is related to that clip mask image not always being added / retained in the cache.
Duplicate of this bug: 1493379
I've managed to reproduce using youtube. Seeking a video once or twice and then pausing it. Unpause and boom.
This issue always happens (wiht me) in this page: https://www.uol.com.br/ after a scrolling up/down sometimes with the mouse.

When this happen, the page and interface gets fully white, without text or images, frozen for some seconds and after that the text and images shows again, without manually reload the page.
I can reproduce easily the crash with signature "mozalloc_abort | abort | __rust_start_panic | rust_begin_unwind" with the url given by Leonardo and in scrolling the page:
https://crash-stats.mozilla.com/report/index/c0f085f6-df5d-46f3-a3a7-bdbff0180922
Debian Testing, KDE, Xorg, GTX 1060
bp-fe0674b5-a53d-481c-9136-46e9f0180922

mozregression --good 2018-09-20 --bad 2018-09-21 --pref gfx.webrender.all:true -a https://www.uol.com.br/
> [...]
> https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8dc63538dff7755eb40ea9783962b513b09ced15&tochange=8b93a94b92c361e3a2e498bfdd7e7605cd8f7b5e
> [...]
> 5:39.76 INFO: Last good revision: 5ab8b903147a0cc97b21d278299840b9e38aa1f6
> 5:39.76 INFO: First bad revision: 096b1dc47d712a49daf361f17fa4f569cfae8050
> 5:39.76 INFO: Pushlog:
> https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=5ab8b903147a0cc97b21d278299840b9e38aa1f6&tochange=096b1dc47d712a49daf361f17fa4f569cfae8050

> 096b1dc47d71	Jeff Muizelaar — Bug 1492880. Re-generate FFI header
> 4a4ccba9abe3	Jeff Muizelaar — Bug 1492880. Update Cargo lockfiles and re-vendor rust dependencies
> d6bea517aec2	Jeff Muizelaar — Bug 1492880. Update webrender to commit a601f9c291cee83257241ef61aaf62353c613438
Blocks: 1492880
Keywords: steps-wanted
Has Regression Range: --- → yes
Has STR: --- → yes
OS: Unspecified → All
Glenn -- Can I give you this one since it's a top-crasher and we have a test case with a regression range?
Assignee: nobody → gwatson
Yes, I'm going to take a look at it now - I'm now able to reproduce locally with the link in comment 20.
The regression comes from this specific commit:

https://github.com/servo/webrender/pull/3079/commits/6f7ab2a848570089b4668cbb5f72fadc2a6dec3c

I'll add some debug logs to try and track down what is occurring. Nical, any ideas / thoughts on what may cause this?
I think I can't reproduce the crash with layers.async-pan-zoom.enabled;false.
As a temporary fix, we're reverting this patch in WR and pushing that through to m-c.

See: https://github.com/servo/webrender/pull/3108

Handing the bug to nical to investigate on Monday, with the following details:

- Opening https://www.uol.com.br/, then scrolling up and down a few times is a 100% repro for me.
- In this case, the assert comes during batching when it tries to resolve a BorderSource::Image.
- The image queue when processing the rasterized blob has length 0.
- Although it gets inserted into pending_image_requests, it never seems to get handled in add_rasterized_blob_images.

I assume that last bit of information is the cause, although I don't know why that's happening.
Assignee: gwatson → nical.bugzilla
I've created a chrome/userContent.css in the profile folder with * { text-shadow: none; } as content. Can't reproduce the crash with it so far.
Flags: needinfo?(nical.bugzilla)
Ok that one was a little sneaky but turned out to be a simple bug: in the texture cache we have an automatic and a manual eviction policy. The manual eviction policy ensures blob images don't get automatically evicted from the cache which could otherwise race with the asynchronous rasterization. The Texture cache was missing a tiny piece of code to ensure the eviction policy was respected in for shared cache entries (it does work for normal cache entries). I bet border images make uses of these shared entries. While scrolling the image eventually gets discarded and next time we see it we request the image. The problem is that when we decide whether a blob image is missing, we only look at whether there is an entry in the rasterized_blob_image map, but we don't check whether that entry contains any actual data to upload in its queue.

Before the regressing commit, the bug would occur without crashing, but we wouldn't necessary upload all of the image (only the last available dirty region). With that commit we end up requesting something we think we have but don't have and later panic with an empty handed upload request.
Fixed in https://github.com/servo/webrender/pull/3111
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.