1492241 - Crash in core::option::expect_failed | core::iter::{{impl}}::next<T>

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is report bp-3b62a536-27d4-4f60-a236-ccf0e0180915. ============================================================= Small volume Win 10 crash spotted during nightly crash triage: https://bit.ly/2NmprgM. Crashes started using 20180914220208. Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=efccb758c78cc4c42170c886294bf335d90e4383&tochange=750e71a8f79b3b745a6653e555e60acb122f1241 There was a webrender update in that changeset. Top 10 frames of crashing thread: 0 xul.dll static void std::panicking::rust_panic_with_hook src/libstd/panicking.rs:521 1 xul.dll static void std::panicking::continue_panic_fmt src/libstd/panicking.rs:426 2 xul.dll static void std::panicking::rust_begin_panic src/libstd/panicking.rs:337 3 xul.dll static void core::panicking::panic_fmt src/libcore/panicking.rs:92 4 xul.dll static void core::option::expect_failed src/libcore/option.rs:960 5 xul.dll static union core::option::Option<webrender_bindings::moz2d_renderer::{{impl}}::rasterize::Job> core::iter::{{impl}}::next<webrender_bindings::moz2d_renderer::{{impl}}::rasterize::Job, core::slice::Iter<webrender_api::image::BlobImageParams>, closure> src/libcore/iter/mod.rs:1326 6 xul.dll static struct alloc::vec::Vec< gfx/webrender_bindings/src/moz2d_renderer.rs:358 7 xul.dll static void webrender::resource_cache::ResourceCache::block_until_all_resources_added gfx/webrender/src/resource_cache.rs:1365 8 xul.dll static struct webrender::tiling::Frame webrender::frame_builder::FrameBuilder::build gfx/webrender/src/frame_builder.rs:368 9 @0x20f3fa83b87 =============================================================

Darkspirit

Comment 1

•

7 years ago

> no entry found for key

Blocks: wr-stability, stage-wr-trains

Priority: -- → P3

Updated

•

7 years ago

Priority: P3 → P4

Marcia Knous [:marcia]

Reporter

Comment 2

•

7 years ago

Last crashes were in the 10-10 build. No crashes seen since then.

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox64: affected → disabled

status-firefox65: --- → affected

status-firefox-esr60: --- → unaffected

Darkspirit

Updated

•

7 years ago

Updated

•

7 years ago

Crash Signature: [@ core::option::expect_failed | core::iter::{{impl}}::next<T>] → [@ core::option::expect_failed | core::iter::{{impl}}::next<T>] [@ core::option::expect_failed | <core::iter::Map<I, F> as core::iter::iterator::Iterator>::next]

Priority: P4 → P2

Matt Woodrow (:mattwoodrow)

Comment 4

•

7 years ago

From bug 1508714: (In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #1) > According to Socorro (Bugzilla Crash Stop) and Mozregression: > last build without this crash signature: 20181119100448 > first build with this crash signature: 20181119220031 > https://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=e44bb5b4bc79be613d29b3f95d7b508e68e3d128&tochange=bd4c > ebdbed4bcd34d449f67d51f139f9bdf75edc

Matt Woodrow (:mattwoodrow)

Comment 5

•

7 years ago

Increased the priority here since it looks like this has spiked to the #1 GPU process crasher now.

Matt Woodrow (:mattwoodrow)

Comment 6

•

7 years ago

There are only 2 WR changes in that regression range, and one is just a version bump for a dependency (euclid). Seems like that this is due to https://github.com/servo/webrender/pull/3323

Matt Woodrow (:mattwoodrow)

Comment 7

•

7 years ago

Tried to untangle this, but the logic is pretty complicated. What I think is happening is: We're crashing trying to access Moz2DBlobRasterizer::blob_commands[key] [1], because there's a key in the requests parameter that we don't have in our hashmap. The callstack is coming from rasterize_missing_blob_images (inlined into block_until_all_resources_added), and that calls self.blob_image_handler.prepare_resources. prepare_resources checks the blob_commands HashMap on BlobImageHandler, so the relevant key must be in that one. Moz2DBlobRasterizer is created with a clone of the blob_commands from BlobImageHandler, so it seems like we must have added a new blob since we created the rasterizer, or the objects are actually unrelated. It appears that the normal process is that we get ApiMsg::UpdateDocument on the render_backend, that updates the BlobImageHandler with new blobs, creates a Moz2DBlobRasterizer, and then sends it all to the scene builder. The scene builder rasterizes blobs, and then returns the Moz2DBlobRasterizer [2] to the render_backend's resource cache. It also seems like ApiMsg::UpdateResources also adds new blobs to the BlobImageHandler, but doesn't do scene building, so it I think the Moz2DBlobRasterizer's internal map of blobs will now be out of date. Should we be recreating the blob rasterizing with a new clone of the blob_commands map? What if there's an existing scene build in-progress, how do we handle ordering of those and making sure we don't clobber our new rasterizer with an older one? It's not super obvious why Bobby's patch regressed this, though I guess just eviction timing changes resulting in us needing to rasterize missing blobs in different situations. [1] https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/gfx/webrender_bindings/src/moz2d_renderer.rs#475 [2] https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/gfx/wr/webrender/src/render_backend.rs#770

Flags: needinfo?(nical.bugzilla)

Matt Woodrow (:mattwoodrow)

Comment 8

•

7 years ago

It might be simpler to just persist a blob rasterizer on each of the render_backend and scene_builders, instead of trying to pass them around. Then each scene builder request can pass a clone of the current blob commands to be integrated into the existing rasterizer, and when done we can just discard the commands instead of trying to return them (since the render_backend already has them, and they're guaranteed to be the same, or newer).

Marcia Knous [:marcia]

Reporter

Updated

•

7 years ago

Keywords: topcrash

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

7 years ago

See Also: → https://github.com/servo/webrender/pull/3351

Maire Reavy [:mreavy]

Comment 9

•

7 years ago

Hi Nical -- This is an important one. Can you take this?

Assignee: nobody → nical.bugzilla

Nicolas Silva [:nical]

Assignee

Comment 10

•

7 years ago

> Hi Nical -- This is an important one. Can you take this? Yep > It's not super obvious why Bobby's patch regressed this, though I guess just eviction timing changes resulting in us needing to rasterize missing blobs in different situations. Blobs should not be affected by the eviction timings (they are always manually evicted from the cache), or if they are it's another bug to look at.

Flags: needinfo?(nical.bugzilla)

Bobby Holley (:bholley)

Comment 11

•

7 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #6) > There are only 2 WR changes in that regression range, and one is just a > version bump for a dependency (euclid). > > Seems like that this is due to https://github.com/servo/webrender/pull/3323 Those are the only two changes to upstream WebRender, and as has been noted, neither is a particularly likely candidate. However, it's quite a big range, and there's certainly some other interesting stuff in there that might have triggered this. Bug 1446309, bug 1477551, and bug 1486958 all seem like they could be related here.

Jeff Muizelaar [:jrmuizel]

Comment 12

•

7 years ago

I put up https://github.com/servo/webrender/pull/3371 to maybe help.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

7 years ago

Depends on: 1511244

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

7 years ago

Depends on: 1510882

Jeff Muizelaar [:jrmuizel]

Comment 13

•

7 years ago

So https://crash-stats.mozilla.com/report/index/fc9523ac-16a6-4b8a-84c3-3ec010181130 and https://crash-stats.mozilla.com/report/index/3762f238-2e31-4e9d-86f6-e64170181130 are crashes that happened after bug 1510882 / pull/3371 landed. However we're still crashing in the same place. This is very surprising as the crash should have moved earlier.

Jeff Muizelaar [:jrmuizel]

Comment 14

•

7 years ago

So it looks like the inconsistency probably comes from: https://searchfox.org/mozilla-central/rev/8f0db72fb6e35414fb9a6fc88af19c69f332425f/gfx/webrender_bindings/src/moz2d_renderer.rs#593 We clone the blob_commands table there and so it's believable that inconsistencies between the two arise.

Andrew Osmond [:aosmond] (he/him)

Comment 15

•

7 years ago

I poked around the push logs for when this signature first appeared, but we enabled gfx.webrender.qualified.all on nightly around this time (20180913100107) so I suspect it was already present. Earliest signatures for reference: https://crash-stats.mozilla.com/report/index/3b62a536-27d4-4f60-a236-ccf0e0180915 [20180914220208, Windows] https://crash-stats.mozilla.com/report/index/4e8df691-9d4f-4c9d-8bdb-cdbe20181123 [20181119220031, OSX] https://crash-stats.mozilla.com/report/index/af5bad9d-70f0-4ed1-8d73-cf83b0181120 [20181119220031, Linux]

Andrew Osmond [:aosmond] (he/him)

Comment 16

•

7 years ago

Here is a theory that kvark and I hashed out: On the first pass of prepare_transaction: 1) Resource update contains only an AddBlobImage, but it is not visible according to viewport_tiles (set in SetBlobImageVisibleArea) 2) We call ResourceCache::create_blob_scene_builder_requests but since the image is not visible, it doesn't create any BlobImageParams entries (containing the blob image key) to be stored in txn.blob_requests 3) Since the transaction has no blob image updates, we can take the shortcut to call RenderBackend::update_document directly 4) We didn't call ResourceCache::set_blob_rasterizer with the updated txn.blob_rasterizer. On the second pass of prepare_transaction: 1) Resource update contains only an SetBlobImageVisibleArea 2) We don't call ResourceCache::create_blob_scene_builder_requests since there is no Add/UpdateBlobImage entries. 3) Since the transaction has no blob image updates, we can take the shortcut to call RenderBackend::update_document directly 4) We are missing the blob we added in the previous step in ResourceCache::request_image, and it adds it to missing_blob_images. 5) We try to process missing_blob_images in ResourceCache::rasterize_missing_blob_images with the old blob_rasterizer. 6) Crash.

Darkspirit

Comment 17

•

7 years ago

bp-25efa768-2eff-4c1a-b7fe-bb3390181130 I marked text in a bugzilla comment box and pressed Del.

Andrew Osmond [:aosmond] (he/him)

Comment 18

•

7 years ago

Putting together a fix now, fingers crossed this is it.

Andrew Osmond [:aosmond] (he/him)

Comment 19

•

7 years ago

https://github.com/servo/webrender/pull/3378

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

7 years ago

Depends on: 1511663

Andrew Osmond [:aosmond] (he/him)

Comment 20

•

7 years ago

(In reply to Andrew Osmond [:aosmond] from comment #18) > Putting together a fix now, fingers crossed this is it. Doesn't look like it. Sadness.

Andrew Osmond [:aosmond] (he/him)

Comment 21

•

7 years ago

I reproduced locally readily outside of rr, and with some effort, in as well. STR: 1) Comment out https://searchfox.org/mozilla-central/rev/3fdc51e66c9487b39220ad58dcee275fca070ccd/gfx/wr/webrender/src/scene_builder.rs#471-476 to force us to always create blobs on the RenderBackend thread. 2) MOZ_CHAOSMODE=1 ./mach run https://en.wikipedia.org/wiki/Fedora_(operating_system)#Releases 3) Ensure you are scrolled to the top of the page. 4) Resize vigorously until it crashes. This confirmed nical's theory that the a low priority and high priority request raced. They both created their own rasterizer, the order got inverted, and when we tried to rasterize the missing blob, our snapshot was missing the key added in the other transaction.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

7 years ago

Depends on: 1512156

Bobby Holley (:bholley)

Updated

•

7 years ago

Blocks: 1510328

Dirkjan Ochtman (:djc)

Comment 23

•

7 years ago

Just got https://crash-stats.mozilla.com/report/index/bc16ca25-ebec-4259-ae08-a18a70181210 on macOS. Different or the same?

Andrew Osmond [:aosmond] (he/him)

Comment 24

•

7 years ago

I don't think what landed can solve the priority inversion I found. Trying to reproduce again... I have a patch to force the correct ordering by forcing high priority scene builder transactions follow the low priority route if it has blob images to rasterize. Probably not ideal however.

Dzmitry Malyshau [:kvark]

Comment 25

•

7 years ago

Linked to another solution (which is simple and appears to fix the issue).

See Also: → https://github.com/servo/webrender/pull/3398

Sotaro Ikeda [:sotaro]

Updated

•

7 years ago

Blocks: 1494164

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

7 years ago

Depends on: 1512730

Andrew Osmond [:aosmond] (he/him)

Comment 26

•

7 years ago

\o/ Looks like bug 1512730 fixed the crash.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 27

•

7 years ago

Indeed! Let's mark it fixed until proven otherwise.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Darkspirit

Comment 28

•

7 years ago

Should bug 1512730 be uplifted to 65?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 29

•

7 years ago

Yeah, that would be good. :aosmond or :kvark, do you want to request uplift on that WR update?

status-firefox66: --- → fixed

Target Milestone: --- → mozilla66

Dzmitry Malyshau [:kvark]

Comment 30

•

7 years ago

kats, the request is up at https://bugzilla.mozilla.org/show_bug.cgi?id=1512730#c5

Darkspirit

Updated

•

7 years ago

status-firefox65: affected → fixed

Comment hidden (Intermittent Failures Robot)