Crash in webrender::clip_scroll_tree::ClipScrollTree::get_relative_transform

RESOLVED FIXED in Firefox 65

Status

()

defect
P2
critical
RESOLVED FIXED
9 months ago
7 months ago

People

(Reporter: calixte, Assigned: gw)

Tracking

(Blocks 3 bugs, {crash, regression})

Trunk
mozilla65
Unspecified
Windows 10
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox63 unaffected, firefox64 unaffected, firefox65 fixed)

Details

(crash signature)

Attachments

(2 attachments)

This bug was filed from the Socorro interface and is
report bp-a8e5a4cc-a547-4cdb-a51c-bc9240181027.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll static void std::panicking::rust_panic_with_hook src/libstd/panicking.rs:485
1 xul.dll static void std::panicking::continue_panic_fmt src/libstd/panicking.rs:390
2 xul.dll static void std::panicking::rust_begin_panic src/libstd/panicking.rs:325
3 xul.dll static void core::panicking::panic_fmt src/libcore/panicking.rs:77
4 xul.dll static void core::panicking::panic_bounds_check src/libcore/panicking.rs:59
5 xul.dll static union core::option::Option<euclid::transform3d::TypedTransform3D<f32, webrender_api::units::LayoutPixel, webrender_api::units::LayoutPixel>> webrender::clip_scroll_tree::ClipScrollTree::get_relative_transform gfx/webrender/src/clip_scroll_tree.rs:158
6 xul.dll static void webrender::prim_store::PrimitiveStore::update_picture gfx/webrender/src/prim_store.rs:1667
7 xul.dll static void webrender::prim_store::PrimitiveStore::update_picture gfx/webrender/src/prim_store.rs:1672
8 xul.dll static void webrender::prim_store::PrimitiveStore::update_picture gfx/webrender/src/prim_store.rs:1672
9 xul.dll static void webrender::prim_store::PrimitiveStore::update_picture gfx/webrender/src/prim_store.rs:1672

=============================================================

There are 45 crashes (from 32 installations) in nightly 65 starting with buildid 20181026100128. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1501675.

[1] https://hg.mozilla.org/mozilla-central/rev?node=13372afaba77
Flags: needinfo?(kats)
Priority: -- → P2
From the backtraces, this looks to be an array bounds indexing bug.

It's one of those bugs that is probably a trivial fix once we have a repro, but very difficult to work out what's going on otherwise.

Do we have any URL and/or repro steps for this?
Flags: needinfo?(gwatson)
I think it's also likely that this crash existed previously, but has just been moved around to a different part of the code, as a result of the patch referenced above.

I vaguely recall seeing this crash signature in a previous bug, but I don't see it anymore in the P2/P3 bug lists - I might just have missed it.
Unfortunately, there are neither urls nor user comments in bug reports.
All the crashes are in gpu process (and it seems that we don't have urls in general for gpu process crashes).
(In reply to Glenn Watson [:gw] from comment #3)
> I think it's also likely that this crash existed previously, but has just
> been moved around to a different part of the code, as a result of the patch
> referenced above.
> 
> I vaguely recall seeing this crash signature in a previous bug, but I don't
> see it anymore in the P2/P3 bug lists - I might just have missed it.

Bug 1486218 is probably it. It is blocked against wr-stage-next, not wr-stage-trains :).
See Also: → 1486218
I'll take this for now, and see if I can puzzle it out the hard way. It is the top crash WR users experience on nightly, so it is only going to get worse on the next beta cycle.
Assignee: nobody → aosmond
I have STR from nightly on macOS:

- Go to https://www.theburgerspriest.com/secret-menu/
- Click on "370 days"
- Click on the "Religious Hypocrite" or the "Jarge Style Shake"

https://crash-stats.mozilla.com/report/index/578e4e4b-4704-4bee-98d2-df64e0181109
(This is in my regular browsing profile, I didn't try on a clean one. Let me know if you can/cannot repro.. will check back after I pick up some burgers :))
Thanks kats! I was able to reproduce as well. Looks like it is the signature variant from bug 1486218, but I suspect the same root cause.
Examining it in gdb (unfortunately no luck with rr reproduction) has convinced them they are indeed the same bug. In the signature for this bug, we overflowed the buffer boundary, and in the signature for the other bug, we had an underlying allocated buffer that it could fit the deref'ed contents, but the number of properly stored values in that buffer is fewer (e.g. we just read garbage).
Duplicate of this bug: 1507323
This fixes kats' STR, although I doubt it is the proper fix. I have a reproduction in rr by modifying the STR to iterate through all of the burgers if not hit immediately, until it happens. Unfortunately rr isn't cooperating and besides the one time it hit my breakpoint, I haven't been able to get back to where it adds the spatial nodes in the first place (to figure out why the IDs are bad).
So it looks like the spatial node hierarchy is correct and the invariant of the method is true relative to that -- it seems it is the coordinate system ID hierarchy which violates that assumption.
Opened a PR with a fix - https://github.com/servo/webrender/pull/3316 - still needs a try run to see if it breaks anything, but it fixes the https://www.theburgerspriest.com/secret-menu/ case for me locally at least.
The try run is good, and this is merging now, so should be in the next WR update for nightly.
I'm going to land servo/webrender#3316 out-of-band as part of this bug, as it resolves the crashtest failures that are blocking the regular WR update process.
Assignee: aosmond → gwatson
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0f0b77b3d36e
Cherry pick webrender commit 7f889ccf165ef0bcbf3688ccb1c51bddd84a7b6f (WR PR #3316). r=kats
Duplicate of this bug: 1507705
https://hg.mozilla.org/mozilla-central/rev/0f0b77b3d36e
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Duplicate of this bug: 1508120
I am still getting this issue. As mentioned in bug 1507323, that was duplicated to this bug, if you visit https://drive.google.com you will get an instant crash taking the entire browser out. Today's crash is https://crash-stats.mozilla.com/report/index/3d29810a-ba7d-4d77-b337-17cd30181114
Status: RESOLVED → REOPENED
Flags: needinfo?(gwatson)
Resolution: FIXED → ---
I'll un-dupe bug 1507323 to track the remaining crashes.
Status: REOPENED → RESOLVED
Closed: 8 months ago8 months ago
Flags: needinfo?(gwatson)
Resolution: --- → FIXED
Still crashes on https://www.rondomark.jp/

Firefox nightly 66.0a1 (2018-12-20) (64-bit), Build ID 20181220215605. Windows 10 1809 64-bit.
Same crash signature as resolved duplicate bug 1507705.
bp-c681d5be-233d-45e7-941c-351040181221
See Also: → 1515932
You need to log in before you can comment on or make changes to this bug.