Closed Bug 1617524 Opened 4 years ago Closed 4 years ago

Crash in [@ core::option::expect_failed | webrender::spatial_tree::SpatialTree::get_relative_transform]

Categories

(Core :: Graphics: WebRender, defect, P3)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla78
Tracking Status
firefox-esr68 --- unaffected
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- fixed

People

(Reporter: gsvelto, Assigned: gw)

References

(Regression)

Details

(Keywords: crash, regression, testcase)

Crash Data

Attachments

(5 files)

This bug is for crash report bp-4d67e4a7-c97a-4c22-8244-f7e6a0200223.

Top 10 frames of crashing thread:

0 xul.dll RustMozCrash mozglue/static/rust/wrappers.cpp:16
1 xul.dll mozglue_static::panic_hook mozglue/static/rust/lib.rs:89
2 xul.dll core::ops::function::Fn::call<fn src/libcore/ops/function.rs:232
3 xul.dll std::panicking::rust_panic_with_hook src/libstd/panicking.rs:475
4 xul.dll std::panicking::begin_panic_handler src/libstd/panicking.rs:375
5 xul.dll core::panicking::panic_fmt src/libcore/panicking.rs:84
6 xul.dll core::option::expect_failed src/libcore/option.rs:1188
7 xul.dll webrender::spatial_tree::SpatialTree::get_relative_transform gfx/wr/webrender/src/spatial_tree.rs:297
8 xul.dll webrender::prim_store::SpaceMapper<webrender_api::units::LayoutPixel, webrender_api::units::PicturePixel>::set_target_spatial_node<webrender_api::units::LayoutPixel, webrender_api::units::PicturePixel> gfx/wr/webrender/src/prim_store/mod.rs:286
9 xul.dll webrender::picture::PicturePrimitive::post_update gfx/wr/webrender/src/picture.rs:5100

This seems to have started in buildid 20200130095303. The raw crash reason is: invalid parent!

I wonder if this is related to bug 1589800

Flags: needinfo?(mikokm)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #1)

I wonder if this is related to bug 1589800

It is difficult to say from just this information. It might be possible if the crashing sites use blend modes (or break RDL in other ways).

Flags: needinfo?(mikokm)
Blocks: wr-75
Priority: -- → P3

This looks like it might be fixed.

It looks like it might not be fixed, Miko - any other info that might help here?

Flags: needinfo?(mikokm)

(In reply to Jessie [:jbonisteel] pls NI from comment #4)

It looks like it might not be fixed, Miko - any other info that might help here?

The four crash reports we have received in the last week seem to be from the same person and within two minutes. Besides checking which site crashed, I do not think there is much we can do here.

Flags: needinfo?(mikokm)
Blocks: wr-76
No longer blocks: wr-75
Blocks: wr-77
No longer blocks: wr-76

Miko, it looks like this crash has increased a bit again

Flags: needinfo?(mikokm)
Flags: needinfo?(mikokm)

Let's continue to wait and see

Flags: needinfo?(mikokm)
Flags: needinfo?(mikokm)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)

Some more urls:
https://destroyallhumansgame.com/
https://www.crehana.com/home/
https://www.rickandmorty.com/videos/story-lord

Unfortunately none of these URLs crash the latest Nightly (WR enabled) on my Windows desktop (Nvidia GPU) or MBP.

Looking at the crash reports, it seems plausible that this is caused by bad interaction with WebRender and video decoding. For example, this crash1 has GraphicsCriticalError: "updating unknown shared surface" and it's followed by another crash2 four seconds later with "DXVA2D3D11? DXVA2D3D11+ " flags missing. More pairs like this: (3, 4), (5, 6), (7, 8).

Alternatively, this could be just GPU crash recovery not working properly.

I can repro on https://destroyallhumansgame.com/
https://crash-stats.mozilla.org/report/index/05c9f596-ac6a-4c90-86f5-7e3580200515#tab-bugzilla

The browser flashed white before the page fully loads. So to repro it 100%, the trick might be to keep the tab laoding that URL as he active tab(keep the tab in the foreground)

Edit: The page contents also end up on top of the URL bar. I tried to file a new bug, but bugzilla had some error)

Edit2: If I manually set gfx.webrender.compositor to false, I dont get the crash

Attached image bug.png

Regression points to :
2020-05-16T11:42:35.242000: DEBUG : Found commit message:
Bug 1631312. Move the check for UseWebRenderDCompWin(). r=sotaro

Bug 1630629 moved the check UseWebRenderDCompWin before it had been
initialized. This moves it below so that DirectComposition isn't
disabled everywhere.

Differential Revision: https://phabricator.services.mozilla.com/D71498

2020-05-16T11:42:35.242000: DEBUG : Did not find a branch, checking all integration branches
2020-05-16T11:42:35.242000: INFO : The bisection is done.
2020-05-16T11:42:35.242000: INFO : Stopped

Regressed by: 1631312
Attached file about:support
Flags: needinfo?(sotaro.ikeda.g)
Attached file testcase.html

This is the minimized testcase from https://destroyallhumansgame.com/ . This crashes reliably for me if I enable WR compositor (which is enabled default for me)
If I manually disable the WR compositor, the tab just... freezes
Edit: testcase fixed by https://phabricator.services.mozilla.com/D75465 , which landed today

Has Regression Range: --- → yes
Has STR: --- → yes

further regression with compositor force enabled, and gfx.webrender.quality.force-subpixel-aa-where-possible set to true


2020-05-17T17:45:07.048000: DEBUG : Found commit message:
Bug 1592891 - Disable picture caching when picture caches have complex transforms. r=kvark,nical

With the recent changes to compositing in WR, the scene must either
only produce produce cache tiles, or disable picture caching and
rasterize directly.

This patch removes the (currently broken) path where WR would attempt
to disable only the picture cache slice(s) that have a complex
transform.

In future, we should disable creating picture cache slices with
complex transforms at the API level, and remove this path completely.

Differential Revision: https://phabricator.services.mozilla.com/D51228

2020-05-17T17:45:07.048000: DEBUG : Did not find a branch, checking all integration branches
2020-05-17T17:45:07.048000: INFO : The bisection is done.
2020-05-17T17:45:07.064000: INFO : Stopped

Flags: needinfo?(gwatson)
Regressed by: 1592891

I wasn't able to reproduce this on any of the URLs or the attached test case.

I think I had all the config values set correctly, but it's possible I missed one, or there is something else different between our systems causing it to not occur here.

Could you change layout.display-list.retain in about:config to false, restart the browser and see if you can still reproduce the bug with that setting changed? That might help us identify if it's similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1589800.

Flags: needinfo?(gwatson)

(In reply to Glenn Watson [:gw] from comment #17)

I wasn't able to reproduce this on any of the URLs or the attached test case.

I think I had all the config values set correctly, but it's possible I missed one, or there is something else different between our systems causing it to not occur here.

Could you change layout.display-list.retain in about:config to false, restart the browser and see if you can still reproduce the bug with that setting changed? That might help us identify if it's similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1589800.

layout.display-list.retain = false or true has no effect on the crash. What makes Nightly crash are 2 options:

gfx.webrender.picture-caching = true (If I disable picture caching, I dont get any crash)
gfx.webrender.quality.force-subpixel-aa-where-possible = true (this pref wasn't needed before https://phabricator.services.mozilla.com/D75465)
and the compositor enabled.
Other factor could be intel iGPU notebook, screen resolution 1360x768

Flags: needinfo?(gwatson)

I've re-tested with the attached test case and URLs, and ensuring that those values are set. I tested at the 1366 x 768 resolution you listed, and some other resolutions, on both Linux and Windows. I also tried with both compositor enabled/disabled.

So far, it still doesn't reproduce locally for me, so there must be some other factor.

Miko, Jeff, could you see if you can reproduce locally?

Flags: needinfo?(mikokm)
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(gwatson)

I was able to repro the crash using https://destroyallhumansgame.com/. The other URLs given produced no crashes for me.

(In reply to Mayank Bansal from comment #18)

gfx.webrender.picture-caching = true (If I disable picture caching, I dont get any crash)
gfx.webrender.quality.force-subpixel-aa-where-possible = true (this pref wasn't needed before https://phabricator.services.mozilla.com/D75465)
and the compositor enabled.
Other factor could be intel iGPU notebook, screen resolution 1360x768

This was great information. I can reproduce the crash with the minified testcase on my desktop machine with Nvidia/Win10, when I enable gfx.webrender.quality.force-subpixel-aa-where-possible = true.

Flags: needinfo?(mikokm)
Attached file about:support

I was able to repro the crash using the minimized testcase. I've attached my about:support output for debugging purposes.

I'm not sure what changed overnight, but I can now reproduce this on my current local build with the test case, investigating today.

Assignee: nobody → gwatson
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(jmuizelaar)

I can see what is occurring, and why it only triggers in these very specific circumstances.

For reference, the spatial tree for the test case is below.

We end up with an invalid state for picture caching, due to finding a scrollframe embedded within a reference frame that contains a perspective transform.

We can't detect this during scene building, as the value of the transform may be animated by property animation during frame building.

Normally, when we encounter this case we temporarily disable picture caching and draw straight to the framebuffer. However, when we have native OS compositor enabled, the code to disable picture caching fails, since native compositor mode relies on picture caching being enabled.

There are a number of potential fixes for this case, including:

  • At the start of the frame build, change the spatial node of the problematic picture cache to be the root spatial node. This would invalidate things but mean we don't hit the assertion in this case.
  • Support a mode with native compositor enabled where we create a single framebuffer sized tile, and draw to that with picture caching disabled, similar to how we currently do without native compositing.

I'll experiment with these and see if I run into any unforeseen issues.

┌ spatial tree
│  ├─ ReferenceFrame
│  │  ├─ kind: Transform
│  │  ├─ transform_style: Flat
│  │  ├─ source_transform: Value([I])
│  │  ├─ origin_in_parent_reference_frame: (0.0,0.0)
│  │  ├─ index: SpatialNodeIndex(0)
│  │  ├─ content_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) }
│  │  ├─ viewport_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) }
│  │  ├─ snapping_transform: Some(ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) })
│  │  ├─ coordinate_system_id: CoordinateSystemId(0)
│  │  ├─ ScrollFrame
│  │  │  ├─ viewport: Rect(1920.0×2099.0 at (0.0,0.0))
│  │  │  ├─ scrollable_size: 0.0×0.0
│  │  │  ├─ scroll offset: (0.0,0.0)
│  │  │  ├─ external_scroll_offset: (0.0,0.0)
│  │  │  ├─ kind: PipelineRoot
│  │  │  ├─ index: SpatialNodeIndex(1)
│  │  │  ├─ content_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) }
│  │  │  ├─ viewport_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) }
│  │  │  ├─ snapping_transform: Some(ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) })
│  │  │  ├─ coordinate_system_id: CoordinateSystemId(0)
│  │  │  ├─ ReferenceFrame
│  │  │  │  ├─ kind: Transform
│  │  │  │  ├─ transform_style: Flat
│  │  │  │  ├─ source_transform: Value([I])
│  │  │  │  ├─ origin_in_parent_reference_frame: (0.0,112.0)
│  │  │  │  ├─ index: SpatialNodeIndex(2)
│  │  │  │  ├─ content_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,112.0) }
│  │  │  │  ├─ viewport_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,112.0) }
│  │  │  │  ├─ snapping_transform: Some(ScaleOffset { scale: (1.0,1.0), offset: (0.0,112.0) })
│  │  │  │  ├─ coordinate_system_id: CoordinateSystemId(0)
│  │  │  │  ├─ ScrollFrame
│  │  │  │  │  ├─ viewport: Rect(1920.0×1989.0 at (0.0,0.0))
│  │  │  │  │  ├─ scrollable_size: 0.0×0.0
│  │  │  │  │  ├─ scroll offset: (0.0,0.0)
│  │  │  │  │  ├─ external_scroll_offset: (0.0,0.0)
│  │  │  │  │  ├─ kind: PipelineRoot
│  │  │  │  │  ├─ index: SpatialNodeIndex(3)
│  │  │  │  │  ├─ content_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,112.0) }
│  │  │  │  │  ├─ viewport_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,112.0) }
│  │  │  │  │  ├─ snapping_transform: Some(ScaleOffset { scale: (1.0,1.0), offset: (0.0,112.0) })
│  │  │  │  │  ├─ coordinate_system_id: CoordinateSystemId(0)
│  │  │  │  │  ├─ ScrollFrame
│  │  │  │  │  │  ├─ viewport: Rect(1920.0×1989.0 at (0.0,0.0))
│  │  │  │  │  │  ├─ scrollable_size: 0.0×0.0
│  │  │  │  │  │  ├─ scroll offset: (0.0,0.0)
│  │  │  │  │  │  ├─ external_scroll_offset: (0.0,0.0)
│  │  │  │  │  │  ├─ kind: Explicit
│  │  │  │  │  │  ├─ index: SpatialNodeIndex(4)
│  │  │  │  │  │  ├─ content_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,112.0) }
│  │  │  │  │  │  ├─ viewport_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,112.0) }
│  │  │  │  │  │  ├─ snapping_transform: Some(ScaleOffset { scale: (1.0,1.0), offset: (0.0,112.0) })
│  │  │  │  │  │  └─ coordinate_system_id: CoordinateSystemId(0)
│  │  │  │  │  ├─ ReferenceFrame
│  │  │  │  │  │  ├─ kind: Transform
│  │  │  │  │  │  ├─ transform_style: Flat
│  │  │  │  │  │  ├─ source_transform: Value([0.8660254, 0.5, 0.0, 0.0, -0.5, 0.8660254, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 133.58435, -478.66864, 0.0, 1.0])
│  │  │  │  │  │  ├─ origin_in_parent_reference_frame: (12.0,12.0)
│  │  │  │  │  │  ├─ index: SpatialNodeIndex(5)
│  │  │  │  │  │  ├─ content_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) }
│  │  │  │  │  │  ├─ viewport_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) }
│  │  │  │  │  │  ├─ snapping_transform: None
│  │  │  │  │  │  ├─ coordinate_system_id: CoordinateSystemId(1)
│  │  │  │  │  │  ├─ ScrollFrame
│  │  │  │  │  │  │  ├─ viewport: Rect(1920.0×19.875 at (0.0,0.0))
│  │  │  │  │  │  │  ├─ scrollable_size: 0.0×1908.0
│  │  │  │  │  │  │  ├─ scroll offset: (0.0,0.0)
│  │  │  │  │  │  │  ├─ external_scroll_offset: (0.0,0.0)
│  │  │  │  │  │  │  ├─ kind: Explicit
│  │  │  │  │  │  │  ├─ index: SpatialNodeIndex(6)
│  │  │  │  │  │  │  ├─ content_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) }
│  │  │  │  │  │  │  ├─ viewport_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) }
│  │  │  │  │  │  │  ├─ snapping_transform: None
│  │  │  │  │  │  │  └─ coordinate_system_id: CoordinateSystemId(1)
│  │  │  │  │  ├─ ReferenceFrame
│  │  │  │  │  │  ├─ kind: Transform
│  │  │  │  │  │  ├─ transform_style: Flat
│  │  │  │  │  │  ├─ source_transform: Value([0.8660254, 0.5, 0.0, 0.0, -0.5, 0.8660254, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 140.6156, -476.7846, 0.0, 1.0])
│  │  │  │  │  │  ├─ origin_in_parent_reference_frame: (12.0,12.0)
│  │  │  │  │  │  ├─ index: SpatialNodeIndex(7)
│  │  │  │  │  │  ├─ content_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) }
│  │  │  │  │  │  ├─ viewport_transform: ScaleOffset { scale: (1.0,1.0), offset: (0.0,0.0) }
│  │  │  │  │  │  ├─ snapping_transform: None
│  │  │  │  │  │  └─ coordinate_system_id: CoordinateSystemId(2)

Previously, WR would attempt to detect at the start of frame
building if the spatial node of any picture cache contained
a non-axis-aligned transform, and disable picture caching in
that edge case.

However, picture caching can't (currently) be disabled when the
native compositor is active. In this mode, picture caching was
force enabled, causing an assertion failure due to unexpected
coordinate systems when updating pictures.

This patch changes the way the detection of scroll root logic
works such that we don't consider any scroll frame inside a
reference frame to be a valid scroll root for picture caching
purposes. Thus it's not possible to create a picture cache
where the reference spatial node has a non-axis-aligned transform.

I think the attached patch is a good solution, although I need to do a try run and make sure it doesn't break anything else.

The only issue I can see this causing is if we ever insert zooming reference frame transforms as parents of scroll frames. From what I can tell, I don't think we do this (I think the zooming reference frame is a child of the scroll root). If we do, we can still use this method, but we'd need to add an additional tag to reference frames, which says "This is a zoom transform, it will always be axis-aligned" or similar.

kats, botond, jnicol, does that sound fine to you?

Flags: needinfo?(kats)
Flags: needinfo?(jnicol)
Flags: needinfo?(botond)
Severity: normal → S3

If I understand correctly (and it's quite possible that I don't), I think this is ok. My understanding is that the zooming reference frame is descendant of the root scrollframe for a pipeline, because there are things inside that scrollframe (e.g. scrollbars) that are outside the zooming reference frame and don't get affected by zooming. So your code that discards any scrollframes inside reference frames should be ok, as it will still find the root scrollframe for the pipeline as being outside any reference frames.

(In reply to Glenn Watson [:gw] from comment #26)

If we do, we can still use this method, but we'd need to add an additional tag to reference frames, which says "This is a zoom transform, it will always be axis-aligned" or similar.

This should be easy to do if it's needed, as we can easily identify (on the gecko side) the stacking context/reference frame that was inserted for the zoom transform.

Flags: needinfo?(kats)

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)

If I understand correctly (and it's quite possible that I don't), I think this is ok. My understanding is that the zooming reference frame is descendant of the root scrollframe for a pipeline, because there are things inside that scrollframe (e.g. scrollbars) that are outside the zooming reference frame and don't get affected by zooming. So your code that discards any scrollframes inside reference frames should be ok, as it will still find the root scrollframe for the pipeline as being outside any reference frames.

(In reply to Glenn Watson [:gw] from comment #26)

If we do, we can still use this method, but we'd need to add an additional tag to reference frames, which says "This is a zoom transform, it will always be axis-aligned" or similar.

This should be easy to do if it's needed, as we can easily identify (on the gecko side) the stacking context/reference frame that was inserted for the zoom transform.

It does seem that on android there are reference frames inserted between the main scroll frame and the pipeline root. Below is the spatial tree for a reftest on android that is failing. It's not clear to me yet why there are two reference frames with property animations. I'll do some more investigations on the Gecko side of what's setting them.

ReferenceFrame
kind: Transform
transform_style: Flat
source_transform: Value([I])
origin_in_parent_reference_frame: (0.0,0.0)
index: SpatialNodeIndex(0)
  ScrollFrame
  viewport: Rect(800.0×1136.0 at (0.0,0.0))
  scrollable_size: 0.0×0.0
  scroll offset: (0.0,0.0)
  external_scroll_offset: (0.0,0.0)
  kind: PipelineRoot
  index: SpatialNodeIndex(1)
    ReferenceFrame
    kind: Transform
    transform_style: Flat
    source_transform: Value([I])
    origin_in_parent_reference_frame: (0.0,0.0)
    index: SpatialNodeIndex(2)
      ScrollFrame
      viewport: Rect(800.0×1000.0 at (0.0,0.0))
      scrollable_size: 0.0×0.0
      scroll offset: (0.0,0.0)
      external_scroll_offset: (0.0,0.0)
      kind: PipelineRoot
      index: SpatialNodeIndex(3)
        ReferenceFrame
        kind: Transform
        transform_style: Flat
        source_transform: Binding(PropertyBindingKey { id: PropertyBindingId { namespace: IdNamespace(11895), uid: 1 }, _phantom: PhantomData }, [I])
        origin_in_parent_reference_frame: (0.0,0.0)
        index: SpatialNodeIndex(4)
          ScrollFrame
          viewport: Rect(325.0×406.25 at (0.0,0.0))
          scrollable_size: 0.0×7837.783
          scroll offset: (0.0,0.0)
          external_scroll_offset: (0.0,0.0)
          kind: Explicit
          index: SpatialNodeIndex(5)
        ReferenceFrame
        kind: Transform
        transform_style: Flat
        source_transform: Binding(PropertyBindingKey { id: PropertyBindingId { namespace: IdNamespace(11895), uid: 2 }, _phantom: PhantomData }, [I])
        origin_in_parent_reference_frame: (0.0,0.0)
        index: SpatialNodeIndex(6)
    ReferenceFrame
    kind: Transform
    transform_style: Flat
    source_transform: Value([I])
    origin_in_parent_reference_frame: (0.0,1000.0)
    index: SpatialNodeIndex(7)
      ScrollFrame
      viewport: Rect(800.0×136.0 at (0.0,0.0))
      scrollable_size: 0.0×0.0
      scroll offset: (0.0,0.0)
      external_scroll_offset: (0.0,0.0)
      kind: PipelineRoot
      index: SpatialNodeIndex(8)
Flags: needinfo?(jnicol)
Flags: needinfo?(botond)

Patch is updated with support for Zoom reference frame kinds, which fixes the android test failures and retains the original bug fix.

Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b04340bad99d
Fix crash in get_relative_transform edge case. r=jnicol
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: