Closed Bug 1363347 Opened 3 years ago Closed 3 years ago

Crash [@ mozalloc_abort | abort | webrender::frame::Frame::flatten_items ]

Categories

(Core :: Graphics: WebRender, defect, P3)

55 Branch
x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla55
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed

People

(Reporter: darkspirit, Assigned: aosmond)

References

Details

(Keywords: crash, Whiteboard: gfx-noted)

Crash Data

Attachments

(2 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0
Build ID: 20170509100208

Steps to reproduce:

bp-cb24c36d-8e1d-4b5c-9ed4-f296d0170509

STR of bug 1360613

set gfx.webrender.enabled to true
open any youtube video (/watch?v=xxxxxxxxx)
click on the fullscreen button on the bottom right of the video
Firefox crashes, I hear the audio playing for the few seconds even the firefox window is already gone.
With webrender disabled, everything works as expected.
Blocks: 1360613
Crash Signature: [@ mozalloc_abort | abort | webrender::frame::Frame::flatten_items ]
Has STR: --- → yes
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Thanks for filing this!

@kvark, this looks like potentially an issue in WR. Do we have a strategy for tackling these kinds of problems? If needed I can try to repro this and get a recording, which might help determine if it's a bug in WR or in the gecko code talking to WR. Or if you want you can try to debug it directly.
Flags: needinfo?(kvark)
See Also: → 1344190
Blocks: webrender
@kats I don't think we have a strategy about this just yet. If you can repro it and reduce the case to something local to WR, that would be terrific. In any case, we may ask @Gankro to add some asserts to the new flattening code in order to catch the problem earlier.
Flags: needinfo?(kvark) → needinfo?(a.beingessner)
On that day I downloaded Servo for Linux64, openend some websites, even youtube, and at some point Servo crashed with the same reason Firefox crashed here: "called `Option::unwrap()` on a `None` value".
I feel ashamed that I didn't immediately take a screenshot. I was unable to click Servo's crash report button, everything was buggy.
Darkspirit, thanks for the STR in bug 1365009 comment 2. I am consistently able to reproduce this now. I don't think this is on the rust side -- the error just before the rust panic suggests we didn't have a TextureHost for the given external image ID.

[GPU 1476] ###!!! ASSERTION: TextureHost does not exist: 'Error', file /home/aosmond/dev/gecko-dev/gfx/layers/wr/WebRenderBridgeParent.cpp, line 438
thread 'RenderBackend' panicked at 'called `Option::unwrap()` on a `None` value', /checkout/src/libcore/option.rs:323
stack backtrace:
[GPU 1476] ###!!! ASSERTION: TextureHost does not exist: 'Error', file /home/aosmond/dev/gecko-dev/gfx/layers/wr/WebRenderBridgeParent.cpp, line 438
[GPU 1476] ###!!! ASSERTION: TextureHost does not exist: 'Error', file /home/aosmond/dev/gecko-dev/gfx/layers/wr/WebRenderBridgeParent.cpp, line 438
[GPU 1476] ###!!! ASSERTION: TextureHost does not exist: 'Error', file /home/aosmond/dev/gecko-dev/gfx/layers/wr/WebRenderBridgeParent.cpp, line 438
[GPU 1476] ###!!! ASSERTION: TextureHost does not exist: 'Error', file /home/aosmond/dev/gecko-dev/gfx/layers/wr/WebRenderBridgeParent.cpp, line 438
++DOCSHELL 0x7f95146d7800 == 3 [pid = 1555] [id = {553b9546-7a3d-445e-a828-f474bd17a944}]
++DOMWINDOW == 6 (0x7f95146da000) [pid = 1555] [serial = 6] [outer = (nil)]
++DOMWINDOW == 7 (0x7f95146de800) [pid = 1555] [serial = 7] [outer = 0x7f95146da000]
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at /checkout/src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/sys_common/backtrace.rs:60
             at /checkout/src/libstd/panicking.rs:355
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:371
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:549
   5: std::panicking::begin_panic
             at /checkout/src/libstd/panicking.rs:511
   6: std::panicking::begin_panic_fmt
             at /checkout/src/libstd/panicking.rs:495
   7: rust_begin_unwind
             at /checkout/src/libstd/panicking.rs:471
   8: core::panicking::panic_fmt
             at /checkout/src/libcore/panicking.rs:69
   9: core::panicking::panic
             at /checkout/src/libcore/panicking.rs:49
  10: <core::option::Option<T>>::unwrap
             at /checkout/src/libcore/macros.rs:21
  11: webrender::resource_cache::ResourceCache::get_image_properties
             at ./gfx/webrender/src/resource_cache.rs:602
  12: webrender::frame::Frame::flatten_items
             at ./gfx/webrender/src/frame.rs:522
  13: webrender::frame::Frame::flatten_stacking_context
             at ./gfx/webrender/src/frame.rs:394
  14: webrender::frame::Frame::flatten_items
             at ./gfx/webrender/src/frame.rs:645
  15: webrender::frame::Frame::flatten_stacking_context
             at ./gfx/webrender/src/frame.rs:394
  16: webrender::frame::Frame::flatten_iframe
             at ./gfx/webrender/src/frame.rs:475
  17: webrender::frame::Frame::flatten_items
             at ./gfx/webrender/src/frame.rs:657
  18: webrender::frame::Frame::flatten_stacking_context
             at ./gfx/webrender/src/frame.rs:394
  19: webrender::frame::Frame::create
             at ./gfx/webrender/src/frame.rs:269
  20: webrender::render_backend::RenderBackend::build_scene
             at ./gfx/webrender/src/render_backend.rs:474
  21: webrender::profiler::TimeProfileCounter::profile
             at ./gfx/webrender/src/profiler.rs:154
  22: webrender::render_backend::RenderBackend::run
             at ./gfx/webrender/src/render_backend.rs:218
  23: webrender::renderer::Renderer::new::{{closure}}
             at ./gfx/webrender/src/renderer.rs:1072
  24: <std::panic::AssertUnwindSafe<F> as core::ops::FnOnce<()>>::call_once
             at /checkout/src/libstd/panic.rs:296
  25: std::panicking::try::do_call
             at /checkout/src/libstd/panicking.rs:454
  26: <unknown>
             at /checkout/src/libpanic_abort/lib.rs:40
Redirecting call to abort() to mozalloc_abort
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: crash
Looks like WebRenderImageHost::mWrCompositableHolder is cleared, so that when it tries to get the next image/texture, the timestamp is null (since there is no holder to get it from), and the frame/producers in its queue don't match the last frame (because its a video, we don't keep them around).
Flags: needinfo?(a.beingessner)
It appears the following occurs:

1) External image ID "A" gets bound to handle 1 via WebRenderBridgeParent::RecvAddExternalImageId. Sets mWrCompositableHolder to that of the WRBP.
2) External image ID "B" gets bound to handle 1 via WebRenderBridgeParent::RecvAddExternalImageId. Sets mWrCompositableHolder to that of the WRBP (noop, since it was already set).
3) Remove external image ID "A". Clears mWrCompositableHolder.
4) Use external image ID "B" in rendering. mWrCompositableHolder is null when it should not be.
Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Priority: -- → P3
Whiteboard: gfx-noted
To summarize, the video's CompositableHandle is created via ImageBridgeChild rather than WebRenderBridgeChild. In WebRenderImageLayer::RenderLayer, we typically create a new ImageClient for each layer, which creates its own CompositableHandle to pair with an external image ID, which effectively makes them one-to-one. However in the video / PImageBridge case, it needs to reuse the CompositableHandle that the video already has. This becomes a problem when a second image layer gets created containing the same video due to moving to full screen (overwriting WebRenderImageHost::mWrCompositableHolder, but it is the same in this case so it is "okay"), followed by the releasing of the first image layer (clearing WebRenderImageHost::mWrCompositableHolder when it releases its own external image ID and CompositableHandle pairing).
The GPU process also crashed on Windows 10 for me, although it did not produce a crash report for that, so I'm not sure if it is the same problem -- instead the crash report that was generated was for the reinit failure (see bug 1365009).
Note that bug 1364302 changed the details (WebRenderImageHost now holds a pointer to WebRenderBridgeParent rather than WebRenderCompositableHolder) but it still crashes.
(In reply to Andrew Osmond [:aosmond] from comment #10)
> Created attachment 8868129 [details] [diff] [review]
> Allow multiple external image IDs to bind to same image host if for same
> WRBridge, v1
> 
> try:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=83e174ce540b024c9d426ac494684b4c8963fc65

I tried "Linux x64 debug" with webrender enabled and both gpu-process enabled+disabled: With that build I am able to open videos, switch tabs and make them fullscreen (and back to normal) as crazy. This seems to fix the STR of bug 1360613. I don't know if your patch from bug 1365009 comment 5 was also included in this try build.
(In reply to Darkspirit from comment #11)
> with webrender enabled and both gpu-process enabled
layers.gpu-process.enabled;true
layers.gpu-process.force-enabled;false
layers.gpu-process.max_restarts;1
The GPU process seems to not crash and doesn't get disabled by runtime anymore.
(In reply to Darkspirit from comment #11)
> (In reply to Andrew Osmond [:aosmond] from comment #10)
> > Created attachment 8868129 [details] [diff] [review]
> > Allow multiple external image IDs to bind to same image host if for same
> > WRBridge, v1
> > 
> > try:
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=83e174ce540b024c9d426ac494684b4c8963fc65
> 
> I tried "Linux x64 debug" with webrender enabled and both gpu-process
> enabled+disabled: With that build I am able to open videos, switch tabs and
> make them fullscreen (and back to normal) as crazy. This seems to fix the
> STR of bug 1360613. I don't know if your patch from bug 1365009 comment 5
> was also included in this try build.

Great! Yes, the patch from bug 1365009 was included in this build, but it shouldn't matter, as it is only relevant if the GPU process crashes. With this bug's patch, we avoid that.
Extreme test: you can even make a <div> fullscreen, ... with a video in it, then make the video fullscreen, then close video fullscreen, you are now back in the fullscreen <div>, then click on "toggle fullscreen" and you are back on your desktop. http://html5-demos.appspot.com/static/fullscreen.html (the left example). No crashes, nothing disabled by runtime.
Comment on attachment 8868129 [details] [diff] [review]
Allow multiple external image IDs to bind to same image host if for same WRBridge, v1

Sotaro, I'm not sure if this is the best way to fix it, but it does work. An alternative would be to see if the layer manager can free the original layer sooner since it appears to happen in succession after creating the new layer, but I wasn't up for untangling that last night (assuming it is feasible). Let me know if prefer another approach.
Attachment #8868129 - Flags: review?(sotaro.ikeda.g)
It seems that we could handle the situation simper way like ImageHost::Composite() does.
   https://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ImageHost.cpp#213

By the way, gecko media framework permits that async ImageContainer is used by multiple ImageLayers at the same time.
Comment on attachment 8868376 [details] [diff] [review]
patch - Call SetWrBridge() before GetAsTextureHostForComposite()

:aosmond, how about this patch?
Attachment #8868376 - Flags: feedback?(aosmond)
(In reply to Sotaro Ikeda [:sotaro] from comment #16)
> By the way, gecko media framework permits that async ImageContainer is used
> by multiple ImageLayers at the same time.

Sorry, correction:
  gecko media framework permits that TextureHost(Image) is used by multiple ImageLayers at the same time.
Comment on attachment 8868376 [details] [diff] [review]
patch - Call SetWrBridge() before GetAsTextureHostForComposite()

Review of attachment 8868376 [details] [diff] [review]:
-----------------------------------------------------------------

Sotaro, but what about this call path:

ImageBridgeParent::RecvUpdate
CompositableParentManager::ReceiveCompositableUpdate
WebRenderImageHost::UseTextureHost
mWrCompositableHolder->CompositeUntil

Combining this patch with the race happening in this bug means we have a period where mWrBridge is nullptr (after the second external image ID is created, and the first external image ID being removed, but before we call ProcessWebRenderCommands), and we may attempt to update the video.

This doesn't seem terribly important right now, as WebRenderCompositableHolder::GetCompositeUntilTime is unused (thus making mCompositeUntilTime effectively write only, as its value can only impact its own state through CompositeUtil and SetCompositionTime). However I note in bug 1359206 you added:

http://searchfox.org/mozilla-central/diff/288058eab96f2700e5bf5278721cefc5ce1a88e3/gfx/layers/wr/WebRenderBridgeParent.cpp#724

Which may become important later?
Attachment #8868376 - Flags: feedback?(aosmond)
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Andrew Osmond [:aosmond] from comment #20)
> 
> This doesn't seem terribly important right now, as
> WebRenderCompositableHolder::GetCompositeUntilTime is unused (thus making
> mCompositeUntilTime effectively write only, as its value can only impact its
> own state through CompositeUtil and SetCompositionTime). However I note in
> bug 1359206 you added:
> 
> http://searchfox.org/mozilla-central/diff/
> 288058eab96f2700e5bf5278721cefc5ce1a88e3/gfx/layers/wr/WebRenderBridgeParent.
> cpp#724
> 
> Which may become important later?

Yes, I forgot to thinks about it. Thanks!
Flags: needinfo?(sotaro.ikeda.g)
It seems that there seems no use case that WebRenderImageHost::mWrBridge is changed to another valid WebRenderBridgeParent for now. Then attachment 8868129 [details] [diff] [review] works good:)
Attachment #8868376 - Attachment is obsolete: true
Attachment #8868129 - Flags: review?(sotaro.ikeda.g) → review+
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/projects/graphics/rev/954c86fc79b0
Allow multiple external image IDs to be bound to the same image host if owned by the same WRBridge. r=sotaro
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Status: RESOLVED → VERIFIED
Depends on: 1529027
You need to log in before you can comment on or make changes to this bug.