Closed Bug 1265468 Opened 8 years ago Closed 8 years ago

[e10s] Dragging tab to new window with Youtube playing causes video to go blank

Categories

(Core :: Graphics: Layers, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla49
Tracking Status
e10s m9+ ---
firefox47 --- wontfix
firefox48 --- verified
firefox49 --- verified

People

(Reporter: adamopenweb, Assigned: mattwoodrow)

References

()

Details

(Whiteboard: [gfx-noted])

Attachments

(4 files)

STR:

- In OSX, open an updated version of nightly or dev edition with multi-process enabled, I'm using 48.0a1 (2016-04-18)
- Ensure the browser is not maximized
- Navigate to webcompat.com or any other site (doesn’t matter)
- Open a new tab
- Navigate to youtube.com and play a video
- Drag the youtube tab to an open area of the screen to create new window

Expected Behavior:
- Youtube video keeps playing

Actual behavior:
- Sound continues but video goes blank

Notes:
- I found that disabling multi-process fixes the problem
- Going full screen or resizing the window small enough for mobile, the video starts working. But going back to the original desktop size, video is blank
- The issue doesn't appear in Windows 10 for me
tracking-e10s: --- → ?
Component: General → Audio/Video: Playback
Product: Firefox → Core
Matt - Is this a graphics bug?
Flags: needinfo?(matt.woodrow)
Yes, but I can't reproduce it.

David, we're hitting the gfxCriticalError in GLTextureSource::SetCompositor because ImageHost::Composite calls SetCompositor directly on the TextureSource instead of via MacIOSurfaceTextureHostOGL.
Component: Audio/Video: Playback → Graphics: Layers
Flags: needinfo?(matt.woodrow)
I think this is a nical problem :)

ImageHost::Composite is calling SetCompositor on the MacIOSurfaceTextureHostOGL, which causes it to destroy its TextureSource.

We then call SetCurrentTextureHost, which calls through to PrepareTextureSource, which is a no-op for this TextureHost.

mCurrentTextureSource is still pointing to the texture source we used last time, and belong to the wrong compositor.

We call SetCompositor on this, but GLTextureSource doesn't support swapping compositors, so we hit a gfxCriticalError.

I think we need to make sure mCurrentTextureSource is cleared for things that don't support PrepareTextureSource.
Flags: needinfo?(nical.bugzilla)
Whiteboard: [gfx-noted]
(In reply to Matt Woodrow (:mattwoodrow) from comment #4)
> I think we need to make sure mCurrentTextureSource is cleared for things
> that don't support PrepareTextureSource.

I am wondering how much sense it makes to call SetCompositor on a TextureSource manually from the CompositableHost code since we just called it on the TextureHost and the latter is expected to forward it to its TextureSources.

TextureSource::BindTextureSource should then take care of making sure the ImageHost's current TextureSource is the proper one and is valid.

That said, if it's only that we should only miss a few video frames (the ones that are already produced and on the compositor side when the compositor change happens), which doesn't seem to be the case here.
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
I didn't actually mean to assign myself the other day.

gfxCriticalError part should be easily fixable, but the rest (signaling the media pipeline that it cannot produce DXGI textures after we have switched to a BasicCompositor) is tricky. It involves more media code than gfx, and I don't have time to focus on it in the short term.
Assignee: nical.bugzilla → nobody
Flags: needinfo?(twalker)
Just want to point out that if you drag the affected window/tab back to the original window.  the video returns. 

regression hunt in progress....
nevermind previous regression range, form comment #8.  I went back and checked a build prior to the window above and found one that exhibits the bug. This is proving to be difficult to track down as I don't have 100% reliable steps.  working on that and will try to nail down correct regression range.
It seems having one tab with a youtube video in it isn't enough to reliably reproduce this bug.  Much more reliable steps are to have two tabs with youtube video. Both tabs don't have to be playing. However, the tab being dragged away to open a new window must have the youtube video playing. 

With those steps I was able to find the original regression.  In that case, the video goes black.  That regression range happened between Nightly builds of 2016-03-08 (working as expected) and 2016-03-09 (video goes black).  The broken behavior then changed slightly on Nightly build of 20160406 (video goes white instead of black).
Hard to say what might have caused this in here - 

3-8: 05c087337043dd8e71cc27bdb5b9d55fd00aaa26
3-9: af7c0cb0798f5425d5d344cbaf0ac0ecb1a72a86

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=05c087337043dd8e71cc27bdb5b9d55fd00aaa26&tochange=af7c0cb0798f5425d5d344cbaf0ac0ecb1a72a86
whatever is causing this was uplifted to Aurora.  regression range there is 20160331 (works as expected) and 20160401 (video goes white).  

Then something else was uplifted to make video go black (it's current state on dev-ed) That happened between 20160403 (video goes white) and 20160404 (video goes black)
(In reply to Tracy Walker [:tracy] from comment #12)
> whatever is causing this was uplifted to Aurora.  regression range there is
> 20160331 (works as expected) and 20160401 (video goes white).  
> 
> Then something else was uplifted to make video go black (it's current state
> on dev-ed) That happened between 20160403 (video goes white) and 20160404
> (video goes black)

http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=876a1f819d83ef8035e579dd9247693c9526875b&tochange=c9d25791f8219540e527c3d18aa919418fff4061

This puts that VP9 decoder work in the cross hairs of blame. Jean-Yves, any chance your work in this range could have impacted drag and drop tabs with video playing?
Flags: needinfo?(jyavenard)
Always returning false from VP9Benchmark::IsVP9DecodeFast() looks like a good way to test this.

http://mxr.mozilla.org/mozilla-central/source/dom/media/Benchmark.cpp#29
(In reply to Jim Mathies [:jimm] from comment #14)
> Always returning false from VP9Benchmark::IsVP9DecodeFast() looks like a
> good way to test this.
> 
> http://mxr.mozilla.org/mozilla-central/source/dom/media/Benchmark.cpp#29

This code only runs once and will never run again. So changing the return value is no different to letting the code run once and letting it return its cache value.
(In reply to Jim Mathies [:jimm] from comment #13)
> http://hg.mozilla.org/releases/mozilla-aurora/
> pushloghtml?fromchange=876a1f819d83ef8035e579dd9247693c9526875b&tochange=c9d2
> 5791f8219540e527c3d18aa919418fff4061
> 
> This puts that VP9 decoder work in the cross hairs of blame. Jean-Yves, any
> chance your work in this range could have impacted drag and drop tabs with
> video playing?

This is likely because following this change VP9 *may* be used in place of the default H264.

To test try one of the following:
1) on a build prior 20160331: set media.mediasource.vp9.enabled to true. Do you see the problem?
2) on a build after 20160401: set media.benchmark.vp9.fps to like 50. Do you see the problem?

1) will enable VP9. If you see the problem there, then it's VP9
2) disable VP9. If you don't see the problem there, then it's VP9

In any case, the issue is in gfx.
Flags: needinfo?(jyavenard)
(In reply to Matt Woodrow (:mattwoodrow) from comment #3)
> Yes, but I can't reproduce it.
> 
> David, we're hitting the gfxCriticalError in GLTextureSource::SetCompositor
> because ImageHost::Composite calls SetCompositor directly on the
> TextureSource instead of via MacIOSurfaceTextureHostOGL.

when using h264, the decoder output a MacIOSurface with the decoded image.
when using VP9 however, it's purely software and the image output by the decoder is a plain YUV image.
Note that MSE/VP9 with YUV Image is in use for anyone with no h264 hardware acceleration or using basic layers ; which I believe means over 30% of our windows users are already exercising the code.

Disabling MSE/VP9 on mac won't fix the issue either, because it will still happen with any webm videos (or ogg, theora whatever)

It may reduce the likelihood of the gfx problem being exposed because youtube is well.. youtube.
> Always returning false from VP9Benchmark::IsVP9DecodeFast() looks like a
> good way to test this.
> 
> http://mxr.mozilla.org/mozilla-central/source/dom/media/Benchmark.cpp#29

adding an early return in my local build fixed the bug.

> 2) on a build after 20160401: set media.benchmark.vp9.fps to like 50. Do you see the problem?

Disabling VP9 in my dogfood Nightly also fixed the issue.
This looks like an e10s blocker. Milan, what do you think.. and if it blocks, can we get an assignee?
Flags: needinfo?(milan)
Note, that in case you're considering it, disabling the VP9 benchmark will *not* help or work for all macs. There are macs out there without h264 hardware decoders, in which case VP9 is always available (like mac pro late-2013 or any mac pre-2011).
ImageHost supports recycling TextureSources when we're using DataTextureSource.

When we switch the ImageHost over the new compositor, the cached TextureSource would still have the old compositor.

We'd attach the cached TextureSource to the new TextureHost, and then manually call SetCompositor on mCurrentTextureSource to update the compositor.

This only updates the Y channel TextureSource for YUV, so the other two would be still uploading to textures on the wrong compositor.
Assignee: nobody → matt.woodrow
Flags: needinfo?(milan)
Attachment #8745178 - Flags: review?(nical.bugzilla)
Attachment #8745178 - Flags: review?(nical.bugzilla) → review+
https://hg.mozilla.org/mozilla-central/rev/2e9688cbb548
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Target Milestone: mozilla48 → mozilla49
Matt, should we uplift this e10s video fix to Aurora 48? The e10s team hopes to enable e10s for some release users with Firefox 48.
Flags: needinfo?(matt.woodrow)
Comment on attachment 8745178 [details] [diff] [review]
update-compositor

Approval Request Comment
[Feature/regressing bug #]: Unknown, enabling vp9 on aurora made it visible though.
[User impact if declined]: Pages with video require a refresh when dragged between windows.
[Describe test coverage new/current, TreeHerder]: Tested manually.
[Risks and why]: Pretty low risk.
[String/UUID change made/needed]: None
Flags: needinfo?(matt.woodrow)
Attachment #8745178 - Flags: approval-mozilla-aurora?
Comment on attachment 8745178 [details] [diff] [review]
update-compositor

Fix for e10s tab dragging, let's uplift this so it can ride with 48.
Attachment #8745178 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: qe-verify+
I tried to verify this issue on:
- Mac OS X 10.11 
- Mac OS X 10.10.5 
- Mac OS X 10.8.5
using:
- 49.0a1 (2016-05-04)
- 48.0a2 (2016-05-05)
but the problem is still reproducing. 

These are the steps taken by me and my conclusions: 
1. Launch Firefox with a new profile (to be sure that the browser starts with its default dimensions - not maximized) and ensure that the e10s is enabled.
2. Open a random site, then open a new tab, go to https://www.youtube.com/ and play a random video.
3. Drag the youtube tab in a new window. 
- the new created window has the same (default) dimensions like the main one
- in most of the cases, the issue is not reproducing in this step
4. Resize the new created window (repeatedly shrink and enlarge it).
5. Drag the new created window back to its previous position - change it back in a tab in the main window.
6. Repeat the step 3.
- the video turns white but the sound is properly played.
- if shrinking the new created window, the video turns black or layout artifacts appear instead (only on Nightly the video starts working on the minimal window size)
- if enlarging the new created window, the video is properly rendered.

I can also confirm that the issue is not reproducing on Windows 10 x64 and Ubuntu 14.04 x86 using the latest Nightly and latest Aurora.
per comment #29
Flags: needinfo?(matt.woodrow)
Iulia, can you please test with one of these builds and see if it is fixed:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b5692307da9
Flags: needinfo?(matt.woodrow) → needinfo?(iulia.cristescu)
(In reply to Matt Woodrow (:mattwoodrow) from comment #31)
> Iulia, can you please test with one of these builds and see if it is fixed:
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b5692307da9

Hello! I verified again this bug on Mac OS X 10.11  using the build from  http://archive.mozilla.org/pub/firefox/try-builds/mwoodrow@mozilla.com-6b5692307da9155ebd526152c60047db4100dd5b/try-macosx64/ and I couldn't reproduce anymore this issue. 
It is still reproducing on latest Nightly (2016-05-19).
Flags: needinfo?(iulia.cristescu)
We've had problems with this multiple times. It's not something that gets tested, and there's so many different layer types, texture types and backends that getting all of them to work is endlessly tricky.

Let's just invalidate layers and start fresh, I don't think dragging tabs between windows is that performance critical.
Attachment #8756146 - Flags: review?(nical.bugzilla)
Comment on attachment 8756146 [details] [diff] [review]
Invalidate layers when dragging tabs

Review of attachment 8756146 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/CompositorOGL.cpp
@@ +1004,5 @@
>      DrawVRDistortion(aRect, aClipRect, aEffectChain, aOpacity, aTransform);
>      return;
>    }
>  
> +  MakeCurrent();

nit: since MakeCurrent can fail, it's best to bail out of DrawQuad if it returns false for whatever reason.
Attachment #8756146 - Flags: review?(nical.bugzilla) → review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #33)
> 
> We've had problems with this multiple times. It's not something that gets
> tested, and there's so many different layer types, texture types and
> backends that getting all of them to work is endlessly tricky.

Matt, can you give us a sequence of steps that we should test?  I imagine the sequence of steps would be the same, matrixed with a types of backends (e.g., accelerated windows, os x, basic windows) and Anthony can probably translate it into a test run that we can pass on for repeated testing.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(anthony.s.hughes)
Blocks: 1275687
Case in point - bug 1275687 that we asked Adam to file as a separate one, because while the YouTube workflow is now fixed for him, the airmozilla live stream is busted.
We'd really need coverage of dragging tabs for all layers configurations, as well as all video formats.

I'll have to investigate what's different about the live stream, since there might be yet another variable.
Flags: needinfo?(matt.woodrow)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Backed out for asserting in LayerManagerComposite.cpp on OS X 10.10 debug with e10s:

https://hg.mozilla.org/integration/mozilla-inbound/rev/df861a9e766d0beb9daeeb192bf5b6f4cfeb3f02

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=4d417e2c21ff9a33aee9e53fd7ec7a8e3d96ec60
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=28758818&repo=mozilla-inbound

00:26:16     INFO -  Assertion failure: !aSubtree || aSubtree->Manager() == this, at /builds/slave/m-in-m64-d-0000000000000000000/build/src/gfx/layers/composite/LayerManagerComposite.cpp:103
00:26:16     INFO -  #01: mozilla::layers::PLayerTransactionParent::OnMessageReceived(IPC::Message const&) [obj-firefox/ipc/ipdl/PLayerTransactionParent.cpp:469]
00:26:16     INFO -  #02: mozilla::layers::PCompositorBridgeParent::OnMessageReceived(IPC::Message const&) [obj-firefox/ipc/ipdl/PCompositorBridgeParent.cpp:577]
00:26:16     INFO -  #03: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) [ipc/glue/MessageChannel.h:553]
00:26:16     INFO -  #04: mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&) [ipc/glue/MessageChannel.cpp:1593]
00:26:16     INFO -  #05: mozilla::ipc::MessageChannel::OnMaybeDequeueOne() [ipc/glue/MessageChannel.cpp:1560]
00:26:16     INFO -  #06: nsRunnableMethodImpl<bool (mozilla::ipc::MessageChannel::*)(), false, true>::Run [xpcom/glue/nsThreadUtils.h:715]
00:26:16     INFO -  #07: mozilla::ipc::MessageChannel::DequeueTask::Run() [ipc/glue/MessageChannel.h:498]
00:26:16     INFO -  #08: MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) [mfbt/RefPtr.h:39]
00:26:16     INFO -  #09: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) [ipc/chromium/src/base/message_loop.cc:357]
00:26:16     INFO -  #10: MessageLoop::DoWork() [ipc/chromium/src/base/message_loop.cc:432]
00:26:16     INFO -  #11: base::MessagePumpDefault::Run(base::MessagePump::Delegate*) [ipc/chromium/src/base/message_pump_default.cc:36]
00:26:16     INFO -  #12: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:493]
00:26:16     INFO -  #13: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:183]
00:26:16     INFO -  #14: ThreadFunc [ipc/chromium/src/base/platform_thread_posix.cc:38]
00:26:16     INFO -  #15: libsystem_pthread.dylib + 0x405a
00:26:16     INFO -  #16: libsystem_pthread.dylib + 0x3fd7
00:26:17     INFO -  [Child 1801] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-m64-d-0000000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 2050
00:26:17     INFO -  #01: mozilla::ipc::ProcessLink::OnChannelError() [xpcom/glue/Monitor.h:36]
00:26:17     INFO -  #02: event_base_loop [ipc/chromium/src/third_party/libevent/event.c:1355]
00:26:17     INFO -  #03: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) [ipc/chromium/src/base/message_pump_libevent.cc:364]
00:26:17     INFO -  #04: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:493]
00:26:17     INFO -  #05: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:183]
00:26:17     INFO -  #06: ThreadFunc [ipc/chromium/src/base/platform_thread_posix.cc:38]
00:26:17     INFO -  #07: libsystem_pthread.dylib + 0x405a
00:26:17     INFO -  #08: libsystem_pthread.dylib + 0x3fd7
00:26:17     INFO -  [Child 1801] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-m64-d-0000000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 2050
00:26:17     INFO -  Hit MOZ_CRASH() at /builds/slave/m-in-m64-d-0000000000000000000/build/src/memory/mozalloc/mozalloc_abort.cpp:33
00:26:18     INFO -  TEST-INFO | Main app process: exit 1
00:26:18  WARNING -  TEST-UNEXPECTED-FAIL | ShutdownLeaks | process() called before end of test suite
00:26:18     INFO -  287 INFO checking window state
00:26:18     INFO -  288 INFO Entering test bound test_swap_frameloader_pagevisibility_events
00:26:18     INFO -  289 INFO Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "data:text/html,<html><body>A%20regular,%20everyday,%20normal%20page." line: 0}]
00:26:18     INFO -  290 INFO Console message: OpenGL compositor Initialized Succesfully.
00:26:18     INFO -  Version: 2.1 INTEL-10.6.33
00:26:18     INFO -  Vendor: Intel Inc.
00:26:18     INFO -  Renderer: Intel Iris OpenGL Engine
00:26:18     INFO -  FBO Texture Target: TEXTURE_2D
00:26:18     INFO -  291 INFO TEST-PASS | dom/base/test/browser_bug1058164.js | Got expected event -
00:26:18     INFO -  292 INFO TEST-PASS | dom/base/test/browser_bug1058164.js | Got expected event -
00:26:18     INFO -  293 INFO TEST-PASS | dom/base/test/browser_bug1058164.js | Got expected event -
00:26:18     INFO -  294 INFO TEST-PASS | dom/base/test/browser_bug1058164.js | Got expected event -
00:26:18  WARNING -  TEST-UNEXPECTED-FAIL | dom/base/test/browser_bug1058164.js | application terminated with exit code 1
00:26:18     INFO -  runtests.py | Application ran for: 0:00:12.324670
00:26:18     INFO -  zombiecheck | Reading PID log: /var/folders/zy/bdsjk76j7tl5nrllnsr13pf000000w/T/tmpwDVUIQpidlog
00:26:18     INFO -  mozcrash Copy/paste: /builds/slave/test/build/macosx64-minidump_stackwalk /var/folders/zy/bdsjk76j7tl5nrllnsr13pf000000w/T/tmpEN6w60.mozrunner/minidumps/0901EE6E-4290-46DD-ADA0-DE62DA48D839.dmp /builds/slave/test/build/symbols
00:26:30     INFO -  mozcrash Saved minidump as /builds/slave/test/build/blobber_upload_dir/0901EE6E-4290-46DD-ADA0-DE62DA48D839.dmp
00:26:30     INFO -  mozcrash Saved app info as /builds/slave/test/build/blobber_upload_dir/0901EE6E-4290-46DD-ADA0-DE62DA48D839.extra
00:26:30  WARNING -  PROCESS-CRASH | dom/base/test/browser_bug1058164.js | application crashed [@ mozilla::layers::LayerManagerComposite::ClearCachedResources(mozilla::layers::Layer*)]
Flags: needinfo?(matt.woodrow)
Attachment #8745178 - Flags: checkin+
https://hg.mozilla.org/mozilla-central/rev/42d2e0afa3bc
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Is there something here that needs uplifted to aurora?
Comment on attachment 8756146 [details] [diff] [review]
Invalidate layers when dragging tabs

Approval Request Comment
[Feature/regressing bug #]: Unknown.
[User impact if declined]: Video stops working when a tab is dragged to a new window.
[Describe test coverage new/current, TreeHerder]: Tested mahually.
[Risks and why]: Very low risk, just makes sure we reset and redraw everything when dragging tabs.
[String/UUID change made/needed]: None
Flags: needinfo?(matt.woodrow)
Attachment #8756146 - Flags: approval-mozilla-beta?
Comment on attachment 8756146 [details] [diff] [review]
Invalidate layers when dragging tabs

Improve e10s, taking it.
Should be in 48 beta 2
Attachment #8756146 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Matt Woodrow (:mattwoodrow) from comment #37)
> (In reply to Milan Sreckovic [:milan] from comment #35)
> > (In reply to Matt Woodrow (:mattwoodrow) from comment #33)
> > > 
> > > We've had problems with this multiple times. It's not something that gets
> > > tested, and there's so many different layer types, texture types and
> > > backends that getting all of them to work is endlessly tricky.
> > 
> > Matt, can you give us a sequence of steps that we should test?  I imagine
> > the sequence of steps would be the same, matrixed with a types of backends
> > (e.g., accelerated windows, os x, basic windows) and Anthony can probably
> > translate it into a test run that we can pass on for repeated testing.
> 
> We'd really need coverage of dragging tabs for all layers configurations, as
> well as all video formats.
> 
> I'll have to investigate what's different about the live stream, since there
> might be yet another variable.

Matt can you please need-info me when you have more info about testing this?
Flags: needinfo?(anthony.s.hughes)
I verified this issue on latest Aurora 49.0a2 (2016-06-22) using 
- Mac OS X 10.10.5
- Mac OS X 10.11
The fix landed and it works as expected.
I will continue verifying on the next beta build, once the fix will land.
The bug is verified fixed for 48.0b5 build1 (20160630123429) using Mac OS X 10.11. I will set the flags accordingly.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.