Closed Bug 1247611 Opened 4 years ago Closed 4 years ago

[Mac OS X][e10s] Various black artifacts and flickering when drag&drop WebGL content

Categories

(Core :: Canvas: WebGL, defect, P1)

x86_64
macOS
defect

Tracking

()

VERIFIED FIXED
mozilla48
Tracking Status
e10s + ---
firefox44 --- unaffected
firefox45 --- disabled
firefox46 + disabled
firefox47 + fixed
firefox48 + verified

People

(Reporter: adalucinet, Assigned: dvander)

References

Details

(Keywords: regression, Whiteboard: gfx-noted)

Attachments

(1 file)

Affected versions: 45 beta 4 (Build ID: 20160208194709), latest Aurora 46.0a2 and Nightly 47.0a1 (from 2016-02-09) with e10s enabled
Affected platforms: Mac OS X 10.9.5, 10.10.5

Steps to reproduce:
1. Launch Firefox.
2. Navigate to:
* http://webglsamples.googlecode.com/hg/aquarium/aquarium.html
or
* http://goo.gl/zz1rox
3. Drag the tab in a new window.
4. Drag it back into the main window.

Expected result: The content is properly displayed.

Actual result: Black artifacts and, intermittently, flickering are visible.

Additional notes:
1. Screen recording: https://goo.gl/dg5Wal
2. Not reproducible under Ubuntu 12.04 x64, Windows 10 x86 nor Windows 7 x64. Unable to reproduce with 44.0.2RC - 'browser.tabs.remote.autostart' pref has no effect.
3. With e10s disbabled I hit crash bug 1241875: bp-e6f5ab35-4f0c-4c0e-8419-f7b522160211
4. Regression range:
(m-c)
Last good revision: 0496b5b3e9ef (2015-06-05)
First bad revision: 4a07e1ac3cdf (2015-06-06)
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0496b5b3e9ef&tochange=4a07e1ac3cdf 

(m-i)
Last good revision: c7720cbbe62ed6e3b56e0dafa794b94b720a0044
First bad revision: c4d1692d88ee675a949c325227324570c9c946aa
Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c7720cbbe62ed6e3b56e0dafa794b94b720a0044&tochange=c4d1692d88ee675a949c325227324570c9c946aa 

Looks like the following bug has the changes which introduced the regression:
https://bugzil.la/1144906
Jeff, can you take a look at this please?
Flags: needinfo?(jgilbert)
Whiteboard: gfx-noted,regression
Whiteboard: gfx-noted,regression → gfx-noted
Tracking for 46+ since it's a recent regression. Only visible with e10s enabled.
This is tracked for 46, with E10S - Jeff, is this actionable?
(In reply to Milan Sreckovic [:milan] from comment #3)
> This is tracked for 46, with E10S - Jeff, is this actionable?

This looks like a compositor issue, really. It looks like the compositor isn't re-binding textures/framebuffers when it should.

Historically, we don't bother doing save+restore for textures/framebuffers in the compositor, so it looks like this laxness is biting us here.
Waiting on e10s triage to see if I need to keep tracking this for 46.
no response to ni, kicking back into triage.
Flags: needinfo?(jgilbert)
Milan, do you think this should block e10s rollout?
Flags: needinfo?(milan)
If it's truly "just" OS X, then I would say no, it should not block e10s.  I can reproduce this on OS X, but not on Windows - can anybody reproduce the problem on Windows?

David, is this potentially related to what you mentioned yesterday?
Flags: needinfo?(milan) → needinfo?(dvander)
Some of the debug messages:
[Child 60197] WARNING: No inner window available!: file /Users/msreckovic/Repos/mozilla-central/dom/base/nsGlobalWindow.cpp, line 9776
[Parent 60195] WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file /Users/msreckovic/Repos/mozilla-central/gfx/layers/Layers.h, line 2532
[Parent 60195] WARNING: Trying to query the format of an empty TextureSource.: file /Users/msreckovic/Repos/mozilla-central/gfx/layers/opengl/TextureHostOGL.cpp, line 274
Probably not WebGL specific?  I sometimes get the chrome only artifacts (as in, chrome partially or fully not painted at all) when dragging to a separate window.
(In reply to Milan Sreckovic [:milan] from comment #9)

> [Parent 60195] WARNING: ConnectReferentLayer failed - Incorrect
> LayerManager: file
> /Users/msreckovic/Repos/mozilla-central/gfx/layers/Layers.h, line 2532

This one should be OK.
Assigning to David on the suspicion it is not WebGL specific.
Assignee: nobody → dvander
(In reply to Milan Sreckovic [:milan] from comment #8)
> If it's truly "just" OS X, then I would say no, it should not block e10s.  I
> can reproduce this on OS X, but not on Windows - can anybody reproduce the
> problem on Windows?
> 
> David, is this potentially related to what you mentioned yesterday?

Sounds like it, I'll take a look.
Flags: needinfo?(dvander)
I can reproduce this on OS X, but not anywhere else. I think the LayerManager warning is a red herring, we would expect to get it if we try to composite before the remote layer tree is fully attached.

But I do see some other strange behavior: when I move the tab from window A to window B, the contents of window A are sometimes garbage. Sometimes there's a permanent black box over half the tab that stays there until I do a full refresh. Other times it's not invalidated properly, or everything is stretched way out of proportion. The problem goes back to at least Firefox 45.

I can also reproduce this with 2D canvas... for example, loading arewefastyet.com will do it. Other sites won't. If I set "gfx.canvas.azure.acceleration" to "false", the problem stops occurring.

This makes me believe the problem is specific to MacIOSurfaceTextureHostOGL.
Also, disabling tiling makes the problem way more obvious.
Jeff, do you know anything about sharing IOSurfaces across GLContexts? It looks like we have a separate context per window, and when a MacIOSurfaceTextureHostOGL changes compositors, we keep the old texture:

https://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/MacIOSurfaceTextureHostOGL.cpp#78

If I change this to something like:

  if (mCompositor != glCompositor) {
    mTextureSource = nullptr;
  }
  mCompositor = glCompositor;

This bug becomes almost impossible to reproduce. I still see the CG parts of the chrome theme get messed up once in a rare while, but I never get the flickering.
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(matt.woodrow)
We don't use GL context sharing at all on OSX [1], so texture ids from one GLContext are undefined integers (or possibly a different texture) in a different GLContext.

The IOSurface itself is shareable between contexts, but our TextureSource (GLTextureSource) is also retaining the GL texture that we bound it to.

GLTextureSource isn't responsible for allocating the texture, so there isn't really a way to implement GLTextureSource::SetCompositor correctly. I think we should just make this function assert that it is never called.

Your suggested fix for MacIOSurfaceTextureHostOGL sounds like the right way to go.


[1] http://mxr.mozilla.org/mozilla-central/source/gfx/gl/GLContextProviderCGL.mm#235
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jmuizelaar)
Nice.  Let's add a gfxCriticalError to the patch and see where we get.
Attached patch fixSplinter Review
Attachment #8736156 - Flags: review?(matt.woodrow)
Attachment #8736156 - Flags: review?(matt.woodrow) → review+
[Tracking Requested - why for this release]:
e10s gfx regression. tracking to current e10s rollout target.
Priority: -- → P1
https://hg.mozilla.org/mozilla-central/rev/7c4258b493b1
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Alexandra, can you still reproduce this on nightly? If not I'll request uplift.
Flags: needinfo?(alexandra.lucinet)
(In reply to David Anderson [:dvander] from comment #24)
> Alexandra, can you still reproduce this on nightly? If not I'll request
> uplift.

No longer reproducible with latest 48.0a1 (from 2016-04-05), under Mac OS X 10.11.1 and 10.10.5.
Status: RESOLVED → VERIFIED
Flags: needinfo?(alexandra.lucinet)
Great! Let's get this uplifted.
Comment on attachment 8736156 [details] [diff] [review]
fix

Approval Request Comment
[Feature/regressing bug #]: None (OMTC, E10S)
[User impact if declined]: Flickering and potentially unusable browser after dragging tabs with canvases. OS X only.
[Describe test coverage new/current, TreeHerder]: Nightly
[Risks and why]: Very low risk - we're clearing a cache that holds garbage data after dragging a tab.
[String/UUID change made/needed]:
Attachment #8736156 - Flags: approval-mozilla-beta?
Attachment #8736156 - Flags: approval-mozilla-aurora?
If this only happens with e10s enabled, we don't need to uplift it to beta since e10s will be disabled on release and the beta e10s experiments end today with beta 8.
Attachment #8736156 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Comment on attachment 8736156 [details] [diff] [review]
fix

e10s regression with a fix that was verified on Nightly, Aurora47+
Attachment #8736156 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I'm not sure if this is the right place to ask but why SharedSurface_EGLImage constructor sets canRecycle false? 
This is for stability to rendering?
Flags: needinfo?(dvander)
Sorry, I don't know the answer to this question - but according to blame it was changed in bug 1144906.
Flags: needinfo?(dvander)
(In reply to Hiroshi Hatake [:cosmo0920] from comment #31)
> I'm not sure if this is the right place to ask but why
> SharedSurface_EGLImage constructor sets canRecycle false? 
> This is for stability to rendering?

Jeff?
Flags: needinfo?(jgilbert)
The comment is right on the same line as `false`:
>                       false) // Can't recycle, as mSync changes never update TextureHost.

The fences we use are not reusable.
Flags: needinfo?(jgilbert)
(In reply to Jeff Gilbert [:jgilbert] from comment #34)
> The comment is right on the same line as `false`:
> >                       false) // Can't recycle, as mSync changes never update TextureHost.
> 
> The fences we use are not reusable.

Thanks for the information.
But I want to know these fences for Android EGL?

I'm working to port SharedSurface_EGLImage code into ARMv7 Linux Wayland environment on RZ/G1E and PowerVR SGX540 board.
This board's SGX540 does not support KHR_fence_sync or OES_EGL_sync extensions.
Flags: needinfo?(jgilbert)
If we can't fence, we glFinish.
Flags: needinfo?(jgilbert)
You need to log in before you can comment on or make changes to this bug.