Closed Bug 1180688 Opened 4 years ago Closed 4 years ago

On OSX 10.10 yosemite VM everything renders as black or white except popups/panels

Categories

(Core :: Graphics, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox40 + fixed
firefox41 + fixed
firefox42 + fixed

People

(Reporter: Gijs, Assigned: nical)

References

Details

(Keywords: regression)

Attachments

(8 files, 1 obsolete file)

Not sure if this is VM-specific or if I can fix it with a VMWare config setting, but I can't work on the frontend when my browser looks like this. This is the latest OSX nightly downloaded from https://nightly.mozilla.org/ less than 20 minutes ago. I tried using safe mode, that made no difference.

Markus, any idea what's going on here?
Flags: needinfo?(mstange)
We're probably attempting to use OMTC BasicCompositor again when HWA can't be enabled for some reason. We should be using main thread BasicLayers instead. We've regressed this before multiple times... it would be nice to know what regressed it this time.

And we should fix OMTC BasicCompositor to work in VMs.
Flags: needinfo?(mstange)
(fwiw, seems 39 release also breaks...)
(In reply to Markus Stange [:mstange] from comment #1)
> We're probably attempting to use OMTC BasicCompositor again when HWA can't
> be enabled for some reason. We should be using main thread BasicLayers
> instead. We've regressed this before multiple times... it would be nice to
> know what regressed it this time.

Running mozregression right now.
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8af276ab8636&tochange=37ddc5e2eb72

Maybe the OMTA stuff? Or maybe something else? It looks like I might still get an inbound window, with a bit of luck.
nical, could you take a look at this? :-)
Blocks: 1125848
Flags: needinfo?(nical.bugzilla)
Severity: normal → critical
Keywords: regression
Nical, I read through the patch but couldn't identify the change that could have caused use to fall back to BasicCompositor instead of BasicLayers. If you know how this happened, I think we should fix/revert just that change, and after that we can play on nightly with the change I pushed to try.
FYI, I have same problem.
In my case, with VirtualBox and Guest OS: Snowleopard 10.6.8
I remember timing of this issue...

Firefox Developer Edition 39.0a2 Build 20150423004012 [OK]
Firefox Developer Edition 39.0a2 Build 20150424004008 [NG]

And also Firefox 39.0 is white out now...
(In reply to Markus Stange [:mstange] from comment #1)
> We're probably attempting to use OMTC BasicCompositor again when HWA can't
> be enabled for some reason. We should be using main thread BasicLayers
> instead. We've regressed this before multiple times... it would be nice to
> know what regressed it this time.
> 
> And we should fix OMTC BasicCompositor to work in VMs.

Disabling OMTC is completely untested, at this point it's just a mine field. I don't think it is a good thing to have users fall back to that configuration ever and I'd much prefer that we fix the basic compositor on mac now.
(In reply to Nicolas Silva [:nical] from comment #12)
> (In reply to Markus Stange [:mstange] from comment #1)
> > We're probably attempting to use OMTC BasicCompositor again when HWA can't
> > be enabled for some reason. We should be using main thread BasicLayers
> > instead. We've regressed this before multiple times... it would be nice to
> > know what regressed it this time.
> > 
> > And we should fix OMTC BasicCompositor to work in VMs.
> 
> Disabling OMTC is completely untested,

Doesn't safe mode still use this? If so, can we change that (different bug, I suppose)? It sounds like it shouldn't...
(In reply to Nicolas Silva [:nical] from comment #12)
> Disabling OMTC is completely untested, at this point it's just a mine field.

This is just not true. We use BasicLayers for all popups and panels on OS X. And we haven't been using BasicCompositor on OS X at all.

> I don't think it is a good thing to have users fall back to that
> configuration ever and I'd much prefer that we fix the basic compositor on
> mac now.

I completely agree about doing that on mozilla-central, but I disagree that that's the fix we should uplift to 39.
[Tracking Requested - why for this release]: Regression in 39.

We know which patch broke this, we'll add that information in here; though it's not clear as to why that patch broke it.  There are some ideas as to how to deal with this complication, but we will first take care of the regression.
Assignee: nobody → nical.bugzilla
Tracked for 40, 41, 42, because of regression, and something we want to fix since users will expect color.
I confirmed the following test on the VirtualBox 4.3.28

1) layers.offmainthreadcomposition.enabled = true -> false

 - Firefox 39.0 [NG -> OK]
 - Firefox Developer Edition 41.0a2 [NG -> OK]
 - Nightly 42.0a1 [NG -> NG]

2) 1) + layers.acceleration.disabled = false -> true

 - Firefox 39.0 [OK -> OK]
 - Firefox Developer Edition 41.0a2 [OK -> OK]
 - Nightly 42.0a1 [NG -> OK]
↑ Guest OS: Mac OS X Snowleopard 10.6.8
(In reply to Markus Stange [:mstange] from comment #14)
> (In reply to Nicolas Silva [:nical] from comment #12)
> > Disabling OMTC is completely untested, at this point it's just a mine field.
> 
> This is just not true. We use BasicLayers for all popups and panels on OS X.

I did not know that. It's sad.

> And we haven't been using BasicCompositor on OS X at all.

We do use it on other platforms. It gets a lot more testing coverage than BasicLayerManager.

> I completely agree about doing that on mozilla-central, but I disagree that
> that's the fix we should uplift to 39.

I didn't expect that presenting a frame off the main thread without opengl would be as complicated as it turns out to be on Mac. But yeah, agreed, I don't want to uplift a complex solution either.
Flags: needinfo?(nical.bugzilla)
This should do for now.
Attachment #8630922 - Flags: review?(mstange)
There will be a build with the patch applied here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=612758d7e461

Jeff, you told me you have an environment that reproduces this, could you verify that this build fixes the issue?
forgot to qrefresh before submitting the previous patch, sorry.
Attachment #8630922 - Attachment is obsolete: true
Attachment #8630922 - Flags: review?(mstange)
Attachment #8630925 - Flags: review?(mstange)
Duplicate of this bug: 1181338
> 
> forgot to qrefresh before submitting the previous patch, sorry.

... and the build is here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d842d9c71d05
Comment on attachment 8630925 [details] [diff] [review]
no BasicCompositor for you MacOSX!

Review of attachment 8630925 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! This certainly seems like the least risky approach. And sorry about the sadness.
Attachment #8630925 - Flags: review?(mstange) → review+
(In reply to Nicolas Silva [:nical] from comment #25)
> > 
> > forgot to qrefresh before submitting the previous patch, sorry.
> 
> ... and the build is here:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=d842d9c71d05

This seems to still be broken for me.
(In reply to :Gijs Kruitbosch from comment #27)
> (In reply to Nicolas Silva [:nical] from comment #25)
> > > 
> > > forgot to qrefresh before submitting the previous patch, sorry.
> > 
> > ... and the build is here:
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=d842d9c71d05
> 
> This seems to still be broken for me.

This works for me in a 10.6 VM. Are you sure it's still broken?
Comment on attachment 8630925 [details] [diff] [review]
no BasicCompositor for you MacOSX!

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Firefox is unusable if the host OS is Mac in a VM
[Describe test coverage new/current, TreeHerder]: None.
[Risks and why]: very low, we just make sure to avoid a configuration that is known to be broken.
[String/UUID change made/needed]:
Attachment #8630925 - Flags: approval-mozilla-beta?
Attachment #8630925 - Flags: approval-mozilla-aurora?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #28)
> (In reply to :Gijs Kruitbosch from comment #27)
> > (In reply to Nicolas Silva [:nical] from comment #25)
> > > > 
> > > > forgot to qrefresh before submitting the previous patch, sorry.
> > > 
> > > ... and the build is here:
> > > https://treeherder.mozilla.org/#/jobs?repo=try&revision=d842d9c71d05
> > 
> > This seems to still be broken for me.
> 
> This works for me in a 10.6 VM. Are you sure it's still broken?

I just redownloaded the new trybuild, and it's still broken for me on my 10.10 VM (on a 10.9 host, vmware fusion 7.1.2).
looks like we are dealing with 2 issues? This bug doesn't have too many comments yet so let's keep investigating here.
Whiteboard: [leave-open]
Gijs, I spinned another build with some logging to understand in what configuration the browser ends up in your case when setting OMTC and the compositor backends up here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8d7bc271e43

Could you run it and forward me the standard output? I prefixed all of the extra logging with "[nical]" to help with grepping for the relevant parts.

Thanks!
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Nicolas Silva [:nical] from comment #33)
> Gijs, I spinned another build with some logging to understand in what
> configuration the browser ends up in your case when setting OMTC and the
> compositor backends up here:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8d7bc271e43
> 
> Could you run it and forward me the standard output? I prefixed all of the
> extra logging with "[nical]" to help with grepping for the relevant parts.
> 
> Thanks!

Gijss-Mac:Desktop gijs$ Nightly.app/Contents/MacOS/firefox -P trunk
[nical] Attempting to create an OpenGL compositor
[nical] Failed to initialise the compositor
[nical] Failed to set OMTC up

(profile manager, it turns out I don't have a profile called "trunk" on this VM; but I can't see the list, so I'mma just hit enter and have it start whatever I do have)

Gijss-Mac:Desktop gijs$ 
Gijss-Mac:Desktop gijs$ [nical] Attempting to create an OpenGL compositor
[nical] Failed to initialise the compositor
[nical] Failed to set OMTC up
[nical] Attempting to create an OpenGL compositor
[nical] Failed to initialise the compositor
[nical] Failed to set OMTC up



(This last chunk is all one run; I assume it's giving info once for the main window and once for the e10s'ified content window, or something?)
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Nicolas Silva [:nical] from comment #30)
> [Risks and why]: very low, we just make sure to avoid a configuration that
> is known to be broken.

Does this negatively impact the standard OSX config (i.e. MacBook Air/Pro)?
Flags: needinfo?(nical.bugzilla)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #36)
> (In reply to Nicolas Silva [:nical] from comment #30)
> > [Risks and why]: very low, we just make sure to avoid a configuration that
> > is known to be broken.
> 
> Does this negatively impact the standard OSX config (i.e. MacBook Air/Pro)?

No, these always enjoy a nice hardware accelerated compositor backend, which works well unless mac runs in a VM.
Flags: needinfo?(nical.bugzilla)
Even in the safe mode?  Or, more to the point, any changes to the safe mode, or when the users turn off "use hardware acceleration when available"?
Flags: needinfo?(nical.bugzilla)
(In reply to Milan Sreckovic [:milan] from comment #38)
> Even in the safe mode?  Or, more to the point, any changes to the safe mode,
> or when the users turn off "use hardware acceleration when available"?

Good question. For those, If I understood correctly, before the patch they will have a BasicCompositor that uses opengl to present the frame, and after the patch they will have a non-OMTC BasicLayerManager.

this new configuration is not very well tested so I would say it is a negative impact for users who change the pref. So perhaps it would be better to somehow detect that we are in a VM instead of disabling BasicCompositor for all mac users, Jeff do you know how to do that?
Flags: needinfo?(nical.bugzilla) → needinfo?(jmuizelaar)
Yes, that's what I thought.  We can't mess with the safe mode.  It's supposed to be safe.  I would much rather mess with VMs than the safe mode, and this includes nightly as well.  We should back this out until we figure out the right thing, and revisit the solution to bug 1125848.
(In reply to Nicolas Silva [:nical] from comment #39)
> (In reply to Milan Sreckovic [:milan] from comment #38)
> > Even in the safe mode?  Or, more to the point, any changes to the safe mode,
> > or when the users turn off "use hardware acceleration when available"?
> 
> Good question. For those, If I understood correctly, before the patch they
> will have a BasicCompositor that uses opengl to present the frame, and after
> the patch they will have a non-OMTC BasicLayerManager.
> 
> this new configuration is not very well tested so I would say it is a
> negative impact for users who change the pref. So perhaps it would be better
> to somehow detect that we are in a VM instead of disabling BasicCompositor
> for all mac users, Jeff do you know how to do that?

Sorry, why do we need to detect VM now and we didn't need to before?
Flags: needinfo?(jmuizelaar)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #41)
> 
> Sorry, why do we need to detect VM now and we didn't need to before?

Before we were in a situation where things worked out by luck rather than from handling the problematic case explicitly. A patch that wasn't supposed to have an impact on this stuff apparently broke that fragile equilibrium and it is not even obvious how. If BasicCompositor is broken on mac-inside-a-VM, i'd rather have code that explicitly handles it, than get back to something that will potentially break an untested platform (that we apparently care about) every time we touch this area of the code. I (naively) assume that detecting that we are in a VM is something that isn't too risky and that we can uplift all the way to beta without breaking a sweat. Long term I would rather have BasicCompositor always work, even in the most broken platforms.
(In reply to Nicolas Silva [:nical] from comment #42)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #41)
> > 
> > Sorry, why do we need to detect VM now and we didn't need to before?
> 
> Before we were in a situation where things worked out by luck rather than
> from handling the problematic case explicitly. A patch that wasn't supposed
> to have an impact on this stuff apparently broke that fragile equilibrium
> and it is not even obvious how. If BasicCompositor is broken on
> mac-inside-a-VM, i'd rather have code that explicitly handles it, than get
> back to something that will potentially break an untested platform (that we
> apparently care about) every time we touch this area of the code. I
> (naively) assume that detecting that we are in a VM is something that isn't
> too risky and that we can uplift all the way to beta without breaking a
> sweat. Long term I would rather have BasicCompositor always work, even in
> the most broken platforms.

Can't you just detect the situation by observing accelerated gl context creation failing then?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #43)
> 
> Can't you just detect the situation by observing accelerated gl context
> creation failing then?

I don't know, Maybe? You tell me, you know more about that stuff than I do. I thought we were in a hurry to get a safe fix and uplift it. If we are not, I can spend some time to get a mac with an extra osx license to run in the vm, set things up and figure that out.
(In reply to Nicolas Silva [:nical] from comment #44)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #43)
> > 
> > Can't you just detect the situation by observing accelerated gl context
> > creation failing then?
> 
> I don't know, Maybe? You tell me, you know more about that stuff than I do.
> I thought we were in a hurry to get a safe fix and uplift it. If we are not,
> I can spend some time to get a mac with an extra osx license to run in the
> vm, set things up and figure that out.

Sorry. I meant to imply that the thing that is different in VMs is that accelerated gl context creation fails. That should be all the detection that you need.
Comment on attachment 8630925 [details] [diff] [review]
no BasicCompositor for you MacOSX!

Very low risk, making sure to avoid a configuration that is known to be broken.
Attachment #8630925 - Flags: approval-mozilla-beta?
Attachment #8630925 - Flags: approval-mozilla-beta+
Attachment #8630925 - Flags: approval-mozilla-aurora?
Attachment #8630925 - Flags: approval-mozilla-aurora+
Nical, do you want to continue the discussion about VM support in a new bug?
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8630925 [details] [diff] [review]
no BasicCompositor for you MacOSX!

Sorry, I should have removed the flags earlier. We would like to not take the risk of regressing safe mode for mac users (that are not in a vm). And this patch makes safe mode go into non-OMTC which is not well tested so could have regressions.
I am looking into an alternative solution, and if it turns out to be too risky we can reconsider this one.
Flags: needinfo?(nical.bugzilla)
Attachment #8630925 - Flags: approval-mozilla-beta+
Attachment #8630925 - Flags: approval-mozilla-aurora+
Comment on attachment 8632757 [details] [diff] [review]
Fail to initialize the BasicCompositor if GLPresenter fails on mac.

Review of attachment 8632757 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/cocoa/nsChildView.mm
@@ +2610,5 @@
> +nsChildView::InitCompositor(Compositor* aCompositor)
> +{
> +  if (aCompositor->GetBackendType() == LayersBackend::LAYERS_BASIC) {
> +    if (!mGLPresenter) {
> +      mGLPresenter = GLPresenter::CreateForWindow(this);

Should we remove the existing call of GLPresenter::CreateForWindow in nsChildView::StartRemoteDrawing()? We could replace it with an assert, if that's not too risky to uplift.
Attachment #8632757 - Flags: review?(mstange) → review+
I added an assertion (without removing the CreateForWindow call for now to be extra safe with the uplift), backed the first fix out and landed on inbound. Let's see how it goes.
Hey Jeff, could you check again that nightly still works in the VM? thanks!
Flags: needinfo?(jmuizelaar)
Nightly is white for me.
Flags: needinfo?(jmuizelaar)
The reason the second patch did not work is that when the compositor creation, we still left the widget with references to some of the omtc object, even though they were not initialized. This patch makes sure we null these out and fixes the problem for me.

It would make more sense to null out the references in DestroyCompositor, but there is something tricky around this part when shutting down, related to some code running at shutdown potentially causing the re-creating of a layer manager, compositor, etc during the middle of shutdown, and I am not 100% certain how nulling out the refs in DestroyCompositor would affect this in the case of shutdown.
So for now it is safer to only null them out if the creation of the compositor failed. Once we have uplifted the patch we can take the time to look into moving that into DestroyCompositor and see if it breaks something.
Attachment #8635268 - Flags: review?(mstange)
On that last comment - let's open another bug for the problem above, perhaps add a "do not recreate, being destroyed" flag to properly fix this problem.  Don't like tricky and unknown things :)
Attachment #8635268 - Flags: review?(mstange) → review+
This patch folds the two patches that landed and applies cleanly on top of aurora.

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Firefox is unusable in if the host os is mac in a vm.
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: low risk, affects very few users which are in completely broken state before the patch.
[String/UUID change made/needed]:
Attachment #8636618 - Flags: approval-mozilla-aurora?
This patch folds the two patches that landed and applies cleanly on top of beta.

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Firefox unusable if the host os is mac in a vm
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: low, affect few users which are already in a broken state without the patch. 
[String/UUID change made/needed]:
Attachment #8636620 - Flags: approval-mozilla-beta?
I am seeing partially or totally black pages on resizing windows on pre-Yosemite Mac OS 10.8.5 starting in FF 39. 
http://www.zweigmedia.com/FirefoxBlackPage.png
Not sure if this is related.
(In reply to Stefan from comment #62)
> I am seeing partially or totally black pages on resizing windows on
> pre-Yosemite Mac OS 10.8.5 starting in FF 39. 
> http://www.zweigmedia.com/FirefoxBlackPage.png
> Not sure if this is related.

This looks similar to what I was seeing when I tried using CompositorOGL with a non-accelerated GLContext in a VM. But Firefox 39 shouldn't be able to get into that state - we currently refuse creating non-accelerated GLContexts on Mac. So you shouldn't be able to get into that state... this is very confusing.
See Also: Bug 1185321
magicp, could you please verify the fix works for you in 7/21 nightly build? Thanks!
Flags: needinfo?(magicp.jp)
(In reply to Ritu Kothari (:ritu) from comment #65)
> magicp, could you please verify the fix works for you in 7/21 nightly build?
> Thanks!

It works fine when multi-process is disabled. But, in case of multi-process as follows...

Case #1
layers.offmainthreadcomposition.enabled = true
layers.acceleration.disabled = false
- XUL documents      [OK]
- HTML documents     [NG] Please find the attached image "test case 1-2.png"

Case #2
layers.offmainthreadcomposition.enabled = false
layers.acceleration.disabled = false
- XUL documents      [OK]
- HTML documents     [NG]

Case #3
layers.offmainthreadcomposition.enabled = false
layers.acceleration.disabled = true
- XUL documents      [OK]
- HTML documents     [OK]

My environment:
- Nightly 42.0a1 (20150721030212) and later
- Mac OS X 10.6.8 (Guest OS) in VirtualBox 4.3.30
Flags: needinfo?(magicp.jp)
Attached image test case 1-2.png
If anything gets rendered at all then the issues that are left are different bugs, let's file new bugs for them.
Comment on attachment 8636620 [details] [diff] [review]
Patch rebased on top of beta (carrying r=mstange)

Although there are still failures with e10s, e10s is disabled by default in 40 and 41 and, as Nical said, we should file follow up bugs. Beta+
Attachment #8636620 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8636618 [details] [diff] [review]
Patch rebased on top of aurora (carrying r=mstange)

Aurora+
Attachment #8636618 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Whiteboard: [leave-open]
Target Milestone: --- → mozilla42
magicp, would you mind creating a follow up bug relating to the e10s issues?
(In reply to Kamil Jozwiak [:kjozwiak] from comment #73)
> magicp, would you mind creating a follow up bug relating to the e10s issues?

I have filed Bug 1187322. Please follow up.
Depends on: 1187322
(In reply to magicp from comment #17)
> I confirmed the following test on the VirtualBox 4.3.28
> 
> 1) layers.offmainthreadcomposition.enabled = true -> false
> 
>  - Firefox 39.0 [NG -> OK]
>  - Firefox Developer Edition 41.0a2 [NG -> OK]
>  - Nightly 42.0a1 [NG -> NG]
> 
> 2) 1) + layers.acceleration.disabled = false -> true
> 
>  - Firefox 39.0 [OK -> OK]
>  - Firefox Developer Edition 41.0a2 [OK -> OK]
>  - Nightly 42.0a1 [NG -> OK]

Confirmed that setting these prefs to false and true respectively fixed it! Thank you so much!!
In case anyone has to blindly run the code and cant view the preferences. Hit Ctrl Shift J to open borwser console (hopefully your dev prefs are enabled). paste and hit enter the nrestart:

Services.prefs.setBoolPref('layers.offmainthreadcomposition.enabled', false); Services.prefs.setBoolPref('layers.acceleration.disabled', true);
Just confirming that this was fixed in nightly. And also a side bug even after setting those prefs in my previous reply in order to make it work inFF39. So in FF39 even with those prefs set, things would work, however nsIAlertService would not show any alerts visually, although the alertshow notification triggered, so meaning it was just visual. so doing

    Cc['@mozilla.org/alerts-service;1'].getService(Ci.nsIAlertsService).showAlertNotification(null, 'title', 'msg');


would not work.

I downloaded nightly and alert service is working. profile manager is showing properly now too. real cool thanks for the fix all!
You need to log in before you can comment on or make changes to this bug.