Closed Bug 713305 Opened 13 years ago Closed 12 years ago

WebGL no longer triggers discrete graphics mode on dual GPU MacBook, affecting content performance

Categories

(Core :: Graphics: CanvasWebGL, defect)

x86
macOS
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla13
Tracking Status
firefox10 --- affected
firefox11 + wontfix
firefox12 --- verified
firefox13 --- verified
firefox-esr10 12+ fixed

People

(Reporter: reuben, Assigned: bjacob)

References

Details

(Keywords: regression, Whiteboard: [gfx.relnote.13])

Attachments

(1 file, 6 obsolete files)

Just noticed this after reinstalling gfxCardStatus and trying Google Maps GL.
I thought that WebGL was always supposed to trigger discrete mode. Strange.
Component: Graphics → Canvas: WebGL
QA Contact: thebes → canvas.webgl
Reuben, can you try finding the regression range?
What version did you see this behavior with? I tried beta-10 and the behavior I see is that WebGL switches to discrete and doesn't return to integrated even after leaving the page. Also double check that gfxCardStatus is set to 'Dynamic'.
(In reply to Boris Zbarsky (:bz) from comment #2)
Ay!

(In reply to Benoit Girard (:BenWa) from comment #3)
>What version did you see this behavior with?
Nightly 11 and 12.

> I tried beta-10 and the behavior I see is that WebGL switches to 
> discrete and doesn't return to integrated even after leaving the page.
It should switch back to integrated if your CFBundleIdentifier is the Firefox one (not Nightly or Aurora). If it doesn't, then that's another regression.

> Also double check that gfxCardStatus is set to 'Dynamic'.
I don't actually use its force GPU feature, it causes system instability.
QA: Let's try to reproduce this with FF10 and FF11 and help out with a regression range.

1) Install gfxCardStatus: http://codykrieger.com/gfxCardStatus
2) Make sure the menu item is set to Dynamic Switching
3) Note the graphics card currently in use in the menu item
4) Go to https://maps.google.com/, click "Experience MapsGL" if you haven't already
5) Note the graphics card currently in use in the menu item
6) Close the Google Maps tab, wait a couple of minutes, and again note the graphics card currently in use

Please report your results back here. Thanks!
Does this explicitly require dual-GPU Macbooks? I'd like to help with a regression range but I fear I might not have the hardware necessary to reproduce.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #6)
> Does this explicitly require dual-GPU Macbooks?

Yes :/

While you're here, can you point me where the nightlies between 9 and 10 are available?
(In reply to Boris Zbarsky (:bz) from comment #8)
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/

Sorry, what I *actually* wanted was a time frame for the nightlies between Fx 9 and 10, but that wouldn't help me anyway, as my initial guess was quite far from the actual range I'm getting:

24-Nov-2011 works
26-Nov-2011 fails

It's late and my system is very unstable for some reason so I'll try to bisect further tomorrow.
> Sorry, what I *actually* wanted was a time frame for the nightlies

Ah, ok.  For that, you want https://wiki.mozilla.org/RapidRelease/Calendar which says Fx9 branched off trunk on 2011-09-27 and Fx10 did so on 2011-11-08...  Definitely before your regression range.
This was likely caused by bug 702058
Attached patch don't AllowOfflineRenderers for WebGL (obsolete) — — Splinter Review
This follows an idea from Jeff. Therefore, I expect a complacent review from Jeff.

The objective-c++ syntax in this patch is a shot in the dark.

try: https://tbpl.mozilla.org/?tree=Try&rev=ab309fe21296
Attachment #601156 - Flags: review?(jmuizelaar)
I don't like the name 'preferSpeedOverBatteryLife', how about 'forceDiscreteGPU' and we should add a comment along the lines of:
WebGL doesn't expose the correct mechanism to handle a GPU switch thus it should always be forced to run on the discrete.
Let's see what Jeff thinks of this. I was thinking that it was better to stay at the user level here ('preferSpeedOverBatteryLife') instead of the lower level 'forceDiscreteGPU' but this isn't a very strong opinion.

"WebGL doesn't expose the correct mechanism to handle a GPU switch" would be inaccurate: WebGL does expose the correct (really, the only possible) mechanism to handle GPU switch, in the form of the webglcontextlost / webglcontextrestored events, but 1) almost none of today's content is listening to these events and 2) we are not currently generating these events upon GPU switch (because we had already decided that it was only incidental that we'd allow GPU switch of WebGL content).
Comment on attachment 601156 [details] [diff] [review]
don't AllowOfflineRenderers for WebGL

Review of attachment 601156 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContext.cpp
@@ +423,5 @@
>          format.alpha = 0;
>          format.minAlpha = 0;
>      }
>  
> +    format.preferSpeedOverBatteryLife = true;

I'd prefer a better name here.

::: gfx/gl/GLContext.h
@@ +534,1 @@
>      int colorBits() const { return red + green + blue; }

If you change ContextFormat to context attributes I think this will make more sense.
Attachment #601156 - Flags: review?(jmuizelaar) → review+
This will not work because shared opengl contexts need the same pixel format attributes. We don't have this problem with pbuffers because we don't use a shared context in that case.
Assignee: nobody → bjacob
Here is an idea from Benoit Girard. Not sure if it's been written down before, so here it is.

The idea is that, except possibly in corner cases that we could detect if needed, only one GPU is used at a time. So if we create a dummy OpenGL context without AllowOfflineRenderers, we will switch immediately to the discrete GPU and stay there as long as this context is alive. The usefulness of such a dummy context comes from the fact that it doesn't have to share resources with any other context, so we are free to choose its attributes.

So the idea is: during WebGL context creation, just before creating the actual WebGL context's OpenGL context, create a dummy context without AllowOfflineRenderers. That should be a global refcounted object. Then for the actual WebGL context's OpenGL context, do not change anything (keep sharing, keep AllowOfflineRenderers).

Apple asked us to debounce GPU switching. That would be done by delaying the destruction of the dummy context when its refcount hits 0.
In the last WebGL confcall, Google informed us of a work-around they were doing on dual Intel/NVIDIA Macs:

http://code.google.com/p/chromium/issues/detail?id=113703

I didn't do anything about it as we were busy with worse issues, but now I've looked at their patch and the fix they're doing also consists in using a dummy object to force staying on the discrete GPU:

http://src.chromium.org/viewvc/chrome/trunk/src/ui/gfx/gl/gl_context_cgl.cc?r1=122013&r2=122012&pathrev=122013

This is really interesting as it means that the dummy object that we have to create doesn't have to be a OpenGL context, it can be a CGLPixelFormatObj. That should be cheaper. They are intentionally leaking it, which IIUC means they permanently switching to it. I would rather not do that, instead pursue our refcounted global object idea.
Please test this.

The reason why I didn't bother explicitly with debouncing is that this is controlled by the lifespan of WebGLContext objects, which is controlled by GC, so I'm hoping that in the short term we're getting the debouncing for free from there.

Design comments: I wanted to switch GPUs strictly before we create the new OpenGL context (seemed better to have WebGL on a GL context that was never switched), and I tried to avoid introducing complexity in cross-platform code since at this point there is no reason to believe that this problem will happen on non-Mac.
Forgot hg add.
Attachment #601156 - Attachment is obsolete: true
Attachment #601690 - Attachment is obsolete: true
Attachment #601693 - Attachment is obsolete: true
Attachment #601695 - Flags: review?(jmuizelaar)
I would rather enforce that we only have 0 or 1 CGLPixelFormatAttribute per process but I'm ok with not having that initially.
Attached patch Fixedup one (obsolete) — — Splinter Review
Attachment #601695 - Attachment is obsolete: true
Attachment #601711 - Flags: review+
Attachment #601695 - Flags: review?(jmuizelaar)
Attached patch Fixeduper — — Splinter Review
Attachment #601711 - Attachment is obsolete: true
Attachment #601719 - Flags: review+
With the previous patch we observed one corrupt canvas frame, upon WebGL context creation. It seems that the GPU switch, which was occuring during the WebGLContext ctor, was introducing a new frame, and the CanvasLayerOGL code is not prepared to deal with a half-constructed canvas context object.

So this patch defers the GPU switch to the SetDimensions call, just before we construct the actual GL context. A side benefit of this is that when WebGL is blacklisted (e.g. Mac OS 10.5), we avoid doing any work.
Attachment #601722 - Flags: review?(jmuizelaar)
Attachment #601719 - Attachment is obsolete: true
Attachment #601722 - Attachment is obsolete: true
Attachment #601722 - Flags: review?(jmuizelaar)
Comment on attachment 601719 [details] [diff] [review]
Fixeduper

Let's do this for now, as deferring the gpu switch doesn't help with the corrupted frame.
Attachment #601719 - Attachment is obsolete: false
Depends on: 731767
Filed bug 731767 about the corrupt frame.
http://hg.mozilla.org/mozilla-central/rev/1c3b291d0830
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment on attachment 601719 [details] [diff] [review]
Fixeduper

[Approval Request Comment]
Regression caused by (bug #): 
User impact if declined: 
Testing completed (on m-c, etc.): 
Risk to taking this patch (and alternatives if risky):
String changes made by this patch:
Attachment #601719 - Flags: approval-mozilla-beta?
Attachment #601719 - Flags: approval-mozilla-aurora?
[Approval Request Comment]
Regression caused by (bug #): bug 702058, when we swiched to FBOs for CGL offscreen contexts
User impact if declined: risk of information leakage: exposing random video memory upon GPU switch. Also, poor WebGL performance as the integrated GPU is slower than the discrete one.
Testing completed (on m-c, etc.): just landed on m-c
Risk to taking this patch (and alternatives if risky): we managed to make it pretty minimal. Just a few lines of code creating a CGLPixelFormat object... doesn't seem risky to me.
String changes made by this patch: none
Removing qawanted since this appears to have been fixed.
Comment on attachment 601719 [details] [diff] [review]
Fixeduper

[Triage Comment]
Approving for Aurora 12 but leaving this unfixed for FF11 given the risk versus reward with only one external report (with gfxCardStatus installed).
Attachment #601719 - Flags: approval-mozilla-beta?
Attachment #601719 - Flags: approval-mozilla-beta-
Attachment #601719 - Flags: approval-mozilla-aurora?
Attachment #601719 - Flags: approval-mozilla-aurora+
We'll want to fix this on ESR once this has gotten some testing.
CCing Gerv for the licencing part of the patch.
djcater: what's the licensing issue here?

Gerv
Gerv: look at the new file I added in this patch, it borrows some code from Chromium, I'm only crediting it in a comment in the function itself. Should I change the header at the top of the file? how?
bjacob: just the four lines in that function? I'd say it's on the borderline of copyrightability, given that I can't see many other ways you could have written "initialize a variable and call a function". I'd just leave it as it is. We have the Chromium license in about:license anyway.

Gerv
Right, that's why I didn't spontaneously ping you about it :) also why I didn't just rewrite it: the rewrite would indeed have been nearly identical. Thanks for checking this.
Setting wontfix for FF11 as per #32
I'm not a Big Bang type! I want my computer to work when I'm strongly advised to update.
Last time there were problems of some kind.
As soon as I see there'll be "issues"("unresolved",what's more!)I keep clear.
Can you not make things clearer/offer straightforward help for us neophytes who'd love to learn?
It's almost like Medieval Catholics having to rely on priests to get to god !
[Triage Comment]
Has this gotten enough testing to land on ESR yet?  Please nominate for approval-esr10 if it has. See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for details.
I think that this has received enough testing for ESR.

We have not seen any clear regression from it (it's not clear how reproducible bug 731767 is, and it's not clear at all that it's a regression).

But on the other hand, we know that this fixes a very significant bug. So, ESR users are better off with this patch.
Attachment #601719 - Flags: approval-mozilla-esr10?
Comment on attachment 601719 [details] [diff] [review]
Fixeduper

[Triage Comment]
Thanks for the quick response, please go ahead and land.
Attachment #601719 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+
Whiteboard: [qa+]
Blocks: 741270
Whiteboard: [qa+] → [qa+][gfx.relnote.13]
I'm having trouble tracking down someone with a dual GPU Macbook. Reuben, can you help us verify this bug by seeing if the bug reproduces in Firefox 12, 13.0b6, or the latest-mozilla-esr10 nightly?
I've put one more call out to QA -- failing that we won't be able to verify this. Benoit, Joe, is there anywhere else we can turn to to get this fix verified?
I have a machine here, but I need some better expected results to test.  In running the latest beta I haven't seen my card switch over to discrete yet (discrete box checked) when running on Google Maps while using http://codykrieger.com/gfxCardStatus.

Also does it matter if multiple browsers are open when this test is initiated?
To verify this bug:

 - install gfxCardStatus (Marcia's link) and verify that your mac supports dynamic GPU switching (only recent Macs do). You can verify that by looking at the gfxCardStatus menu: only on supporting Macs does it offer the default/auto-switch option. On older macs, you can only force one GPU or the other.

 - Make sure you're in the default auto-switch mode, and that the currently in-use GPU is the integrated one (gfxCardStatus's icon should be 'i'). If the currently used GPU is the discrete one (icon is then 'd'), try quitting applications until it's switched back to 'i'.

 - Start Firefox on default start page. Should stay on 'i'.

 - Visit a page that uses WebGL, for instance http://webglsamples.googlecode.com/hg/aquarium/aquarium.html . Should switch to 'd'.

 - Close all tabs that uses WebGL but keep Firefox open. Wait until GC happens so the WebGL contexts actually go away: you can go to about:memory to check if it still mentions WebGL under 'other measurements', and use the GC/CC buttons to accelerate them going away.

 - As the last WebGL context goes away, gfxCardStatus should switch back to 'i'.

Regarding Google Maps: note that it only uses WebGL if you enabled 'MapsGL'. about:memory is the best way to tell if WebGL is actually being used.
Marcia, can you please try to test this before release tomorrow?
Verified fixed using the Firefox 13 candidate build and Firefox 12 and Benoit's STR from 51.

Using the latest 10esr nightly, the card immediately switches to "D" so it is difficult to test this - I cannot get it to switch back even with a GC/CC.
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.6esrpre) Gecko/20120602 Firefox/10.0.6esrpre
Thanks Marcia. I think we can "trust" this is fixed for ESR given that it's now verified fixed in Firefox 12 and 13. Benoit, please let us know if there is more you want tested. Otherwise, I'm calling this VERIFIED FIXED.
Status: RESOLVED → VERIFIED
Whiteboard: [qa+][gfx.relnote.13] → [qa+:marcia][gfx.relnote.13]
My understanding from re-reading the comments (the tracking flags might be a little off here) is that the bug fixed here was a regression in FF 11, so FF 10 ESR would still have the old behavior that it always uses the discrete GPU as Marcia observed in Comment 53. Not great, but not the bug handled here.
Going back through some old unverified ESR fixes, should this even be marked esr10:fixed based on comment 55? Benoit?
QA Contact: mozillamarcia.knous
Whiteboard: [qa+:marcia][gfx.relnote.13] → [gfx.relnote.13]
Based on comment 55, I would say esr10:unaffected.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: