Last Comment Bug 713305 - WebGL no longer triggers discrete graphics mode on dual GPU MacBook, affecting content performance
: WebGL no longer triggers discrete graphics mode on dual GPU MacBook, affectin...
Status: VERIFIED FIXED
[gfx.relnote.13]
: regression
Product: Core
Classification: Components
Component: Canvas: WebGL (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: mozilla13
Assigned To: Benoit Jacob [:bjacob] (mostly away)
: Marcia Knous [:marcia - use ni]
Mentors:
Depends on: 731767
Blocks: 741270
  Show dependency treegraph
 
Reported: 2011-12-23 15:33 PST by Reuben Morais [:reuben]
Modified: 2012-09-14 07:49 PDT (History)
17 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
affected
+
wontfix
verified
verified
12+
fixed


Attachments
don't AllowOfflineRenderers for WebGL (3.74 KB, patch)
2012-02-27 19:31 PST, Benoit Jacob [:bjacob] (mostly away)
jmuizelaar: review+
Details | Diff | Review
force discrete GPU while any WebGL context is alive (2.03 KB, patch)
2012-02-29 11:48 PST, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review
force discrete GPU while any WebGL context is alive (4.78 KB, patch)
2012-02-29 11:49 PST, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review
force discrete GPU while any WebGL context is alive (5.25 KB, patch)
2012-02-29 12:00 PST, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review
Fixedup one (1.49 KB, patch)
2012-02-29 12:33 PST, Jeff Muizelaar [:jrmuizel]
jmuizelaar: review+
Details | Diff | Review
Fixeduper (4.47 KB, patch)
2012-02-29 12:50 PST, Jeff Muizelaar [:jrmuizel]
jmuizelaar: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta-
lukasblakk+bugs: approval‑mozilla‑esr10+
Details | Diff | Review
defer the GPU switch to the SetDimensions call, just before the actual GL context creation (5.97 KB, patch)
2012-02-29 12:56 PST, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review

Description Reuben Morais [:reuben] 2011-12-23 15:33:07 PST
Just noticed this after reinstalling gfxCardStatus and trying Google Maps GL.
Comment 1 Joe Drew (not getting mail) 2011-12-23 16:02:32 PST
I thought that WebGL was always supposed to trigger discrete mode. Strange.
Comment 2 Boris Zbarsky [:bz] 2011-12-23 19:47:17 PST
Reuben, can you try finding the regression range?
Comment 3 Benoit Girard (:BenWa) 2011-12-24 10:38:56 PST
What version did you see this behavior with? I tried beta-10 and the behavior I see is that WebGL switches to discrete and doesn't return to integrated even after leaving the page. Also double check that gfxCardStatus is set to 'Dynamic'.
Comment 4 Reuben Morais [:reuben] 2011-12-24 16:26:55 PST
(In reply to Boris Zbarsky (:bz) from comment #2)
Ay!

(In reply to Benoit Girard (:BenWa) from comment #3)
>What version did you see this behavior with?
Nightly 11 and 12.

> I tried beta-10 and the behavior I see is that WebGL switches to 
> discrete and doesn't return to integrated even after leaving the page.
It should switch back to integrated if your CFBundleIdentifier is the Firefox one (not Nightly or Aurora). If it doesn't, then that's another regression.

> Also double check that gfxCardStatus is set to 'Dynamic'.
I don't actually use its force GPU feature, it causes system instability.
Comment 5 Alex Keybl [:akeybl] 2012-02-13 14:16:42 PST
QA: Let's try to reproduce this with FF10 and FF11 and help out with a regression range.

1) Install gfxCardStatus: http://codykrieger.com/gfxCardStatus
2) Make sure the menu item is set to Dynamic Switching
3) Note the graphics card currently in use in the menu item
4) Go to https://maps.google.com/, click "Experience MapsGL" if you haven't already
5) Note the graphics card currently in use in the menu item
6) Close the Google Maps tab, wait a couple of minutes, and again note the graphics card currently in use

Please report your results back here. Thanks!
Comment 6 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-02-14 10:37:54 PST
Does this explicitly require dual-GPU Macbooks? I'd like to help with a regression range but I fear I might not have the hardware necessary to reproduce.
Comment 7 Reuben Morais [:reuben] 2012-02-14 15:09:35 PST
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #6)
> Does this explicitly require dual-GPU Macbooks?

Yes :/

While you're here, can you point me where the nightlies between 9 and 10 are available?
Comment 8 Boris Zbarsky [:bz] 2012-02-14 16:11:15 PST
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/
Comment 9 Reuben Morais [:reuben] 2012-02-14 17:53:06 PST
(In reply to Boris Zbarsky (:bz) from comment #8)
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/

Sorry, what I *actually* wanted was a time frame for the nightlies between Fx 9 and 10, but that wouldn't help me anyway, as my initial guess was quite far from the actual range I'm getting:

24-Nov-2011 works
26-Nov-2011 fails

It's late and my system is very unstable for some reason so I'll try to bisect further tomorrow.
Comment 10 Boris Zbarsky [:bz] 2012-02-14 17:59:20 PST
> Sorry, what I *actually* wanted was a time frame for the nightlies

Ah, ok.  For that, you want https://wiki.mozilla.org/RapidRelease/Calendar which says Fx9 branched off trunk on 2011-09-27 and Fx10 did so on 2011-11-08...  Definitely before your regression range.
Comment 11 Jeff Muizelaar [:jrmuizel] 2012-02-27 15:26:07 PST
This was likely caused by bug 702058
Comment 12 Benoit Jacob [:bjacob] (mostly away) 2012-02-27 19:31:16 PST
Created attachment 601156 [details] [diff] [review]
don't AllowOfflineRenderers for WebGL

This follows an idea from Jeff. Therefore, I expect a complacent review from Jeff.

The objective-c++ syntax in this patch is a shot in the dark.

try: https://tbpl.mozilla.org/?tree=Try&rev=ab309fe21296
Comment 13 Benoit Girard (:BenWa) 2012-02-27 20:03:30 PST
I don't like the name 'preferSpeedOverBatteryLife', how about 'forceDiscreteGPU' and we should add a comment along the lines of:
WebGL doesn't expose the correct mechanism to handle a GPU switch thus it should always be forced to run on the discrete.
Comment 14 Benoit Jacob [:bjacob] (mostly away) 2012-02-28 05:28:26 PST
Let's see what Jeff thinks of this. I was thinking that it was better to stay at the user level here ('preferSpeedOverBatteryLife') instead of the lower level 'forceDiscreteGPU' but this isn't a very strong opinion.

"WebGL doesn't expose the correct mechanism to handle a GPU switch" would be inaccurate: WebGL does expose the correct (really, the only possible) mechanism to handle GPU switch, in the form of the webglcontextlost / webglcontextrestored events, but 1) almost none of today's content is listening to these events and 2) we are not currently generating these events upon GPU switch (because we had already decided that it was only incidental that we'd allow GPU switch of WebGL content).
Comment 15 Jeff Muizelaar [:jrmuizel] 2012-02-28 09:55:13 PST
Comment on attachment 601156 [details] [diff] [review]
don't AllowOfflineRenderers for WebGL

Review of attachment 601156 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContext.cpp
@@ +423,5 @@
>          format.alpha = 0;
>          format.minAlpha = 0;
>      }
>  
> +    format.preferSpeedOverBatteryLife = true;

I'd prefer a better name here.

::: gfx/gl/GLContext.h
@@ +534,1 @@
>      int colorBits() const { return red + green + blue; }

If you change ContextFormat to context attributes I think this will make more sense.
Comment 16 Jeff Muizelaar [:jrmuizel] 2012-02-28 17:26:52 PST
This will not work because shared opengl contexts need the same pixel format attributes. We don't have this problem with pbuffers because we don't use a shared context in that case.
Comment 17 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 08:03:51 PST
Here is an idea from Benoit Girard. Not sure if it's been written down before, so here it is.

The idea is that, except possibly in corner cases that we could detect if needed, only one GPU is used at a time. So if we create a dummy OpenGL context without AllowOfflineRenderers, we will switch immediately to the discrete GPU and stay there as long as this context is alive. The usefulness of such a dummy context comes from the fact that it doesn't have to share resources with any other context, so we are free to choose its attributes.

So the idea is: during WebGL context creation, just before creating the actual WebGL context's OpenGL context, create a dummy context without AllowOfflineRenderers. That should be a global refcounted object. Then for the actual WebGL context's OpenGL context, do not change anything (keep sharing, keep AllowOfflineRenderers).

Apple asked us to debounce GPU switching. That would be done by delaying the destruction of the dummy context when its refcount hits 0.
Comment 18 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 08:41:45 PST
In the last WebGL confcall, Google informed us of a work-around they were doing on dual Intel/NVIDIA Macs:

http://code.google.com/p/chromium/issues/detail?id=113703

I didn't do anything about it as we were busy with worse issues, but now I've looked at their patch and the fix they're doing also consists in using a dummy object to force staying on the discrete GPU:

http://src.chromium.org/viewvc/chrome/trunk/src/ui/gfx/gl/gl_context_cgl.cc?r1=122013&r2=122012&pathrev=122013

This is really interesting as it means that the dummy object that we have to create doesn't have to be a OpenGL context, it can be a CGLPixelFormatObj. That should be cheaper. They are intentionally leaking it, which IIUC means they permanently switching to it. I would rather not do that, instead pursue our refcounted global object idea.
Comment 19 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 11:48:35 PST
Created attachment 601690 [details] [diff] [review]
force discrete GPU while any WebGL context is alive

Please test this.

The reason why I didn't bother explicitly with debouncing is that this is controlled by the lifespan of WebGLContext objects, which is controlled by GC, so I'm hoping that in the short term we're getting the debouncing for free from there.

Design comments: I wanted to switch GPUs strictly before we create the new OpenGL context (seemed better to have WebGL on a GL context that was never switched), and I tried to avoid introducing complexity in cross-platform code since at this point there is no reason to believe that this problem will happen on non-Mac.
Comment 20 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 11:49:52 PST
Created attachment 601693 [details] [diff] [review]
force discrete GPU while any WebGL context is alive

Forgot hg add.
Comment 21 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 12:00:06 PST
Created attachment 601695 [details] [diff] [review]
force discrete GPU while any WebGL context is alive
Comment 22 Benoit Girard (:BenWa) 2012-02-29 12:09:01 PST
I would rather enforce that we only have 0 or 1 CGLPixelFormatAttribute per process but I'm ok with not having that initially.
Comment 23 Jeff Muizelaar [:jrmuizel] 2012-02-29 12:33:29 PST
Created attachment 601711 [details] [diff] [review]
Fixedup one
Comment 24 Jeff Muizelaar [:jrmuizel] 2012-02-29 12:50:49 PST
Created attachment 601719 [details] [diff] [review]
Fixeduper
Comment 25 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 12:56:11 PST
Created attachment 601722 [details] [diff] [review]
defer the GPU switch to the SetDimensions call, just before the actual GL context creation

With the previous patch we observed one corrupt canvas frame, upon WebGL context creation. It seems that the GPU switch, which was occuring during the WebGLContext ctor, was introducing a new frame, and the CanvasLayerOGL code is not prepared to deal with a half-constructed canvas context object.

So this patch defers the GPU switch to the SetDimensions call, just before we construct the actual GL context. A side benefit of this is that when WebGL is blacklisted (e.g. Mac OS 10.5), we avoid doing any work.
Comment 26 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 13:31:42 PST
Comment on attachment 601719 [details] [diff] [review]
Fixeduper

Let's do this for now, as deferring the gpu switch doesn't help with the corrupted frame.
Comment 27 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 13:41:51 PST
Filed bug 731767 about the corrupt frame.
Comment 28 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 13:45:53 PST
http://hg.mozilla.org/mozilla-central/rev/1c3b291d0830
Comment 29 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 13:46:38 PST
Comment on attachment 601719 [details] [diff] [review]
Fixeduper

[Approval Request Comment]
Regression caused by (bug #): 
User impact if declined: 
Testing completed (on m-c, etc.): 
Risk to taking this patch (and alternatives if risky):
String changes made by this patch:
Comment 30 Benoit Jacob [:bjacob] (mostly away) 2012-02-29 13:50:03 PST
[Approval Request Comment]
Regression caused by (bug #): bug 702058, when we swiched to FBOs for CGL offscreen contexts
User impact if declined: risk of information leakage: exposing random video memory upon GPU switch. Also, poor WebGL performance as the integrated GPU is slower than the discrete one.
Testing completed (on m-c, etc.): just landed on m-c
Risk to taking this patch (and alternatives if risky): we managed to make it pretty minimal. Just a few lines of code creating a CGLPixelFormat object... doesn't seem risky to me.
String changes made by this patch: none
Comment 31 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-03-01 13:45:49 PST
Removing qawanted since this appears to have been fixed.
Comment 32 Alex Keybl [:akeybl] 2012-03-01 17:52:17 PST
Comment on attachment 601719 [details] [diff] [review]
Fixeduper

[Triage Comment]
Approving for Aurora 12 but leaving this unfixed for FF11 given the risk versus reward with only one external report (with gfxCardStatus installed).
Comment 33 Benoit Jacob [:bjacob] (mostly away) 2012-03-02 08:49:31 PST
http://hg.mozilla.org/releases/mozilla-aurora/rev/c9d984d834a8
Comment 34 Benoit Jacob [:bjacob] (mostly away) 2012-03-02 09:02:27 PST
fixed copyright year: http://hg.mozilla.org/integration/mozilla-inbound/rev/775c60c67e37
Comment 35 Daniel Veditz [:dveditz] 2012-03-02 14:09:12 PST
We'll want to fix this on ESR once this has gotten some testing.
Comment 36 Daniel Cater 2012-03-03 11:55:18 PST
CCing Gerv for the licencing part of the patch.
Comment 37 Ed Morley [:emorley] 2012-03-04 05:06:05 PST
https://hg.mozilla.org/mozilla-central/rev/775c60c67e37
Comment 38 Gervase Markham [:gerv] 2012-03-05 02:11:16 PST
djcater: what's the licensing issue here?

Gerv
Comment 39 Benoit Jacob [:bjacob] (mostly away) 2012-03-05 06:11:56 PST
Gerv: look at the new file I added in this patch, it borrows some code from Chromium, I'm only crediting it in a comment in the function itself. Should I change the header at the top of the file? how?
Comment 40 Gervase Markham [:gerv] 2012-03-05 06:15:53 PST
bjacob: just the four lines in that function? I'd say it's on the borderline of copyrightability, given that I can't see many other ways you could have written "initialize a variable and call a function". I'd just leave it as it is. We have the Chromium license in about:license anyway.

Gerv
Comment 41 Benoit Jacob [:bjacob] (mostly away) 2012-03-05 06:19:06 PST
Right, that's why I didn't spontaneously ping you about it :) also why I didn't just rewrite it: the rewrite would indeed have been nearly identical. Thanks for checking this.
Comment 42 JP Rosevear [:jpr] 2012-03-05 06:47:40 PST
Setting wontfix for FF11 as per #32
Comment 43 fletcher.j@videotron.ca 2012-03-17 17:00:09 PDT
I'm not a Big Bang type! I want my computer to work when I'm strongly advised to update.
Last time there were problems of some kind.
As soon as I see there'll be "issues"("unresolved",what's more!)I keep clear.
Can you not make things clearer/offer straightforward help for us neophytes who'd love to learn?
It's almost like Medieval Catholics having to rely on priests to get to god !
Comment 44 Lukas Blakk [:lsblakk] use ?needinfo 2012-03-20 10:14:52 PDT
[Triage Comment]
Has this gotten enough testing to land on ESR yet?  Please nominate for approval-esr10 if it has. See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for details.
Comment 45 Benoit Jacob [:bjacob] (mostly away) 2012-03-21 07:23:05 PDT
I think that this has received enough testing for ESR.

We have not seen any clear regression from it (it's not clear how reproducible bug 731767 is, and it's not clear at all that it's a regression).

But on the other hand, we know that this fixes a very significant bug. So, ESR users are better off with this patch.
Comment 46 Lukas Blakk [:lsblakk] use ?needinfo 2012-03-21 14:01:43 PDT
Comment on attachment 601719 [details] [diff] [review]
Fixeduper

[Triage Comment]
Thanks for the quick response, please go ahead and land.
Comment 47 Benoit Jacob [:bjacob] (mostly away) 2012-03-26 13:03:12 PDT
http://hg.mozilla.org/releases/mozilla-esr10/rev/def29cb4e306
Comment 48 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-30 10:20:36 PDT
I'm having trouble tracking down someone with a dual GPU Macbook. Reuben, can you help us verify this bug by seeing if the bug reproduces in Firefox 12, 13.0b6, or the latest-mozilla-esr10 nightly?
Comment 49 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-03 11:20:32 PDT
I've put one more call out to QA -- failing that we won't be able to verify this. Benoit, Joe, is there anywhere else we can turn to to get this fix verified?
Comment 50 Marcia Knous [:marcia - use ni] 2012-06-03 12:05:24 PDT
I have a machine here, but I need some better expected results to test.  In running the latest beta I haven't seen my card switch over to discrete yet (discrete box checked) when running on Google Maps while using http://codykrieger.com/gfxCardStatus.

Also does it matter if multiple browsers are open when this test is initiated?
Comment 51 Benoit Jacob [:bjacob] (mostly away) 2012-06-03 14:53:07 PDT
To verify this bug:

 - install gfxCardStatus (Marcia's link) and verify that your mac supports dynamic GPU switching (only recent Macs do). You can verify that by looking at the gfxCardStatus menu: only on supporting Macs does it offer the default/auto-switch option. On older macs, you can only force one GPU or the other.

 - Make sure you're in the default auto-switch mode, and that the currently in-use GPU is the integrated one (gfxCardStatus's icon should be 'i'). If the currently used GPU is the discrete one (icon is then 'd'), try quitting applications until it's switched back to 'i'.

 - Start Firefox on default start page. Should stay on 'i'.

 - Visit a page that uses WebGL, for instance http://webglsamples.googlecode.com/hg/aquarium/aquarium.html . Should switch to 'd'.

 - Close all tabs that uses WebGL but keep Firefox open. Wait until GC happens so the WebGL contexts actually go away: you can go to about:memory to check if it still mentions WebGL under 'other measurements', and use the GC/CC buttons to accelerate them going away.

 - As the last WebGL context goes away, gfxCardStatus should switch back to 'i'.

Regarding Google Maps: note that it only uses WebGL if you enabled 'MapsGL'. about:memory is the best way to tell if WebGL is actually being used.
Comment 52 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-04 13:50:57 PDT
Marcia, can you please try to test this before release tomorrow?
Comment 53 Marcia Knous [:marcia - use ni] 2012-06-04 15:46:36 PDT
Verified fixed using the Firefox 13 candidate build and Firefox 12 and Benoit's STR from 51.

Using the latest 10esr nightly, the card immediately switches to "D" so it is difficult to test this - I cannot get it to switch back even with a GC/CC.
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.6esrpre) Gecko/20120602 Firefox/10.0.6esrpre
Comment 54 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-04 15:55:31 PDT
Thanks Marcia. I think we can "trust" this is fixed for ESR given that it's now verified fixed in Firefox 12 and 13. Benoit, please let us know if there is more you want tested. Otherwise, I'm calling this VERIFIED FIXED.
Comment 55 Benoit Jacob [:bjacob] (mostly away) 2012-06-04 17:39:04 PDT
My understanding from re-reading the comments (the tracking flags might be a little off here) is that the bug fixed here was a regression in FF 11, so FF 10 ESR would still have the old behavior that it always uses the discrete GPU as Marcia observed in Comment 53. Not great, but not the bug handled here.
Comment 56 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-13 15:11:35 PDT
Going back through some old unverified ESR fixes, should this even be marked esr10:fixed based on comment 55? Benoit?
Comment 57 Benoit Jacob [:bjacob] (mostly away) 2012-09-14 07:49:55 PDT
Based on comment 55, I would say esr10:unaffected.

Note You need to log in before you can comment on or make changes to this bug.