Closed Bug 987497 Opened 6 years ago Closed 5 years ago

Elements flash white when mousing over them or scrolling the page

Categories

(Core :: Graphics: Layers, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla33
Tracking Status
firefox30 --- wontfix
firefox31 + verified
firefox32 + verified
firefox33 + fixed

People

(Reporter: harth, Assigned: mstange)

References

Details

Attachments

(4 files, 2 obsolete files)

I've been putting up with a debilitating bug for a few weeks that I can't reproduce on my other systems.

Across all pages and including the Firefox and devtools UI, elements will turn completely white and/or their text will turn white. Sometimes the white just flashes, and sometimes it stays. It's usually triggered by a mousemove out of the element, or a page scroll, in which the entire page can go white. This happens all the time.

Regression range pushlog from mozregression:

Last good revision: 6e3ec93efe1d (2014-02-18)
First bad revision: bf0e76f2a7d4 (2014-02-19)
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6e3ec93efe1d&tochange=bf0e76f2a7d4

Inbound:

Last good revision: 1a0927d0558b
First bad revision: c0e256be4775
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=1a0927d0558b&tochange=c0e256be4775

I'm on OS X 10.8.2 on an Air. I can't reproduce this on my Pro.
From the inbound range, maybe bug 926128?
Component: General → Graphics: Layers
Almost certainly. Morris, can you take a look at this? Might be specific to some drivers/hardware.
Blocks: 926128
Flags: needinfo?(mtseng)
Hi harth,
Do you have any log when issue happened? Or have any step to reproduce this issue? It will be easier for debugging when having more information. Thanks.
Flags: needinfo?(mtseng) → needinfo?(fayearthur)
I just tried a debug build. Still occurs, but I don't see anything in the dump when the flashes occur, and in general don't see anything relevant.

If you use a build for more than a minute you'll definitely notice it. Right now on http://google.com if you hover over the "I'm feeling lucky" button, its text changes to "I'm feeling puzzled" and the text flashes white for about one second.

Some things only turn white some of the time though, sometimes the text of the tab bars will turn white, but that doesn't happen every time I use the browser. As I was typing this, this text area went white and stayed for a few seconds then went back.
Flags: needinfo?(fayearthur)
Here's the debug log anyways, it says a couple things about OpenGL near the beginning: https://gist.github.com/harthur/9ef6d49be03cc71446bb
Wonder if this is related to the other depth texture unsupported bug we have...
This is a screenshot from when I was trying to search twitter for 'aurora white'. Clicking on the search field in twitter popped open a white autocomplete list and turned the top bar white as well. This textarea just turned white too.

This is on Aurora, and it's going to hit Beta in a couple weeks. If this is on any other computer, trust me, they won't be able to use Firefox. I'm only putting up with it because I have to to work on Firefox.
Any idea what's going on given the new information I added?
Flags: needinfo?(mtseng)
I still cannot reproduce this issue on my mac by using firefox aurora or latest build firefox. Can you provide change list of your about:config settings?
Flags: needinfo?(mtseng)
(In reply to Morris Tseng [:mtseng] from comment #9)
> I still cannot reproduce this issue on my mac by using firefox aurora or
> latest build firefox. Can you provide change list of your about:config
> settings?

Clean profile, mozregression runs a clean profile on every build as well. I've got a 2011 MacBook Air. Do these lines in the log help at all?:

OpenGL version detected: 210
OpenGL vendor: NVIDIA Corporation
OpenGL renderer: NVIDIA GeForce 320M OpenGL Engine
[97122] WARNING: depth_texture marked as unsupported: file /Users/user/hgs/fx-team/gfx/gl/GLContextFeatures.cpp, line 536
OpenGL version detected: 210
OpenGL vendor: NVIDIA Corporation
OpenGL renderer: NVIDIA GeForce 320M OpenGL Engine
[97122] WARNING: depth_texture marked as unsupported: file /Users/user/hgs/fx-team/gfx/gl/GLContextFeatures.cpp, line 536
[97122] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /Users/user/hgs/fx-team/dom/events/ContentEventHandler.cpp, line 101
Does that help at all Morris?
Flags: needinfo?(mtseng)
Is there anyone else that could take a look at this?
I have tested on my another Macbook Air 2011-mid. But my OS is 10.9.2, not 10.8. I use Firefox Aurora but everything is ok. And from you log, your GPU is GeForce320M, So I think you model is 2010-late. log also shows depth_texture marked as unsupported, maybe this is problem. I'll try to investigate from here.
Flags: needinfo?(mtseng)
(In reply to Morris Tseng [:mtseng] from comment #13)
> I have tested on my another Macbook Air 2011-mid. But my OS is 10.9.2, not
> 10.8. I use Firefox Aurora but everything is ok. And from you log, your GPU
> is GeForce320M, So I think you model is 2010-late. log also shows
> depth_texture marked as unsupported, maybe this is problem. I'll try to
> investigate from here.

Aha, it is Late 2010.
Just did an hg bisect to make sure, it is indeed regressed by bug 926128.

If it helps, it's almost always the text that goes white.
Are you in MV or SF? My patch originally. I can take a look.
(In reply to Andreas Gal :gal from comment #16)
> Are you in MV or SF? My patch originally. I can take a look.

I'm in SF myself.
The symptoms appear to be a swizzling bug (ie. ARGB/BGRA mismatch that corrects itself on the next refresh tick.) It does seem card-specific (NVIDIA GeForce 320M 256 MB VRAM) as my MBAir has Intel graphics and doesn't show the symptoms. Markus: have you seen this one before?
Flags: needinfo?(mstange)
I have not, unfortunately. Heather, does this still happen if you flip layers.componentalpha.enabled to false? (Probably needs a restart.)
Flags: needinfo?(mstange)
You could also try running a debug build with the environment variable MOZ_GL_DEBUG set, and attaching the console output here.
(In reply to Markus Stange [:mstange] from comment #19)
> I have not, unfortunately. Heather, does this still happen if you flip
> layers.componentalpha.enabled to false? (Probably needs a restart.)

That fixes it. I can use Firefox now! But I think it's still important to fix the underlying issue as other people might have my setup.
Duplicate of this bug: 1027598
We need to do something here. This affects Firefox 30 which is in Beta right now. Can we back out bug 926128 there? We could also try blacklisting layers acceleration on the affected graphics cards, but that seems tricky since we've already seen two different cards that fall over (NVIDIA GeForce 320M in this bug and NVIDIA GeForce 9400M in bug 1027598, and maybe another in bug 1029079).

We also have some complaints about this bug on https://input.mozilla.org/en-US/?date_start=2014-06-12&selected=10d&q=white&platform=OS+X&happy=0
Duplicate of this bug: 1029079
This reproduces for me on a Mid-2010 13" MBP with Nvidia 320M running OS X 10.8.4
(In reply to Markus Stange [:mstange] from comment #23)
> This affects Firefox 30 which is in Beta right now.

Oops, that's wrong, Firefox 30 was already released.
I don't think backing out bug 926128 is practical, but I agree this is a big deal.  CJ, any Airs that match the configuration in the Taipei office that Morris can borrow?
Flags: needinfo?(cku)
Attached file testcase
Heather, Stuart and Yuval, can you please do the following for me:
 1) Download the try build at https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.com-86fceccd2a11/try-macosx64/firefox-33.0a1.en-US.mac.dmg
 2) Open attachment 8445545 [details] in it and take a screenshot of it
 3) Upload the screenshot to this bug

This will let us narrow down where things are going wrong.
I forgot to mention: Please re-enable component alpha (and hardware acceleration) before taking the screenshot.
Assignee: nobody → mtseng
Flags: needinfo?(cku)
Here's a screenshot of the test page, taken with a fresh profile.
Thanks!
Attached patch possible fix (obsolete) — Splinter Review
The screenshot shows two things:
 - When only one pass is drawn, the values we pass to program->SetTexturePass2 are respected properly.
 - When both passes are drawn, the second pass is drawn as if the TexturePass2 uniform was still set to false, regardless of whether we set it to true or false.

This makes me think that the uniform change isn't picked up after the first draw with the bound shader. A bit of googling brought me to http://stackoverflow.com/questions/16608998/glsl-uniform-only-being-updated-by-unrelated-calls which describes a very similar problem on Mac OS X 10.8.3 with a NVIDIA GeForce 9400M. This matches Stuart's configuration almost exactly (Stuart has 10.8.5 instead of 10.8.3). One of the workarounds suggested in the stack overflow post is to call glUseProgram again after the first draw, so that's what this patch is doing.

In about an hour, a try build with this patch will appear at https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.com-d3e4745dd993/try-macosx64/firefox-33.0a1.en-US.mac.dmg .
(In reply to Markus Stange [:mstange] from comment #33)
>  - When both passes are drawn, the second pass is drawn as if the
> TexturePass2 uniform was still set to false, regardless of whether we set it
> to true or false.

However, if both passes are drawn (last column in the screenshot) and the first pass is drawn with program->SetTexturePass2(true) (first and third row), drawing the second pass gives *different* results depending on whether true or false is passed to SetTexturePass2 for the second pass (it's true in third row and false in the first). This is unexpected and reduces my confidence in the patch somewhat.
The build at https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.com-d3e4745dd993/try-macosx64/firefox-33.0a1.en-US.mac.dmg has finished.

Heather / Stuart / Yuval, can you please test whether the bug is fixed in this build?
(In reply to Markus Stange [:mstange] from comment #35)
> Heather / Stuart / Yuval, can you please test whether the bug is fixed in
> this build?

Appears to be fixed, yes.
Attached patch fixSplinter Review
Yay!

This is the same patch as before, but condition the workaround on gl()->Vendor() == GLVendor::NVIDIA && !nsCocoaFeatures::OnMavericksOrLater() because that's what all three reports have in common.
Assignee: mtseng → mstange
Attachment #8445801 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8445817 - Flags: review?(bjacob)
Sounds like a great find :-) but surprising enough that it seems worth double-checking that there isn't a bug on our side here. That this would be a driver work-around means that Apple OpenGL libraries would be forgetting the current program binding. If I understand correctly, program->Activate() is how we set that current program binding. We do call program->Activate() earlier in this DrawQuad function. Its code is:

http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/OGLShaderProgram.cpp#526

void
ShaderProgramOGL::Activate()
{
  if (mProgramState == STATE_NEW) {
    if (!Initialize()) {
      NS_WARNING("Shader could not be initialised");
      return;
    }
  }
  NS_ASSERTION(HasInitialized(), "Attempting to activate a program that's not in use!");
  mGL->fUseProgram(mProgram);
}

So it seems that it does not unconditionally call glUseProgram. If we hit the "Shader could not be initialised" exit path, then glUseProgram isn't called. Could you please check if that is the case?
Flags: needinfo?(mstange)
(In reply to Benoit Jacob [:bjacob] from comment #38)
> That this would be
> a driver work-around means that Apple OpenGL libraries would be forgetting
> the current program binding.

Or worse. From the stack overflow post I mentioned in comment 33:

"I have also checked GL_CURRENT_PROGRAM within the loop, and the shader remains bound throughout."

I'm going to trust this person's findings and rely on the assumption that we're hitting the same bug.

> If I understand correctly, program->Activate()
> is how we set that current program binding. We do call program->Activate()
> earlier in this DrawQuad function. Its code is:
> 
> http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/
> OGLShaderProgram.cpp#526
> 
> void
> ShaderProgramOGL::Activate()
> {
>   if (mProgramState == STATE_NEW) {
>     if (!Initialize()) {
>       NS_WARNING("Shader could not be initialised");
>       return;
>     }
>   }
>   NS_ASSERTION(HasInitialized(), "Attempting to activate a program that's
> not in use!");
>   mGL->fUseProgram(mProgram);
> }
> 
> So it seems that it does not unconditionally call glUseProgram. If we hit
> the "Shader could not be initialised" exit path, then glUseProgram isn't
> called. Could you please check if that is the case?

If the pre-existing call to program->Activate() in DrawQuad really did skip the call to mGL->fUseProgram(mProgram), the screenshot Stuart posted would show completely different results, because it wouldn't be using the component alpha shader. I'm very sure that that's not what's happening.
Flags: needinfo?(mstange)
Duplicate of this bug: 984939
Comment on attachment 8445817 [details] [diff] [review]
fix

Review of attachment 8445817 [details] [diff] [review]:
-----------------------------------------------------------------

Works for me. Thanks!
Attachment #8445817 - Flags: review?(bjacob) → review+
https://hg.mozilla.org/mozilla-central/rev/ba54cde8bf7b
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Will this fix land in FF 31? Some pages (such as https://parse.com/docs/cloud_code_guide) are simply unreadable, making this a crucial fix IMHO.
Attachment #8445548 - Attachment is obsolete: true
(In reply to Yuval Adam from comment #44)
> Will this fix land in FF 31? Some pages (such as
> https://parse.com/docs/cloud_code_guide) are simply unreadable, making this
> a crucial fix IMHO.

At the moment it's only in Firefox 33, but I'll request approval for Firefox 31 and 32.
Comment on attachment 8445817 [details] [diff] [review]
fix

Approval Request Comment
[Feature/regressing bug #]: regression from bug 926128
[User impact if declined]: unreadable pages with certain GPUs on Mac OS 10.8
[Describe test coverage new/current, TBPL]: nothing specific to this bug
[Risks and why]: very low risk, this just adds an OpenGL call that is ignored by all properly functioning drivers (but happens to poke the buggy driver in the right way)
[String/UUID change made/needed]: none
Attachment #8445817 - Flags: approval-mozilla-beta?
Attachment #8445817 - Flags: approval-mozilla-aurora?
Comment on attachment 8445817 [details] [diff] [review]
fix

Some pages are unreadable. Accepting for 31 & 32.
Attachment #8445817 - Flags: approval-mozilla-beta?
Attachment #8445817 - Flags: approval-mozilla-beta+
Attachment #8445817 - Flags: approval-mozilla-aurora?
Attachment #8445817 - Flags: approval-mozilla-aurora+
(In reply to Sylvestre Ledru [:sylvestre] from comment #47)
> Some pages are unreadable. Accepting for 31 & 32.

Much obliged :)
Heather / Stuart / Yuval, could one of you please test whether the bug is fixed in the latest Firefox 31 Beta 7 (ftp://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/31.0b7-candidates/build1/)?

If you have the time maybe you can also confirm the fix on:
- latest Nightly (ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/)
- latest Aurora (ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/)
(In reply to Florin Mezei, QA (:FlorinMezei) from comment #51)
> Heather / Stuart / Yuval, could one of you please test whether the bug is
> fixed in the latest Firefox 31 Beta 7
> (ftp://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/31.0b7-candidates/
> build1/)?
> 

Yes it is indeed fixed on the Beta channel.

One other thing I noticed is that while scrolling affected areas on a page, the CPU usage climbs to about 30-40%. (See attachment, the big block in the middle is when I was continuously scrolling an affected area) This isn't catastrophic, but it does seem weird.

I'm thinking this might be related to the original regression, since if you recall, my original report mentioned that the blanked-out area does appear after waiting for a few seconds. When testing again on FF 30, the exact timeline goes like this:

  1. Scroll for 5 seconds (+ bad rendering)
  2. CPU usage goes up as long as scrolling happens
  3. Stop scrolling
  4. CPU usage drops back to normal
  5. After 3 seconds the area is rendered normally again

Again. I'm not sure if this is relevant to this issue or not, but something to keep in mind.

In any case this shouldn't delay the fix since it is clearly an urgent one. Thanks again for all the work on it.
Looks fixed to me in Aurora.
Thanks guys for the quick feedback. I'm marking this as verified on Beta 31 and Aurora 32 based on comment 52 and comment 53.

Yuval, I think you can add a separate bug for the CPU usage issue.
I modified the work around code in bug 1073039. Let me know if that causes any breakage.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #55)
> I modified the work around code in bug 1073039. Let me know if that causes
> any breakage.

I see that this was done in Firefox 35 Nightly. Heather / Stuart / Yuval can you confirm that this does not cause any issues for you with the latest Nightly build - make sure the build is from 2014-09-26 or later - ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/.
You need to log in before you can comment on or make changes to this bug.