Closed Bug 476557 Opened 12 years ago Closed 12 years ago

Flash of white (sometimes with corruption) when reloading drobo.com

Categories

(Core :: Layout, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: dholbert, Assigned: zwol)

References

Details

(Keywords: fixed1.9.1, regression)

Attachments

(5 files, 3 obsolete files)

STR:
 1. Load URL ( http://drobo.com/ )
 2. Hold shift + press reload button (or press Ctrl+Shift+R)

EXPECTED RESULTS: Background should remain gray while the page reloads.

ACTUAL RESULTS: Background flashes bright white for an instant, before returning to gray.

BROKEN VERSIONS:
 * Latest mozilla-central nightly
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090202 Minefield/3.2a1pre

WORKING VERSIONS:
 * latest 1.9.1 nightly
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b3pre) Gecko/20090202 Shiretoko/3.1b3pre
 * Firefox 3.0.5
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008121711 Ubuntu/9.04 (jaunty) Firefox/3.0.5

NOTE: This looks related to bug 475548, though it's apparently not exactly the same, since that bug should already be fixed in the past few m-c nightlies.
I can reproduce this in Windows Vista, too:
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2a1pre) Gecko/20090122 Minefield/3.2a1pre (.NET CLR 3.5.30729)

But *not* in Firefox 3.0.5 on Windows Vista:
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5

So, regression there as well.   OS --> ALL.
OS: Linux → All
Regression range:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090121 Minefield/3.2a1pre
Built from http://hg.mozilla.org/mozilla-central/rev/b514c8318c7d

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090128 Minefield/3.2a1pre
Built from http://hg.mozilla.org/mozilla-central/rev/42b8e2c66c5b

 --> Looks like a regression from bug 474201.  Possibly a dupe of bug 474886, which has a not-yet-landed patch.
Blocks: 474201
(In reply to comment #2)
> Possibly a dupe of bug 474886,
> which has a not-yet-landed patch.

Nope, not a dupe -- I just tested that bug's patch, and it doesn't fix this.
So I *had* a big rant about the combination of Flash and Javascript being used on this site but then I noticed that I can see the same white flash if I just plain reload the page (ctrl-R, not shift-ctrl-R) with Javascript turned off.  Oddly, this does *not* happen on shift-ctrl-R if Javascript is off.  (If Javascript is on, I get it with both kinds of reload.)  And I can see it on other pages that have (a) a dark <BODY> background and (b) Flash embeds.

This does not leave me having any idea where the bug is, though.
Ah, yeah -- I meant to add a comment noting that I *do* see this bug with normal Ctrl-R as well, though I occasionally get an apparently bug-free reload.

Shift-Ctrl-R seems to reproduce it every time, though.  The difference could just be my imagination, or maybe it's just that a full reload takes longer and so the white flash is more noticeable.
We could use a simplified testcase.
Attached file testcase 1 (obsolete) —
Here's a testcase... I can easily reproduce the bug using this testcase in my normal profile, but it's a lot harder to reproduce using a fresh profile, (probably because Firefox is a lot snappier with a fresh profile).
Here's what might be a testcase using the test plugin.  I don't see the problem with this testcase, even with the patch in bug 469831 applied.  There are two possible explanations for that: either Flash is doing something evil, or my development build's profile has too little junk in it for the browser to be slow enough.  I would be very interested to know if anyone can reproduce the problem with this test case.  (And if you can't, perhaps try Java?  It might be that the test plugin is too simple.)
(In reply to comment #2)
>  --> Looks like a regression from bug 474201.

BTW -- I confirmed this with local backout of that bug's changeset.
Thanks for the verification.  I'm looking into this right now.
I can't seem to get reliable failures with testcase 1 anymore, so I took another shot at reducing the testcase, and I think I got something much better.  It looks like it all boils down to a call to the JS library function "AC_FL_RunContent", which *sets up* an embedded flash object.

Note that this testcase references the file at http://drobo.com/Scripts/AC_RunActiveContent.js -- I haven't reduced the JS from there yet.

This testcase reproduces the bug for me very reliably (even in a fresh profile) when I _hold_ Ctrl+R to force a chain of reloads.  A single reload generally shows no flash of white, but the continuous chain of reloads makes the background go solid white almost immediately. (whereas it stays black in pre-regression nightlies)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090209 Minefield/3.2a1pre
Attachment #360630 - Attachment is obsolete: true
Comment on attachment 361557 [details]
testcase 3: using "AC_FL_RunContent" (try saving locally)

Actually, when hosted on bugzilla, this file doesn't seem to reproduce the bug, but it does reproduce it when I save it locally, and it does reproduce it when I view a copy located on people.mozilla.org at http://people.mozilla.org/~dholbert/tests/476557/test3.html

Perhaps Bugzilla's HTTPS overhead changes the timing slightly, in such a way as to work around the bug, or something.  In any case, if you can't reproduce it using the attached testcase 3, try a local copy.
Attachment #361557 - Attachment description: testcase 3: using "AC_FL_RunContent" → testcase 3: using "AC_FL_RunContent" (try saving locally)
dholbert: While I'm waiting for stuff to recompile, do you still see the bug if you add <!doctype html> to the top of your test case, i.e. if you force standards mode?  (I don't think it should make a difference but I want to make sure.)
dholbert: Also, your test case does not appear to *execute* any Javascript; it only creates a whole bunch of functions.  Is that really correct?
(In reply to comment #13)
> do you still see the bug if
> you add <!doctype html> to the top of your test case

Yes: http://people.mozilla.org/~dholbert/tests/476557/test3_stds.html

(In reply to comment #14)
> dholbert: Also, your test case does not appear to *execute* any Javascript; it
> only creates a whole bunch of functions.  Is that really correct?

Whoah, you're right!  That was accidental -- I separated it into a function to test something, and then I forgot to un-separate it. :)

However, it does still reproduce the bug -- apparently it's just the <script src="..."> tag that's required.  Here's a version with an empty body, with the silly "go()" function removed:
  http://people.mozilla.org/~dholbert/tests/476557/test4.html

That testcase reproduces the bug very reliably for me when I hold ctrl+r, as described in comment 11.
Attached file testcase 4
I'm attaching testcase 4, which doesn't execute any JS, but just includes a <script> referencing a remote JS file.

It may not reproduce the bug when hosted on bugzilla, per comment 12, but I've got a copy which reproduces the problem here:
  http://people.mozilla.org/~dholbert/tests/476557/test4.html
and it also reproduces the problem when I view a local copy (and hold Ctrl+R).
Huh.  That doesn't reproduce the problem for me at all (I am seeing it on the actual drobo.com site, for the record) and if what you say is correct, it blows my hypothesis of the problem out of the water.
Ok, here's a bit more information to hopefully help you reproduce it.

* It helps if you slow down your internet connection via a bandwidth limiting tool like "wondershaper". (It's available in ubuntu repositories.  "sudo wondershaper eth0 5 5" limits the connection to 5kb/s up & down, and "sudo wondershaper clear eth0" will restore full connectivity.)

* After slowing down my connection, I can very reliably reproduce the problem via these steps:
  1) Open attachment 361565 [details], and let it load fully.
  2) Run "sudo wondershaper eth0 5 5" (where eth0 is your network interface)
  3) Hit ctrl-shift-r, and **watch the status bar**
  4) As soon as you see a message related to drobo.com ('looking up' / 'connecting to' / 'transferring data from' / etc), hit ctrl-R **at that point**

This makes the background go instantly white, right after step 3.

If I do the reload earlier than that point (i.e. if I reload when the status bar says something related to bugzilla.mozilla.org), the background stays black -- that might be why you haven't been able to reproduce it.

So, it looks like this is an issue with being interrupted *after* we've loaded the initial webpage, but while we're still loading an external JS resource, or something like that.
(In reply to comment #18)
> * After slowing down my connection, I can very reliably reproduce the problem
> via these steps:
>   1) Open attachment 361565 [details], and let it load fully.
>   2) Run "sudo wondershaper eth0 5 5" (where eth0 is your network interface)

Sorry, ignore the "After slowing down my connection" there.  (In the steps I provided, I don't actually slow down the connection until step 2, just for convenience so that the initial load doesn't take forever.)
This testcase references a 4-meg JS file on people.mozilla.org. I'm hoping this will easily reproduce the bug without needing to use "wondershaper".

Also -- it looks like we reset to white not just on reload-during-full-reload, but also on cancel-during-full-reload.

So, here are revised steps-to-reproduce, based on comment 14.

STEPS TO REPRODUCE:
 1. Fully load this testcase.  (this may hang Firefox for a few seconds as it parses the huge JS file)
 2. Force full reload with Ctrl-Shift-R
 3. As soon as you see mention of "people.mozilla.org" in the status bar, hit Escape key, or press "Stop" button.

On the 20090121 nightly, this leaves me with a black background.
On the 20090122 nightly (and later nightlies), this leaves me with a white background.
Sorry, one more followup note: you don't actually need to do a full reload in step 2 of comment 20;  a simple Ctrl+R reload should suffice.  A simple reload doesn't seem to re-fetch the JS, but it does hang for ~5 seconds while it's processing it, and if you press escape while it's hanging, you still end up with a white background. (the buggy behavior)
Darn -- so, the experiments in comment 18 and comment 20 actually result in white background in Firefox 3.0.6 as well.  So I guess the current mozilla-central behavior is actually correct in those cases -- i.e. it's indicative of bug 474201 being fixed, not of this bug being broken.

I'm going to experiment a bit more, comparing behaviors more closely to Firefox 3.0.x, and I'll report back when I've got better testcases & steps to reproduce for the *actual* bug here.  Sorry for the confusion & bugspam. :)
So now I've discovered that by holding Ctrl+R at drobo.com in Firefox 3.0.6 for about 5-10 seconds, I can occasionally reproduce the bug there (around 15-25% of the time).

However, it's less severe there than in current mozilla-central -- i.e. I've never been able to reproduce it in Firefox 3.0 with a single Ctrl+R, but that reproduces it in mozilla-central more than 50% of the time.

So, I'm guessing this is be a more subtle timing-related issue, rather than a background-painting issue.

Also, bug 474201's patch probably didn't *cause* this problem, nor did it cause the increased severity WRT Firefox 3.0 -- it's more that bug 474201 was masking this issue, and when that bug was fixed, this issue became visible once more.
On mozilla-central, I put a whole bunch of instrumentation printfs into the various functions that pass around and use either view manager or presentation context default background.  I then loaded up drobo.com and hit Ctrl-R, which does reliably reproduce the bug for me.  The trace, or anyway the parts that I think are most relevant, looks like this:

-- initial page load
##zw: new vm[0x7f8f1d9d9ee0], bg color transparent
##zw: InitPresentationStuff set vm[0x7f8f1d9d9ee0] default bg to ffffffff
##zw: PresShell::Paint default bg ffffffff
##zw: PaintBackground set vm[0x7f8f1d9d9ee0] default bg to ff333333
##zw: PresShell::Paint default bg ffffffff
... many more iterations of PresShell::Paint ...
-- ctrl-R pressed here --
##zw: nsDocShell looking for bg...
      +cv +dv +ps +vm[0x7f8f1d9d9ee0] bc=ff333333  (propagating)
##zw: new vm[0x7f8f1b0bafb0], bg color transparent
##zw: InitPresentationStuff set vm[0x7f8f1b0bafb0] default bg to ffffffff
##zw: nsDocShell propagating bg... +dv +ps +vm[0x7f8f1b0bafb0] set to ff333333
##zw: PresShell::Paint default bg ffffffff (*)
##zw: PaintBackground set vm[0x7f8f1d9d9ee0] default bg to ff333333
##zw: vm[0x7f8f1d9d9ee0] bg set to ff333333
##zw: PresShell::Paint default bg ffffffff

Conspicuously absent from this trace is any mention of nsViewManager::DefaultRefresh(), which post-bug 474201 is the *only* place to take the vm default background and draw it.  The white flash is happening because we call PresShell::Paint before there is a canvas frame, so PaintBackground is not being called, and PresShell::Paint's ultimate backstop is showing through.  (That's nsCSSRendering::PaintBackground btw.)

I do not see any way out of this maze that does not regress either bug 474201 or bug 469170.  (I would like to reiterate that I thought bug 469170 should have been WONTFIXed.)
I should maybe lay out explicitly what I see as "this maze" rather than make people deduce it from all the discussion in all the bugs related to this issue.

Bug 453566: when drawing normally, if the canvas frame is (semi-)transparent,
  we must paint an opaque color behind it, or we will get garbage in the
  window (on Linux) or mysterious black regions (on Windows).  Note that the
  default background color for all frames is transparent: this potentially
  affects any page that does not explicitly set background-color on the HTML
  or BODY elements.

Bug 75591 (yes, really): effectively the same as this bug: if we have to
  redraw (part of) the window background after a page has been torn down
  but before the next page has loaded enough that we know its background
  color, we should use the previous page's background color rather than
  white to avoid an ugly flash.

Bug 474201: after a page is fully loaded, if it has a (semi-)transparent
  background color, we should *not* paint the previous page's background
  color behind it.

Bug 469170: some extensions expect that if they pass "transparent" as the last
  argument to drawWindow, and the content in the window (taken as a whole) is
  (semi-)transparent, what is drawn into the <canvas> object will be
  (semi-)transparent.  I am not convinced it is possible to fulfill this
  requirement at the same time as all of the above.

What the trace in comment 24 is telling me is that under these circumstances, nsViewManager::DefaultRefresh() does not get involved in repaint "in between" pages.  (Looking at the code, the only time DefaultRefresh is called is when nsViewManager::DispatchEvent() has to handle an NS_PAINT event but we are inside a DisableRefresh()...EnableRefresh() block, which I guess is not the case when page rendering is stalled waiting for a script.)

Typing all of this out has helped me think of a possible fix, though: add a concept of "most recent canvas background color" to PresShell and use that in the ::Paint method when the view has no frames.  That *might* nail this case.  I'm pretty sure it won't break anything else.
> Typing all of this out has helped me think of a possible fix, though: add a
> concept of "most recent canvas background color" to PresShell and use that in
> the ::Paint method when the view has no frames.  That *might* nail this case.

Nope.  PresShell::Paint is never called with no frames.
Thanks for laying out that analysis.

In PresShell::Paint, we currently have
  if (needTransparency)
    backgroundColor = NS_RGBA(0,0,0,0);
  else
    backgroundColor = mPresContext->DefaultBackgroundColor();

The obvious fix for this bug would be to change the latter to mViewManager->GetDefaultBackgroundColor(), but that would cause a problem in one case: the first paint after the document's canvas frame has been set up, when the document's background color is transparent. We'll paint the previous document's background color underneath the new document's content, because the view manager's new background color is updated in PaintBackground, which is too late.

So how about this: instead of updating the view manager's default background color in PaintBackground, update it in PresShell::Paint whenever the document has a canvas background, and then set backgroundColor to the view manager's default background color. Basically we should set the view manager's default background color when FindCanvasBackground is going to find something, i.e., whenever aPresContext->PresShell()->FrameConstructor()->GetRootElementStyleFrame() is non-null, and we should set it to what FindCanvasBackground finds.

Make sense?
I think I understand what you're suggesting, roc, but unless I got it wrong, it doesn't work; see attached patch, with which there is still a white flash when one reloads drobo.com.  The weird thing is that now PresShell::Paint will not admit to ever passing along a white background color, and yet, the flash happens.
+    backgroundColor = NS_ComposeColors(bgStyle->mBackgroundColor,
+                                       backgroundColor);

Isn't this backwards? the bgStyle->mBackgroundColor should be over the prescontext's DefaultBackgroundColor.
+    backgroundColor = NS_ComposeColors(prevPageColor, backgroundColor);

Same here actually (in both places this occurs).

Instead of 'prevPageColor', I'd call it 'currentBackgroundColor' or something like that, since it's not necessarily derived from the previous page. It will usually be derived from the current page, in fact.
(In reply to comment #29)
> +    backgroundColor = NS_ComposeColors(bgStyle->mBackgroundColor,
> +                                       backgroundColor);
> 
> Isn't this backwards? the bgStyle->mBackgroundColor should be over the
> prescontext's DefaultBackgroundColor.

NS_ComposeColors' args are (fg, bg) ... oh wait, they're not.  *headdesk*
Don't worry, I had to check it about three times before I wrote that comment.
Attached patch working patch (obsolete) — Splinter Review
Looks like the reversed arguments to NS_ComposeColors were what was making this not work.  With this patch, no whole-window white flash on drobo.com.  (The area covered by the Flash movie still blinks white briefly, but I don't think there's anything we can do about that.)

I re-tested all the manual test cases for the bugs I listed above, and everything's still good.  Bug 469170 is safe as long as nsPresShell::RenderDocument() does its own painting rather than call nsPresShell::Paint (and it needs to do that for other reasons; it might be able to use nsLayoutUtils::PaintFrame instead of building the display list itself, but that would still be fine).

There is an XXX comment in here.  I realized how to fix that while I was in the shower this morning, but I'll file it as a follow-up bug, it's tangential and probably inappropriate for 1.9.1.
Assignee: nobody → zweinberg
Attachment #361695 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #361798 - Flags: review?(roc)
+    // XXX ideally we would set the view manager default to
+    // bgStyle->mBackgroundColor, and nsViewManager::DefaultRefresh would
+    // be able to cope with partial transparency.  But it can't so we can't.
+    // -- zwol 2009-02-11

I don't think this is worth fixing, to be honest.

+  } else {
+    backgroundColor = NS_ComposeColors(backgroundColor, viewDefaultColor);
+  }

You're passing 'frame' to FindBackground, which means we'll take this !isCanvas path for popups like the combobox dropdown. That's probably OK in practice since we won't draw the popup until the document has had a chance to set the view manager's default background color. However, I think it would be slightly better to pass GetRootElementStyleFrame as the frame for FindBackground. In that case isCanvas is sure to be true, unless GetRootElementStyleFrame is null, so we should replace "if (isCanvas)" with a null check on that frame. If we do that, we can call FindCanvasBackground directly (after exposing it in nsCSSRendering) and remove the isCanvas parameter from FindBackground.
Blocks: 478079
(In reply to comment #34)
> +    // XXX ideally we would set the view manager default to
> +    // bgStyle->mBackgroundColor, and nsViewManager::DefaultRefresh would
> +    // be able to cope with partial transparency.  But it can't so we can't.
> +    // -- zwol 2009-02-11
> 
> I don't think this is worth fixing, to be honest.

See bug 478079; my proposed "fix" is to kill off DefaultRefresh.

> You're passing 'frame' to FindBackground, which means we'll take this !isCanvas
> path for popups like the combobox dropdown. That's probably OK in practice
> since we won't draw the popup until the document has had a chance to set the
> view manager's default background color. However, I think it would be slightly
> better to pass GetRootElementStyleFrame as the frame for FindBackground. In
> that case isCanvas is sure to be true, unless GetRootElementStyleFrame is null,
> so we should replace "if (isCanvas)" with a null check on that frame. If we do
> that, we can call FindCanvasBackground directly (after exposing it in
> nsCSSRendering) and remove the isCanvas parameter from FindBackground.

I initially coded this with GetRootElementStyleFrame and it didn't work but that was probably because I had the ComposeColors arguments backward.  I'll try this.
Here's the revised patch, including updates to the nsCSSRendering::FindBackground API.  One other place did use the isCanvas out-parameter, but it really wanted *just* that, so I decided to export IsCanvasFrame() in nsCSSRendering.h as well as the new method for the pres shell code (which is not quite the same thing as FindCanvasBackground()).
Attachment #362054 - Flags: review?(roc)
Keywords: checkin-needed
Whiteboard: [needs landing]
Pushed http://hg.mozilla.org/mozilla-central/rev/335d535620a2
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing] → [needs 191 approval]
Keywords: checkin-needed
Target Milestone: --- → mozilla1.9.2a1
Attachment #361798 - Attachment is obsolete: true
Attachment #362054 - Attachment description: revised as requested → revised as requested [Checkin: Comment 37]
Could this have caused bug 478784?
Depends on: 478784
Attachment #362054 - Flags: approval1.9.1? → approval1.9.1+
Comment on attachment 362054 [details] [diff] [review]
revised as requested
[Checkin: Comment 37]

Looks like this is already approved per bug 478784 comment 19, but marking it here as well, although what's there is probably what should be landed.
Is this causing the regression in bug 485275?
Blocks: 485275
You need to log in before you can comment on or make changes to this bug.