Closed Bug 1211647 Opened 9 years ago Closed 6 years ago

Runtime graphics testing window visible in taskbar after update

Categories

(Core :: Graphics, defect, P3)

42 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla63
Tracking Status
firefox46 + wontfix
firefox47 - wontfix
firefox48 - wontfix
firefox49 + wontfix
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- verified

People

(Reporter: bugzilla, Assigned: jaws)

References

Details

(Keywords: regression, Whiteboard: [gfx-noted])

Attachments

(3 files, 1 obsolete file)

STR:

(1) Have Firefox configured to restore tabs from last session on startup;
(2) Have a session with lots of tabs (hundreds in my case) that will take a while to restore;
(3) Quit Firefox;
(4) Upgrade nVidia video driver to a newer version;
(5) Start Firefox;

Notice that the graphics test window is visible for however many seconds it takes to restore the session.
The test is supposed to not block normal browser startup so this might be intended behavior, but I'm not sure. How are you determining that the test window is visible? taskbar/alt+tab?
Whiteboard: [gfx-noted]
It is visible in the task bar, aero peek shows its contents. I think I could also click on it and bring it into the foreground.
[Tracking Requested - why for this release]:
This should have gotten tracking when it was filed. We are showing a window upon update, for release, beta, aurora, and nightly users. Users who have "Show my windows and tabs from last time" enabled, even with restore-on-demand set to true, will likely see this extra window in their taskbar.

Apparently this window also blocks startup as the test must run before we can show the first window. I do not see an explicit perf difference with my eyes, but there is a perceptible performance difference and a bit of confusion each time I restart as I wondered why there were two windows appearing in my taskbar when I previously only had one Firefox window open.
Depends on: 1178823
Summary: Runtime graphics testing visible during long running session restore → Runtime graphics testing visible during session restore
Assignee: nobody → jaws
Status: NEW → ASSIGNED
Version: unspecified → 42 Branch
Attachment #8746610 - Flags: review?(mchang) → review+
Comment on attachment 8746610 [details]
MozReview Request: Bug 1211647 - Runtime graphics testing visible during session restore. r?mchang

https://reviewboard.mozilla.org/r/49489/#review46319
Sorry, had to back this out https://hg.mozilla.org/integration/mozilla-inbound/rev/71610f1f6b7c

I can't reliably start Firefox with this patch due to another part of gecko crashing. It's probably not this patches fault as it is probably just changing timing enough to not always pass.
If you run a debug build with this patch, can you try loading firefox 3 times in a row? Does it stall for you on the 3rd time?
Flags: needinfo?(jaws)
Firefox loads for me successfully all three times when using the build from http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win32-debug/1461879881/firefox-49.0a1.en-US.win32.zip

Linked to from https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=cadd77d38f6d&selectedJob=26837778&filter-searchStr=Windows%20XP%20debug%20Build%20(B)

Mason, what if we just add the popup=1 attribute on non-debug builds?
Flags: needinfo?(jaws) → needinfo?(mchang)
Sorry, I should of clarified. If you run with this patch and comment out this [1] so that the test runs every time, does it not load for you? Please do a local build.

[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/gfx/SanityTest.js#299
Flags: needinfo?(mchang) → needinfo?(jaws)
Attached patch broken.patchSplinter Review
I'm not quite sure what's going on. At the DOM / Window level, we are creating the global windows in the same order and loading the sanity test, then hidden window, then browser.xul correctly. Sometime after the test loads, we see the sanity test being destroyed in nsGlobalWindow and start to see loading of:

nsDocShell[1737E800]: loading data:application/vnd.mozilla.xul+xml;charset=utf-8,<window%20id='win'/> with flags 0x00000000 onto window id: 19

and 

nsDocShell[17381400]: loading https://self-repair.mozilla.org/en-US/repair with flags 0x00000000 onto window id: 22

but about:home never loads. You can try this patch to see if you can reproduce. If you can, can you check the front end to see if the browser window is never displayed for some reason?
Also, I disagree that this needs to be tracked. The window shows up once after a new release. We already show users "checking updates for add ons" every new release. I agree it's nice to have and it improves the UX, but I wouldn't want to uplift this due to the current bug we're having.
(In reply to Mason Chang [:mchang] from comment #14)
> We already show users "checking updates for add ons"
> every new release.

This was actually just removed by bug 1262880.
Regression from 41, tracking. This would be nice to fix and uplift to at least aurora. We should also verify the fix. 

But, wontfixing for 46 since we've had 5 releases with this problem already.
Given the possible fragility and difficulty in testing this, I doubt that we would chance an uplift.  To start, we should wontfix for 47 and 48, the best case scenario is that we do something in nightly.
[Tracking Requested - why for this release]: Not a new regression, and unclear as to the urgency.
I built a local debug build with the patch from this bug as well as line 299 commented out (the `if (!this.shouldRunTest()) return;` line).

My mozconfig had:
ac_add_options --enable-debug
ac_add_options --disable-optimize

The build started up just fine in three subsequent runs.
Flags: needinfo?(jaws)
Summary: Runtime graphics testing visible during session restore → Runtime graphics testing visible after update
Summary: Runtime graphics testing visible after update → Runtime graphics testing window visible in taskbar after update
https://hg.mozilla.org/integration/fx-team/rev/600249e3014e4626fd354460b49059e10aafde8b
Bug 1211647 - Runtime graphics testing window should not be visible after update. r=mchang
I can still reproduce this on master.  You should back this out.
Flags: needinfo?(jaws)
(In reply to Mason Chang [:mchang] from comment #22)
> I can still reproduce this on master.  You should back this out.

Why, if this doesn't reproduce on infra or on other folks' machines? Seems like we need a better diagnosis of what's actually broken and then we can fix it in a followup. But without a diagnosis, delaying this patch indefinitely for a problem the patch author can't reproduce, that doesn't occur on infra either, doesn't seem like the sensible thing to do.
I should add too that the patch only sets popup=1 on the window styles. This in turn is used by the widget layer to apply a different window style to the Window telling it to not appear in the taskbar. The window is still created by the OS and the code is still run.
Flags: needinfo?(jaws)
(In reply to :Gijs Kruitbosch from comment #23)
> (In reply to Mason Chang [:mchang] from comment #22)
> > I can still reproduce this on master.  You should back this out.
> 
> Why, if this doesn't reproduce on infra or on other folks' machines? Seems
> like we need a better diagnosis of what's actually broken and then we can
> fix it in a followup. But without a diagnosis, delaying this patch
> indefinitely for a problem the patch author can't reproduce, that doesn't
> occur on infra either, doesn't seem like the sensible thing to do.

Because the alternative is some users won't be able to start Firefox at all. Just because 3 machines can't reproduce it doesn't mean some user out there won't, which I'm sure they probably will. I'd feel more comfortable knowing what the issue is that's causing it before giving the patch an ok. I was hoping Jared could reproduce it as well so he could help diagnose something on the front end and I was waiting for him to reply before further investigation.
(In reply to Mason Chang [:mchang] from comment #25)
> (In reply to :Gijs Kruitbosch from comment #23)
> > (In reply to Mason Chang [:mchang] from comment #22)
> > > I can still reproduce this on master.  You should back this out.
> > 
> > Why, if this doesn't reproduce on infra or on other folks' machines? Seems
> > like we need a better diagnosis of what's actually broken and then we can
> > fix it in a followup. But without a diagnosis, delaying this patch
> > indefinitely for a problem the patch author can't reproduce, that doesn't
> > occur on infra either, doesn't seem like the sensible thing to do.
> 
> Because the alternative is some users won't be able to start Firefox at all.

AFAICT from the comments on this bug, this issue has only reproduced on debug builds. Am I missing something? Because no non-developer users run non-opt (and/or pgo) builds, so that doesn't seem like it should influence the decision here. Alternatively, if this is a timing issue as comment #9 suggests, and this makes it easier to reproduce, that would actually increase the likelihood that we'll fix this. Landing it on m-c seems perfectly reasonable given all of the above.
(In reply to :Gijs Kruitbosch from comment #26)
> (In reply to Mason Chang [:mchang] from comment #25)
> > (In reply to :Gijs Kruitbosch from comment #23)
> > > (In reply to Mason Chang [:mchang] from comment #22)
> > > > I can still reproduce this on master.  You should back this out.
> > > 
> > > Why, if this doesn't reproduce on infra or on other folks' machines? Seems
> > > like we need a better diagnosis of what's actually broken and then we can
> > > fix it in a followup. But without a diagnosis, delaying this patch
> > > indefinitely for a problem the patch author can't reproduce, that doesn't
> > > occur on infra either, doesn't seem like the sensible thing to do.
> > 
> > Because the alternative is some users won't be able to start Firefox at all.
> 
> AFAICT from the comments on this bug, this issue has only reproduced on
> debug builds. Am I missing something? Because no non-developer users run
> non-opt (and/or pgo) builds, so that doesn't seem like it should influence
> the decision here. Alternatively, if this is a timing issue as comment #9
> suggests, and this makes it easier to reproduce, that would actually
> increase the likelihood that we'll fix this. Landing it on m-c seems
> perfectly reasonable given all of the above.

Just because only I can reproduce it only on debug builds doesn't mean it won't happen on opt builds. And if it is a timing issue, then someone will indeed hit this bug on an opt build and won't be able to start firefox. We already know it's an issue now, why not just fix it now before we land this?
There's a race condition at [1]. If that flag is set, the window is not activated and prevents the browser.xul window from being shown. I still don't know who exactly is racing, but if that code is executed for the popup after some other windows are shown, a user won't be able to start firefox. We should really back this patch out please. A user will hit this.

[1] https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp?from=widget%2Fwindows%2FnsWindow.cpp#1283
Flags: needinfo?(jaws)
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #15)
> (In reply to Mason Chang [:mchang] from comment #14)
> > We already show users "checking updates for add ons"
> > every new release.
> 
> This was actually just removed by bug 1262880.
That bug is for the add-ons compatibility check performed by app update before updating. The "checking updates for add-ons" every new release is the add-ons mgr check performed on startup after an app version change whether that version change is from app update or a new install. I already discussed this with jaws over irc.
[Dropping in at Jared's request...]

Ultimately I think this is the call of GFX (or perhaps specifically GFX::Widget, given comment 28). I also sorta feel like there's an implicit r- by the original reviewer at this point. But if we can't agree on an outcome here, the relevant module owner should make the call.

But I would say:

* While this (the extra window) is definitely a bug we should fix, a startup-hang is definitely worse. The conservative course of action would be to back this out until we fully understand what's going on. No need to rush here that I see.

* There's a pretty limited sample of where it's been tested. Breaks for Mason, works for Jared, works on Infra (which is a handful of configurations with many clones). So it's hard to say if Mason is the exception of if Jared/Infra are.

* It's not great to keep a patch in limbo for a problem that we can only (so far) reproduce on one system. So it would be most helpful if Mason can continue to help with debugging the problem as needed (assuming comment 28 isn't enough). If that's not possible, I'd suggest a next step would be to make a Try build available to folks on #fx-team / #developers, to see if anyone else can reproduce it. If an assortment of people can't reproduce it, then that's data that might suggest it's safe to land on Nightly.

* The standard for Nightly isn't perfection (otherwise nothing would ever land! ;), but we should also try to be respectful of our testers. Accidental breakage is one thing, but I can't say I'm super-thrilled at the prospect of landing a known startup-crash, which would force users to download an updated build to get things working again.
 
* I don't think we should dismiss even a debug-only crash, especially when it wasn't understood at all. And comment 28 does seem to indicate this might be a a Real Problem(tm)...

So let's back this out for now, and discuss it without the pressure of a pending Nightly?
Flags: needinfo?(jaws)
Flags: needinfo?(gijskruitbosch+bugs)
https://hg.mozilla.org/integration/fx-team/rev/242c458c4c61e94c8d12628612ef5e4e83992eaa
Backout patch for bug 1211647 to allow more time for investigating startup hang/crash on debug builds. r=me
(In reply to Mason Chang [:mchang] from comment #28)
> There's a race condition at [1]. If that flag is set, the window is not
> activated and prevents the browser.xul window from being shown.

Why does this prevent the browser.xul window from being shown? The lack of activation on the gfx test window shouldn't influence whether the browser.xul window is activated.
Flags: needinfo?(mchang)
(In reply to :Gijs Kruitbosch from comment #32)
> (In reply to Mason Chang [:mchang] from comment #28)
> > There's a race condition at [1]. If that flag is set, the window is not
> > activated and prevents the browser.xul window from being shown.
> 
> Why does this prevent the browser.xul window from being shown? The lack of
> activation on the gfx test window shouldn't influence whether the
> browser.xul window is activated.

I have no idea why yet. They should be independent.
Flags: needinfo?(mchang)
(In reply to Mason Chang [:mchang] from comment #33)
> (In reply to :Gijs Kruitbosch from comment #32)
> > (In reply to Mason Chang [:mchang] from comment #28)
> > > There's a race condition at [1]. If that flag is set, the window is not
> > > activated and prevents the browser.xul window from being shown.
> > 
> > Why does this prevent the browser.xul window from being shown? The lack of
> > activation on the gfx test window shouldn't influence whether the
> > browser.xul window is activated.
> 
> I have no idea why yet. They should be independent.

Can you own investigating this further, then, as you're the only person who can reproduce?
Flags: needinfo?(mchang)
In the good cases, we normally show the browser.xul window at [1]. The condition is based on CanTakeFocus(), which goes through all the HWNDs in the system and if a window is already showing in the same process, we don't show the window. In the bad case, the popup is visible to the system, so we don't show browser.xul. This actually isn't unique to the popup=1 case here, but to the general sanity test. I think the popup=1 just triggered this race condition for me on my system.

[1] https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp?from=widget%2Fwindows%2FnsWindow.cpp#1258
Flags: needinfo?(mchang)
Attached patch Try to force visibility (obsolete) — Splinter Review
This is kind of evil. Due to a race condition, we can show the browser.xul window either at [2] or [1]. [1] Happens if we have no other visible windows in our process shown or if the current foreground process is already firefox, we can just show the new window (e.g. ctrl-n). Otherwise, we still want to create a new window, we just put it behind the current foreground window. The race condition happens depending on when browser.xul and when the sanity test show up. Most of the time, the sanity test closes before browser.xul loads, so there is no other window and we just show the browser.

However, if both the sanity test and browser.xul show up at the same time, one of the windows will open via [1] and the other via [2]. It just means that browser.xul would be "behind" the sanity test, but still visible, then regain focus once the sanity test closes. However, sometimes, SetWindowPos will report success but not actually put the WS_VISIBLE style on the shown window. I have no idea why, but this causes the Window to not be visible. Hat tip to :Gijs for spy++ recommendation! TIL.

This patch detects if we have a SetWindowPos that fails or fails to set the WS_VISIBLE style, we try and forcibly set the WS_VISIBLE style. I was able to run the test with the popup=1 patch successfully 10x in a row.

[1] https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp?from=widget%2Fwindows%2FnsWindow.cpp#1258

[2] https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp?from=widget%2Fwindows%2FnsWindow.cpp#1267
Attachment #8756642 - Flags: review?(jmathies)
Attachment #8756642 - Flags: review?(dvander)
> However, sometimes, SetWindowPos will report success but not actually
> put the WS_VISIBLE style on the shown window. I have no idea why, but
> this causes the Window to not be visible.

I can't find a reference to this on the web, which leads me to the conclusion that this is a bug in our code here.

I'm fine with landing a one-off fix but lets also file a widget bug on tracking down the cause of this problem.
Comment on attachment 8756642 [details] [diff] [review]
Try to force visibility

Review of attachment 8756642 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/windows/nsWindow.cpp
@@ +1189,5 @@
>   **************************************************************/
>  
> +// Sometimes, even though we explicitly set SWP_SHOWWINDOW, with ::SetWindowPos,
> +// the window won't be shown. In that case, try to force visibility.
> +static void VerifyVisibility(BOOL aSetWindowRet, HWND aHWND)

Not a fan of this extra verify call on top of SetWindowPos calls, lets clean this up. Lets move |showing a window| out into a helper which takes the various flags callers need to pass into SetWindowPos and call that from each of the uses below. In the helper lets use SetWindowPos and SetWindowLongPtrW and then validate window settings before we return (in debug builds only) with assertions if things fails.
Attachment #8756642 - Flags: review?(jmathies) → review-
I tried forcing a ::ShowWindow everywhere we currently do ::SetWindowPos such as [1], [2], [3] and inversely trying a ::SetWindowPos where we do ::ShowWindow. If at [1], we replace the ::ShowWindow with a :SetWindowPos and the SWP_SHOWWINDOW flag, the main browser window never shows up even without the sanity test running. Also, replacing ::SetWindowPos with ::ShowWindow, popups don't show or it hides the main browser window. From the docs, ::SetWindowPos should just move the window behind or in front of another window, not hide it completely, which is really odd. The semantics of it aren't clear to me.

On the other hand, I tried to get [4] to actually happen since things like "make a new window" already have firefox in focus and so go through [1]. I put a MOZ_CRASH in there and ran it through try (https://treeherder.mozilla.org/#/jobs?repo=try&revision=4c30d50189b1&selectedJob=21536992) and could only get it to hit in 1 test. 

I didn't want to mess with the popup menu logic since forcing visibility might be the wrong thing there, but instead for toplevel windows, this patch uses ::ShowWindow without activating the window, which I think the current code is supposed to do.

[1] https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp?from=widget%2Fwindows%2FnsWindow.cpp#1258
[2] https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp?from=widget%2Fwindows%2FnsWindow.cpp#1285
[3] https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp?from=widget%2Fwindows%2FnsWindow.cpp#1290
[4] https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp?from=widget%2Fwindows%2FnsWindow.cpp#1262
Attachment #8756642 - Attachment is obsolete: true
Attachment #8756642 - Flags: review?(dvander)
Attachment #8757360 - Flags: review?(jmathies)
Comment on attachment 8757360 [details] [diff] [review]
Use ::ShowWindow instead of ::ShowWindowPos for toplevel windows widgets that can't take focus

Review of attachment 8757360 [details] [diff] [review]:
-----------------------------------------------------------------

Lets try it.
Attachment #8757360 - Flags: review?(jmathies) → review+
Pushed by mchang@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7b0b44564325
Use ::ShowWindow instead of ::ShowWindowPos for toplevel windows widgets that can't take focus. r=jimm
(In reply to Pulsebot from comment #42)
> Pushed by mchang@mozilla.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/7b0b44564325
> Use ::ShowWindow instead of ::ShowWindowPos for toplevel windows widgets
> that can't take focus. r=jimm

Please let this sit in Nightly for a few days before landing the popup=1 patch. Thanks!
https://hg.mozilla.org/mozilla-central/rev/7b0b44564325
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
This should stay open until the popup=1 patch lands.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #8757360 - Flags: checkin+
Attachment #8757360 - Flags: checkin+
Pushed by jwein@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e8d1d92fa59f
Runtime graphics testing visible during session restore. r=mchang
(In reply to Ryan VanderMeulen [:RyanVM] from comment #47)
> Backed out for causing Win7 wpt-e10s permafails.
> https://hg.mozilla.org/integration/mozilla-inbound/rev/192969a0735a

Jared, have you had any time to look at this?
Flags: needinfo?(jaws)
I haven't. Mason, did you have any ideas why this was failing? Who can we reach out to for help here?
Flags: needinfo?(jaws) → needinfo?(mchang)
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #50)
> I haven't. Mason, did you have any ideas why this was failing? Who can we
> reach out to for help here?

I haven't had time to look at it either. Either one of us should take a look.
Flags: needinfo?(mchang)
Assignee: jaws → nobody
Target Milestone: mozilla49 → ---
See Also: → 1417832
Jared, looks like Mason is gone - any chance you could try looking if the try issues are accidentally resolved given it's been a 2 years? If not, perhaps we can ping :jimm or :dbolter to see if someone could drive this over the line. It looks glitchy and should really be fairly easy to fix, plus it might have some minor perf improvement impact.
Flags: needinfo?(jaws)
This is all green on tryserver (a couple oranges but they went away on retriggers). I'll land this now.
Flags: needinfo?(jaws)
Pushed by jwein@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/067c4b7f2ed0
Runtime graphics testing visible during session restore. r=mchang
Assignee: nobody → jaws
Status: REOPENED → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/067c4b7f2ed0
Status: ASSIGNED → RESOLVED
Closed: 8 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
I've managed to reproduce this issue in Firefox Nightly 63.0a1 (2018-07-01)(64-bit) Windows 10 Pro.
I've verified the fix in Firefox Nightly 63.0a1 (2018-07-16) (64-bit) Windows 10 Pro.

Based on the above, I'm marking it  as verified.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Regressions: 1554490
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: