Last Comment Bug 981569 - [Australis] TitleBar active state is wrong after "Restore Previous Session" when multiple windows are restored (Classic)
: [Australis] TitleBar active state is wrong after "Restore Previous Session" w...
Status: RESOLVED FIXED
[Australis:P3]
: regression
Product: Core
Classification: Components
Component: Widget: Win32 (show other bugs)
: 29 Branch
: x86_64 Windows 7
-- normal with 1 vote (vote)
: mozilla31
Assigned To: Tim Abraldes [:TimAbraldes] [:tabraldes]
:
: Jim Mathies [:jimm]
Mentors:
Depends on:
Blocks: australis-merge 890105
  Show dependency treegraph
 
Reported: 2014-03-10 04:42 PDT by Alice0775 White
Modified: 2014-06-20 16:55 PDT (History)
11 users (show)
anthony.s.hughes: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard: [good first verify]
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected
unaffected
+
wontfix
+
fixed
fixed
unaffected


Attachments
screenshot (100.56 KB, image/png)
2014-03-10 04:42 PDT, Alice0775 White
no flags Details
Workaround Idea 1 - v1 (3.29 KB, patch)
2014-03-31 08:08 PDT, Mike Conley (:mconley)
MattN+bmo: feedback-
Details | Diff | Splinter Review
Workaround Idea 2 - v.1 - Initialize TabsInTitlebar on first update (1.67 KB, patch)
2014-04-02 02:35 PDT, Matthew N. [:MattN] (PM if requests are blocking you)
mconley: feedback+
Details | Diff | Splinter Review
Widget logging and some attempts at fixing (9.39 KB, patch)
2014-04-02 03:13 PDT, Matthew N. [:MattN] (PM if requests are blocking you)
no flags Details | Diff | Splinter Review
Attachment 8400535 output (2.97 KB, text/plain)
2014-04-02 03:30 PDT, Matthew N. [:MattN] (PM if requests are blocking you)
no flags Details
another workaround (2.22 KB, patch)
2014-04-02 04:44 PDT, Dão Gottwald [:dao]
mconley: feedback-
Details | Diff | Splinter Review
Widget patch v1 (1.85 KB, patch)
2014-04-09 18:19 PDT, Tim Abraldes [:TimAbraldes] [:tabraldes]
no flags Details | Diff | Splinter Review
Widget patch v2 (5.17 KB, patch)
2014-04-12 00:25 PDT, Tim Abraldes [:TimAbraldes] [:tabraldes]
no flags Details | Diff | Splinter Review
Widget patch v3 (2.77 KB, patch)
2014-04-12 00:34 PDT, Tim Abraldes [:TimAbraldes] [:tabraldes]
blassey.bugs: review+
mconley: feedback+
bkerensa: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description User image Alice0775 White 2014-03-10 04:42:08 PDT
Created attachment 8388452 [details]
screenshot

This problem does not happen in Firefoc28.0b9

Steps To Reproduce:
1. Enable Title Bar
2. Open multiple new windows (4 windows)
4. Exit and restart browser
5. Click "Restore Previous Session"

Actual Results:
TitleBar active state is wrong

Expected Results:
Foreground window should only be active state. other windows should be inactive state.
Comment 1 User image Alice0775 White 2014-03-10 04:59:46 PDT
it works properly in holly build
https://hg.mozilla.org/projects/holly/rev/b3eb8ad7d325
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0 ID:20140305040204

This bug is definitely caused by Australis.
Comment 2 User image Alice0775 White 2014-03-10 05:10:16 PDT
And this problem happens since Australis merge day...
Comment 3 User image Alice0775 White 2014-03-10 10:00:10 PDT
Regression window(ux)
Good:
http://hg.mozilla.org/projects/ux/rev/bfb5c27c566b
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130910040201
Bad:
http://hg.mozilla.org/projects/ux/rev/339479a60c1c
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130911040201
Pushlog:
http://hg.mozilla.org/projects/ux/pushloghtml?fromchange=bfb5c27c566b&tochange=339479a60c1c

Suspected:
ed0e0347b778	Dão Gottwald — Bug 882009 - Invert updateTitlebarDisplay / TabsInTitlebar dependency. f=MattN r=Gijs
Comment 4 User image Dão Gottwald [:dao] 2014-03-10 10:30:45 PDT
Looks like a widget bug to me at first sight. I wouldn't expect sessionstore or other front-end code to be able to legitimately get into a state of title bars from different windows looking active simultaneously. Front-end code doesn't even have access to APIs that should allow this, does it?
Comment 5 User image :Gijs 2014-03-10 10:43:50 PDT
(In reply to Dão Gottwald [:dao] from comment #4)
> Looks like a widget bug to me at first sight. I wouldn't expect sessionstore
> or other front-end code to be able to legitimately get into a state of title
> bars from different windows looking active simultaneously. Front-end code
> doesn't even have access to APIs that should allow this, does it?

Well... interestingly, the menu bars are all colored appropriately (ie, text is grayed out everywhere except in the first one). And we do have :-moz-window-active and :-moz-window-inactive or whatever they're called. But IIRC the title bar is done natively, so I don't know why those look wrong. :-|
Comment 6 User image Jim Mathies [:jimm] 2014-03-10 10:51:02 PDT
(In reply to Dão Gottwald [:dao] from comment #4)
> Looks like a widget bug to me at first sight. I wouldn't expect sessionstore
> or other front-end code to be able to legitimately get into a state of title
> bars from different windows looking active simultaneously. Front-end code
> doesn't even have access to APIs that should allow this, does it?

Not that I'm aware of. Interesting here is that there's no fx button enabled, so I'm assuming the chrome margin is 0,0,0,0 and we aren't painting the titlebar in this case, Windows is?
Comment 7 User image Jim Mathies [:jimm] 2014-03-10 10:52:16 PDT
(In reply to Jim Mathies [:jimm] (away 4/1-4/21) from comment #6)
> (In reply to Dão Gottwald [:dao] from comment #4)
> > Looks like a widget bug to me at first sight. I wouldn't expect sessionstore
> > or other front-end code to be able to legitimately get into a state of title
> > bars from different windows looking active simultaneously. Front-end code
> > doesn't even have access to APIs that should allow this, does it?
> 
> Not that I'm aware of. Interesting here is that there's no fx button
> enabled, so I'm assuming the chrome margin is 0,0,0,0 and we aren't painting
> the titlebar in this case, Windows is?

er rather, the chromemargin prop is removed in this case? 0,0,0,0 would mean we paint everything. Been a while since I've worked with this code.
Comment 8 User image Dão Gottwald [:dao] 2014-03-11 08:21:56 PDT
There is no Firefox button on mozilla-central, but you're right that the chromemargin attribute is removed in this case.
Comment 9 User image :Gijs 2014-03-24 10:44:41 PDT
Jim, do you have time to look at this in more detail, or can you suggest someone else? It's tracking 29 and I'm out of my depth here. When I reproduced, I did notice drawing artifacts when moving other windows over the top of the background-but-looking-like-they-were-focused windows, but perhaps that was because this was a VM (VMWare) ?
Comment 10 User image Mike Conley (:mconley) 2014-03-25 08:30:21 PDT
I'm going to try to drill down Alice's regression range to a single changeset today, and either confirm or refute our suspicion of bug 882009.
Comment 11 User image Mike Conley (:mconley) 2014-03-25 14:33:34 PDT
Great call, Alice. Bisection has narrowed this down to:

http://hg.mozilla.org/projects/ux/rev/ed0e0347b778

I'll see if I can isolate which part is causing the problem.
Comment 12 User image Mike Conley (:mconley) 2014-03-28 14:58:46 PDT
So, I've been poking at this further, and I actually think bug 882009 is a red herring. That patch caused us to hide the XUL-drawn titlebar and show the native titlebar, and that _exposed_ the problem.

And if I expose the native titlebar like this on earlier changesets, I see the same behaviour - up to a point. There was a point _after_ the initial tab work landed that we were still showing the native titlebar when browser.tabs.drawInTitlebar was false, and this bug wasn't occurring. I'm attempting to bisect to it now.
Comment 13 User image Mike Conley (:mconley) 2014-03-29 18:30:12 PDT
So, did a better bisection, applying a patch each step of the way to expose the native titlebar as Dao's patch did in bug 882009.

The actual culprit is this changeset:

https://hg.mozilla.org/mozilla-central/rev/727132bea25c

from bug 890105.
Comment 14 User image Mike Conley (:mconley) 2014-03-29 18:33:49 PDT
A quick test, it appears that putting the tabsintitlebar attribute on the main-window is what's causing this. If I remove the lines that put that into the main-window by default, the problem disappears.
Comment 15 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-03-29 20:13:11 PDT
That's really weird as I don't know how that attribute would be causing this since it's a native titlebar being drawn.  I don't see native code checking the attribute directly and I don't know how our CSS would cause this. Perhaps it's something the TabsInTitlebar code does? If removing the attribute by default works and doesn't regress perf. then it's probably acceptable at this point but this seems really strange and I am curious about the underlying issue.
Comment 16 User image Mike Conley (:mconley) 2014-03-29 20:35:46 PDT
I agree, very weird - and in a more recent build, removing that attribute doesn't seem to fix the issue - but I'm pretty certain this patch is when the problem was introduced, or (again) exposed.

Let me dig into it further.
Comment 17 User image Mike Conley (:mconley) 2014-03-30 13:28:20 PDT
I suspect that Matt is right, and that this has less to do with CSS rules concerning tabsintitlebar, and more to do with the fact that the tabsintitlebar attribute is used to determine whether or not TabsInTitlebar.enabled is true or false. Specifically, I think this is related to the updateTitlebarDisplay function, which reads TabsInTitlebar.enabled, and adds / removes chromemargin based on its value.

I have a hypothesis that this problem might occur when adding, and then removing the chromemargin attribute on a background window. I'll see if I can show that.
Comment 18 User image Mike Conley (:mconley) 2014-03-31 08:08:37 PDT
Created attachment 8399448 [details] [diff] [review]
Workaround Idea 1 - v1

This seems to work for me. Basically, we persist the chromemargin attribute so that when tabs aren't in the titlebar, we don't set and then quickly remove that attribute (which I believe is contributing to this problem).

I've only tested this on Windows - I want to try it on the other two OS's before requesting a real review.

Thoughts, Matt?
Comment 19 User image Dão Gottwald [:dao] 2014-03-31 08:15:14 PDT
(In reply to Mike Conley (:mconley) from comment #17)
> I suspect that Matt is right, and that this has less to do with CSS rules
> concerning tabsintitlebar, and more to do with the fact that the
> tabsintitlebar attribute is used to determine whether or not
> TabsInTitlebar.enabled is true or false. Specifically, I think this is
> related to the updateTitlebarDisplay function, which reads
> TabsInTitlebar.enabled, and adds / removes chromemargin based on its value.
> 
> I have a hypothesis that this problem might occur when adding, and then
> removing the chromemargin attribute on a background window. I'll see if I
> can show that.

Why are we adding and then removing the attribute? Are you saying that updateTitlebarDisplay is called twice, first with TabsInTitlebar.enabled and then with !TabsInTitlebar.enabled?
Comment 20 User image Mike Conley (:mconley) 2014-03-31 08:57:08 PDT
(In reply to Dão Gottwald [:dao] from comment #19)
> Why are we adding and then removing the attribute? Are you saying that
> updateTitlebarDisplay is called twice, first with TabsInTitlebar.enabled and
> then with !TabsInTitlebar.enabled?

No, it's not called twice - it's called once per window on open, as one would hope / expect.

However, in the case of tabs not being in the titlebar, that means _removing_ the chromemargin attribute that is already applied to the main-window (since main-window has this attribute by default). I think removing that attribute in the midst of other windows being opened is what's causing this bug.

By using XUL persistence on that attribute, I think we "remove" that attribute sooner (my guess is it never comes out of the XUL parser applied in the first place).
Comment 21 User image Mike Conley (:mconley) 2014-03-31 09:09:39 PDT
Tested on OS X and Linux, and all seems well.
Comment 22 User image Dão Gottwald [:dao] 2014-03-31 09:41:20 PDT
(In reply to Mike Conley (:mconley) from comment #20)
> (In reply to Dão Gottwald [:dao] from comment #19)
> > Why are we adding and then removing the attribute? Are you saying that
> > updateTitlebarDisplay is called twice, first with TabsInTitlebar.enabled and
> > then with !TabsInTitlebar.enabled?
> 
> No, it's not called twice - it's called once per window on open, as one
> would hope / expect.
> 
> However, in the case of tabs not being in the titlebar, that means
> _removing_ the chromemargin attribute that is already applied to the
> main-window (since main-window has this attribute by default).

So this is bug 893682's fault? Can we stop doing that? I see no data in that bug proving that it's actually a perf win.

Also, it still seems that there's an underlying widget bug; removing or setting the chromemargin attribute should never have this effect.
Comment 23 User image Mike Conley (:mconley) 2014-03-31 10:20:17 PDT
(In reply to Dão Gottwald [:dao] from comment #22)
> (In reply to Mike Conley (:mconley) from comment #20)
> > (In reply to Dão Gottwald [:dao] from comment #19)
> > > Why are we adding and then removing the attribute? Are you saying that
> > > updateTitlebarDisplay is called twice, first with TabsInTitlebar.enabled and
> > > then with !TabsInTitlebar.enabled?
> > 
> > No, it's not called twice - it's called once per window on open, as one
> > would hope / expect.
> > 
> > However, in the case of tabs not being in the titlebar, that means
> > _removing_ the chromemargin attribute that is already applied to the
> > main-window (since main-window has this attribute by default).
> 
> So this is bug 893682's fault? Can we stop doing that? I see no data in that
> bug proving that it's actually a perf win.
> 

That's a good point. Let me see if backing that out fixes the problem (the evidence so far suggests that it should). If that's the case, I'll do a comparison to see how this affects our ts_paint and tpaint numbers (because I believe bug 893682 was landed to lower their values).

> Also, it still seems that there's an underlying widget bug; removing or
> setting the chromemargin attribute should never have this effect.

I would agree.
Comment 24 User image Mike Conley (:mconley) 2014-03-31 10:26:10 PDT
(In reply to Dão Gottwald [:dao] from comment #22)
> So this is bug 893682's fault? Can we stop doing that? I see no data in that
> bug proving that it's actually a perf win.
> 

There's no direct data, correct - but MattN _does_ say in https://bugzilla.mozilla.org/show_bug.cgi?id=893682#c0 that it showed a perf win on OS X, on ts_paint and / or tpaint.

And MattN is an honourable fellow. I imagine we can trust him. :)

However, I'll do the Try push anyway just to confirm.
Comment 25 User image Mike Conley (:mconley) 2014-03-31 10:40:54 PDT
Baseline: https://tbpl.mozilla.org/?tree=Try&rev=acb6167afeac
Backout:  https://tbpl.mozilla.org/?tree=Try&rev=1967bc972c85

Some observations, when I wrote this patch:

1) MattN did report an improvement for OS X in comment 0 of bug 893682... but from what I can tell, the attribute was already being set in the main-window for OS X. So I don't see what other advantages this gave OS X.

2) Removing the chromemargin attribute on main-window causes some pretty horrific effects when restoring multiple windows with tabs in titlebar (which I would bet heavily will be the more-used mode).

Example: http://screencast.com/t/sDYhCmJTjS

So even if it turns out that reverting bug 893682 doesn't result in a performance regression, I think we'll have traded a rather innocuous bug for one that's more visible and gross.

I think the happy compromise might be going with the approach in my patch, which is to persist the chromemargin attribute. That seems to give us best of both worlds.

But I'm open to argument, as always.
Comment 26 User image Mike Conley (:mconley) 2014-03-31 10:42:52 PDT
So, while we're waiting for that data to come in, I'll polish up the patch that I'm suggesting, and then I guess we'll weigh our options.
Comment 27 User image Dão Gottwald [:dao] 2014-03-31 10:46:22 PDT
(In reply to Mike Conley (:mconley) from comment #25)
> Example: http://screencast.com/t/sDYhCmJTjS
> 
> So even if it turns out that reverting bug 893682 doesn't result in a
> performance regression, I think we'll have traded a rather innocuous bug for
> one that's more visible and gross.

It looks like the chromemargin attribute is being set too late or something. In any case, this didn't happen before bug 893682 landed. Something must have regressed this in the meantime.

> I think the happy compromise might be going with the approach in my patch,
> which is to persist the chromemargin attribute. That seems to give us best
> of both worlds.

Except that it's a hack and permanently pollutes localstore.rdf (can only be undone with a migration step). Ideally, we'd just get the widget bug fixed.
Comment 28 User image Mike Conley (:mconley) 2014-03-31 10:46:39 PDT
Comment on attachment 8399448 [details] [diff] [review]
Workaround Idea 1 - v1

Actually, couldn't find any clean-up to do. I still want MattN to weigh in here, if possible.
Comment 29 User image Mike Conley (:mconley) 2014-03-31 10:58:17 PDT
(In reply to Dão Gottwald [:dao] from comment #27)
> (In reply to Mike Conley (:mconley) from comment #25)
> > Example: http://screencast.com/t/sDYhCmJTjS
> > 
> > So even if it turns out that reverting bug 893682 doesn't result in a
> > performance regression, I think we'll have traded a rather innocuous bug for
> > one that's more visible and gross.
> 
> It looks like the chromemargin attribute is being set too late or something.
> In any case, this didn't happen before bug 893682 landed. Something must
> have regressed this in the meantime.

Are we absolutely sure that didn't happen before bug 893682 landed? We should check.

> 
> > I think the happy compromise might be going with the approach in my patch,
> > which is to persist the chromemargin attribute. That seems to give us best
> > of both worlds.
> 
> Except that it's a hack and permanently pollutes localstore.rdf (can only be
> undone with a migration step). Ideally, we'd just get the widget bug fixed.

I agree, but I suspect that such a fix would be too risky to justify uplifting to the Beta branch (which, since this is a P3, we want to do).

So I've identified a hack-y workaround that might buy us some time until we fix the actual underlying issue. Maybe we can land the hack on Beta, and try to fix the widget issue on central / Aurora.

I'm just assuming that the widget fix would be too risky for Beta. I suppose I should ensure that assumption is justified.

Tim - a blame suggests you've poked around in the relevant widget code. Do you have any idea what the problem might be, and how extensive a fix might be required?

And if you know the answer to either of those questions, in your estimation, do you believe that a fix would be safe enough to uplift to the Beta branch?
Comment 30 User image Dão Gottwald [:dao] 2014-03-31 11:08:04 PDT
(In reply to Mike Conley (:mconley) from comment #29)
> Are we absolutely sure that didn't happen before bug 893682 landed? We
> should check.

99% sure, yes. Note that chromemargin isn't even new, we've been setting it for ages and I've never seen the effect recorded in your screencast.

> So I've identified a hack-y workaround that might buy us some time until we
> fix the actual underlying issue. Maybe we can land the hack on Beta, and try
> to fix the widget issue on central / Aurora.

Note again that persisting stuff entails more persistency than one might expect. Once put in localstore.rdf, it stays there for good and will be restored for eternity unless we manually remove it from the file. So landing on beta would in fact affect subsequent releases.
Comment 31 User image Mike Conley (:mconley) 2014-03-31 11:11:06 PDT
(In reply to Mike Conley (:mconley) from comment #29)
> (In reply to Dão Gottwald [:dao] from comment #27)
> > (In reply to Mike Conley (:mconley) from comment #25)
> > > Example: http://screencast.com/t/sDYhCmJTjS
> > > 
> > > So even if it turns out that reverting bug 893682 doesn't result in a
> > > performance regression, I think we'll have traded a rather innocuous bug for
> > > one that's more visible and gross.
> > 
> > It looks like the chromemargin attribute is being set too late or something.
> > In any case, this didn't happen before bug 893682 landed. Something must
> > have regressed this in the meantime.
> 
> Are we absolutely sure that didn't happen before bug 893682 landed? We
> should check.
> 

Dao appears to be correct: http://screencast.com/t/eGtLmehO4
Comment 32 User image Mike Conley (:mconley) 2014-03-31 11:11:54 PDT
(In reply to Dão Gottwald [:dao] from comment #30)
> Note again that persisting stuff entails more persistency than one might
> expect. Once put in localstore.rdf, it stays there for good and will be
> restored for eternity unless we manually remove it from the file. So landing
> on beta would in fact affect subsequent releases.

Yep, understood - we'd need a migration to remove that persisted attribute down the line.
Comment 33 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-03-31 13:47:15 PDT
For my own benefit, it sounds like the issue is (and please correct me if I'm wrong):
  If you remove the chromemargin attribute that is already applied to the main-window while opening other main windows, multiple of the main windows will look active (meaning that their title bars will show the active color)


> Tim - a blame suggests you've poked around in the relevant widget code. Do
> you have any idea what the problem might be, and how extensive a fix might
> be required?

I did at one point poke around in this code, but it's been a while so I'll need to refresh my knowledge and do some investigation to figure out what the problem is. I imagine that the fix will be fairly small code-wise - maybe a handful of lines changed - but will be complicated to test because of interactions with themes, OS version, and (based on the "while opening other main windows" part of the problem description) timing.

> And if you know the answer to either of those questions, in your estimation,
> do you believe that a fix would be safe enough to uplift to the Beta branch?

My gut instinct is "no, we probably wouldn't be able to test the widget patch extensively enough for uplift to Beta." Obviously that feeling might change if/when a widget patch actually exists.
Comment 34 User image Mike Conley (:mconley) 2014-03-31 13:50:56 PDT
(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #33)
> For my own benefit, it sounds like the issue is (and please correct me if
> I'm wrong):
>   If you remove the chromemargin attribute that is already applied to the
> main-window while opening other main windows, multiple of the main windows
> will look active (meaning that their title bars will show the active color)
> 

Yes, that's a good summation of my current understanding of the problem.

> 
> > Tim - a blame suggests you've poked around in the relevant widget code. Do
> > you have any idea what the problem might be, and how extensive a fix might
> > be required?
> 
> I did at one point poke around in this code, but it's been a while so I'll
> need to refresh my knowledge and do some investigation to figure out what
> the problem is. I imagine that the fix will be fairly small code-wise -
> maybe a handful of lines changed - but will be complicated to test because
> of interactions with themes, OS version, and (based on the "while opening
> other main windows" part of the problem description) timing.
> 
> > And if you know the answer to either of those questions, in your estimation,
> > do you believe that a fix would be safe enough to uplift to the Beta branch?
> 
> My gut instinct is "no, we probably wouldn't be able to test the widget
> patch extensively enough for uplift to Beta." Obviously that feeling might
> change if/when a widget patch actually exists.

Great, thanks.
Comment 35 User image Mike Conley (:mconley) 2014-04-01 08:58:46 PDT
Hey Matt - do you have any preference / feeling on how to proceed for Beta?
Comment 36 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-01 15:00:31 PDT
(In reply to Mike Conley (:mconley) from comment #25)
> Baseline: https://tbpl.mozilla.org/?tree=Try&rev=acb6167afeac
> Backout:  https://tbpl.mozilla.org/?tree=Try&rev=1967bc972c85

So backing that out would cause ~5-6% regression on tpaint for Win7 (it makes sense not to see a change on XP on Linux since IIRC, chromemargin doesn't get set there after startup by default and as you pointed out, OS X already had the right value).
Comment 37 User image Dão Gottwald [:dao] 2014-04-01 15:05:53 PDT
(In reply to Matthew N. [:MattN] from comment #36)
> (In reply to Mike Conley (:mconley) from comment #25)
> > Baseline: https://tbpl.mozilla.org/?tree=Try&rev=acb6167afeac
> > Backout:  https://tbpl.mozilla.org/?tree=Try&rev=1967bc972c85
> 
> So backing that out would cause ~5-6% regression on tpaint for Win7

Possibly related to http://screencast.com/t/sDYhCmJTjS -- not only does it look laggy, I can also imagine how it's actually slower. Since this regressed in the meantime, I'm not sure we're looking at the right numbers.
Comment 38 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-01 16:10:04 PDT
Comment on attachment 8399448 [details] [diff] [review]
Workaround Idea 1 - v1

Review of attachment 8399448 [details] [diff] [review]:
-----------------------------------------------------------------

Should you also change https://mxr.mozilla.org/mozilla-central/source/toolkit/modules/LightweightThemeConsumer.jsm?rev=87e9f48141ea#157 in case extensions are modifying attributes?

It would be good to know what the proper fix is before judging whether to land this on Aurora or Beta since if we don't land this on m-c we're just deferring the problem until 31. I don't like polluting localstore since even if we do a migration with the normal migrateui stuff, it won't re-migrate users who ever downgrade to a build with persist set for the attribute. I realize it's not going to be a huge amount of users but it does affect regression hunting where you need to use a specific profile (rather than a fresh one on each build). If we run a custom migration for this, it slows down every startup unless we're smart and hook into mstone pref version/buildid changes (which migrateUI doesn't currently do IIRC).

There is a lot of information about possible causes in the bug but it's a bit hard for me to follow: Did we figured out what caused the painting regression since bug 893682 landed? If we find the regression range for that perhaps there is a nicer fix?
Comment 39 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-01 19:44:52 PDT
I'm doing some investigation now and I think I understand what the problem is (for the issue of the bug, not related to backouts). There is a difference in the order that chromemargin gets reset to -1 on all sides compared to when we get a Windows WM_NCACTIVATE message to tell us to change the caption state. 

The problem is that we handle WM_NCACTIVATE messages when we're drawing in the titlebar and the following happens on a Window that gets stuck in the wrong state:
1) Window is initially created with 0,2,2,2 chromemargin due to bug 893682.
2) WM_NCACTIVATE is received to indicate the window is active. Since we have a custom non-client region (i.e. the chromemargin isn't all -1), we handle the caption state ourselves in UpdateGetWindowInfoCaptionStatus and set it active.
3) The chromemargin of -1,-1,-1,-1 propagates to widget code due to TabsInTitlebar code.
4) WM_NCACTIVATE is received to indicate the window is no longer active (since a later window opened on top of it) but due to step #3 setting a default client margin, we think we don't need to update our state of whether the window is active so UpdateGetWindowInfoCaptionStatus doesn't get called.

On windows that don't have this bug, steps 3 & 4 are in reverse order.

Now I'm going to look into whether this is something I can figure out a fix for.
Comment 40 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-02 02:35:42 PDT
Created attachment 8400514 [details] [diff] [review]
Workaround Idea 2 - v.1 - Initialize TabsInTitlebar on first update

While doing the debugging, I noticed the initial chromemargin value was passed to widget code and then getting replaced with the correct value. By initializing earlier (as part of the load handler), only the correct chromemargin gets passed to widget code. It's possible this will break something and we don't really have test coverage for TabsInTitlebar still :( This does avoid the localstore issue and seems like a worthwhile optimization if there aren't regressions from it. If we required a certain initialization path then we should probably enforce or at least document that.

I tested quickly on 10.9 and Win7 Glass & Classic (with and without the titlebar for all of them). I also tried some different window sizemodes and other tweaks. I'd like to look through more past TabsInTitlebar bugs before landing. I would definitely want bake time on this before Beta if it's acceptable. If we want to lower the risk, perhaps we could only do the change I added if we're using the classic theme (assuming this bug is actually limited to it).

Perf check:
Baseline:     https://tbpl.mozilla.org/?tree=Try&rev=e906cbf72a85
Workaround 2: https://tbpl.mozilla.org/?tree=Try&rev=f9026165749a
Comparison: http://compare-talos.mattn.ca/?oldRevs=e906cbf72a85&newRev=f9026165749a&submit=true
Comment 41 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-02 03:13:32 PDT
Created attachment 8400535 [details] [diff] [review]
Widget logging and some attempts at fixing

I wasn't able to come up with a proper fix. It seems we things are in the correct state but we need to tell Windows about it since we miss the appropriate WM_(NC?)ACTIVATE message due to the different ordering of changes. I tried to send a WM_NCACTIVATE but perhaps WM_ACTIVATE was needed too? It seems like we may need something like NeedsToHandleNCActivateDelayed/sPendingNCACTIVATE to get the right event order but I'm not sure exactly.

Tim, do you think you'll have time to look into a proper solution for this bug for 31 within the next week or so in order to know how to handle 29/30? (To know which releases we'll need a workaround on depending on the risk).

Thanks
Comment 42 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-02 03:30:50 PDT
Created attachment 8400543 [details]
Attachment 8400535 [details] [diff] output

Three windows using the Classic theme on Win7:

1st window (opens initially before restoring the session and gets re-used): 
hwnd: 1405e4
titlebar color is CORRECT (inactive since it's at the bottom of the stack after restoration)

2nd window:
hwnd: 6604e2
titlebar color is INCORRECT (has active color but it's not the foreground window)

3rd window:
hwnd: 1d0276
titlebar color is CORRECT (it's the topmost and active window after the restore and switching to another application causes it to look inactive).

Window 2 is the one to focus investigation on since it looks incorrect.
Comment 43 User image Dão Gottwald [:dao] 2014-04-02 04:03:28 PDT
(In reply to Matthew N. [:MattN] from comment #40)
> Created attachment 8400514 [details] [diff] [review]
> Workaround Idea 2 - v.1 - Initialize TabsInTitlebar on first update
> 
> While doing the debugging, I noticed the initial chromemargin value was
> passed to widget code and then getting replaced with the correct value. By
> initializing earlier (as part of the load handler), only the correct
> chromemargin gets passed to widget code.

TabsInTitlebar.init() is already part of the load handler, your patch doesn't change this. Is _update called before the load handler?
Comment 44 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-02 04:07:06 PDT
(In reply to Dão Gottwald [:dao] from comment #43)
> (In reply to Matthew N. [:MattN] from comment #40)
> > Created attachment 8400514 [details] [diff] [review]
> > Workaround Idea 2 - v.1 - Initialize TabsInTitlebar on first update
> > 
> > While doing the debugging, I noticed the initial chromemargin value was
> > passed to widget code and then getting replaced with the correct value. By
> > initializing earlier (as part of the load handler), only the correct
> > chromemargin gets passed to widget code.
> 
> TabsInTitlebar.init() is already part of the load handler, your patch
> doesn't change this. Is _update called before the load handler?

Yes, IIRC from the resize handler in tabbrowser.xml to updateAppearance
Comment 45 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-02 04:12:41 PDT
(In reply to Matthew N. [:MattN] from comment #44)
> (In reply to Dão Gottwald [:dao] from comment #43)
> > (In reply to Matthew N. [:MattN] from comment #40)
> > > Created attachment 8400514 [details] [diff] [review]
> > > Workaround Idea 2 - v.1 - Initialize TabsInTitlebar on first update
> > > 
> > > While doing the debugging, I noticed the initial chromemargin value was
> > > passed to widget code and then getting replaced with the correct value. By
> > > initializing earlier (as part of the load handler), only the correct
> > > chromemargin gets passed to widget code.
> > 
> > TabsInTitlebar.init() is already part of the load handler, your patch
> > doesn't change this. Is _update called before the load handler?
> 
> Yes, IIRC from the resize handler in tabbrowser.xml to updateAppearance

Or actually now I'm thinking I meant to write that by having  *_update* complete (not init) earlier as part of the load handler, the correct chromemargin is always used. I'll need to double-check tomorrow what is initializing it earlier as I need to sleep now. I'm sure there was a change on Win7 Classic though (whether it's now too early, I don't know yet).
Comment 46 User image Dão Gottwald [:dao] 2014-04-02 04:41:26 PDT
(In reply to Matthew N. [:MattN] from comment #45)
> (In reply to Matthew N. [:MattN] from comment #44)
> > (In reply to Dão Gottwald [:dao] from comment #43)
> > > (In reply to Matthew N. [:MattN] from comment #40)
> > > > Created attachment 8400514 [details] [diff] [review]
> > > > Workaround Idea 2 - v.1 - Initialize TabsInTitlebar on first update
> > > > 
> > > > While doing the debugging, I noticed the initial chromemargin value was
> > > > passed to widget code and then getting replaced with the correct value. By
> > > > initializing earlier (as part of the load handler), only the correct
> > > > chromemargin gets passed to widget code.
> > > 
> > > TabsInTitlebar.init() is already part of the load handler, your patch
> > > doesn't change this. Is _update called before the load handler?
> > 
> > Yes, IIRC from the resize handler in tabbrowser.xml to updateAppearance
> 
> Or actually now I'm thinking I meant to write that by having  *_update*
> complete (not init) earlier as part of the load handler, the correct
> chromemargin is always used.

Your patch doesn't do this either.
Comment 47 User image Dão Gottwald [:dao] 2014-04-02 04:44:42 PDT
Created attachment 8400579 [details] [diff] [review]
another workaround

How's this? This simply calls _update from init, thereby avoiding http://screencast.com/t/sDYhCmJTjS and removing the need for the chromemargin attribute to be hardcoded in browser.xul.
Comment 48 User image Mike Conley (:mconley) 2014-04-02 07:33:26 PDT
Comment on attachment 8400579 [details] [diff] [review]
another workaround

So, while this fixes the original bug, I think it introduces one that's just as glitchy / gross - the tabs in the restored windows don't appear right away when we have tabs in the titlebar:

http://www.screencast.com/t/uiOFFjQlD5qR
Comment 49 User image Dão Gottwald [:dao] 2014-04-02 07:41:48 PDT
(In reply to Mike Conley (:mconley) from comment #48)
> Comment on attachment 8400579 [details] [diff] [review]
> another workaround
> 
> So, while this fixes the original bug, I think it introduces one that's just
> as glitchy / gross - the tabs in the restored windows don't appear right
> away when we have tabs in the titlebar:

I didn't see this while testing on Windows and can't make much sense of it either. If something goes wrong in TabsInTitlebar / updateTitlebarDisplay, I guess the tab strip may be misaligned or something, but why would it be hidden at some point?
Comment 50 User image Mike Conley (:mconley) 2014-04-02 07:55:05 PDT
Seems to be related to the _update you added to TabsInTitlebar.init - removing that seems to put us back into a better state.

Unsure what's happening in this case - perhaps the nav-bar is somehow overlapping the tabstrip?

Removing the _update from init fixes the problem I mentioned, _and_ seems to address the active titlebar state bug as well. I'd be curious to know if there's a performance impact as well.
Comment 51 User image Dão Gottwald [:dao] 2014-04-02 08:00:08 PDT
I see it now after trying further, it's just faster and therefore less noticeable than in your screencast.

> Removing the _update from init fixes the problem I mentioned, _and_ seems to
> address the active titlebar state bug as well. I'd be curious to know if
> there's a performance impact as well.

Wouldn't this regress http://screencast.com/t/sDYhCmJTjS ?
Comment 52 User image Mike Conley (:mconley) 2014-04-02 08:10:03 PDT
(In reply to Dão Gottwald [:dao] from comment #51)
> I see it now after trying further, it's just faster and therefore less
> noticeable than in your screencast.
> 
> > Removing the _update from init fixes the problem I mentioned, _and_ seems to
> > address the active titlebar state bug as well. I'd be curious to know if
> > there's a performance impact as well.
> 
> Wouldn't this regress http://screencast.com/t/sDYhCmJTjS ?

Doesn't seem to be the same effect:

http://screencast.com/t/nhdaqIsT

We don't show the menu right away, but we also don't show the white boxes. We do, however, fail to put the tabs into the titlebar right away, so there's a bit of a jump when that kicks in, which isn't great. I also demonstrate that in this screencast.
Comment 53 User image Mike Conley (:mconley) 2014-04-02 08:12:18 PDT
(In reply to Mike Conley (:mconley) from comment #52)
> (In reply to Dão Gottwald [:dao] from comment #51)
> > I see it now after trying further, it's just faster and therefore less
> > noticeable than in your screencast.
> > 
> > > Removing the _update from init fixes the problem I mentioned, _and_ seems to
> > > address the active titlebar state bug as well. I'd be curious to know if
> > > there's a performance impact as well.
> > 
> > Wouldn't this regress http://screencast.com/t/sDYhCmJTjS ?
> 
> Doesn't seem to be the same effect:
> 
> http://screencast.com/t/nhdaqIsT
> 
> We don't show the menu right away, but we also don't show the white boxes.
> We do, however, fail to put the tabs into the titlebar right away, so
> there's a bit of a jump when that kicks in, which isn't great. I also
> demonstrate that in this screencast.

Bah, I was wrong again. I failed to remove that _update line from my Windows 7 machine. Yes, you're right, it regresses us to the point of http://screencast.com/t/sDYhCmJTjS.
Comment 54 User image Dão Gottwald [:dao] 2014-04-02 08:16:43 PDT
Maybe we should start with figuring out when http://screencast.com/t/sDYhCmJTjS started happening in the first place (despite the fact that it was masked by bug 893682).
Comment 55 User image Mike Conley (:mconley) 2014-04-02 10:58:38 PDT
Comment on attachment 8400514 [details] [diff] [review]
Workaround Idea 2 - v.1 - Initialize TabsInTitlebar on first update

Hm. This might be the least crappy option, if we're OK eating what appears to be a ~5ms ts_paint regression on Windows 7.

Even discounting the outlier in the patched try build, at 9 builds each, the patched build averages a regression of about 2ms compared to the baseline.

If that's something Perf is willing to eat while we work on fixing the Windows widgets, I'm OK with that too.
Comment 56 User image Dão Gottwald [:dao] 2014-04-02 11:47:03 PDT
I don't think attachment 8400514 [details] [diff] [review] is well enough understood at this point. We can't just run the TabsInTitlebar code at some random point in the window opening process and hope for the best. If the resize handler runs too early, then so will TabsInTitlebar._update, which is no good. This code really isn't expected to run before the load event, hence attachment 8400579 [details] [diff] [review].
Comment 57 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-02 12:50:57 PDT
(In reply to Matthew N. [:MattN] from comment #41)
...
> Tim, do you think you'll have time to look into a proper solution for this
> bug for 31 within the next week or so in order to know how to handle 29/30?
> (To know which releases we'll need a workaround on depending on the risk).

Yes I think I'll be able to take a look at this tomorrow
Comment 58 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-03 15:34:08 PDT
(In reply to Mike Conley (:mconley) from comment #50)
> Unsure what's happening in this case - perhaps the nav-bar is somehow
> overlapping the tabstrip?

I think this is just because the tabstrip is collapsed by default in browser.xul. I've argued that shouldn't be the case before since we can't autohide the tabstrip anymore with one tab and most common window opening is one with tabs. I believe Dão disagreed with removing collapsed="true" before (I think because of the popup window case where we don't want tabs visible) so we didn't do it. This was during perf work IIRC to help ts_paint/tpaint.

https://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.xul?rev=7baa06f4b49c&mark=548#539
Comment 59 User image Dão Gottwald [:dao] 2014-04-04 07:06:58 PDT
(In reply to Matthew N. [:MattN] from comment #58)
> (In reply to Mike Conley (:mconley) from comment #50)
> > Unsure what's happening in this case - perhaps the nav-bar is somehow
> > overlapping the tabstrip?
> 
> I think this is just because the tabstrip is collapsed by default in
> browser.xul. I've argued that shouldn't be the case before since we can't
> autohide the tabstrip anymore with one tab and most common window opening is
> one with tabs.

It should be unhidden before the first paint. This indicates that we're painting too early, which may also be the cause of http://screencast.com/t/sDYhCmJTjS.
Comment 60 User image :Gijs 2014-04-07 06:50:24 PDT
(In reply to Dão Gottwald [:dao] from comment #59)
> (In reply to Matthew N. [:MattN] from comment #58)
> > (In reply to Mike Conley (:mconley) from comment #50)
> > > Unsure what's happening in this case - perhaps the nav-bar is somehow
> > > overlapping the tabstrip?
> > 
> > I think this is just because the tabstrip is collapsed by default in
> > browser.xul. I've argued that shouldn't be the case before since we can't
> > autohide the tabstrip anymore with one tab and most common window opening is
> > one with tabs.
> 
> It should be unhidden before the first paint. This indicates that we're
> painting too early, which may also be the cause of
> http://screencast.com/t/sDYhCmJTjS.

This makes sense to me. However, how/why do you define "too early"? That is, by what definition is it 'too' early for a paint to happen, and can we influence this or are you essentially saying this is a gfx issue?

(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #57)
> (In reply to Matthew N. [:MattN] from comment #41)
> ...
> > Tim, do you think you'll have time to look into a proper solution for this
> > bug for 31 within the next week or so in order to know how to handle 29/30?
> > (To know which releases we'll need a workaround on depending on the risk).
> 
> Yes I think I'll be able to take a look at this tomorrow

Tim, did you figure something out on the widget/Windows side of this?
Comment 61 User image Dão Gottwald [:dao] 2014-04-07 07:14:46 PDT
(In reply to :Gijs Kruitbosch from comment #60)
> (In reply to Dão Gottwald [:dao] from comment #59)
> > (In reply to Matthew N. [:MattN] from comment #58)
> > > (In reply to Mike Conley (:mconley) from comment #50)
> > > > Unsure what's happening in this case - perhaps the nav-bar is somehow
> > > > overlapping the tabstrip?
> > > 
> > > I think this is just because the tabstrip is collapsed by default in
> > > browser.xul. I've argued that shouldn't be the case before since we can't
> > > autohide the tabstrip anymore with one tab and most common window opening is
> > > one with tabs.
> > 
> > It should be unhidden before the first paint. This indicates that we're
> > painting too early, which may also be the cause of
> > http://screencast.com/t/sDYhCmJTjS.
> 
> This makes sense to me. However, how/why do you define "too early"? That is,
> by what definition is it 'too' early for a paint to happen,

Before we're done handling the load event would be too early, since that's where we generally set up the window such that it's painted as expected.

> and can we
> influence this or are you essentially saying this is a gfx issue?

If I remember correctly, widget code kicks off the first paint, but I suppose there may be areas where something could go wrong as well (and I wouldn't immediately rule out front-end code here).
Comment 62 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-07 17:25:03 PDT
(In reply to :Gijs Kruitbosch from comment #60)
[...]
> (In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #57)
> > (In reply to Matthew N. [:MattN] from comment #41)
> > ...
> > > Tim, do you think you'll have time to look into a proper solution for this
> > > bug for 31 within the next week or so in order to know how to handle 29/30?
> > > (To know which releases we'll need a workaround on depending on the risk).
> > 
> > Yes I think I'll be able to take a look at this tomorrow
> 
> Tim, did you figure something out on the widget/Windows side of this?

Thursday and Friday of last week got a little crazy.

I just tried reproducing this in a local build (changeset d8b2e3738785), a beta build from [1], and another beta build from [2]. I wasn't able to reproduce in any of those builds, and I'm guessing that the reason is that I'm using Windows 8.1. Stepping through in the debugger confirms that we early-return from the codepaths mentioned in comment 39 on Windows 8.1 because this - nsUXThemeData::CheckForCompositor() - returns true. I don't have a Windows 7 machine to test with so I won't be able to reproduce the issue, but I'll stare at the code a little longer to see if I can come up with an approach to test out.  

[1] https://ftp.mozilla.org/pub/mozilla.org/firefox/releases/latest-beta/win32/en-US
[2] https://www.mozilla.org/en-US/firefox/beta
Comment 63 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-07 22:37:25 PDT
(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #62)
> I just tried reproducing this in a local build (changeset d8b2e3738785), a
> beta build from [1], and another beta build from [2]. I wasn't able to
> reproduce in any of those builds, and I'm guessing that the reason is that
> I'm using Windows 8.1. 

Right, this bug is specific to the Windows Classic theme which isn't available on Windows 8 AFAIK. If you're fine working through remote desktop (or similar), I can give you access to a Win7 setup.
Comment 64 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-08 14:09:22 PDT
(In reply to Matthew N. [:MattN] from comment #63)
> (In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #62)
> > I just tried reproducing this in a local build (changeset d8b2e3738785), a
> > beta build from [1], and another beta build from [2]. I wasn't able to
> > reproduce in any of those builds, and I'm guessing that the reason is that
> > I'm using Windows 8.1. 
> 
> Right, this bug is specific to the Windows Classic theme which isn't
> available on Windows 8 AFAIK. If you're fine working through remote desktop
> (or similar), I can give you access to a Win7 setup.

Thanks! I'm setting up a Win7 VM to test on. If that turns out to be too much trouble or if I can't repro on the VM, I'll ask you about RDP access to your Win7 machine.
Comment 65 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-09 18:19:22 PDT
Created attachment 8404372 [details] [diff] [review]
Widget patch v1

(In reply to Matthew N. [:MattN] from comment #63)
> (In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #62)
> > I just tried reproducing this in a local build (changeset d8b2e3738785), a
> > beta build from [1], and another beta build from [2]. I wasn't able to
> > reproduce in any of those builds, and I'm guessing that the reason is that
> > I'm using Windows 8.1. 
> 
> Right, this bug is specific to the Windows Classic theme [...]

I was able to repro this bug without using the Windows Classic theme. It affects all themes on Windows 7, including the default theme.


The following analysis builds on the excellent work done in comment 39.


First, some raw notes:

`nsWindow::SetNonClientMargins`
  If the margins we're setting are non-default, we set `mCustomNonClient` to `true`
  If the margins we're setting are default, we set `mCustomNonClient` to `false` AND remove `kManageWindowInfoProperty`

In our processing of WM_NCACTIVATE:
  If `!mCustomNonClient`, we let Windows do its default processing
  If `mCustomNonClient`, we call `UpdateGetWindowInfoCaptionStatus`. We DO NOT call DefWndProc to do default processing.

`UpdateGetWindowInfoCaptionStatus`
  The first time this is called, it sets up our function `GetWindowInfoHook` to be a "hook" on the function `GetWindowInfo`.  This means that, whenever something in our app tries to call `GetWindowInfo`, our function `GetWindowInfoHook` will be called instead.
  Sets `kManageWindowInfoProperty` to true or false depending on the arg passed in

`GetWindowInfoHook`
  Gets `kManageWindowInfoProperty`
  Calls the real `GetWindowInfo` function by loading it directly from user32.dll
  Gets a PWINDOWINFO from `GetWindowInfo` (called pwi)
  Sets pwi->dwWindowStatus according to value of `kManageWindowInfoProperty`
  returns pwi


And now the pathology:

One of our windows receives a call to `nsWindow::SetNonClientMargins` setting the margins to non-default value (0, 2, 2, 2), causing `mCustomNonClient` to become `true`. Later, that window receives a WM_NCACTIVATE message telling it to become active. It calls `UpdateGetWindowInfoCaptionStatus` which sets up our `GetWindowInfo` hook. Now every call to `GetWindowInfo` made from our application will instead call our `GetWindowInfoHook` function. Additionally, the `kManageWindowInfoProperty` of our window gets set based on whether the window is becoming active or inactive. We do not do default processing of the WM_NCACTIVATE message, so whatever mechanism usually keeps track of whether the window is active is now out of date. At some later time, our window receives a call to `nsWindow::SetNonClientMargins` setting the margins to default value (-1, -1, -1, -1). This causes `kManageWindowInfoProperty` to be removed, which means that calls to `GetWindowInfo` will now return the out-of-date information about whether the window is active. When the window gets a WM_NCACTIVATE message telling it to go inactive, we send the message to `DefWndProc` to do default processing. `DefWndProc` checks `GetWindowInfo`, sees that the window is already inactive, and doesn't re-paint the title bar.


The attached patch causes us to (almost) always rely on our own information about whether the window is active or not. It causes us never to remove the `kManageWindowInfoProperty` and to always update that value from our processing of WM_NCACTIVATE.

Please try out the attached patch and let me know if you run into any issues.
Comment 66 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-09 18:20:54 PDT
Comment on attachment 8400535 [details] [diff] [review]
Widget logging and some attempts at fixing

Review of attachment 8400535 [details] [diff] [review]:
-----------------------------------------------------------------

Good investigation! Not marking f+ because I don't think we want to pursue this approach
Comment 67 User image Mike Conley (:mconley) 2014-04-09 19:44:28 PDT
(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #65)
> Created attachment 8404372 [details] [diff] [review]
> Widget patch v1
> 
> (In reply to Matthew N. [:MattN] from comment #63)
> > (In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #62)
> > > I just tried reproducing this in a local build (changeset d8b2e3738785), a
> > > beta build from [1], and another beta build from [2]. I wasn't able to
> > > reproduce in any of those builds, and I'm guessing that the reason is that
> > > I'm using Windows 8.1. 
> > 
> > Right, this bug is specific to the Windows Classic theme [...]
> 
> I was able to repro this bug without using the Windows Classic theme. It
> affects all themes on Windows 7, including the default theme.
> 
> 
> The following analysis builds on the excellent work done in comment 39.
> 
> 
> First, some raw notes:
> 
> `nsWindow::SetNonClientMargins`
>   If the margins we're setting are non-default, we set `mCustomNonClient` to
> `true`
>   If the margins we're setting are default, we set `mCustomNonClient` to
> `false` AND remove `kManageWindowInfoProperty`
> 
> In our processing of WM_NCACTIVATE:
>   If `!mCustomNonClient`, we let Windows do its default processing
>   If `mCustomNonClient`, we call `UpdateGetWindowInfoCaptionStatus`. We DO
> NOT call DefWndProc to do default processing.
> 
> `UpdateGetWindowInfoCaptionStatus`
>   The first time this is called, it sets up our function `GetWindowInfoHook`
> to be a "hook" on the function `GetWindowInfo`.  This means that, whenever
> something in our app tries to call `GetWindowInfo`, our function
> `GetWindowInfoHook` will be called instead.
>   Sets `kManageWindowInfoProperty` to true or false depending on the arg
> passed in
> 
> `GetWindowInfoHook`
>   Gets `kManageWindowInfoProperty`
>   Calls the real `GetWindowInfo` function by loading it directly from
> user32.dll
>   Gets a PWINDOWINFO from `GetWindowInfo` (called pwi)
>   Sets pwi->dwWindowStatus according to value of `kManageWindowInfoProperty`
>   returns pwi
> 
> 
> And now the pathology:
> 
> One of our windows receives a call to `nsWindow::SetNonClientMargins`
> setting the margins to non-default value (0, 2, 2, 2), causing
> `mCustomNonClient` to become `true`. Later, that window receives a
> WM_NCACTIVATE message telling it to become active. It calls
> `UpdateGetWindowInfoCaptionStatus` which sets up our `GetWindowInfo` hook.
> Now every call to `GetWindowInfo` made from our application will instead
> call our `GetWindowInfoHook` function. Additionally, the
> `kManageWindowInfoProperty` of our window gets set based on whether the
> window is becoming active or inactive. We do not do default processing of
> the WM_NCACTIVATE message, so whatever mechanism usually keeps track of
> whether the window is active is now out of date. At some later time, our
> window receives a call to `nsWindow::SetNonClientMargins` setting the
> margins to default value (-1, -1, -1, -1). This causes
> `kManageWindowInfoProperty` to be removed, which means that calls to
> `GetWindowInfo` will now return the out-of-date information about whether
> the window is active. When the window gets a WM_NCACTIVATE message telling
> it to go inactive, we send the message to `DefWndProc` to do default
> processing. `DefWndProc` checks `GetWindowInfo`, sees that the window is
> already inactive, and doesn't re-paint the title bar.
> 
> 
> The attached patch causes us to (almost) always rely on our own information
> about whether the window is active or not. It causes us never to remove the
> `kManageWindowInfoProperty` and to always update that value from our
> processing of WM_NCACTIVATE.
> 
> Please try out the attached patch and let me know if you run into any issues.

Hey Tim, I tried your patch, and I'm still seeing the original problem occur... was I only supposed to apply your patch, or one of the others as well?
Comment 68 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-09 20:44:00 PDT
(In reply to Mike Conley (:mconley) from comment #67)
> (In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #65)
> > Created attachment 8404372 [details] [diff] [review]
> > Widget patch v1
> > 
> > (In reply to Matthew N. [:MattN] from comment #63)
> > > (In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #62)
> > > > I just tried reproducing this in a local build (changeset d8b2e3738785), a
> > > > beta build from [1], and another beta build from [2]. I wasn't able to
> > > > reproduce in any of those builds, and I'm guessing that the reason is that
> > > > I'm using Windows 8.1. 
> > > 
> > > Right, this bug is specific to the Windows Classic theme [...]
> > 
> > I was able to repro this bug without using the Windows Classic theme. It
> > affects all themes on Windows 7, including the default theme.
> > 
> > 
> > The following analysis builds on the excellent work done in comment 39.
> > 
> > 
> > First, some raw notes:
> > 
> > `nsWindow::SetNonClientMargins`
> >   If the margins we're setting are non-default, we set `mCustomNonClient` to
> > `true`
> >   If the margins we're setting are default, we set `mCustomNonClient` to
> > `false` AND remove `kManageWindowInfoProperty`
> > 
> > In our processing of WM_NCACTIVATE:
> >   If `!mCustomNonClient`, we let Windows do its default processing
> >   If `mCustomNonClient`, we call `UpdateGetWindowInfoCaptionStatus`. We DO
> > NOT call DefWndProc to do default processing.
> > 
> > `UpdateGetWindowInfoCaptionStatus`
> >   The first time this is called, it sets up our function `GetWindowInfoHook`
> > to be a "hook" on the function `GetWindowInfo`.  This means that, whenever
> > something in our app tries to call `GetWindowInfo`, our function
> > `GetWindowInfoHook` will be called instead.
> >   Sets `kManageWindowInfoProperty` to true or false depending on the arg
> > passed in
> > 
> > `GetWindowInfoHook`
> >   Gets `kManageWindowInfoProperty`
> >   Calls the real `GetWindowInfo` function by loading it directly from
> > user32.dll
> >   Gets a PWINDOWINFO from `GetWindowInfo` (called pwi)
> >   Sets pwi->dwWindowStatus according to value of `kManageWindowInfoProperty`
> >   returns pwi
> > 
> > 
> > And now the pathology:
> > 
> > One of our windows receives a call to `nsWindow::SetNonClientMargins`
> > setting the margins to non-default value (0, 2, 2, 2), causing
> > `mCustomNonClient` to become `true`. Later, that window receives a
> > WM_NCACTIVATE message telling it to become active. It calls
> > `UpdateGetWindowInfoCaptionStatus` which sets up our `GetWindowInfo` hook.
> > Now every call to `GetWindowInfo` made from our application will instead
> > call our `GetWindowInfoHook` function. Additionally, the
> > `kManageWindowInfoProperty` of our window gets set based on whether the
> > window is becoming active or inactive. We do not do default processing of
> > the WM_NCACTIVATE message, so whatever mechanism usually keeps track of
> > whether the window is active is now out of date. At some later time, our
> > window receives a call to `nsWindow::SetNonClientMargins` setting the
> > margins to default value (-1, -1, -1, -1). This causes
> > `kManageWindowInfoProperty` to be removed, which means that calls to
> > `GetWindowInfo` will now return the out-of-date information about whether
> > the window is active. When the window gets a WM_NCACTIVATE message telling
> > it to go inactive, we send the message to `DefWndProc` to do default
> > processing. `DefWndProc` checks `GetWindowInfo`, sees that the window is
> > already inactive, and doesn't re-paint the title bar.
> > 
> > 
> > The attached patch causes us to (almost) always rely on our own information
> > about whether the window is active or not. It causes us never to remove the
> > `kManageWindowInfoProperty` and to always update that value from our
> > processing of WM_NCACTIVATE.
> > 
> > Please try out the attached patch and let me know if you run into any issues.
> 
> Hey Tim, I tried your patch, and I'm still seeing the original problem
> occur... was I only supposed to apply your patch, or one of the others as
> well?

Nope, just the one patch! Let me see if I can get a repro with the patch applied in my VM...
Comment 69 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-09 21:13:33 PDT
Well, shoot. I had been testing with the "Basic" theme, but now that I test with "Classic" the problem is still there. I'll debug some more.
Comment 70 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-09 22:32:00 PDT
(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #69)
> Well, shoot. I had been testing with the "Basic" theme, but now that I test
> with "Classic" the problem is still there. I'll debug some more.

Ok this mystery is now solved. Apparently when the "Basic" theme is applied the fix works properly, but when the "Classic" theme is applied we fail to create the `GetWindowInfo` hook. Thus, the patch has no effect when the "Classic" theme is applied.

I can't think of a reason why our ability to create the `GetWindowInfo` hook would be dependent on the theme. I imagine that that call shouldn't be failing, but I'll have to keep investigating to see if maybe this is expected. If it _is_ expected, then the fact that our windows' active/inactive states get out of touch with reality should be an obvious consequence.
Comment 71 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-12 00:25:39 PDT
Created attachment 8405735 [details] [diff] [review]
Widget patch v2

(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #70)
> (In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #69)
> > Well, shoot. I had been testing with the "Basic" theme, but now that I test
> > with "Classic" the problem is still there. I'll debug some more.
> 
> Ok this mystery is now solved. Apparently when the "Basic" theme is applied
> the fix works properly, but when the "Classic" theme is applied we fail to
> create the `GetWindowInfo` hook. Thus, the patch has no effect when the
> "Classic" theme is applied.
> 
> I can't think of a reason why our ability to create the `GetWindowInfo` hook
> would be dependent on the theme. I imagine that that call shouldn't be
> failing, but I'll have to keep investigating to see if maybe this is
> expected. If it _is_ expected, then the fact that our windows'
> active/inactive states get out of touch with reality should be an obvious
> consequence.

It turns out that the hook is being created just fine. It appears that when the "basic" theme is applied, `DefWndProc` doesn't actually bother calling `GetWindowInfo`, so Windows never sees our custom tracking of whether the window is active. We can't change how `DefWndProc` handles the `WM_NCACTIVATE` message, so our best bet is to try to inform Windows of the window's active/inactive state when we go from custom margins to default margins (i.e. when we hand off processing of titlebar messages back to `DefWndProc`)

This patch ended up looking a lot more like MattN's original patch (attachment 8400535 [details] [diff] [review]) than I expected, so maybe I should have set f+ on that patch :)

This patch sends a `WM_NCACTIVATE` to our window when we switch from custom margins back to default margins.

Let me know if this patch causes any problems.
Comment 72 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-12 00:34:17 PDT
Created attachment 8405737 [details] [diff] [review]
Widget patch v3

This is the same as the v2 patch except this removes all the debug logging
Comment 73 User image Mike Conley (:mconley) 2014-04-12 13:37:17 PDT
Testing patch now...
Comment 74 User image Mike Conley (:mconley) 2014-04-12 14:06:01 PDT
Comment on attachment 8405737 [details] [diff] [review]
Widget patch v3

This definitely works for me!
Comment 75 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-13 04:16:47 PDT
Thanks for the patch Tim and thanks for testing Mike.

Tim, do you think this patch is low-enough risk for Beta 29? It would probably have to land on the beta branch by Thursday (ideally with bake time on m-c before that). Otherwise, we'll have to look at the workarounds again.
Comment 76 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-14 12:33:32 PDT
Comment on attachment 8405737 [details] [diff] [review]
Widget patch v3

(In reply to Matthew N. [:MattN] from comment #75)
> Thanks for the patch Tim and thanks for testing Mike.
> 
> Tim, do you think this patch is low-enough risk for Beta 29? It would
> probably have to land on the beta branch by Thursday (ideally with bake time
> on m-c before that). Otherwise, we'll have to look at the workarounds again.

I think this is a low-risk patch:
  It causes `UpdateGetWindowInfoCaptionStatus` to be called every time we process a `WM_NCACTIVATE`, and it causes our custom window-active/inactive tracking to never be removed. Both of these effects would happen anyway for windows with custom non-client margins. We've just expanded them to happen also for windows with default non-client margins).
  It causes a `WM_NCACTIVATE` message to be sent when we go from custom to default non-client margins. There is precedent already for us sending the `WM_NCACTIVATE` message in `nsWindow` (search for "WM_NCACTIVATE" in nsWindow.cpp) so this shouldn't cause unforeseen effects.

I'm not sure who should review this patch: jimm is on vacation and bbondy has left Mozilla. I'm not sure who of the other "Widget - Windows" peers is still actively reviewing patches. Brad - you've got the emptiest queue. Want to take a look?
Comment 77 User image Matthew N. [:MattN] (PM if requests are blocking you) 2014-04-14 18:39:35 PDT
Comment on attachment 8400579 [details] [diff] [review]
another workaround

It sounds like the widget patch is safe for beta so we won't need this.
Comment 78 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-14 20:14:19 PDT
Juuuuuuuuuust in case:
  https://tbpl.mozilla.org/?tree=Try&rev=f556fb1542d3
Comment 79 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-15 11:12:57 PDT
https://hg.mozilla.org/integration/fx-team/rev/f2cc37938434
Comment 80 User image Tim Abraldes [:TimAbraldes] [:tabraldes] 2014-04-15 15:48:59 PDT
Comment on attachment 8405737 [details] [diff] [review]
Widget patch v3

[Approval Request Comment]
Bug caused by (feature/regressing bug #): This issue has existed for years, but it is much more noticeable in Australis builds because, on window creation, they switch from custom non-client margins to default non-client margins.
User impact if declined: Anyone on Windows 7 who restores multiple windows of Firefox simultaneously will see some of them with the wrong active/inactive state (some of them will have active-looking borders even though they are not the active window)
Testing completed (on m-c, etc.): Some manual testing. Will hopefully bake on m-c for a couple days.
Risk to taking this patch (and alternatives if risky): This code touches Windows widget code, which can impact hundreds of millions of users. However, I think this patch is low-risk for the reasons identified in comment 76
String or IDL/UUID changes made by this patch: None
Comment 81 User image Ryan VanderMeulen [:RyanVM] 2014-04-15 19:20:35 PDT
https://hg.mozilla.org/mozilla-central/rev/f2cc37938434
Comment 82 User image Lukas Blakk [:lsblakk] use ?needinfo 2014-04-17 07:57:39 PDT
Triage drive-by, let's give this another day on Nightly.
Comment 83 User image Jared Wein [:jaws] (please needinfo? me) 2014-04-22 11:49:59 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/98e7ff38943a

Note You need to log in before you can comment on or make changes to this bug.