Closed Bug 849647 Opened 7 years ago Closed 7 years ago

It's better to remove message order optimization on Windows if it's possible

Categories

(Core :: Widget: Win32, defect)

x86_64
Windows 8
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: masayuki, Assigned: masayuki)

References

(Depends on 1 open bug)

Details

Attachments

(1 file, 2 obsolete files)

https://bugzilla.mozilla.org/show_bug.cgi?id=840409#c39

I think that we should remove the message order optimization on Windows if it's possible. The reason is, some other applications such as Japanese IME ATOK may expect the message order which is described in MSDN (http://msdn.microsoft.com/en-us/library/windows/desktop/ms644943%28v=vs.85%29.aspx). The order is,

1 Sent messages
2 Posted messages
3 Input (hardware) messages and system internal events
4 Sent messages (again)
5 WM_PAINT messages
6 WM_TIMER messages

See bug 466487, this bug is now back in TSF mode. And also it seems that CUAS (does something for TSF unaware applications automatically) is also confused by the reordered message. CUAS probably hooks some messages and they perform something in TSF. For example, swapping the order of WM_FOCUS and WM_KEYDOWN might cause confusing TSF.

Currently, we give the highest priority to #3 since bug 22979 comment 20 without any explanation.

And roc tried to remove it at bug 36849, but it's backed out at bug 36849 comment 69 due to performance issue during incremental layout.

However, MetroWidget doesn't do such message order optimization. And performance of Windows PC is very very improved since 2000. Additionally, Gecko is redesigned for better performance and architecture.

So, I *guess* that now, we can remove the message optimization for getting back standard behavior of Windows application.

# If a lot of strange message blocks user input, we can blacklist them and compare the time stamps of the message and the first input message.

If you know something additional information about this, let's share it.
If you can remove it, that'll be great, but it's risky.
Yeah, the risk is highest I have ever experienced. However, the return is very big if we'll succeed.

First shot of the patch:
https://hg.mozilla.org/try/rev/d7b6509c938d
Here is the test build:
https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=25202386afcd

How do I test "incremental layout" which is the cause of backing out the patch of bug 36849?

And what regressions are possible?

I don't see difference between patched build and m-c except the ATOK behavior.
Attached patch Patch (obsolete) — Splinter Review
I think the key issue that caused the backout in bug 36849 was that PLEvents (which are now XPCOM events) were being prioritized over native paint events. So don't do that :-).
Thank you roc for the information. However, I don't know the PLEvents and the XPCOM events. PLEvents were defined as WM_USER+n or WM_APP+n? And XPCOM events are still so?

My current idea of the patch is, basically, we don't touch the event order. So, any messages must be given higher priority than WM_PAINT. However, for ensuring the responsiveness, if non-system message is too newer than the oldest system message, then, the patch gives priority to the system messages while non-system message isn't so newer than the oldest system message.
My idea for landing the patch is, we'll land the patch on m-c as far as possible. And after next merge, the patch is going to be backed out from Aurora for testing longer on Nightly. So, I think that the patch should be tested over 6 weeks on Nightly.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #6)
> I think the key issue that caused the backout in bug 36849 was that PLEvents
> (which are now XPCOM events) were being prioritized over native paint
> events. So don't do that :-).

They will be I think, via nsAppShell's post message handler which calls NativeEventCallback() - 

http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsAppShell.cpp#84
http://mxr.mozilla.org/mozilla-central/source/widget/xpwidgets/nsBaseAppShell.cpp#63

Windows prioritizes posted messages, note the comments here - 

http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsAppShell.cpp#170

Maybe this fix for modal dialogs will help solve the problem.
Here's a blurb on msdn about paints, they fall to the very bottom.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms644927%28v=vs.85%29.aspx

Since we don't pluck paints out and give them higher priority I'm not sure if this changes anything. Our posted message already have a higher priority than paints.
Attached patch Patch (obsolete) — Splinter Review
Let's venture this.

I believe that we should land this patch on weekend because check-in is fewer than weekdays, then, we can check the regression easier.

And we should back this out from Aurora after next merge for testing enough long time on Nightly.

I don't modify the comment of
http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsAppShell.cpp#170
because I don't understand the comment enough. If you think we should update it too, I'd like you to suggest good comment.

I'm requesting the review to only jimm. I have no plan to request it to other developers. However, you are welcome to comment anything about this.
Assignee: nobody → masayuki
Attachment #723557 - Attachment is obsolete: true
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Jimm:

Could you review the patch? Bugzilla shows "You must enter a valid bug number!" error if I input reviewer's email address.
Flags: needinfo?(jmathies)
Comment on attachment 726087 [details] [diff] [review]
Patch

Review of attachment 726087 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/windows/nsAppShell.cpp
@@ +205,5 @@
> +          // Give priority to enough old system message in the queue rather than
> +          // newer posted user/app message.  This ensures responsiveness even
> +          // while somebody fills our thread's message queue with a lot of
> +          // user/app messages.
> +          if (msg.message >= WM_USER) {

Are we really concerned about this? I've never seen any bugs filed about it.
Attachment #726087 - Flags: review? → review?(jmathies)
Flags: needinfo?(jmathies)
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #11)
> I don't modify the comment of
> http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsAppShell.
> cpp#170
> because I don't understand the comment enough. If you think we should update
> it too, I'd like you to suggest good comment.


I'd suggest testing your patch against the heavy js overhead testcase in the following bug:

https://bugzilla.mozilla.org/show_bug.cgi?id=575515
(In reply to Jim Mathies [:jimm] from comment #13)
> Comment on attachment 726087 [details] [diff] [review]
> Patch
> 
> Review of attachment 726087 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: widget/windows/nsAppShell.cpp
> @@ +205,5 @@
> > +          // Give priority to enough old system message in the queue rather than
> > +          // newer posted user/app message.  This ensures responsiveness even
> > +          // while somebody fills our thread's message queue with a lot of
> > +          // user/app messages.
> > +          if (msg.message >= WM_USER) {
> 
> Are we really concerned about this? I've never seen any bugs filed about it.

There is bug 378830. However, currently, Flash Player is always run in another process on Vista or later.
http://mxr.mozilla.org/mozilla-central/source/dom/plugins/base/nsNPAPIPlugin.cpp#278

Additionally, Flash Player makes another process for protected mode on Vista or later.

So, currently, Flash Player must not need to post the message into the our message queue.

However, I guess that it could be a problem on XP if the users disables OOPP. I'll try to check this later.

If there is no problem, indeed, removing the hack is also great.

(In reply to Jim Mathies [:jimm] from comment #14)
> (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #11)
> > I don't modify the comment of
> > http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsAppShell.
> > cpp#170
> > because I don't understand the comment enough. If you think we should update
> > it too, I'd like you to suggest good comment.
> 
> 
> I'd suggest testing your patch against the heavy js overhead testcase in the
> following bug:
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=575515

Thanks, I'll check it.

FYI: the bugzilla's problem was bug 852072, which is now fixed.
FWIW, adobe updated the behavior of flash related to their app messages, they throttle these better. I'm not sure what version this was in but it was fixed about two years ago.
Thank you, jimm. That's very good news. I'll remove the block and I'll test it again.
Attached patch PatchSplinter Review
https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=71c86588e4eb

Looks like no problem even if it's too busy by running heavy script.

And I confirmed that even if OOPP is disabled on WinXP, I can type some text normally while playing high refresh rate flush contents.
Attachment #726087 - Attachment is obsolete: true
Attachment #726087 - Flags: review?(jmathies)
Attachment #726656 - Flags: review?(jmathies)
And this patch gets rid of calls of PeekMessage() with PM_NOREMOVE. Therefore, this patch reduce a lot of virtual call in WidgetUtils::PeekMessage() in TSF mode. That must improve the performance of TSF mode.
Comment on attachment 726656 [details] [diff] [review]
Patch

What are you plans, landing this now or waiting until after the next merge?
Attachment #726656 - Flags: review?(jmathies) → review+
My plans are:

1. landing this weekend for easier regression check.
2. backing out after next merge from Aurora and testing more 6 weeks on Nightly (I mean 23 is the first version of this fix included for testing this over 6 weeks on Nightly).
https://hg.mozilla.org/mozilla-central/rev/7ac778ed79a5
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Backed out from Aurora, see bug 857829.
Target Milestone: mozilla22 → mozilla23
We're trying to find the source of a regression in bug 878449 that is present on Aurora but not Beta. Would it be OK to back this out of aurora temporarily to see if the crash goes away?
Hmm, even if this patch causes the crash, it means that it shows another hidden bug. I strongly recommend that we should not backout this patch since the change risk is too high.
Depends on: 902869
You need to log in before you can comment on or make changes to this bug.