Closed Bug 962140 Opened 6 years ago Closed 2 years ago

[Meta] crash in mozilla::widget::NativeKey::RemoveFollowingCharMessage() or mozilla::widget::NativeKey::GetFollowingCharMessage(tagMSG&)

Categories

(Core Graveyard :: Widget: WinRT, defect, P1, critical)

26 Branch
x86
Windows 8.1
defect

Tracking

(firefox27 unaffected, firefox28+ fixed, firefox29+ fixed, firefox30+ fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

RESOLVED FIXED
Tracking Status
firefox27 --- unaffected
firefox28 + fixed
firefox29 + fixed
firefox30 + fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: jimm, Assigned: masayuki)

References

Details

(4 keywords, Whiteboard: [release28][leave-open] p=0 s=it-30c-29a-28b.3 r=ff28)

Crash Data

Attachments

(7 files, 1 obsolete file)

11.20 KB, patch
jimm
: review+
jimm
: feedback+
Details | Diff | Splinter Review
3.82 KB, patch
jimm
: review+
Details | Diff | Splinter Review
17.23 KB, patch
jimm
: review+
Details | Diff | Splinter Review
1.65 KB, patch
jimm
: review+
Details | Diff | Splinter Review
4.26 KB, patch
jimm
: review+
Details | Diff | Splinter Review
7.83 KB, patch
jimm
: review+
Details | Diff | Splinter Review
4.37 KB, patch
jimm
: review+
Details | Diff | Splinter Review
This bubbled up into our top five recently. It's a forced crash due to a missing character message.

http://hg.mozilla.org/mozilla-central/annotate/171857e7e334/widget/windows/KeyboardLayout.cpp#l1288

This bug was filed from the Socorro interface and is 
report bp-8b71b401-771d-491a-9ad9-2a3a52140120.
=============================================================

0 	xul.dll 	mozilla::widget::NativeKey::RemoveFollowingCharMessage() 	widget/windows/KeyboardLayout.cpp
1 	xul.dll 	mozilla::widget::NativeKey::DispatchKeyPressEventForFollowingCharMessage() 	widget/windows/KeyboardLayout.cpp
2 	xul.dll 	mozilla::widget::NativeKey::HandleKeyDownMessage(bool *) 	widget/windows/KeyboardLayout.cpp
3 	xul.dll 	MetroWidget::WindowProcedure(HWND__ *,unsigned int,unsigned int,long)
Hmm, it's very odd.

DispatchKeyPressEventForFollowingCharMessage() is called only when IsFollowedByCharMessage() returns true.

I have no idea of the difference between IsFollowedByCharMessage() and RemoveFollowingCharMessage()'s check...
Keywords: topcrash-metro
Does this have to be a MOZ_CRASH or can we just warn?
I'm seeing this a *lot* in Nightly and Aurora.  I think it should block Metro Firefox 28.  It's now the #2 metro topcrash in Aurora 28.
Whiteboard: [release28] [defect] p=0
Hmm, it's a regression of 28? Did we change something message handling code only on Metro?
(In reply to Jim Mathies [:jimm] from comment #2)
> Does this have to be a MOZ_CRASH or can we just warn?

It's possible, but it should be the last resort. We should research what's the cause (and STR).
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #4)
> Hmm, it's a regression of 28? Did we change something message handling code
> only on Metro?

According to the breakdown this happens on all Windows versions and in all products, including Firefox and MetroFirefox.

Firefox reports with a lot of comments:

https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Awidget%3A%3ANativeKey%3A%3ARemoveFollowingCharMessage%28%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2014-01-31+11%3A00%3A00&range_value=1#tab-sigsummary

"This is the 3rd time in the last hour that Firefox has crashed on me. Each time I was trying to forward an email and it crashed while I was adding/writing a message to the recipient."

"I am working in Turbotax and twice the connection has crashed. I believe I lost the first data I entered"

"ONE THING THAT MAY BE CAUSING THE PROBLEMS is that I am using Dragon natural speaking, it seemed to have more problems since installing that. But, due to problems with my hands typing in need to use it."

"hit m key and it crashed"

"The whole application suddenly closed. I was using Dragon speak, version 11.5, to reply to e-mails that I had received."

"I was using DragonDictate when the machine crashed this is not the first time this has happened in fact it happens way too often."

"This is very annoying. Was in the middle of a purchase."
On desktop this happens on simple key downs:

0 	xul.dll 	mozilla::widget::NativeKey::RemoveFollowingCharMessage() 	widget/windows/KeyboardLayout.cpp
1 	xul.dll 	mozilla::widget::NativeKey::DispatchKeyPressEventForFollowingCharMessage() 	widget/windows/KeyboardLayout.cpp
2 	xul.dll 	mozilla::widget::NativeKey::HandleKeyDownMessage(bool *) 	widget/windows/KeyboardLayout.cpp
3 	xul.dll 	nsWindow::ProcessKeyDownMessage(tagMSG const &,bool *) 	widget/windows/nsWindow.cpp
4 	xul.dll 	nsWindow::ProcessMessage(unsigned int,unsigned int &,long &,long *) 	widget/windows/nsWindow.cpp
5 	xul.dll 	nsWindow::WindowProcInternal(HWND__ *,unsigned int,unsigned int,long) 	widget/windows/nsWindow.cpp
6 	xul.dll 	CallWindowProcCrashProtected 	xpcom/base/nsCrashOnException.cpp
7 	xul.dll 	nsWindow::WindowProc(HWND__ *,unsigned int,unsigned int,long) 	widget/windows/nsWindow.cpp
8 	user32.dll 	InternalCallWinProc 	
9 	user32.dll 	UserCallWinProcCheckWow 	
10 	user32.dll 	DispatchMessageWorker 	
11 	user32.dll 	DispatchMessageW 	
12 	xul.dll 	nsAppShell::ProcessNextNativeEvent(bool)
In HandleKeyDownMessage(), DispatchKeyPressEventForFollowingCharMessage() is called only this code:

> 1088   if (IsFollowedByCharMessage()) {
> 1089     return DispatchKeyPressEventForFollowingCharMessage();
> 1090   }

IsFollowedByCharMessage() checks with PeekMessage():

> 751     if (!WinUtils::PeekMessage(&nextMsg, mMsg.hwnd, WM_KEYFIRST, WM_KEYLAST,
> 752                                PM_NOREMOVE | PM_NOYIELD)) {
> 753       return false;
> 754     }
> 755   }
> 756   return (nextMsg.message == WM_CHAR ||
> 757           nextMsg.message == WM_SYSCHAR ||
> 758           nextMsg.message == WM_DEADCHAR);

And RemoveFollowingCharMessage() does:

> 1284   MSG msg;
> 1285   if (!WinUtils::GetMessage(&msg, mMsg.hwnd, WM_KEYFIRST, WM_KEYLAST) ||
> 1286       !(msg.message == WM_CHAR || msg.message == WM_SYSCHAR ||
> 1287         msg.message == WM_DEADCHAR)) {
> 1288     MOZ_CRASH("We lost the following char message");
> 1289   }
> 1290 
> 1291   return msg;

It seems that PeekMessage() finds a WM_*CHAR message, but GetMessage() doesn't find the same message. Is it possible?
Oh, GetMessage()'s result is BOOL but if there is no message, it returns -1. It's really different point of them. But I have no idea how it causes this because we expect return TRUE in this case.

Another difference of them is, GetMessage() always prefers WM_QUIT. But I don't think we can reach here when the queue has WM_QUIT...
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #9)
> Oh, GetMessage()'s result is BOOL but if there is no message, it returns -1.
> It's really different point of them. But I have no idea how it causes this
> because we expect return TRUE in this case.
> 
> Another difference of them is, GetMessage() always prefers WM_QUIT. But I
> don't think we can reach here when the queue has WM_QUIT...

Yeah I don't see that happening either.
I'm wondering if during the processing of non-queued events during the PeekMessage call in IsFollowedByCharMessage, we somehow end up processing the pending WM_CHAR message before the call to GetMessage to remove it.
Oh, WM_QUIT case is possible:
http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsAppShell.cpp#202

We're using PeekMessage() in the message loop now. But I don't think it causes this bug because it's odd user to type text at quitting.
unwrapped:


MSG nextMsg;
if (mFakeCharMsgs) {
  nextMsg = mFakeCharMsgs->ElementAt(0).GetCharMsg(mMsg.hwnd);
} else {
  if (!WinUtils::PeekMessage(&nextMsg, mMsg.hwnd, WM_KEYFIRST, WM_KEYLAST,
                              PM_NOREMOVE | PM_NOYIELD)) {
    return false;
  }
}

if (nextMsg.message == WM_CHAR ||
    nextMsg.message == WM_SYSCHAR ||
    nextMsg.message == WM_DEADCHAR) {
  if (mFakeCharMsgs) {
    mFakeCharMsgs->ElementAt(0).mConsumed = true;
    return mFakeCharMsgs->ElementAt(0).GetCharMsg(mMsg.hwnd);
  }

  MSG msg;
  if (!WinUtils::GetMessage(&msg, mMsg.hwnd, WM_KEYFIRST, WM_KEYLAST) ||
      !(msg.message == WM_CHAR || msg.message == WM_SYSCHAR ||
        msg.message == WM_DEADCHAR)) {
    MOZ_CRASH("We lost the following char message");
  }
}

One possibility is  that WinUtils::GetMessage returns 0, and we call MOZ_CRASH? Maybe we should handle those cases separately?
It's possible only when GetMessage() retrieves WM_QUIT...
I'll be back 2 hours later, sorry.
Comment on attachment 8368509 [details] [diff] [review]
Use PeekMessage() with RM_REMOVE instead of GetMessage() (landed on m-c, aurora and beta)

Let's try this.

We should fix at least two things after this.

1. The result type of WinUtils::GetMessage() should be BOOL, not bool. bool cannot represent -1.

2. Use GetMessage() in the message loop in nsAppShell.cpp. WM_QUIT should be preferred than the other messages.
Attachment #8368509 - Flags: review?(jmathies)
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #18)
> Comment on attachment 8368509 [details] [diff] [review]
> Use PeekMessage() with RM_REMOVE instead of GetMessage()
> 
> Let's try this.
> 
> We should fix at least two things after this.
> 
> 1. The result type of WinUtils::GetMessage() should be BOOL, not bool. bool
> cannot represent -1.


Did you miss a WinUtils change in your patch? I don't see this part.
Attachment #8368509 - Flags: review?(jmathies) → feedback+
(In reply to Jim Mathies [:jimm] from comment #19)
> (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #18)
> > Comment on attachment 8368509 [details] [diff] [review]
> > Use PeekMessage() with RM_REMOVE instead of GetMessage()
> > 
> > Let's try this.
> > 
> > We should fix at least two things after this.
> > 
> > 1. The result type of WinUtils::GetMessage() should be BOOL, not bool. bool
> > cannot represent -1.
> 
> 
> Did you miss a WinUtils change in your patch? I don't see this part.

The 2 things should be landed *after* this bug. They're pretty risky for now.
Comment on attachment 8368509 [details] [diff] [review]
Use PeekMessage() with RM_REMOVE instead of GetMessage() (landed on m-c, aurora and beta)

Ok sounds good. This will be an interesting experiment.
Attachment #8368509 - Flags: review+
Whiteboard: [release28] [defect] p=0 → [release28] [defect] p=0 [leave-open]
https://hg.mozilla.org/integration/mozilla-inbound/rev/8ded1c0760cc

Thanks! I can go to bed!!

Let's watch the result for a couple of days.
Whiteboard: [release28] [defect] p=0 [leave-open] → [release28] [defect] [leave-open]
Assignee: nobody → masayuki
https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=d8399c123969

I hope that this would give us more hints.

I'm thinking the possibilities of this bug are:

Another thread/process post or send message to our main thread. Then, input message gives lower priority than it. If the found char message comes from device and somebody post/send a key message between IsFollowingCharMessage() and RemoveFollowingCharMessage(), this may cause this bug.

This sounds like not realistic scenario for me. However, if application such as a11y tools post multiple key messages to us sequentially, our thread may peek WM_(SYS)KEYDOWN message during the sequence. I guess that this might be possible. (I assume that WM_(SYS)(DEAD)CHAR messages are always queued as input message from a device.)

Anyway, I want to see more detail of what happens as far as possible.
According to the change by the previous landing, I believe that PeekMessage() doesn't fail but it retrieves KEYDOWN or KEYUP message.
Comment on attachment 8369457 [details] [diff] [review]
Append unexpected message log to crash report

Let's land this.

After checking the crash report, I'll create next patch.
Attachment #8369457 - Flags: review?(jmathies)
Comment on attachment 8369457 [details] [diff] [review]
Append unexpected message log to crash report

Oops, the logging message is odd. I'll post new patch.
Attachment #8369457 - Flags: review?(jmathies)
Attachment #8369857 - Flags: review?(jmathies) → review+
I think the first patch may have helped reduce these. We have two crashes in the nightly build of 02-01 which included the fix. But we haven't seen any since then.

This landed 
Fri Jan 31 15:51:32 2014
Next nightly:
Sat Feb 1 08:07:34 2014
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/02/2014-02-01-07-49-23-mozilla-central/firefox-29.0a1.en-US.win32.txt

https://crash-stats.mozilla.com/report/list?product=MetroFirefox&range_value=7&range_unit=days&date=2014-02-06&signature=mozilla%3A%3Awidget%3A%3ANativeKey%3A%3ARemoveFollowingCharMessage%28%29&version=MetroFirefox%3A29.0a1#tab-reports
Comment on attachment 8368509 [details] [diff] [review]
Use PeekMessage() with RM_REMOVE instead of GetMessage() (landed on m-c, aurora and beta)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): long standing bug
User impact if declined: crashy browser when entering form input data
Testing completed (on m-c, etc.): on mc and aurora, crash count reduced.
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #8368509 - Flags: approval-mozilla-beta?
Keywords: needURLs
sorry, wrong crash bug.
Keywords: needURLs
https://crash-stats.mozilla.com/report/index/a67fe373-0036-4d9f-83d4-db3ab2140206
Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000020, lParam: 0x00390001, InSendMessageEx()=ISMEX_NOSEND,
Next key message: Unknown (0x00000000), wParam: 0x00000000, lParam: 0x00000000

https://crash-stats.mozilla.com/report/index/734ad868-1822-4a73-b284-b8ad42140206
Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000020, lParam: 0x00390001, InSendMessageEx()=ISMEX_NOSEND,
Next key message: Unknown (0x00000000), wParam: 0x00000000, lParam: 0x00000000 

https://crash-stats.mozilla.com/report/index/1118149d-3df8-4101-a5b7-02e372140206
Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000020, lParam: 0x00390001, InSendMessageEx()=ISMEX_NOSEND, Next key message: Unknown (0x00000000), wParam: 0x00000000, lParam: 0x00000000 

These logs are odd. PeekMessage() succeeded but the got message is WM_NULL!
Came from desktop:

https://crash-stats.mozilla.com/report/index/00db1124-68ff-49b3-8205-ca9b42140206
Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000033, lParam: 0x40040001, InSendMessageEx()=ISMEX_NOSEND,
Next key message: WM_KEYDOWN (0x00000100), wParam: 0x00000033, lParam: 0x40040001

https://crash-stats.mozilla.com/report/index/0953c2a9-c93e-45e8-a730-c14862140206
Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000034, lParam: 0x00050001, InSendMessageEx()=ISMEX_NOSEND,
Next key message: WM_KEYUP (0x00000101), wParam: 0x00000034, lParam: 0xC0050001 

https://crash-stats.mozilla.com/report/index/8fd97970-6cec-40c4-8495-fbcaa2140206
Handling message: WM_KEYDOWN (0x00000100), wParam: 0x0000001B, lParam: 0x00010001, InSendMessageEx()=ISMEX_NOSEND,
Next key message: WM_SYSKEYDOWN (0x00000104), wParam: 0x00000012, lParam: 0x20380001

These logs are not so odd to me because these are possible as I said in comment 25.

I'll create a patch for this, first. I'm still thinking the reason why PeekMessage() returns WM_NULL on Metrofox.
> I'll create a patch for this, first. I'm still thinking the reason why
> PeekMessage() returns WM_NULL on Metrofox.

winrt is really aggressive about checking app event loops for hangs. WM_NULL is what it uses to do this. Is it possible these null events are getting injected at a point our keyboard handling doesn't expect?
(In reply to Jim Mathies [:jimm] from comment #38)
> > I'll create a patch for this, first. I'm still thinking the reason why
> > PeekMessage() returns WM_NULL on Metrofox.
> 
> winrt is really aggressive about checking app event loops for hangs. WM_NULL
> is what it uses to do this. Is it possible these null events are getting
> injected at a point our keyboard handling doesn't expect?

I guessed so. But I cannot reproduce the bug with this patch, see comment 36.
https://hg.mozilla.org/try/rev/cd4740da9560

I don't understand why the PeekMessage() returns WM_NULL since we pass WM_KEYFIRST and WM_KEYLAST to it.
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #39)
> (In reply to Jim Mathies [:jimm] from comment #38)
> > > I'll create a patch for this, first. I'm still thinking the reason why
> > > PeekMessage() returns WM_NULL on Metrofox.
> > 
> > winrt is really aggressive about checking app event loops for hangs. WM_NULL
> > is what it uses to do this. Is it possible these null events are getting
> > injected at a point our keyboard handling doesn't expect?
> 
> I guessed so. But I cannot reproduce the bug with this patch, see comment 36.
> https://hg.mozilla.org/try/rev/cd4740da9560
> 
> I don't understand why the PeekMessage() returns WM_NULL since we pass
> WM_KEYFIRST and WM_KEYLAST to it.

Well PostMessage is a little different from SendMessage/SendMessageTimeout. Sent messages get priority in the event queue.
I agree though, it's really odd that WM_NULL would be peeked as a keyboard event.
Posted message has higher priority than input message from device. So, if PeekMessage() can return WM_NULL in that case, the test build should crash, but I cannot reproduce it.

We should confirm one more thing. That is, when PeekMessage() returns WM_NULL, whether the WM_*CHAR message is still in the queue or not. I guess that it's not removed. WM_NULL is inserted before WM_*CHAR. If so, PeekMessage() should retry until it returns WM_*CHAR.
This patch tries to remove found char message and pass it to handling method.
Attachment #8372896 - Flags: review?(jmathies)
I don't understand well about the issue of WM_NULL. But I guess that Metrofox shouldn't use WM_NULL. It might better to use user or app defined message or registered message instead.
Attachment #8372896 - Flags: review?(jmathies) → review+
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #44)
> I don't understand well about the issue of WM_NULL. But I guess that
> Metrofox shouldn't use WM_NULL. It might better to use user or app defined
> message or registered message instead.

Windows uses WM_NULL and SendMessageTimeout to check if our event loop is processing messages in a timely fashion. If it isn't, explorer kills the browser.
(In reply to Jim Mathies [:jimm] from comment #45)
> (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #44)
> > I don't understand well about the issue of WM_NULL. But I guess that
> > Metrofox shouldn't use WM_NULL. It might better to use user or app defined
> > message or registered message instead.
> 
> Windows uses WM_NULL and SendMessageTimeout to check if our event loop is
> processing messages in a timely fashion. If it isn't, explorer kills the
> browser.

On Vista and up, this would result in a non-responding ghost window. I don't think this is specific to Metro.
Attachment #8368509 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/16a356b13057

Not sure if this means 28 is fixed for good or not, but setting the status to fixed gets it off my radar for now. Please set it back to affected if something else needs to land here at some point.
New query:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Awidget%3A%3ANativeKey%3A%3AGetFollowingCharMessage%28tagMSG%26%29&product=Firefox&product=MetroFirefox&query_type=contains&range_unit=weeks&process_type=any&platform=win&hang_type=any&date=2014-02-13+00%3A00%3A00&range_value=1#tab-reports

There is one report:
https://crash-stats.mozilla.com/report/index/a6fd640f-8028-474e-8cf6-7380f2140212
Crash Signature: [@ mozilla::widget::NativeKey::RemoveFollowingCharMessage()] → [@ mozilla::widget::NativeKey::RemoveFollowingCharMessage()] [@ mozilla::widget::NativeKey::GetFollowingCharMessage(tagMSG&)]
Summary: crash in mozilla::widget::NativeKey::RemoveFollowingCharMessage() → crash in mozilla::widget::NativeKey::RemoveFollowingCharMessage() or mozilla::widget::NativeKey::GetFollowingCharMessage(tagMSG&)
Looks like the third patch succeeded to reduce most cases of the crash!

However, there is still strange report:

Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000020, lParam: 0x00390001, InSendMessageEx()=ISMEX_NOSEND, 
Found message: WM_CHAR (0x00000102), wParam: 0x00000020, lParam: 0x00390001

This means that PeekMessage() with PM_REMOVE failed to retrieve WM_CHAR which was found immediately before. This means that the WM_CHAR message has been really lost before we handle it.

What should we do???
jimm:

I have a question. On Metrofox, the main message loop which retrieves all messages and NativeKey run in same thread?
Flags: needinfo?(jmathies)
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #53)
> jimm:
> 
> I have a question. On Metrofox, the main message loop which retrieves all
> messages and NativeKey run in same thread?

Yes, same thread. We do send events async though by bouncing them off runnables to keep the dispatcher off the stack. But all of that is handled after the keyboard code receives the events. We bounce the events in DispatchKeyboardEvent.

http://mxr.mozilla.org/mozilla-central/source/widget/windows/winrt/MetroWidget.cpp#828
http://mxr.mozilla.org/mozilla-central/source/widget/windows/winrt/MetroWidget.cpp#691
Flags: needinfo?(jmathies)
Thanks. But it's really odd. Then, I have no idea who removed the message from the queue.

If we completely lose the change to receive the message, we just return the found message at that time. However, otherwise, i.e., if we will receive the lost char message later, we shouldn't dispatch keypress event here, or, we should dispatch it with found message and ignore next message...
Why do we assume that WM_CHAR for printable keys will be in the event queue when we receive the WM_KEYDOWN event?
If I understand this correctly, we're crashing here - 

http://hg.mozilla.org/mozilla-central/annotate/802d87c77e76/widget/windows/KeyboardLayout.cpp#l1341

for this forced crash:

https://crash-stats.mozilla.com/report/index/3a22ca0b-9db0-4f7d-a843-783672140213

steps:

1) we've received a WM_KEYDOWN for virtual key code VK_SPACE (0x00000020)

2) we call peek message (PM_NOREMOVE) for window mMsg.hwnd looking for a WM_CHAR message that matches the WM_KEYDOWN, which we find in the event queue.

Found message: WM_CHAR (0x00000102), wParam: 0x00000020, lParam: 0x00390001

3) we try to remove the WM_CHAR message with peek message for window nextKeyMsg.hwnd, which fails.

4) we crash

am I following this right?
Note, on mc, the problem still seems to be present, but mostly on desktop. Not sure why that is.
(In reply to Jim Mathies [:jimm] from comment #57)
> Why do we assume that WM_CHAR for printable keys will be in the event queue
> when we receive the WM_KEYDOWN event?

Because if the key is a printable key, ::TranslateMessage() must generate one or more WM_*CHAR messages.

(In reply to Jim Mathies [:jimm] from comment #58)
> If I understand this correctly, we're crashing here - 
> 
> http://hg.mozilla.org/mozilla-central/annotate/802d87c77e76/widget/windows/
> KeyboardLayout.cpp#l1341
> 
> for this forced crash:
> 
> https://crash-stats.mozilla.com/report/index/3a22ca0b-9db0-4f7d-a843-
> 783672140213
> 
> steps:
> 
> 1) we've received a WM_KEYDOWN for virtual key code VK_SPACE (0x00000020)
> 
> 2) we call peek message (PM_NOREMOVE) for window mMsg.hwnd looking for a
> WM_CHAR message that matches the WM_KEYDOWN, which we find in the event
> queue.
> 
> Found message: WM_CHAR (0x00000102), wParam: 0x00000020, lParam: 0x00390001
> 
> 3) we try to remove the WM_CHAR message with peek message for window
> nextKeyMsg.hwnd, which fails.
> 
> 4) we crash
> 
> am I following this right?

Yes, but the actual crash line is here:
http://hg.mozilla.org/mozilla-central/annotate/802d87c77e76/widget/windows/KeyboardLayout.cpp#l1362
(according to the difference of the appending note)

(In reply to Jim Mathies [:jimm] from comment #59)
> Can we upload the last two patches to aurora and see if it helps there?

Yeah, perhaps, the last patch reduces a lot of crashes.

(In reply to Jim Mathies [:jimm] from comment #60)
> Note, on mc, the problem still seems to be present, but mostly on desktop.
> Not sure why that is.

Please note that the signature is changed by the last patch. RemoveFollowingCharMessage() is now GetFollowingCharMessage(tagMSG&) on m-c.
> (In reply to Jim Mathies [:jimm] from comment #60)
> > Note, on mc, the problem still seems to be present, but mostly on desktop.
> > Not sure why that is.
> 
> Please note that the signature is changed by the last patch.
> RemoveFollowingCharMessage() is now GetFollowingCharMessage(tagMSG&) on m-c.

ah, bummer. There are a bunch of these in the 2-13 build. :/
(In reply to Jim Mathies [:jimm] from comment #62)
> > (In reply to Jim Mathies [:jimm] from comment #60)
> > > Note, on mc, the problem still seems to be present, but mostly on desktop.
> > > Not sure why that is.
> > 
> > Please note that the signature is changed by the last patch.
> > RemoveFollowingCharMessage() is now GetFollowingCharMessage(tagMSG&) on m-c.
> 
> ah, bummer. There are a bunch of these in the 2-13 build. :/

or rather, 2-12 build. 2-13 isn't out yet.
metro and desktop seem to be crashing in two different places, which is interesting.

MetroFirefox:

Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000020 (space), lParam: 0x00390001,
Found message: WM_CHAR (0x00000102), wParam: 0x00000020, lParam: 0x00390001

Firefox:

Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000080 (VK_F17), lParam: 0x00000001,
Found message: WM_CHAR (0x00000102), wParam: 0x00000938 ???, lParam: 0x00000001, 
Removed message: WM_CHAR (0x00000102), wParam: 0x00000938 ???, lParam: 0x00800001

^ kinda weird, isn't wParam for WM_CHAR supposed to be the character code?

http://msdn.microsoft.com/en-us/library/windows/desktop/ms646276(v=vs.85).aspx
Generally, the assumption here that a call to PeekMessage with PM_NOREMOVE guarantees a subsequent call to PeekMessage with PM_REMOVE for the same message will return it, seems flawed.
U+0938: स

The lParam change is odd, it changes the scancode. However, the important thing is the wParam value. We should omit the lParam check for the latter. At least for now, only when wParam is changed, it means we may lost a WM_CHAR.
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #66)
> U+0938: स

Ah thanks.  

Can we go ahead flag these patches for uplift? That WM_NULL check really seems to have helped. We don't see many of these crashes on beta but there are a lot on aurora.
Comment on attachment 8369857 [details] [diff] [review]
Append unexpected message log to crash report (landed on m-c, aurora and beta)

[Approval Request Comment]
User impact if declined: This is necessary for applying following patch.
Testing completed (on m-c, etc.): landed.
Risk to taking this patch (and alternatives if risky): Nothing, this just adds the detail at crash to the log.
String or IDL/UUID changes made by this patch: Nothing.
Attachment #8369857 - Flags: approval-mozilla-aurora?
Comment on attachment 8372896 [details] [diff] [review]
Remove following char message of keydown message with found message (landed on m-c, aurora and beta)

[Approval Request Comment]
User impact if declined: Some users meet this crash a lot. Actually, this patch reduces some crash patterns on m-c according to the crash report of Nightly builds.
Testing completed (on m-c, etc.): landed, and we're collecting crash reports.
Risk to taking this patch (and alternatives if risky): Low risk. There are still crash scenarios even we land this, but this actually reduces the number of crash reports on Nightly.
String or IDL/UUID changes made by this patch: Nothing.
Attachment #8372896 - Flags: approval-mozilla-aurora?
Masayuki, can you confirm that it has landed in m-c ?
(I would like to see it in m-c before uplifting it aurora)
thanks
Flags: needinfo?(masayuki)
(In reply to Sylvestre Ledru [:sylvestre] from comment #70)
> Masayuki, can you confirm that it has landed in m-c ?
> (I would like to see it in m-c before uplifting it aurora)
> thanks

Yes, I did it. The last patch changes the crash signature since it changes the method name.
Flags: needinfo?(masayuki)
Attachment #8369857 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8372896 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This has dropped 16 positions in the last week in Firefox 28 and now sits at #91 accounting for 0.11% of our browser crashes on Windows. So while this crash isn't 100% solved it does seem to be occurring at much lower volume.
Marking this verified for Firefox 28 based on comment 72. Please revert this change if you think this is still an issue.
Logs of remaining crashes are:

> Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000080, lParam: 0x00000001, InSendMessageEx()=ISMEX_NOSEND,
> Found message: WM_CHAR (0x00000102), wParam: 0x0000092F, lParam: 0x00000001,
> Removed message: WM_CHAR (0x00000102), wParam: 0x0000092F, lParam: 0x00800001 

and

> Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000080, lParam: 0x00000001, InSendMessageEx()=ISMEX_NOSEND,
> Found message: WM_CHAR (0x00000102), wParam: 0x000000A0, lParam: 0x00000001,
> Removed message: WM_CHAR (0x00000102), wParam: 0x000000A0, lParam: 0x002A0001 

http://download.microsoft.com/download/1/6/1/161ba512-40e2-4cc9-843a-923143f3456c/scancode.doc
According to "Keyboard Scan Code Specification", scan code 0x80 means releasing undefined key. 0x2A is pressing left shift key. It might occur when new keydown or keyup message is inserted to the queue. Although, I have no idea to confirm it.
Let's ignore the scan code value at comparing found char message with removed char message.

That's all for the crash reasons which we know.
Attachment #8377472 - Flags: review?(jmathies)
Attachment #8377472 - Flags: review?(jmathies) → review+
Thank you, let's close this bug after watching crash reports for a week.
https://hg.mozilla.org/integration/mozilla-inbound/rev/efd7039a2ff2
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #74)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/687a3cf84d32
> https://hg.mozilla.org/releases/mozilla-aurora/rev/fe56f56eafbc

These crashes are showing up on beta now in higher numbers. Would you feel comfortable uplifting these?

https://crash-stats.mozilla.com/topcrasher/products/MetroFirefox/versions/28.0b
Sure, but I should confirm that the patches can be applied to beta tomorrow.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #72)
> This has dropped 16 positions in the last week in Firefox 28 and now sits at
> #91 accounting for 0.11% of our browser crashes on Windows. So while this
> crash isn't 100% solved it does seem to be occurring at much lower volume.

Just as a note, for seeing a clear picture for a specific beta on things like this, please looks at the topcrash pages for the specific betas - there's a reason why we have specific "28.0b3" etc. in the Socorro version dropdown next to the generic "28.0b". ;-)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #80)
> Just as a note, for seeing a clear picture for a specific beta on things
> like this, please looks at the topcrash pages for the specific betas -
> there's a reason why we have specific "28.0b3" etc. in the Socorro version
> dropdown next to the generic "28.0b". ;-)

Hmm, none of these display any crash reports for both Firefox and MetroFirefox.
(In reply to Jim Mathies [:jimm] from comment #81)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #80)
> > Just as a note, for seeing a clear picture for a specific beta on things
> > like this, please looks at the topcrash pages for the specific betas -
> > there's a reason why we have specific "28.0b3" etc. in the Socorro version
> > dropdown next to the generic "28.0b". ;-)
> 
> Hmm, none of these display any crash reports for both Firefox and
> MetroFirefox.

For me, https://crash-stats.mozilla.com/topcrasher/products/MetroFirefox/versions/28.0b has this crash at #2 right now and https://crash-stats.mozilla.com/topcrasher/products/MetroFirefox/versions/28.0b3 has it at #1.
Comment on attachment 8369857 [details] [diff] [review]
Append unexpected message log to crash report (landed on m-c, aurora and beta)

This will be overwritten by next patch. This is necessary only for applying next patch.
Attachment #8369857 - Flags: approval-mozilla-beta?
Comment on attachment 8372896 [details] [diff] [review]
Remove following char message of keydown message with found message (landed on m-c, aurora and beta)

[Approval Request Comment]
User impact if declined: Some users using something a11y tools for text input meet the crash.
Testing completed (on m-c, etc.): on m-c and on aurora.
Risk to taking this patch (and alternatives if risky): Low risk. This patch just reduces the cases calling MOZ_CRASH() since some causes of the unexpected cases are researched.
String or IDL/UUID changes made by this patch: Nothing.
Attachment #8372896 - Flags: approval-mozilla-beta?
Attachment #8372896 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8369857 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Masayuki, it looks like this has landed properly on Nightly & Aurora so can you update the status flags for 29 & 30 when you come back to put in your beta landing links?
Flags: needinfo?(masayuki)
The reason why I've not changed the flags yet is that we still watching if this bug is completely fixed by the last patch. So, it's not enough only the approved patches to fix this bug completely. Although, they actually reduce the number of crash reports.

But yes, I'll change the flags when I believe this bug completely fixed on each branch.
Flags: needinfo?(masayuki)
Attachment #8372896 - Attachment description: Remove following char message of keydown message with found message → Remove following char message of keydown message with found message (landed on m-c, aurora and beta)
Attachment #8369857 - Attachment description: Append unexpected message log to crash report → Append unexpected message log to crash report (landed on m-c, aurora and beta)
Attachment #8368509 - Attachment description: Use PeekMessage() with RM_REMOVE instead of GetMessage() → Use PeekMessage() with RM_REMOVE instead of GetMessage() (landed on m-c, aurora and beta)
Attachment #8377472 - Attachment description: NativeKey should not check scan code at comparing found char message and removed char message → NativeKey should not check scan code at comparing found char message and removed char message (landed on m-c)
Comment on attachment 8377472 [details] [diff] [review]
NativeKey should not check scan code at comparing found char message and removed char message (landed on m-c and aurora)

[Approval Request Comment]
User impact if declined: This avoids the remaining known case of this crash.
Testing completed (on m-c, etc.): Landed only on m-c. After the landing, we've not see this crash. (20140217030202 is the latest build of 30.0a1, this is landed on 2/18)
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Awidget%3A%3ANativeKey%3A%3AGetFollowingCharMessage%28tagMSG%26%29&#tab-reports
Risk to taking this patch (and alternatives if risky): Very low risk. The scan code difference has no problem.
String or IDL/UUID changes made by this patch: Nothing.
Attachment #8377472 - Flags: approval-mozilla-beta?
Attachment #8377472 - Flags: approval-mozilla-aurora?
Attachment #8377472 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8377472 [details] [diff] [review]
NativeKey should not check scan code at comparing found char message and removed char message (landed on m-c and aurora)

https://hg.mozilla.org/releases/mozilla-aurora/rev/744c8107734f
Attachment #8377472 - Attachment description: NativeKey should not check scan code at comparing found char message and removed char message (landed on m-c) → NativeKey should not check scan code at comparing found char message and removed char message (landed on m-c and aurora)
On Nightly, the 17th day's build is the last crashable build. So, this bug must be fixed completely with the last patch.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Whiteboard: [release28] [defect] [leave-open] → [release28] [defect]
Oh...

https://crash-stats.mozilla.com/report/index/6df449ed-302f-49e6-bd0e-f86a12140224
https://crash-stats.mozilla.com/report/index/4b6872b7-f710-4a55-8361-dce462140224
https://crash-stats.mozilla.com/report/index/90952072-757c-4213-8149-2d79a2140224

These are reported only from Metrofox.

> Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000020, lParam: 0x00390001, InSendMessageEx()=ISMEX_NOSEND, 
> Found message: WM_CHAR (0x00000102), wParam: 0x00000020, lParam: 0x00390001

This might indicate we need to work more for this bug...
# This log indicates PeekMessage(RM_REMOVE) failed due to no message.
Target Milestone: --- → mozilla30
Hey Kamil, would you tag this as [qa+] or [qa-]?
Flags: needinfo?(kamiljoz)
Whiteboard: [release28] [defect] → [release28] r=ff30
Attachment #8377472 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Hmm, as I said in comment 96, we need more research and work.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [release28] r=ff30 → [release28][leave-open] r=ff30
Status: REOPENED → ASSIGNED
I guess that failure of PeekMessage(PM_REMOVE) is caused by focus move or exiting apps. I want to see state of the queue. Let's land this on both m-c and Aurora. This is reported only from Metrofox of Aurora. I guess that the remaining crash occurs in really rare environment.
Attachment #8381319 - Flags: review?(jmathies)
I don't know if we should take the found char message or just ignore the phantom char message because it might come later. This is reason why I want to see the message queue.
Attachment #8381319 - Flags: review?(jmathies) → review+
Flagging for verification via crashstats after the next Beta is on the wire for a few days.
Flags: needinfo?(kamiljoz)
Keywords: verifyme
QA Contact: kamiljoz
Comment on attachment 8381319 [details] [diff] [review]
Log more details in the queue when PeekMessage() fails to remove found char message

[Approval Request Comment]
User impact if declined: There is a case of remaining rare crash case. However, we don't know why that occurs. This patch logs more information of the Windows message queue at that time. We see the remaining crash's report only on Metrofox of Aurora. Therefore, we need to uplift this to Aurora for researching the cause.
Testing completed (on m-c, etc.): Landed on m-c.
Risk to taking this patch (and alternatives if risky): Not risky. This patch just adds additional information to the crash reporter before the call of MOZ_CRASH().
String or IDL/UUID changes made by this patch: Nothing.
Attachment #8381319 - Flags: approval-mozilla-aurora?
Attachment #8381319 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Wow, new case comes:

https://crash-stats.mozilla.com/report/index/cce66a0e-f415-4c95-aa7b-0e3342140226

> Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00001300, lParam: 0x00000000, InSendMessageEx()=ISMEX_NOSEND, 
> Found message: WM_DEADCHAR (0x00000103), wParam: 0x00001300, lParam: 0x00240001, 
> Removed message: WM_DEADCHAR (0x00000103), wParam: 0x00000000, lParam: 0x00000000

Looks like the dead key state is cleared by somebody. I guess WM_KEYDOWN or something is inserted before the WM_DEADCHAR...
Here's a new one on beta6, 2-24 release - 

https://crash-stats.mozilla.com/report/index/d820391e-5a90-4a88-bd83-cfe282140226

Handling message: WM_KEYDOWN (0x00000100), wParam: 0x0000000D, lParam: 0x001C0001, InSendMessageEx()=ISMEX_NOSEND, 
Found message: WM_CHAR (0x00000102), wParam: 0x0000000D, lParam: 0x001C0001
https://crash-stats.mozilla.com/report/index/7aff1c12-ef74-4e4b-afc0-d63642140227

> Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000030, lParam: 0x000B0001, InSendMessageEx()=ISMEX_NOSEND, 
> Found message: WM_CHAR (0x00000102), wParam: 0x00000030, lParam: 0x000B0001, 
> Removed message: WM_CHAR (0x00000102), wParam: 0x00000065, lParam: 0x00120001

This report seems that another case of the crash of comment 107.

While 0 keydown is handling, WM_CHAR for 0 is found. However, the removed message is WM_CHAR for e generated by e key...

I have no idea in which case this occurs...
(In reply to Jim Mathies [:jimm] from comment #109)
> Here's a new one on beta6, 2-24 release - 
> 
> https://crash-stats.mozilla.com/report/index/d820391e-5a90-4a88-bd83-
> cfe282140226
> 
> Handling message: WM_KEYDOWN (0x00000100), wParam: 0x0000000D, lParam:
> 0x001C0001, InSendMessageEx()=ISMEX_NOSEND, 
> Found message: WM_CHAR (0x00000102), wParam: 0x0000000D, lParam: 0x001C0001

We need 26 or later build's crash report on Nightly or Aurora... Unluckily, 25th build is the last report :-(
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #111)
> (In reply to Jim Mathies [:jimm] from comment #109)
> > Here's a new one on beta6, 2-24 release - 
> > 
> > https://crash-stats.mozilla.com/report/index/d820391e-5a90-4a88-bd83-
> > cfe282140226
> > 
> > Handling message: WM_KEYDOWN (0x00000100), wParam: 0x0000000D, lParam:
> > 0x001C0001, InSendMessageEx()=ISMEX_NOSEND, 
> > Found message: WM_CHAR (0x00000102), wParam: 0x0000000D, lParam: 0x001C0001
> 
> We need 26 or later build's crash report on Nightly or Aurora... Unluckily,
> 25th build is the last report :-(

This could have something to do with our funky async input handling, or it might have something to do with the soft keyboard. I'll try to find some time to do some investigating.
Priority: -- → P1
Whiteboard: [release28][leave-open] r=ff30 → [release28][leave-open] p=0 s=it-30c-29a-28b.3 r=ff28
Hmm...

https://crash-stats.mozilla.com/report/index/cad86ed7-093b-4e10-8d2b-16df12140228
https://crash-stats.mozilla.com/report/index/4108dcdb-aa41-4f2c-8958-5856d2140301
https://crash-stats.mozilla.com/report/index/6bc3ed50-5e98-4fb3-99a0-effd92140228
https://crash-stats.mozilla.com/report/index/e9519835-bb57-493c-9198-c4ff82140228
https://crash-stats.mozilla.com/report/index/29382059-998e-4bb5-a328-88bfe2140228
https://crash-stats.mozilla.com/report/index/ba22e1e1-756a-44ca-a85f-b0f2d2140302
https://crash-stats.mozilla.com/report/index/62bcadb9-5485-4586-be72-02c9b2140302

> Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000020, lParam: 0x00390001, hwnd=0x191258, InSendMessageEx()=ISMEX_NOSEND, 
> Found message: WM_CHAR (0x00000102), wParam: 0x00000020, lParam: 0x00390001, 
> WM_NULL has been removed: 1, 
> There is no key message in any window, 
> Next message in all windows: Unknown (0x0000C2ED), wParam: 0x00000000, lParam: 0x066F8970, hwnd=0x12124c, time=166724656

In these cases, found char message has gone after we found WM_NULL in the queue.

I think that we should just ignore the fact of found the message because it might be redirected to different thread or process or cancelled.

Only this report indicates such behavior:
https://crash-stats.mozilla.com/report/index/9b0156a9-cb0a-4324-80b0-ad6c02140302

> Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000008, lParam: 0x000E0001, hwnd=0x30360, InSendMessageEx()=ISMEX_NOSEND, 
> Found message: WM_DEADCHAR (0x00000103), wParam: 0x00000020, lParam: 0x000E0001, 
> WM_NULL has been removed: 0, 
> Next key message in all windows: WM_DEADCHAR (0x00000103), wParam: 0x00000020, lParam: 0x000E0001, hwnd=0x30360, time=10729640, 
> Next message in all windows: WM_DEADCHAR (0x00000103), wParam: 0x00000020, lParam: 0x000E0001, hwnd=0x30360, time=10729640
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #113)
> Only this report indicates such behavior:
> https://crash-stats.mozilla.com/report/index/9b0156a9-cb0a-4324-80b0-
> ad6c02140302
> 
> > Handling message: WM_KEYDOWN (0x00000100), wParam: 0x00000008, lParam: 0x000E0001, hwnd=0x30360, InSendMessageEx()=ISMEX_NOSEND, 
> > Found message: WM_DEADCHAR (0x00000103), wParam: 0x00000020, lParam: 0x000E0001, 
> > WM_NULL has been removed: 0, 
> > Next key message in all windows: WM_DEADCHAR (0x00000103), wParam: 0x00000020, lParam: 0x000E0001, hwnd=0x30360, time=10729640, 
> > Next message in all windows: WM_DEADCHAR (0x00000103), wParam: 0x00000020, lParam: 0x000E0001, hwnd=0x30360, time=10729640

Oops, I misread this log. This is really odd. There is the message which we want to remove from the queue. But PeekMessage() failed it.
I think that if PeekMessage(PM_REMOVE) fails due to losing message, we shouldn't dispatch keypress event because system/user expect that the key event shouldn't cause text input.

Even with this patch, there will be still remaining minor cases. However, I guess that it's enough for next release.

I believe that we need to redesign WM_CHAR message handling in other bug.
Attachment #8384548 - Flags: review?(jmathies)
Comment on attachment 8384548 [details] [diff] [review]
When PeekMessage() fails to remove found char message, just ignore it (not to dispatch keyprese event)

Review of attachment 8384548 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/windows/KeyboardLayout.cpp
@@ +1358,5 @@
>                                 nextKeyMsg.message, nextKeyMsg.message,
>                                 PM_REMOVE | PM_NOYIELD)) {
> +      MSG nextKeyMsgInAllWindows;
> +      // The char message is redirected to different thread's window by focus
> +      // move or someting or just calcelled by external application.

nit - comment needs cleanup: 'someting', 'calcelled'
Attachment #8384548 - Flags: review?(jmathies) → review+
I am updating the status-firefox28 & status-firefox29 since it was reopened.
Comment on attachment 8384548 [details] [diff] [review]
When PeekMessage() fails to remove found char message, just ignore it (not to dispatch keyprese event)

[Approval Request Comment]
User impact if declined: Some users still meet this crash due to strange behavior of something like a11y tools. This patch avoids the most crash cases of renaming cases.
Testing completed (on m-c, etc.): Landed on m-c.
Risk to taking this patch (and alternatives if risky): The risk is low. However, instead of that users will meet this crash, they may see different bug that they try to input some text but some characters may be not inputted actually. Anyway, we are not sure why we loses found char message... If such bug won't be filed, we have no change to fix such bug if there is.
String or IDL/UUID changes made by this patch: Nothing.
Attachment #8384548 - Flags: approval-mozilla-aurora?
Attachment #8384548 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8381319 [details] [diff] [review]
Log more details in the queue when PeekMessage() fails to remove found char message

[Approval Request Comment]
User impact if declined: This patch just appending additional information to the crash report. This is necessary for applying next patch.
Testing completed (on m-c, etc.): Landed on m-c and Aurora.
Risk to taking this patch (and alternatives if risky): Nothing.
String or IDL/UUID changes made by this patch: Noting.
Attachment #8381319 - Flags: approval-mozilla-beta?
Comment on attachment 8384548 [details] [diff] [review]
When PeekMessage() fails to remove found char message, just ignore it (not to dispatch keyprese event)

[Approval Request Comment]
User impact if declined: This patch avoids to crash with a cause which is the most frequent of remaining causes.
Testing completed (on m-c, etc.): Landed on m-c and Aurora. Although, the landing dates are in a couple of days. However, I recommend to uplift them as far as possible for QA.
Risk to taking this patch (and alternatives if risky): Low risk. But instead of avoiding the crash, users might see that text input sometimes fails in the case.
String or IDL/UUID changes made by this patch: Nothing.
Attachment #8384548 - Flags: approval-mozilla-beta?
Although, we cannot land next fix in this cycle, we should try the last issue which has already known.

This patch collects the message queue information when we remove unexpected char message. We need to lift this up to beta because some beta users sometimes meet this case.
Attachment #8386593 - Flags: review?(jmathies)
Comment on attachment 8386593 [details] [diff] [review]
Collect more details of the queue status when we removed unexpected char message from the queue

In the case of metrofx, we only have one window. Not sure if the "other window" dumps will hit on anything. Be interesting to see the output.
Attachment #8386593 - Flags: review?(jmathies) → review+
Attachment #8384548 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8381319 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8386593 [details] [diff] [review]
Collect more details of the queue status when we removed unexpected char message from the queue

[Approval Request Comment]
User impact if declined: This patch collects more information when we remove unexpected char message which case is the last known reason of this crash.
Testing completed (on m-c, etc.): Landed on m-c.
Risk to taking this patch (and alternatives if risky): Nothing, this appends additional information to the crash report's note before a call of MOZ_CRASH().
String or IDL/UUID changes made by this patch: Nothing.
Attachment #8386593 - Flags: approval-mozilla-aurora?
Attachment #8386593 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Can this be marked fixed now? We'll need to have this flagged as fixed before it can be verified.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #131)
> Can this be marked fixed now? We'll need to have this flagged as fixed
> before it can be verified.

I believe that we succeeded to _reduce_ the number of crashes, however, we've not this bug completely yet...

This bug is not too long and fixed a lot of crash cases. So, this bug isn't good as a bug in bugzilla. I'll separate this bug as each issue, then, you can verify fix of each cases with crash reports.
Sounds good, thanks Masayuki. Marking this bug as such so we can verify the effect this bug has had on "fixed" branches.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
I think that this should be open as a meta bug of the signatures.
Status: RESOLVED → REOPENED
Keywords: meta
Resolution: FIXED → ---
Summary: crash in mozilla::widget::NativeKey::RemoveFollowingCharMessage() or mozilla::widget::NativeKey::GetFollowingCharMessage(tagMSG&) → [Meta] crash in mozilla::widget::NativeKey::RemoveFollowingCharMessage() or mozilla::widget::NativeKey::GetFollowingCharMessage(tagMSG&)
Target Milestone: mozilla30 → ---
Status: REOPENED → ASSIGNED
A meta bug with checked-in patches is an absolute no-no usually but you made it into that. :(

I prefer closing the bug with the patches, try to verify there that they helped and exclusively only open followup bug for any remaining issues or things that didn't work correctly.

This here is a total mess now. :(
Also, if this bug stays open, please remove any tracking flags as a meta bug doesn't qualify for tracking per definition.
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #134)
> I think that this should be open as a meta bug of the signatures.
Flags: needinfo?(jmathies)
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #134)
> I think that this should be open as a meta bug of the signatures.
Flags: needinfo?(jmathies)
Crash Signature: [@ mozilla::widget::NativeKey::RemoveFollowingCharMessage()] [@ mozilla::widget::NativeKey::GetFollowingCharMessage(tagMSG&)] → [@ mozilla::widget::NativeKey::RemoveFollowingCharMessage()] [@ mozilla::widget::NativeKey::GetFollowingCharMessage(tagMSG&)] [@ mozilla::widget::NativeKey::RemoveFollowingCharMessage] [@ mozilla::widget::NativeKey::GetFollowingCharMessage]
Oops, Kato-san, I forgot to ni? to you at writing previous comment, please check it.
Flags: needinfo?(m_kato)
(In reply to Masayuki Nakano [:masayuki] from comment #139)
> Some stack traces are odd:
> 
> https://crash-stats.mozilla.com/report/index/23e5dc1c-b4c1-4258-91ca-
> 9bc1b2170127
> https://crash-stats.mozilla.com/report/index/1f493b94-32a1-4d65-b588-
> 2c26c2170127
> https://crash-stats.mozilla.com/report/index/451785c0-f555-403f-a1ac-
> c943a2170128
> https://crash-stats.mozilla.com/report/index/af28555b-b7f8-42b2-aac1-
> b72c32170128
> https://crash-stats.mozilla.com/report/index/6a670f6d-2b48-44dd-b102-
> f251d2170128
> 
> nsWindow::ProcessKeyDownMessage(tagMSG const&, bool*) is suddenly called
> from unexpected methods.
> 
> Makoto-san, do you know if this kind of stack traces are just broken result
> of Breakpad? Or they indicate crashes caused by already broken memory?

Until Firefox 51, we use omit-frame-poitner option to build Firefox.  So on x86 build, we cannot get correct stack trace well.  But from Firefox 52, we won't use it (but it might cause performance issue) by bug 1322735.

Also, since crash report server doesn't install Windows symbols of all version automatically now, some stack (into user32.dll etc) cannot be resolved.
Flags: needinfo?(m_kato)
Thank you for the information. But the reports are from 52 Beta, so, I still don't understand what occurs...
According to the result of bug 1335306, the keyboard layout of all crash reports with 53.0b3 is, "US". This means that the users are using hacky key input utilities which *should* be created as keyboard layout or IME. Instead of using Windows's multiple keyboard layouts feature, the hacky keyboard layouts _override_ US keyboard layout behavior at runtime.

One is C-DAC's product but still not sure which product is exactly the cause, though. The company releases at least a software keyboard:
https://cdac.in/index.aspx?id=mc_ilf_Keyboard
This supports to input Bengali, Assamese and Manipuri. So, around India users the most victims of this crash.

On the other hand, there are some crash reports which include Chinese character, '㕲'. This is bug 1336322 and bug 1336331. Oddly, these odd cases also caused by US keyboard layout, but there are no unknown DLL file in their reports. So, I have no idea what's the cause of these bugs, sigh.
Except some edge cases, we have no report with latest versions. So, we should mark this bug as FIXED for now. If you still hit this crash, please file a new bug and let us know what keyboard layout are you using and what result is expected at the crash. Because we cannot decide which information coming from your keyboard layout is respected due to the keyboard layout's bug.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago2 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.