Crash in mozilla::ContentCacheInParent::OnEventNeedingAckHandled

RESOLVED FIXED in Firefox 58

Status

()

defect
P2
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: philipp, Assigned: masayuki)

Tracking

({crash, inputmethod, regression})

56 Branch
mozilla59
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox56 wontfix, firefox57 wontfix, firefox58 fixed, firefox59 fixed)

Details

(crash signature)

Attachments

(4 attachments)

59 bytes, text/x-review-board-request
m_kato
: review+
Details
59 bytes, text/x-review-board-request
m_kato
: review+
Details
59 bytes, text/x-review-board-request
m_kato
: review+
Details
59 bytes, text/x-review-board-request
m_kato
: review+
Details
This bug was filed from the Socorro interface and is 
report bp-5758fe26-aa83-4245-8c02-883b80171004.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::ContentCacheInParent::OnEventNeedingAckHandled(nsIWidget*, mozilla::EventMessage) 	widget/ContentCache.cpp:1222
1 	xul.dll 	mozilla::dom::TabParent::RecvOnEventNeedingAckHandled(mozilla::EventMessage const&) 	dom/ipc/TabParent.cpp:1862
2 	xul.dll 	mozilla::dom::PBrowserParent::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PBrowserParent.cpp:2251
3 	xul.dll 	mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PContentParent.cpp:3137
4 	xul.dll 	mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) 	ipc/glue/MessageChannel.cpp:2092
5 	xul.dll 	mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message&&) 	ipc/glue/MessageChannel.cpp:2018
6 	xul.dll 	mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) 	ipc/glue/MessageChannel.cpp:1887
7 	xul.dll 	mozilla::ipc::MessageChannel::MessageTask::Run() 	ipc/glue/MessageChannel.cpp:1920
8 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1446
9 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:97
10 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:319
11 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:299
12 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:156
13 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:278
14 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp:287
15 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:4620
16 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4784
17 	xul.dll 	XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4879
18 	firefox.exe 	NS_internal_main(int, char**, char**) 	browser/app/nsBrowserApp.cpp:309
19 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
20 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
21 	kernel32.dll 	BaseThreadInitThunk 	
22 	firefox.exe 	firefox.exe@0x6eef 	
23 	firefox.exe 	firefox.exe@0x6eef 	
24 	ntdll.dll 	RtlUserThreadStart

reports with this signature are regressing in volume in firefox 56 and happen across platforms.

Moz crash reason facet
1 	MOZ_RELEASE_ASSERT(mPendingCompositionCount > 0) 	757 	69.32 %
2 	MOZ_RELEASE_ASSERT(mPendingEventsNeedingAck > 0) 	334 	30.59 %

it looks like the problem originally started on 56.0a1 build 20170709030204 - this would have been the pushlog to the day before:
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-07-08&tochange=a708f80289b20da80bea42ac95777ac766ee6845
Yeah, should be regression of bug 1377672.

The problem to fix the crashes is, we cannot know the STR of this or these crashes. These crashes detect our composition state management failure.

# It might be better to store methods called before and put them into crash reports, though.
Keywords: inputmethod
Too late for a fix in 56, may also be too late for 57.
Priority: -- → P2
https://crash-stats.mozilla.com/report/index/9f6272c2-217b-4028-a232-95a9d0171018
> There is no pending composition but received eCompositionCommitRequestHandled message from the remote child
> 
> Dispatched Event Message Log:
>   eCompositionStart
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionCommit
>   eCompositionStart
>   eCompositionChange
>   eCompositionCommit
> 
> Received Event Message Log:
>   eCompositionStart
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionCommitRequestHandled
>   eCompositionStart
>   eCompositionCommitRequestHandled
>   eCompositionChange
>   eCompositionCommitRequestHandled

https://crash-stats.mozilla.com/report/index/ce3ee9e0-e345-43d3-a12a-9cc770171017
> There is no pending composition but received eCompositionCommitRequestHandled message from the remote child
> 
> Dispatched Event Message Log:
>   eCompositionStart
>   eCompositionChange
>   eCompositionCommit
>   eCompositionStart
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionCommit
> 
> Received Event Message Log:
>   eCompositionStart
>   eCompositionChange
>   eCompositionCommit
>   eCompositionStart
>   eCompositionChange
>   eCompositionChange
>   eCompositionCommitRequestHandled
>   eCompositionChange
>   eCompositionCommitRequestHandled
(In reply to Shawn Huang [:shawnjohnjr] from comment #4)
> I still hit the crash on nightly.
> 
> https://crash-stats.mozilla.com/report/index/a71cd7bd-a585-4272-8142-
> 948250171027
> I'm not sure it's the same issue.

Do you have some hits to reproduce this? This crash is perhaps an issue of race between parent process and remote process. So, this must look like a random crash.
Flags: needinfo?(shuang)
(In reply to Masayuki Nakano [:masayuki] (JST, +0900) from comment #5)
> (In reply to Shawn Huang [:shawnjohnjr] from comment #4) 
> Do you have some hits to reproduce this? This crash is perhaps an issue of
> race between parent process and remote process. So, this must look like a
> random crash.

I hit the crash when I click facebook messenger textbox to reply message (via www.facebook.com not www.messenger.com) on Linux.
Unfortunately I can't reproduce it again now.
Flags: needinfo?(shuang)
(In reply to Shawn Huang [:shawnjohnjr] from comment #6)
> (In reply to Masayuki Nakano [:masayuki] (JST, +0900) from comment #5)
> > (In reply to Shawn Huang [:shawnjohnjr] from comment #4) 
> > Do you have some hits to reproduce this? This crash is perhaps an issue of
> > race between parent process and remote process. So, this must look like a
> > random crash.
> 
> I hit the crash when I click facebook messenger textbox to reply message
> (via www.facebook.com not www.messenger.com) on Linux.
> Unfortunately I can't reproduce it again now.

Did you have composing text with IME (i.e., underlined text or something) when you clicked?
Flags: needinfo?(shuang)
(In reply to Masayuki Nakano [:masayuki] (JST, +0900) from comment #7) 
> Did you have composing text with IME (i.e., underlined text or something)
> when you clicked?
I do have ibus IME loaded, but I can't remember exactly whether I switched to ibus IME or not when I clicked textbox.
Flags: needinfo?(shuang)
Okay, thank you for your information. I'll keep looking for how to reproduce and fix this.
Hmm. I just hit it again.
https://crash-stats.mozilla.com/report/index/1f2a7bf9-0f2b-4dde-b9b9-09a110171103

This time when posting a new message to facebook wall, facebook will pop another data-editor, and focus will automatically change to that edtior textbox, then I quickly switch IME via (ctrl+space) key (en to ibus IME).
This time it happens on slack.
https://crash-stats.mozilla.com/report/index/6d4adebc-1da0-4b0c-8c87-4447d0171107
I seems to hit this randomly. Any log or assertion I can add locally to help to clarify further?
Although, still I cannot reproduce this bug but we should fix it on 58 ASAP.

Comment 17 does really make sense. Moving focus in content tab(s) may cause this bug.
Assignee: nobody → masayuki
Status: NEW → ASSIGNED
Looks like that most (all?) crashes occurs after "Commit request is handled with stored composition string because TabParent has already lost focus". In this case, ContentCacheInParent::RequestIMEToCommitComposition() sends latest composition string as commit string to the remote process.  Then, the remote process will send eCompositionCommitRequestHandled back to the main process. Looks like that this is redundant in some cases. Although, I still don't understand why this occurs.
Comment on attachment 8930315 [details]
Bug 1405832 - part 1: ContentCacheInParent::RequestIMEToCommitComposition() should increment mPendingEventsNeedingAck itself if it treat the request handled synchronously without actually requesting IME to commit composition

https://reviewboard.mozilla.org/r/201460/#review207256
Attachment #8930315 - Flags: review?(m_kato) → review+
Comment on attachment 8930318 [details]
Bug 1405832 - part 4: ContentCacheInParent::OnEventNeedingAckHandled() shouldn't crash in release build

https://reviewboard.mozilla.org/r/201466/#review207264
Attachment #8930318 - Flags: review?(m_kato) → review+
Comment on attachment 8930316 [details]
Bug 1405832 - part 2: TextComposition::RequestToCommit() should request IME to commit or cancel composition only when it hasn't been request it yet and hasn't received commit event yet

https://reviewboard.mozilla.org/r/201462/#review207258

::: dom/events/TextComposition.h:107
(Diff revision 1)
>  
>    /**
> +   * IsRequestingCommitOrCancelComposition() returns true if the instance is
> +   * requesting widget to commit or cancel composition.
> +   */
> +  bool IsRequestingCommitOrCancelComposition()

nits: const?
Attachment #8930316 - Flags: review?(m_kato) → review+
Comment on attachment 8930317 [details]
Bug 1405832 - part 3: ContentCacheInParent::RequestIMEToCommitComposition() should call nsIWidget::NotifyIME() via TextComposition::RequestToCommit()

https://reviewboard.mozilla.org/r/201464/#review207332
Attachment #8930317 - Flags: review?(m_kato) → review+
Pushed by masayuki@d-toybox.com:
https://hg.mozilla.org/integration/autoland/rev/215221275151
part 1: ContentCacheInParent::RequestIMEToCommitComposition() should increment mPendingEventsNeedingAck itself if it treat the request handled synchronously without actually requesting IME to commit composition r=m_kato
https://hg.mozilla.org/integration/autoland/rev/e8d7881af680
part 2: TextComposition::RequestToCommit() should request IME to commit or cancel composition only when it hasn't been request it yet and hasn't received commit event yet r=m_kato
https://hg.mozilla.org/integration/autoland/rev/3e4e219b8aeb
part 3: ContentCacheInParent::RequestIMEToCommitComposition() should call nsIWidget::NotifyIME() via TextComposition::RequestToCommit() r=m_kato
https://hg.mozilla.org/integration/autoland/rev/7a3e5d976499
part 4: ContentCacheInParent::OnEventNeedingAckHandled() shouldn't crash in release build r=m_kato
Comment on attachment 8930315 [details]
Bug 1405832 - part 1: ContentCacheInParent::RequestIMEToCommitComposition() should increment mPendingEventsNeedingAck itself if it treat the request handled synchronously without actually requesting IME to commit composition

Approval Request Comment
[Feature/Bug causing the regression]:
Perhaps, regression of bug 1377672.

[User impact if declined]:
A lot of users keep meeting this crash.

[Is this code covered by automated tests?]:
No because we're still not sure how to reproduce this bug.

[Has the fix been verified in Nightly?]:
No. However, this series of patches makes it won't crash at least in release channel nor beta channel except developer edition.

[Needs manual test from QE? If yes, steps to reproduce]:
No.

[List of other uplifts needed for the feature/fix]:
The following patches. This patch must fix the exact cause of this bug, but still not sure.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
This fixes wrong count management of expected event receive message from remote process.

[String changes made/needed]:
No.
Attachment #8930315 - Flags: approval-mozilla-beta?
Comment on attachment 8930316 [details]
Bug 1405832 - part 2: TextComposition::RequestToCommit() should request IME to commit or cancel composition only when it hasn't been request it yet and hasn't received commit event yet

Approval Request Comment
[Feature/Bug causing the regression]:
Not sure, but perhaps regression of e10s.

[User impact if declined]:
Not sure, but the main process may receive odd request from remote process.

[Is this code covered by automated tests?]:
No.

[Has the fix been verified in Nightly?]:
No.

[Needs manual test from QE? If yes, steps to reproduce]: 
No.

[List of other uplifts needed for the feature/fix]:
All patches for this bug.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
This fixes a simple bug of TextComposition.  TextComposition manages currently composing words with IME.  So, it should only request to commit/cancel composition to native IME only once.  However, in e10s mode, this may request twice or more for a composition.  This may cause commit/cancel another composition which has already started in the main process.

[String changes made/needed]:
No.
Attachment #8930316 - Flags: approval-mozilla-beta?
Comment on attachment 8930317 [details]
Bug 1405832 - part 3: ContentCacheInParent::RequestIMEToCommitComposition() should call nsIWidget::NotifyIME() via TextComposition::RequestToCommit()

Approval Request Comment
[Feature/Bug causing the regression]:
Regression of e10s.

[User impact if declined]:
Not sure.

[Is this code covered by automated tests?]:
No.

[Has the fix been verified in Nightly?]:
No.

[Needs manual test from QE? If yes, steps to reproduce]:
No.

[List of other uplifts needed for the feature/fix]:
The patches for this bug.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
This supports the previous patch.  By the previous patch, we need to request IME to commit/cancel composition via TextComposition.  However, there is only path to request it directly.  This patch fixes it.

[String changes made/needed]:
No.
Attachment #8930317 - Flags: approval-mozilla-beta?
Comment on attachment 8930318 [details]
Bug 1405832 - part 4: ContentCacheInParent::OnEventNeedingAckHandled() shouldn't crash in release build

Approval Request Comment
[Feature/Bug causing the regression]:
Regression of bug 1377672.

[User impact if declined]:
Users may meet crash in same method if we still have different bug of managing composition event count in the main process.

[Is this code covered by automated tests?]:
No.

[Has the fix been verified in Nightly?]:
No.

[Needs manual test from QE? If yes, steps to reproduce]:
No.

[List of other uplifts needed for the feature/fix]:
The patches for this bug.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
This patch just makes the crash by ourselves in the main process completely go away from release channel and beta channel except developer edition.

Even if we detect a bug we shouldn't crash the main process by ourselves because:
* Users don't report STR even if we crash.
* Users may lost some inputting data in the other tabs.
Even if we stop crashing the main process, remote process may crash if it detects odd composition events.  However, crashing only the remote process is better than crash of the main process.

[String changes made/needed]:
No.
Attachment #8930318 - Flags: approval-mozilla-beta?
Although, this is a new regression of 56. However, after releasing 57, the rate is being increased. So, it *might* be better to uplift to 57.
Hi :masayuki,
From the crash report, there are still crashes in nightly after the patch was landed. Do you have any thoughts?
Flags: needinfo?(masayuki)
(In reply to Gerry Chang [:gchang] from comment #36)
> Hi :masayuki,
> From the crash report, there are still crashes in nightly after the patch
> was landed. Do you have any thoughts?

Hmm. That means the patches are not enough.

Latest reports is this: https://crash-stats.mozilla.com/report/index/706f8f0f-0b3e-450d-860b-614c60171125

> There is no pending composition but received eCompositionCommitRequestHandled message from the remote child
> 
> Dispatched Event Message Log:
>   eCompositionStart
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionCommit
> 
> Received Event Message Log:
>   eCompositionStart
>   eCompositionChange
>   eCompositionChange
>   eCompositionChange
>   eCompositionCommitRequestHandled
>   eCompositionChange
>   eCompositionCommitRequestHandled
> 
> Result of RequestIMEToCommitComposition():
>   Commit reqeust is handled synchronously
>   Commit request is handled with stored composition string because TabParent has already lost focus

We still receive odd eCompositionCommitRequestHandled message from the remote process.
Flags: needinfo?(masayuki)
Comment on attachment 8930315 [details]
Bug 1405832 - part 1: ContentCacheInParent::RequestIMEToCommitComposition() should increment mPendingEventsNeedingAck itself if it treat the request handled synchronously without actually requesting IME to commit composition

Aiming the patches here and in bug 1423456 and bug 1420849 at 58 beta 11 or 12 (depending on when we can land this and go to build today)
Attachment #8930315 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8930316 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8930317 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8930318 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.