Crash in mozilla::ContentCacheInParent::OnEventNeedingAckHandled

RESOLVED FIXED in Firefox 56

Status

()

Core
Event Handling
--
critical
RESOLVED FIXED
5 months ago
5 months ago

People

(Reporter: Tomcat, Assigned: masayuki)

Tracking

({crash, regression})

Trunk
mozilla56
crash, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox54 unaffected, firefox55 wontfix, firefox56 fixed)

Details

(crash signature)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Reporter)

Description

5 months ago
This bug was filed from the Socorro interface and is 
report bp-1f26c708-e23b-4dc1-a25d-370a00170712.
=============================================================

filed out of crash stats, seems this is a new bug - maybe related to bug 1377672 ? 

Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::ContentCacheInParent::OnEventNeedingAckHandled(nsIWidget*, mozilla::EventMessage) 	widget/ContentCache.cpp:1207
1 	xul.dll 	mozilla::dom::TabParent::RecvOnEventNeedingAckHandled(mozilla::EventMessage const&) 	dom/ipc/TabParent.cpp:1858
2 	xul.dll 	mozilla::dom::PBrowserParent::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PBrowserParent.cpp:2255
3 	xul.dll 	mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PContentParent.cpp:3151
4 	xul.dll 	mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) 	ipc/glue/MessageChannel.cpp:2093
5 	xul.dll 	mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message&&) 	ipc/glue/MessageChannel.cpp:2019
6 	xul.dll 	mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) 	ipc/glue/MessageChannel.cpp:1888
7 	xul.dll 	mozilla::ipc::MessageChannel::MessageTask::Run() 	ipc/glue/MessageChannel.cpp:1921
8 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1437
9 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/threads/nsThreadUtils.cpp:489
10 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:125
Wow, really odd. It means that we receive compositionend notification from TabChild without pending composition in TabParent...
Shouldn't be related to bug 1377672, but perhaps, related to bug 1376424.
I hit this while typing in the search bar in Gmail. I was typing 目覚まし時計 but while the candidate window was still showing (i.e. before committing) it crashed.
Happens to me too on macOS. https://crash-stats.mozilla.com/report/index/2fbd1518-3ac2-4197-bfe6-1ad270170717

I am also using an IME (with uncommited characters) at the time, and the master password dialog shows up right before Firefox crashes.
OS: Windows 10 → All
I still cannot reproduce this bug.

I'd like to explain what happens.

Each ContentCacheInParent increases a counter when it sends eCompositionStart to remote process, then, after TabChild dispatched every composition event into its DOM tree, it sends "OnEventNeedingAckHandled" message to TabParent. Then, TabParent notifies ContentCacheInParent::OnEventNeedingAckHandled() of that with the event message.  If TabChild sent it with eCompositionCommit or eCompositionCommitAsIs, ContentCacheInTab decreases the counter.  So, the counter indicates that number of compositions which are sent to the remote process but hasn't been handled in the remote content.

Additionally, if remote process needs to request to IME to commit active composition, PuppetWidget sends "RequestIMEToCommitComposition" message to TabParent.  Then, TabParent requests it to native IME.  Finally, if it's handled by IME synchronously, it sends back the fact and commit string to TabChild.  Then, TabChild dispatches eCompositionCommit event with the commit string into its DOM tree.  Finally, TabChild sends "OnEventNeedingAckHandled" message to TabParent with eCompositionCommitRequestHandled event message and ContentCacheInParent decreases the counter.

So, when ContentCacheInTabParent::OnEventNeedingAckHandled receives eCompositionCommit, eCompositionCommitAsIs or eCompositionCommitRequestHandled, ContentCacheInParent decreases the counter.

The crash is, when ContentCacheInParent does it but the counter is already 0, it crashes with MOZ_RELEASE_ASSERT.

According to comment 4, some composition events may be sent to different TabParent than TabParent which sends eCompositionStart to TabChild. However, I have no idea why such case could occur.
Can we close some old TabParent and create a new one during the process?
Okay, perhaps, there are at least 2 cases to meet this crash:

#1, ContentCacheInTabParent::OnEventNeedingAckHandled() received eCompositionCommitRequestHandled, then, eCompositionCommit or eCompositionCommitAsIs. This could occur if request to commit composition is from the remote process during dispatching eCompositionCommit(AsIs) in the remote process. This is timing issue.

#2, ContentCacheInTabParent::RequestIMEToCommitComposition() returns true when the TabParent already lost focus, then, it may receive both eCompositionCommitRequestHandled and eCompositionCommit or eCompositionCommitAsIs. This is focus issue.

So, perhaps, comment 4 hits the second case, but not sure about comment 3's case.
Assignee: nobody → masayuki
Blocks: 1376424
Status: NEW → ASSIGNED
status-firefox54: --- → unaffected
status-firefox55: --- → affected
status-firefox56: --- → affected
Component: Widget → Event Handling
Keywords: regression
Hardware: Unspecified → All
Version: unspecified → Trunk
https://treeherder.mozilla.org/#/jobs?repo=try&revision=17aa5e947ed7e6a69bc5600d54b90295ccf28f12
Comment hidden (mozreview-request)
Comment on attachment 8888832 [details]
Bug 1380652 - ContentCacheInParent::RequestIMEToCommitComposition() shouldn't handle the request synchronously when its TabParent has already sent eCompositionCommit(AsIs) event of the composition

https://reviewboard.mozilla.org/r/159808/#review165600
Attachment #8888832 - Flags: review?(m_kato) → review+

Comment 11

5 months ago
Pushed by masayuki@d-toybox.com:
https://hg.mozilla.org/integration/autoland/rev/ef4d1461791e
ContentCacheInParent::RequestIMEToCommitComposition() shouldn't handle the request synchronously when its TabParent has already sent eCompositionCommit(AsIs) event of the composition r=m_kato
(Reporter)

Comment 12

5 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/ef4d1461791e
Status: ASSIGNED → RESOLVED
Last Resolved: 5 months ago
status-firefox56: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Please nominate this for Beta approval when you're comfortable doing so.
status-firefox-esr52: --- → unaffected
Flags: needinfo?(masayuki)
Oh, I found different crash bug on macOS. That causes same stack trace but crashed by different MOZ_RELEASE_ASSERT in mozilla::ContentCacheInParent::OnEventNeedingAckHandled. I'll file a bug for it after checking on Linux. (FYI: it's not recent regression, perhaps.)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #13)
> Please nominate this for Beta approval when you're comfortable doing so.

Looks like that the patch is landed after 26th build. So, we need one more day to check if it's actually fixed this bug.
Okay, there is no reports with 26th build (there is a couple of remotes per build ID).

(Note that, https://crash-stats.mozilla.com/report/index/ba9e8455-c8fc-40b3-bcc6-a6eb30170727 is crashed in same method, but different point which I mentioned at the previous comment, so, this is my report.)
Flags: needinfo?(masayuki)
Hmm, increasing the crash rate was caused by both bug 1376424 and bug 1377672. However, the latter is too risky for uplift (actually, it still has known regressions). Therefore, only bug 1376424 was uplifted to Beta.

The landed patch really succeeded to decrease the rate. However, the patch depends on the latter risky fix. So, we cannot take the fix to Beta in this stage...
Blocks: 1377672
Fortunately, there is only one report with 55.0b.
https://crash-stats.mozilla.com/signature/?product=Firefox&version=55.0b12&version=55.0b11&version=55.0b10&version=55.0b9&signature=mozilla%3A%3AContentCacheInParent%3A%3AOnEventNeedingAckHandled&date=%3E%3D2017-06-27T12%3A55%3A59.000Z&date=%3C2017-07-27T12%3A55%3A59.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports

I think that this is enough low rate for giving up to fix it on Beta.
status-firefox55: affected → wontfix

Comment 18

5 months ago
I encountered it using nightly buildid 20170803100352 on Linux.

Here is the crash report:
https://crash-stats.mozilla.com/report/index/415dbd3f-d204-4f2d-a57a-6baf00170804

Here is my STR:
1. Open several tabs(ex: 7 tabs) connecting to different sites(yahoo, facebook, etc.)
2. In one of the tabs, switch to Chinese IME and type some characters but don't commit
3. Switch back to English mode
4. Hold down Ctrl-PgDn to switch between tabs quickly
5. In 3 seconds the tab switching will stuck in one tab
6. Click mouse cursor at a different tab -> crash happened

If more information needed, please let me know.

Updated

5 months ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Shian-Yow Wu [:swu] (56 Regression Engineering support) from comment #18)
> I encountered it using nightly buildid 20170803100352 on Linux.
> 
> Here is the crash report:
> https://crash-stats.mozilla.com/report/index/415dbd3f-d204-4f2d-a57a-
> 6baf00170804
> 
> Here is my STR:
> 1. Open several tabs(ex: 7 tabs) connecting to different sites(yahoo,
> facebook, etc.)
> 2. In one of the tabs, switch to Chinese IME and type some characters but
> don't commit
> 3. Switch back to English mode
> 4. Hold down Ctrl-PgDn to switch between tabs quickly
> 5. In 3 seconds the tab switching will stuck in one tab
> 6. Click mouse cursor at a different tab -> crash happened
> 
> If more information needed, please let me know.

Crashed in same method but different assertion. Please file a new bug.
Status: REOPENED → RESOLVED
Last Resolved: 5 months ago5 months ago
Resolution: --- → FIXED

Comment 20

5 months ago
Bug 1387357 filed for comment 18.
You need to log in before you can comment on or make changes to this bug.