Closed
Bug 1128457
Opened 8 years ago
Closed 7 years ago
CookieServiceParent::RecvGetCookieString occasionally fails
Categories
(Core :: DOM: Content Processes, defect)
Tracking
()
VERIFIED
FIXED
Tracking | Status | |
---|---|---|
e10s | m5+ | --- |
People
(Reporter: jimm, Assigned: billm)
References
Details
Attachments
(2 files)
2.50 KB,
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
11.26 KB,
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
mozilla::net::PCookieServiceChild::SendGetCookieString occasionally fails, causing an abort of the child process. https://dxr.mozilla.org/mozilla-central/source/netwerk/cookie/CookieServiceParent.cpp#74
(In reply to Jim Mathies [:jimm] from comment #1) > This might be tear down related. It sounds like it's just some race where GetAppInfoFromParams is returning false: https://dxr.mozilla.org/mozilla-central/source/netwerk/cookie/CookieServiceParent.cpp#95
![]() |
Reporter | |
Updated•8 years ago
|
tracking-e10s:
--- → ?
![]() |
Reporter | |
Updated•8 years ago
|
Assignee: nobody → mrbkap
![]() |
Reporter | |
Comment 3•8 years ago
|
||
This is predominantly caused by these MsgNotAllowed errors: (msgtype=0x340001,name=???) Message not allowed: cannot be sent/recvd in this state
Ah, so they just timed out. http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.cpp#1187
Assignee | ||
Updated•8 years ago
|
Assignee: mrbkap → wmccloskey
Assignee | ||
Updated•8 years ago
|
![]() |
Reporter | |
Comment 5•8 years ago
|
||
noticed this in today's data, for reasons: '(msgtype=0x340001,name=???) Message not allowed: cannot be sent/recvd in this state' 'HangMonitor'
Assignee | ||
Comment 6•8 years ago
|
||
"HangMonitor" is the reason we give when the user kills a content process using the slow script/plugin hang UI.
Assignee | ||
Updated•8 years ago
|
Assignee: wmccloskey → mrbkap
![]() |
Reporter | |
Updated•8 years ago
|
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*)]
![]() |
Reporter | |
Comment 7•8 years ago
|
||
oops, wrong bug.
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*)]
Comment 8•7 years ago
|
||
Hey Jim, Is it possible to figure out if there are particular extensions that cause this or generally to get a list of crash reports that are caused by this indirect signature?
Flags: needinfo?(jmathies)
![]() |
Reporter | |
Comment 9•7 years ago
|
||
(In reply to Blake Kaplan (:mrbkap) (please use needinfo!) from comment #8) > Hey Jim, > > Is it possible to figure out if there are particular extensions that cause > this or generally to get a list of crash reports that are caused by this > indirect signature? Use supersearch and key in on a part of the ipc error message you're interested in, like '0x340001' - https://crash-stats.mozilla.com/search/?product=Firefox&version=39.0a1&ipc_channel_error=~0x340001&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports Note this hex message value I'm using can change over time: 0x34 is a message class, 0x01 is the message id. Message classes are defined in obj/ipc/ipdl/_ipdlheaders/IPCMessageStart.h, message ids are defined in the main auto-generated protocol header (like PContent.h and PPluginWidget.h).
Flags: needinfo?(jmathies)
Assignee | ||
Comment 10•7 years ago
|
||
I can reproduce this now. I'm in the middle of debugging it. The basic STR is to install LastPass and visit a page where JS hangs (which is pretty easy because of bug 1113369). Then... do some stuff after it hangs. I see crashes from SendGetInputContext and SendGetCookieString, both urgent messages.
Assignee: mrbkap → wmccloskey
Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1142109
Assignee | ||
Comment 12•7 years ago
|
||
Re-opening because of https://bugzilla.mozilla.org/show_bug.cgi?id=1056018#c20.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Assignee | ||
Comment 13•7 years ago
|
||
It's really hard to tell what's going on here, but I looked over the code again and found a potential problem. WaitForSyncNotify can return try (meaning a timeout) even if we received a message reply. Additionally, we can receive a reply during ShouldContinueFromTimeout since it releases the lock. So we should check for a reply before deciding to time out. Otherwise we'll never unset mTimedOutMessageSeqno.
Attachment #8586350 -
Flags: review?(dvander)
Comment on attachment 8586350 [details] [diff] [review] patch for possible problem Review of attachment 8586350 [details] [diff] [review]: ----------------------------------------------------------------- Good catch.
Attachment #8586350 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 15•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/966f9a3fa1f8
Keywords: leave-open
![]() |
Reporter | |
Comment 17•7 years ago
|
||
fixed: http://www.mathies.com/mozilla/client-abort-report.txt
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
![]() |
Reporter | |
Comment 18•7 years ago
|
||
Oops, sorry, I was looking at the wrong list.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 19•7 years ago
|
||
It looks like this fixed the MsgNotAllowed errors for non-urgent messages. It didn't fix the problem for urgent messages like GetCookieString since won't process incoming CPOWs in ProcessPendingMessages. So I think we need a separate solution for that.
Assignee | ||
Comment 20•7 years ago
|
||
This patch complements the patch in bug 1142109. They both address the problem where the parent is in the timed out state (i.e., it sent a CPOW that timed out and it hasn't gotten a reply for that CPOW yet). If we get a message in the timed out state, we need to decide what to do. Originally, we always returned an error for the message. That was to ensure that we didn't start processing the incoming message at the same time the child started processing the timed out CPOW. In general, both sides processing messages at the same time is bad. The patch in bug 1142109 ensured that we always process incoming messages before sending a message. That way, if the child sends something while a CPOW is in its input queue, it will handle the CPOW before sending its message. However, it won't handle the CPOW if it's sending an urgent messages, since we defer everything when waiting for an urgent response. However, there's no reason why the parent can't process the child's message in that case, since the child is guaranteed to not process anything, thus avoiding the problem where both sides dispatch at once. So this patch causes the parent to process incoming urgent messages when it has a timed out CPOW that it hasn't received a response for.
Attachment #8589358 -
Flags: review?(dvander)
Attachment #8589358 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 21•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/74b5bb9f1263
Assignee | ||
Comment 23•7 years ago
|
||
I think we can declare this fixed. There have been 9 crashes with "message not allowed" for the 2015-04-15 build so far. In a comparable time period, the 2015-04-14 build had 84 "message not allowed" crashes. If the crash volume is still too high, we can make "message not allowed" be non-fatal for the child. I was avoiding doing that so we could monitor how often it happens. But as long as it's not too common, it's fine to make it non-fatal.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
![]() |
Reporter | |
Comment 24•7 years ago
|
||
One hit in the last three days, definitely fixed!
Status: RESOLVED → VERIFIED
Updated•7 years ago
|
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•