Closed
Bug 1287044
Opened 8 years ago
Closed 8 years ago
Annotate crashes for pending messages in MessageChannel::OnMessageReceivedFromLink
Categories
(Core :: IPC, defect)
Core
IPC
Tracking
()
RESOLVED
FIXED
mozilla50
Tracking | Status | |
---|---|---|
firefox50 | --- | fixed |
People
(Reporter: ting, Assigned: ting)
References
Details
Attachments
(1 file, 2 obsolete files)
5.34 KB,
patch
|
Details | Diff | Splinter Review |
The OOM from trying to push a message to mPending deque in MessageChannel::OnMessageReceivedFromLink() shows there're more than 10000 (could be far more) pending IPC messages. We should annotate the crashes so we can understand what are they.
Assignee | ||
Comment 1•8 years ago
|
||
The patch is for Windows because I can't get the code compiled with GCC (I'll figure out why, but I think on Windows should be enough to get us some data). And there's one thing I'm concerned that what if it's OOM again while annotating, is it possible to avoid using jemalloc at runtime?
Attachment #8771343 -
Flags: review?(continuation)
Comment 2•8 years ago
|
||
Comment on attachment 8771343 [details] [diff] [review] patch v1 Review of attachment 8771343 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for working on this. This patch should be reviewed by an IPC peer.
Attachment #8771343 -
Flags: review?(continuation) → review?(wmccloskey)
Assignee | ||
Comment 3•8 years ago
|
||
The reason why this can't be compiled with GCC 4.8.3: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61426 Though it's fixed in 2014, but the target milestone is 5.0.
Comment on attachment 8771343 [details] [diff] [review] patch v1 Review of attachment 8771343 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/MessageChannel.h @@ +474,5 @@ > + T* allocate(size_t n) { > + void* p = ::operator new(n * sizeof(T), std::nothrow); > + if (!p && n) { > + // Count the number of each different pending message. > + std::map<std::string, uint32_t> m; I suspect this won't work, for a few reasons: 1. The message name is only assigned once it has been processed on the main thread (in PContentParent::OnMessageReceived, for example--you can look for set_name calls). If a message is in the queue, then that hasn't happened yet. So all the names will be "???". You can fix this by using the message type instead. This is in msg.type(). It will be number that we'll have to look up ourselves, but that's fine. 2. If we fail to allocate the mPending vector, then I suspect there's a good chance we won't have space for |m| either. So we should try to do this without allocating. One idea is to pick a random message in the queue and store its type in the crash report. If a given message type consumes most of the queue, then it will be much more likely to be chosen.
Attachment #8771343 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 5•8 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #4) > yet. So all the names will be "???". You can fix this by using the message > type instead. This is in msg.type(). It will be number that we'll have to > look up ourselves, but that's fine. I was considering whether to use the name or the type, and chose the name because I found it's not easy to translate a type to a message name on dxr for a specific revision which I can't do: https://dxr.mozilla.org/mozilla-central/rev/xxxxxx/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/_ipdlheaders/IPCMessageStart.h Do you know an easier way to look up the message type? Note the message through ThreadLink will have a valid name. > > 2. If we fail to allocate the mPending vector, then I suspect there's a good > chance we won't have space for |m| either. So we should try to do this > without allocating. One idea is to pick a random message in the queue and > store its type in the crash report. If a given message type consumes most of > the queue, then it will be much more likely to be chosen. If the assumption is "a given message type consumes most of the queue", then the chance of no space for |m|, |v|, or |buf| might be not that high because there won't be many different types of message. Pick a random message could avoid allocation, but we won't be able to know whether it is the one consumes most of the queue, the only thing we can tell is a message seems to be flood when there's a bunch of crashes report the same message type. :mccr8, what do you think about this?
Flags: needinfo?(continuation)
(In reply to Ting-Yu Chou [:ting] from comment #5) > (In reply to Bill McCloskey (:billm) from comment #4) > > yet. So all the names will be "???". You can fix this by using the message > > type instead. This is in msg.type(). It will be number that we'll have to > > look up ourselves, but that's fine. > > I was considering whether to use the name or the type, and chose the name > because I found it's not easy to translate a type to a message name on dxr > for a specific revision which I can't do: > > > https://dxr.mozilla.org/mozilla-central/rev/xxxxxx/source/obj-x86_64-pc- > linux-gnu/ipc/ipdl/_ipdlheaders/IPCMessageStart.h > > Do you know an easier way to look up the message type? No, I don't think there's an easy way. > Note the message through ThreadLink will have a valid name. OK, but since the crashes in bug 1266517 are happening in the content process, that's not really relevant.
Comment 7•8 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #5) > :mccr8, what do you think about this? I don't know, sorry.
Flags: needinfo?(continuation)
Assignee | ||
Comment 8•8 years ago
|
||
I got another idea, because we won't handle any pending message, we can sort the deque |mPending| and then walk through it to find the message with the largest quantity.
Assignee | ||
Comment 9•8 years ago
|
||
Traced std::sort, Microsoft implements it as a hybrid algorithm which uses quicksort, heapsort and insertion sort, also it looks like an in-place sorting.
Assignee | ||
Comment 10•8 years ago
|
||
Attachment #8771343 -
Attachment is obsolete: true
Attachment #8772719 -
Flags: review?(wmccloskey)
Comment on attachment 8772719 [details] [diff] [review] patch v2 Review of attachment 8772719 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/MessageChannel.h @@ +491,5 @@ > + Message &msg = *it; > + if (msg.type() == curType) { > + ++curCount; > + } else { > + if (curCount > topCount) { You need to do this block after the loop has exited as well (in case the last type is the top one).
Attachment #8772719 -
Flags: review?(wmccloskey) → review+
Assignee | ||
Comment 12•8 years ago
|
||
Thanks for catching that.
Attachment #8772719 -
Attachment is obsolete: true
Assignee | ||
Comment 13•8 years ago
|
||
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=990b9a2383f4
Comment 15•8 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/50ff01c81e09 Annotate pending IPC messages when OOM due to too many of them. r=billm
Keywords: checkin-needed
Comment 16•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/50ff01c81e09
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Assignee | ||
Comment 18•8 years ago
|
||
Mind if I request uplift this after the first report comes in with the annotation?
Flags: needinfo?(janus926)
Assignee | ||
Comment 19•8 years ago
|
||
I am a bit worried as I haven't seen any report with the annotation. Though there're also only three _GrowMap() OOM from MessageLoop::PostTask_Helper() since the patch got landed.
You need to log in
before you can comment on or make changes to this bug.
Description
•