Closed
Bug 1271102
Opened 9 years ago
Closed 7 years ago
Release assert when sending large (>128MB) IPC messages
Categories
(Core :: IPC, defect, P1)
Tracking
()
RESOLVED
FIXED
People
(Reporter: dbaron, Unassigned)
References
Details
(4 keywords, Whiteboard: btpp-meta)
Crash Data
Attachments
(1 file)
58 bytes,
text/x-review-board-request
|
billm
:
review+
lizzard
:
approval-mozilla-aurora+
|
Details |
This bug was filed from the Socorro interface and is
report bp-5b3756e3-1086-4cd8-94b2-2a98e2160506.
=============================================================
A new topcrash started in build 20160506052823.
https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&platform=Windows&date=%3E%3D2016-04-01&signature=mozilla%3A%3Aipc%3A%3AProcessLink%3A%3ASendMessageW
It's clearly a regression from:
https://hg.mozilla.org/mozilla-central/rev/5cdbf605f972
It's not clear to me how the crashes should usefully be categorized, though.
Are initial segments of the stack (but with more than one function) a useful way to categorize them? (i.e., should we add things to prefix_signature_re?) Or something else?
FWIW, I looked at 5 reports, and there were two pairs of reports that were similar. Two reports had the stack top:
0 xul.dll mozilla::ipc::ProcessLink::SendMessageW(IPC::Message*) ipc/glue/MessageLink.cpp:161
1 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message const&) ipc/glue/MessageChannel.cpp:1607
2 xul.dll mozilla::ipc::MessageChannel::OnMaybeDequeueOne() ipc/glue/MessageChannel.cpp:1560
3 xul.dll nsRunnableMethodImpl<bool ( mozilla::ipc::MessageChannel::*)(void), 0, 1>::Run() obj-firefox/dist/include/nsThreadUtils.h:741
and two had:
0 xul.dll mozilla::ipc::ProcessLink::SendMessageW(IPC::Message*) ipc/glue/MessageLink.cpp:161
1 xul.dll mozilla::ipc::MessageChannel::Send(IPC::Message*) ipc/glue/MessageChannel.cpp:780
2 xul.dll mozilla::dom::PBrowserChild::SendInvokeDragSession(nsTArray<mozilla::dom::IPCDataTransfer> const&, unsigned int const&, nsCString const&, unsigned int const&, unsigned int const&, unsigned int const&, unsigned char const&, int const&, int const&) obj-firefox/ipc/ipdl/PBrowserChild.cpp:2127
3 xul.dll nsDragServiceProxy::InvokeDragSessionImpl(nsISupportsArray*, nsIScriptableRegion*, unsigned int) widget/nsDragServiceProxy.cpp:61
Flags: needinfo?(erahm)
Comment 1•9 years ago
|
||
To the extent that we can't fix it quickly, we should classify these by the message type, I think. That's probably not available in any annotations, and perhaps not in the minidump at all currently.
Comment 2•9 years ago
|
||
We changed the behavior here to enforce the maximum message size on the sender side rather than the receiver side. In theory we're not causing more crashes, just crashing sooner.
I could either add the message name to the assertion message or I could add an annotation (I'm not sure what the process for that is though).
I also saw structured clone in some of the stacks:
> 0 xul.dll mozilla::ipc::ProcessLink::SendMessageW(IPC::Message*) ipc/glue/MessageLink.cpp:161
> 1 xul.dll mozilla::ipc::MessageChannel::Send(IPC::Message*) ipc/glue/MessageChannel.cpp:780
> 2 xul.dll mozilla::dom::PBrowserParent::SendAsyncMessage(nsString const&, nsTArray<mozilla::jsipc::CpowEntry> const&, IPC::Principal const&, mozilla::dom::ClonedMessageData const&) obj-firefox/ipc/ipdl/PBrowserParent.cpp:231
> 3 xul.dll nsFrameLoader::DoSendAsyncMessage(JSContext*, nsAString_internal const&, mozilla::dom::ipc::StructuredCloneData&, JS::Handle<JSObject*>, nsIPrincipal*) dom/base/nsFrameLoader.cpp:2681
Flags: needinfo?(erahm)
Comment 3•9 years ago
|
||
If you do add an annotation, it would be nice to have the size of the message in there, like we do for NS_ABORT_OOM.
Comment 4•9 years ago
|
||
Annotating has a problem in that it's main thread only in content processes. I would guess that the IPC messaging layer is off main thread (I need to confirm this), if that's the case there's not much I can do.
Updated•9 years ago
|
tracking-e10s:
--- → ?
Comment 5•9 years ago
|
||
Here is a deterministic test case that will crash Nightly:
https://dl.dropboxusercontent.com/u/40949268/dump/streamed_download_test.html
Some crash reports:
https://crash-stats.mozilla.com/report/index/8c86c26e-b1c6-4344-918b-b0a1e2160511
https://crash-stats.mozilla.com/report/index/eea27751-2b9b-4d88-8815-3072f2160511
https://crash-stats.mozilla.com/report/index/5317ab9a-666b-4b6a-9345-02c0a2160511
Comment 6•9 years ago
|
||
If the above page prints
"progress 1"
"The transfer is complete."
then the crash did not occur.
Running in non-e10s mode (set browser.tabs.remote.autostart.2;false) avoids the crash, so looks like e10s related.
Comment 7•9 years ago
|
||
Yes, this assertion is caused by an IPC message being very large, so they are mostly going to be e10s-only. Your particular crash is in message manager, for what it is worth. The other recognizable one I see in this thread is in drag and drop code, which sends images serialized as a string. I should file a bug about using shared memory for that.
Message manager will be harder to fix. Maybe the message manager could reject large messages somehow, and throw an exception.
(In reply to Eric Rahm [:erahm] from comment #2)
> We changed the behavior here to enforce the maximum message size on the
> sender side rather than the receiver side. In theory we're not causing more
> crashes, just crashing sooner.
You also decreased the message size limit. Maybe we should back that out until some of these cases are more under control? Though sending messages between 128mb and 256mb is going to often result in an OOM crash on 32-bit systems.
I'm hitting this crash when trying to analyze something with the gecko profiler addon, using custom profiler settings with an increased sample buffer.
![]() |
||
Comment 9•9 years ago
|
||
not blocking on this, we prefer to fix the individual cases where we allocate large blocks for individual messages.
Priority: -- → P1
Comment 10•9 years ago
|
||
The profiler sends all the profiling data in one message, encoded as JSON in an nsCString:
https://dxr.mozilla.org/mozilla-central/rev/3461f3cae78495f100a0f7d3d2e0b89292d3ec02/dom/ipc/ContentChild.cpp#2900
https://dxr.mozilla.org/mozilla-central/rev/3461f3cae78495f100a0f7d3d2e0b89292d3ec02/tools/profiler/public/GeckoProfiler.h#175
It looks like it could be adapted to send smaller messages (see also PSendStream from bug 1093357):
https://dxr.mozilla.org/mozilla-central/source/tools/profiler/core/GeckoSampler.cpp#395
Comment 11•9 years ago
|
||
Yeah, I think bug 1093357 will let us avoid giant IPC messages for a lot of the various IPC users that are sending giant gobs of data.
Depends on: 1093357
Comment 12•9 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #11)
> Yeah, I think bug 1093357 will let us avoid giant IPC messages for a lot of
> the various IPC users that are sending giant gobs of data.
Keep in mind that bug 1093357 is only supporting pipes from child-to-parent at first. At least one of the stacks above seems to be sending from parent-to-child.
Updated•9 years ago
|
Summary: Crashes in mozilla::ipc::ProcessLink::SendMessageW, starting 2016-05-06, from MOZ_RELEASE_ASSERT(msg->size() < IPC::Channel::kMaximumMessageSize) → Release assert when sending large (>128MB) IPC messages
Comment 13•9 years ago
|
||
This crash seems to show up without the W on OSX. It is the number one crash on OSX for 05-11 builds (with only 5 crashes).
Crash Signature: [@ mozilla::ipc::ProcessLink::SendMessageW] → [@ mozilla::ipc::ProcessLink::SendMessageW] [@ mozilla::ipc::ProcessLink::SendMessage ]
Updated•9 years ago
|
Crash Signature: [@ mozilla::ipc::ProcessLink::SendMessageW] [@ mozilla::ipc::ProcessLink::SendMessage ] → [@ mozilla::ipc::ProcessLink::SendMessageW]
[@ mozilla::ipc::ProcessLink::SendMessage]
[@ xul.dll@0xccfa63 | strspn]
Comment 14•9 years ago
|
||
I'm crashing on this on Windows on a lot of different activities. Trying to view a JS file in the debugger, or trying to visit a page that stores asset files to IndexedDB cache, or trying to run other pages that call postmessage.
In comment 9, it is stated "we prefer to fix the individual cases ...". Does that mean we prefer to open a separate bug entry for each distinct crashing call stack leading to SendMessage?
Comment 15•9 years ago
|
||
Assuming yes to my above question, I went ahead and filed the two separate call stack crashes as
bug 1274074 - console.log()ging too much data can crash e10s via IPC call from mozilla::dom::ProcessGlobal::SendAsyncMessage
and
bug 1274075 - IndexedDB ObjectStore AddOrPut operation crashes when attempting to send too much data via an IPC pipe
The call stacks are separate from comment 0, so they likely warrant reports of their own.
Comment 16•9 years ago
|
||
Also marked down a third detected call site in addition to those two from comment 15:
bug 1275062 - IndexedDB IDB Request Send__delete__ operation crashes when attempting to send too much data via an IPC pipe
Comment 18•9 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #9)
> not blocking on this, we prefer to fix the individual cases where we
> allocate large blocks for individual messages.
I would like a second try to block on this.
The size of the data that the parent process sends to the child can be controlled from a web page, as the Indexed DB bugs that Jukka has mentioned demonstrate. This means that you can write a web page that causes the parent to send more than 128 megs of data and crash the whole browser which seems very bad.
Also, this API is exposed to chrome JS as this call stack shows https://crash-stats.mozilla.com/report/index/ddafba04-04f3-41d5-a810-cde2c2160602 which means that add-ons can also crash Firefox in this way.
I think we need to fix this issue.
Comment 19•9 years ago
|
||
On branches without bug 1262671, which just landed, I can't imagine that many users on 32-bit systems will actually survive trying to send a 128MB message.
We could also revert to what should have been the prior behavior by bumping the limit to 256MB. This will still let a webpage crash the browser by sending a lot of data, but of course there are lots of other ways to OOM kill the browser from content.
(In reply to :Ehsan Akhgari (busy, don't ask for review please) from comment #18)
> Also, this API is exposed to chrome JS as this call stack shows
> https://crash-stats.mozilla.com/report/index/ddafba04-04f3-41d5-a810-
> cde2c2160602 which means that add-ons can also crash Firefox in this way.
That should be addressed in bug 1272423, by just throwing an exception.
Updated•9 years ago
|
status-firefox48:
--- → unaffected
status-firefox49:
--- → affected
Updated•9 years ago
|
Whiteboard: btpp-meta
It seems like indexeddb blobs are the last remaining issue here. That will probably be hard to fix quickly. I think a good short-term fix here is to boost the limit back to 256MB. That can be done as an M9 bug.
It appears to be hard to fix some sources of >128 MiB messages (e.g. IndexedDB),
so revert back to a 256MiB limit for the short term.
Review commit: https://reviewboard.mozilla.org/r/58308/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/58308/
Attachment #8760913 -
Flags: review?(wmccloskey)
Attachment #8760913 -
Flags: review?(wmccloskey) → review+
Comment on attachment 8760913 [details]
Bug 1271102 - Revert back to 256 MiB message limit.
https://reviewboard.mozilla.org/r/58308/#review55196
Thanks Ryan!
Keywords: leave-open
Comment 23•9 years ago
|
||
Pushed by jryans@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/04beffefe3a7
Revert back to 256 MiB message limit. r=billm
Comment 24•9 years ago
|
||
bugherder |
![]() |
||
Updated•9 years ago
|
![]() |
||
Comment 25•9 years ago
|
||
Comment on attachment 8760913 [details]
Bug 1271102 - Revert back to 256 MiB message limit.
Approval Request Comment
[Feature/regressing bug #]:
bug 1268616
[User impact if declined]:
occasional crashes in games and mail sites
[Describe test coverage new/current, TreeHerder]:
reverting to the previous setting. well tested.
[Risks and why]:
low, setting a maximum message size limit back to its original value.
[String/UUID change made/needed]:
none
Attachment #8760913 -
Flags: approval-mozilla-beta?
Attachment #8760913 -
Flags: approval-mozilla-aurora?
Comment 26•9 years ago
|
||
This isn't on beta.
Attachment #8760913 -
Flags: approval-mozilla-beta?
Comment 27•9 years ago
|
||
Looks like this was fixed in 50, comment 24. Tracking since this looks like a recent regression.
Comment 28•9 years ago
|
||
Comment on attachment 8760913 [details]
Bug 1271102 - Revert back to 256 MiB message limit.
Revert to previous size limit to prevent crashes, please uplift to aurora
Attachment #8760913 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 29•9 years ago
|
||
bugherder uplift |
Comment 30•9 years ago
|
||
Since uplift to aurora, there are still new crash reports coming in with this signature. Are there still remaining issues for a followup bug?
Flags: needinfo?(dbaron)
:dbaron is out, let's redirect comment 30.
Flags: needinfo?(dbaron) → needinfo?(continuation)
Comment 32•9 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #30)
> Since uplift to aurora, there are still new crash reports coming in with
> this signature. Are there still remaining issues for a followup bug?
The patch wasn't expected to fix all of these crashes. It just increases a limit from 128MB to 256MB. The open bugs blocking this one address some of the common remaining issues.
Flags: needinfo?(continuation)
Comment 33•9 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #32)
> (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #30)
> > Since uplift to aurora, there are still new crash reports coming in with
> > this signature. Are there still remaining issues for a followup bug?
>
> The patch wasn't expected to fix all of these crashes. It just increases a
> limit from 128MB to 256MB. The open bugs blocking this one address some of
> the common remaining issues.
I guess this is why I'm not able to download files > 256 MB from mega.nz without crashing.
Do we know when we regressed this?
Comment 34•9 years ago
|
||
This bug makes Cleopatra (Gecko Profiler) cause reproducible tab crashes in many cases. It stops me from using the profiler. Should I file a bug against Cleopatra?
Comment 35•9 years ago
|
||
I wonder if there's been a regression in the last few days, I remember being able to download the files a few days ago, but I can't anymore.
kael, when did the crashes you're experiencing start to happen?
![]() |
||
Comment 36•9 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #33)
> (In reply to Andrew McCreight [:mccr8] from comment #32)
> > (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #30)
> > > Since uplift to aurora, there are still new crash reports coming in with
> > > this signature. Are there still remaining issues for a followup bug?
> >
> > The patch wasn't expected to fix all of these crashes. It just increases a
> > limit from 128MB to 256MB. The open bugs blocking this one address some of
> > the common remaining issues.
>
> I guess this is why I'm not able to download files > 256 MB from mega.nz
> without crashing.
> Do we know when we regressed this?
I haven't had any trouble with large file downloads. I've brought down isos from msdn for example without issue. I'll check again to be sure. Can you post crash report ids for this?
(In reply to K. Gadd (:kael) from comment #34)
> This bug makes Cleopatra (Gecko Profiler) cause reproducible tab crashes in
> many cases. It stops me from using the profiler. Should I file a bug against
> Cleopatra?
Yes, please do.
Comment 37•9 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #35)
> I wonder if there's been a regression in the last few days, I remember being
> able to download the files a few days ago, but I can't anymore.
>
> kael, when did the crashes you're experiencing start to happen?
I haven't used Cleopatra in months before today, sorry. If I don't have the GPU and Main Thread I/O options turned on, it doesn't crash, presumably because the profiles are smaller. And it seems to be intermittent, unfortunately. Bug 1294712 filed against it. Thanks.
Comment 38•9 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #36)
> I haven't had any trouble with large file downloads. I've brought down isos
> from msdn for example without issue. I'll check again to be sure. Can you
> post crash report ids for this?
I believe file downloads from mega.nz aren't "normal" downloads. They're stored in a blob first.
Here's a report: https://crash-stats.mozilla.com/report/index/e405c34e-e7fd-42fd-b30d-ef6a02160812.
Updated•9 years ago
|
Comment 39•8 years ago
|
||
Could we augment the mozilla::ipc::ProcessLink::SendMessage and mozilla::ipc::ProcessLink::SendMessageW signatures with the ipc_message_name?
It would give us a better matching between signatures and bugs blocking this bug.
Flags: needinfo?(ted)
Comment 40•8 years ago
|
||
Not if we don't have it available as a crash annotation, no. We'd have to add that first. Ting-Yu did something similar in bug 1301022, but I don't think it's exactly what you'd want here.
Flags: needinfo?(ted)
Comment 41•8 years ago
|
||
marco pointed out that we did in fact add that annotation in bug 1274404. I don't see any reason why we couldn't use that for signatures. You'd want to file a Socorro bug on that.
Comment 42•8 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #41)
> You'd want to file a Socorro bug on that.
I've now filed bug 1271102 for that.
Comment 43•8 years ago
|
||
Bug 1306449 was fixed, so I've added the new signatures to the blocking bugs.
Comment 44•8 years ago
|
||
Note that we have a bunch of different IPC messages that could trigger this assert: https://crash-stats.mozilla.com/search/?ipc_message_name=!__null__&_facets=signature&_facets=ipc_message_name&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-ipc_message_name.
Only some of them are covered by bugs.
![]() |
||
Comment 45•7 years ago
|
||
This was part of our memory reduction work for shipping e10s. closing this meta out.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 46•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•