Closed
Bug 1264820
Opened 9 years ago
Closed 9 years ago
Measure IPC reply size in telemetry
Categories
(Core :: IPC, defect)
Core
IPC
Tracking
()
RESOLVED
FIXED
mozilla49
People
(Reporter: billm, Assigned: billm)
References
Details
(Whiteboard: btpp-active)
Attachments
(1 file)
2.03 KB,
patch
|
mccr8
:
review+
ritu
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
I forgot to measure the size of the reply in bug 1260908.
Assignee | ||
Comment 1•9 years ago
|
||
I'll change this to capacity before landing. It's not in my tree right now.
Attachment #8741572 -
Flags: review?(continuation)
Comment 2•9 years ago
|
||
Comment on attachment 8741572 [details] [diff] [review]
patch
Review of attachment 8741572 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/telemetry/Histograms.json
@@ +10528,5 @@
> + "kind": "exponential",
> + "high": 8000000,
> + "n_buckets": 50,
> + "keyed": true,
> + "description": "Measures the size of IPC messages by message name"
"IPC reply messages" maybe?
Attachment #8741572 -
Flags: review?(continuation) → review+
Updated•9 years ago
|
Whiteboard: btpp-active
Updated•9 years ago
|
Blocks: 1262918
tracking-e10s:
--- → ?
Comment 3•9 years ago
|
||
Comment on attachment 8741572 [details] [diff] [review]
patch
Review of attachment 8741572 [details] [diff] [review]:
-----------------------------------------------------------------
::: ipc/glue/MessageChannel.cpp
@@ +1243,5 @@
> MOZ_RELEASE_ASSERT(reply->is_sync());
>
> *aReply = Move(*reply);
> + if (aReply->size() >= kMinTelemetryMessageSize) {
> + Telemetry::Accumulate(Telemetry::IPC_REPLY_SIZE, nsCString(aMsg->name()), aReply->size());
Can you make this nsDependentCString so that we're not allocating unnecessarily? Thanks!
Comment 5•9 years ago
|
||
Backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/051353392264
...for Werror build bustage:
> ipc/glue/MessageChannel.cpp:1247:27: error: comparison between signed
> and unsigned integer expressions [-Werror=sign-compare]
Here's a TreeHerder cycle showing the bustage:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=ae89c24b76920d2203353580091f4a9240b148af
Updated•9 years ago
|
Flags: needinfo?(wmccloskey)
Looks like you also caused some test failures: https://treeherder.mozilla.org/logviewer.html#?job_id=26259912&repo=mozilla-inbound
Comment 7•9 years ago
|
||
s/test failures/ASAN heap-use-after-free, in Linux64-ASAN e10s crashtest, jsreftest, & build/
(Not sure how that platform got past the build warning, actually -- maybe we build without warnings as errors in our ASAN builds for some reason.)
Anyway: probably worth doing a try run on that platform to ensure the ASAN issue is fixed before re-landing.
Comment 8•9 years ago
|
||
I think the Werror problem is because "if (aReply->size() >= kMinTelemetryMessageSize) {" should use ->capacity() instead. I assume it returns a different type. The UAF is because the telemetry is using aMsg->name() instead of aReply->name(). This is an argument for bug 1267326. Or at least, that would have prevented the UAF. I'll land the fix once I have tested a bit locally.
Updated•9 years ago
|
status-firefox46:
--- → wontfix
status-firefox47:
--- → affected
status-firefox48:
--- → affected
status-firefox49:
--- → affected
Comment 11•9 years ago
|
||
This should be low risk, so once it is on Nightly and I check that it is actually working with telemetry, it should get backported to Beta.
Comment 12•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Updated•9 years ago
|
Target Milestone: mozilla48 → mozilla49
Comment 13•9 years ago
|
||
One message that will be interesting to look at is the reply for ReadPrefsArray(). I found a crash where the message was being grown to 220MB and it results in an OOM crash: https://crash-stats.mozilla.com/report/index/01246d16-3c71-4bb1-a148-35b992160428
Comment 14•9 years ago
|
||
Telemetry data is starting to come in, but the message name is mostly "???". Not always, though! I can reproduce that locally. I'll look into why that is. It probably isn't the fault of this patch per se but it still makes the telemetry not useful.
Updated•9 years ago
|
Comment 15•9 years ago
|
||
We'll start to get names soon, but telemetry suggests this isn't a major source of large messages. There are around 22 thousand messages around 400kb or larger, compared to around a few million messages that size that are PBrowser::Msg_AsyncMessage in the same period.
Comment 16•9 years ago
|
||
Comment on attachment 8741572 [details] [diff] [review]
patch
Approval Request Comment
[Feature/regressing bug #]: e10s
[User impact if declined]: This collects data about one kind of IPC message, to help investigate e10s OOMs.
[Describe test coverage new/current, TreeHerder]: This runs frequently.
[Risks and why]: low, it just adds new telemetry
[String/UUID change made/needed]: none
Attachment #8741572 -
Flags: approval-mozilla-beta?
Attachment #8741572 -
Flags: approval-mozilla-aurora?
Comment on attachment 8741572 [details] [diff] [review]
patch
Telemetry data to investigate e10s OOMs, plus data was verified on Nightly, Beta47+, Aurora48+
Attachment #8741572 -
Flags: approval-mozilla-beta?
Attachment #8741572 -
Flags: approval-mozilla-beta+
Attachment #8741572 -
Flags: approval-mozilla-aurora?
Attachment #8741572 -
Flags: approval-mozilla-aurora+
Comment 18•9 years ago
|
||
bugherder uplift |
Comment 19•9 years ago
|
||
bugherder uplift |
Comment 20•9 years ago
|
||
One unfortunate thing that Bill and I noticed is that we don't have any telemetry for the reply to PContent::Msg_ReadPrefsArray, which as we know from bug 1268662 can sometimes be large enough to cause OOM crashes. In a local build, this is because TelemetryImpl::IsInitialized() is returning false, presumably because we set up prefs very early in the startup process. On the plus side, this seems to be the only message, reply or otherwise, that we don't record telemetry for, at least that I can see locally in a clean profile.
You need to log in
before you can comment on or make changes to this bug.
Description
•