Closed
Bug 1260908
Opened 9 years ago
Closed 9 years ago
Record message sizes in telemetry
Categories
(Core :: IPC, defect, P1)
Core
IPC
Tracking
()
RESOLVED
FIXED
mozilla48
People
(Reporter: billm, Assigned: billm)
References
Details
Attachments
(4 files, 2 obsolete files)
5.71 KB,
patch
|
dvander
:
review+
ritu
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta-
|
Details | Diff | Splinter Review |
2.35 KB,
patch
|
gfritzsche
:
review+
ritu
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta-
|
Details | Diff | Splinter Review |
1.11 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
2.72 KB,
patch
|
billm
:
review+
ritu
:
approval-mozilla-beta-
|
Details | Diff | Splinter Review |
The goal here is to find any IPC messages that are really big. Maybe we could break them into smaller pieces so that we don't need contiguous memory for them.
Assignee | ||
Comment 1•9 years ago
|
||
I'm trying to call Accumulate for a keyed histogram on a path that sometimes runs before telemetry has started up. Right now this asserts because the keyed histograms haven't been initialized. This patch just allows us to skip accumulating in that case.
Attachment #8736494 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 2•9 years ago
|
||
This patch records the size of normal IPC messages and message manager messages in keyed histograms. I set a lower bound of 8K to avoid recording too much data.
Attachment #8736496 -
Flags: review?(dvander)
Comment 3•9 years ago
|
||
Comment on attachment 8736494 [details] [diff] [review]
Allow keyed histograms to accumulate early in startup
Review of attachment 8736494 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/telemetry/Telemetry.cpp
@@ +3902,5 @@
> void
> Accumulate(ID aID, const nsCString& aKey, uint32_t aSample)
> {
> if (!TelemetryImpl::CanRecordBase()) {
> return;
I assume this is happening with sTelemetry==nullptr?
If so, let's just add this check here instead.
Attachment #8736494 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 4•9 years ago
|
||
That field is private. Hopefully this is okay.
Attachment #8736494 -
Attachment is obsolete: true
Attachment #8736778 -
Flags: review?(gfritzsche)
Comment 5•9 years ago
|
||
Comment on attachment 8736778 [details] [diff] [review]
Allow keyed histograms to accumulate early in startup, v2
Review of attachment 8736778 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks, this looks fine if we can restore the assert.
::: toolkit/components/telemetry/Telemetry.cpp
@@ +3913,5 @@
> return;
> }
> const TelemetryHistogram& th = gHistograms[aID];
> KeyedHistogram* keyed = TelemetryImpl::GetKeyedHistogramById(nsDependentCString(th.id()));
> + if (keyed) {
Now we should be able to restore the MOZ_ASSERT() as before.
Attachment #8736778 -
Flags: review?(gfritzsche) → review+
Attachment #8736496 -
Flags: review?(dvander) → review+
Backed out for build bustage: https://hg.mozilla.org/integration/mozilla-inbound/rev/37ee5a42ddb5
https://treeherder.mozilla.org/logviewer.html#?job_id=24993518&repo=mozilla-inbound
Flags: needinfo?(wmccloskey)
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(wmccloskey)
Comment 9•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8d4ce12bda44
https://hg.mozilla.org/mozilla-central/rev/c8eea76ef144
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment 10•9 years ago
|
||
Some data has come in from this now: http://mzl.la/1XfxYN1
Nothing too surprising in there. Something like 0.02% of samples for PHttpChannel::Msg_OnTransportAndData are around 1MB or so. I don't know if that would be enough to kick up our OOM rates.
Comment 11•9 years ago
|
||
In case you are looking at the telemetry, it is just showing the first four messages. It isn't in any particular order. For instance, PBrowser::Msg_AsyncMessage has 1.2% of recorded samples between 2.21MB and 3MB...
Assignee | ||
Comment 12•9 years ago
|
||
Comment on attachment 8736496 [details] [diff] [review]
Record message sizes
We'd like to gather this telemetry to understand some OOM crashes we're seeing in the e10s beta experiment.
Approval Request Comment
[Feature/regressing bug #]: no bug
[User impact if declined]: none
[Describe test coverage new/current, TreeHerder]: none
[Risks and why]: very low risk, just telemetry
[String/UUID change made/needed]: none
Attachment #8736496 -
Flags: approval-mozilla-beta?
Attachment #8736496 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 13•9 years ago
|
||
Comment on attachment 8736778 [details] [diff] [review]
Allow keyed histograms to accumulate early in startup, v2
Same as above.
Attachment #8736778 -
Flags: approval-mozilla-beta?
Attachment #8736778 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 14•9 years ago
|
||
Here's the data we have so far:
https://gist.github.com/bill-mccloskey/65b1115556dbbacdf020f749e4b3dd98
Here are all the IPC messages where we've recorded data of over 1 MB (with the max size to the right):
(8000000, u'PStorage::Msg_LoadItem'),
(8000000, u'PContent::Msg_AsyncMessage'),
(8000000, u'PBrowser::Msg_AsyncMessage'),
(8000000, u'PBackgroundIDBTransaction::Msg_PBackgroundIDBRequestConstructor'),
(8000000, u'PBackgroundIDBRequest::Msg___delete__'),
(8000000, u'PBackgroundIDBCursor::Msg_Response'),
(5802597, u'PHttpChannel::Msg_OnTransportAndData'),
(4208766, u'PStorage::Msg_OriginsHavingData'),
(4208766, u'PJavaScript::Msg_CallOrConstruct'),
(3052721, u'PWebSocket::Msg_OnMessageAvailable'),
(3052721, u'PContent::Msg_InvokeDragSession'),
(1164888, u'PContent::Msg_LoadAndRegisterSheet')
Here are the message manager messages:
(8000000, u'sdk/remote/process/message'),
(8000000, u'SessionStore:update'),
(8000000, u'SessionStore:restoreHistory'),
(4208766, u'debug:netmonitor:*.*.*/*:updateEvent'),
(4208766, u'AdblockPlus:Message'),
(3052721, u'wombat::core::response'),
(3052721, u'iescreenshot-download'),
(3052721, u'greasemonkey:scripts-update'),
(1164888, u'ublock0:sb:2236'),
(1164888, u'ublock0:sb:2235'),
(1164888, u'resource://kango-unity-extensionkeeebcom/message'),
(1164888, u'imagus:sb:398'),
(1164888, u'imagus:sb:175'),
(1164888, u'imagus:sb:127')
I had to collapse together a bunch of debug:netmonitor messages.
Assignee | ||
Comment 15•9 years ago
|
||
Andrew, would you mind filing bugs to investigate whether we can improve some of these? Aside from session store, I'm not too familiar with any of this code.
Flags: needinfo?(continuation)
Assignee | ||
Comment 16•9 years ago
|
||
Actually, this data is for the parent process only. I'll work on child process data.
Comment 17•9 years ago
|
||
Sure, I can look over them and figure out which ones are more frequent. I've already filed one for SessionStorage:update (bug 1262661). I think it makes sense to take into account how many of these large messages of each size there are when deciding what to fix. Some of the messages are around 1000 times more common than other ones.
Flags: needinfo?(continuation)
status-firefox46:
--- → affected
status-firefox47:
--- → affected
Comment on attachment 8736496 [details] [diff] [review]
Record message sizes
Gather more telemetry data to investigate e10s OOM crashes.
Attachment #8736496 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8736778 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 19•9 years ago
|
||
bugherder uplift |
Comment 20•9 years ago
|
||
Comment 21•9 years ago
|
||
Comment on attachment 8736496 [details] [diff] [review]
Record message sizes
Too late for the 46 e10s experiment.
Attachment #8736496 -
Flags: approval-mozilla-beta? → approval-mozilla-beta-
Updated•9 years ago
|
Attachment #8736778 -
Flags: approval-mozilla-beta? → approval-mozilla-beta-
Comment 22•9 years ago
|
||
I'm working on a fix. I have no idea why this works on trunk and fails on Aurora, but the fix looks easy enough.
tracking-e10s:
--- → ?
Comment 23•9 years ago
|
||
Might as well fix this on trunk, too.
Olli, this is a tiny patch. Anybody could basically review it.
StructuredCloneData::DataLength() has type size_t, so change the constant to also be a constant. The constant in MessageChannel can stay as an int, because it is compared to Pickle::size(), which returns an int for some reason.
Attachment #8741360 -
Flags: review?(bugs)
Comment 24•9 years ago
|
||
> so change the constant to also be a constant.
This should be "so change the constant to also be a size_t".
Updated•9 years ago
|
Comment 25•9 years ago
|
||
One drawback of this patch that I noticed is that it measures the size() of the data in the message, not the capacity(), so it isn't measuring the full memory impact of these messages, and will miss improvements from bugs like bug 1263235 and bug 1263953. I'll file a followup for that.
Comment 26•9 years ago
|
||
Comment on attachment 8741360 [details] [diff] [review]
Fix type of kMinTelemetryMessageSize.
I guess I can just fix this as part of changing to capacity().
Attachment #8741360 -
Flags: review?(bugs)
Comment 27•9 years ago
|
||
Comment on attachment 8741360 [details] [diff] [review]
Fix type of kMinTelemetryMessageSize.
Never mind, that's separate. Sorry for the noise!
Attachment #8741360 -
Flags: review?(bugs)
Updated•9 years ago
|
Attachment #8741360 -
Flags: review?(bugs) → review+
Comment 28•9 years ago
|
||
Capacity includes internal fragmentation, while size does not.
This requires making capacity() public, but that seems benign.
Attachment #8741386 -
Flags: review?(wmccloskey)
Comment 29•9 years ago
|
||
Comment on attachment 8741386 [details] [diff] [review]
Record IPC message capacity instead of size.
Wrong bug, sorry.
Attachment #8741386 -
Attachment is obsolete: true
Attachment #8741386 -
Flags: review?(wmccloskey)
Comment 30•9 years ago
|
||
Ryan, could you land this on Aurora, with my additional patch folded into Bill's second patch? I'm sure it will fix the issue. Thanks.
Flags: needinfo?(ryanvm)
Comment 31•9 years ago
|
||
![]() |
||
Updated•9 years ago
|
Priority: -- → P1
Comment 32•9 years ago
|
||
bugherder uplift |
Updated•9 years ago
|
Flags: needinfo?(ryanvm)
Comment 33•9 years ago
|
||
bugherder |
Comment 34•9 years ago
|
||
I think we should stop recording MESSAGE_MANAGER_MESSAGE_SIZE on beta so we don't spam the telemetry when it goes to release (bug 1274442). This telemetry has already been disabled, and just stopping to send this seems lower risk than trying to fix it in place via bug 1275707. We're late enough in beta that there's not too much value to be gotten from collecting this telemetry.
Attachment #8756942 -
Flags: review?(wmccloskey)
Assignee | ||
Updated•9 years ago
|
Attachment #8756942 -
Flags: review?(wmccloskey) → review+
Comment 35•9 years ago
|
||
Comment on attachment 8756942 [details] [diff] [review]
Don't record MESSAGE_MANAGER_MESSAGE_SIZE on beta.
Thanks for the fast review.
Approval Request Comment
[Feature/regressing bug #]: bug 1260908
[User impact if declined]: This is causing some issues with telemetry, so I want to just remove it before we release.
[Describe test coverage new/current, TreeHerder]: this code runs very frequently
[Risks and why]: Very low. This just stops collecting one kind of telemetry by backing out part of a patch. I'm planning to fix this issue outside of beta in bug 1275707, but disabling the telemetry is lower risk than changing slightly how it is collected.
[String/UUID change made/needed]: none
Attachment #8756942 -
Flags: approval-mozilla-beta?
Comment on attachment 8756942 [details] [diff] [review]
Don't record MESSAGE_MANAGER_MESSAGE_SIZE on beta.
At this point in time, only fixes that might lead to a dot release (like critical regressions, stability or security issues) are being considered for uplift. 47.0b9 build is already in progress, RC1 gtb is Monday. I would prefer to not uplift this fix but let it ride the 48 train. Andrew is ok with that.
Attachment #8756942 -
Flags: approval-mozilla-beta? → approval-mozilla-beta-
Comment 37•9 years ago
|
||
My impression is that this is only causing issues for telemetry in the website frontend side of things, and they've worked around that already so it isn't a huge deal to just leave this alone.
Comment 38•9 years ago
|
||
There was actually a patch landed in 47, it is just a partial backout that won't be landed there.
You need to log in
before you can comment on or make changes to this bug.
Description
•