Closed Bug 1337073 Opened 4 years ago Closed 3 years ago
Improve the IPC
_SYNC _LATENCY _MS probe
I have a difficult time interpreting this probe, and I think we can improve it a little bit. Firstly by not specifying a low: 1 value we always waste one bucket to hold 0 for 0% of the samples. Also the exponential bucketing makes it a pain to analyze these results. For example for PContent::Msg_GetGfxVars we know that 12% of samples had a 50-126ms pause, which can be 3 to 8 missed frames, which is quite a wide range. I suggest making the bucketing linear and increasing the number to 20. Chris, what do you think?
Sure. We'll need to delete the old probe and create a new one with a different name. Note that the documentation states: "low: Optional, the default value is 1. This field represents the minimum value expected in the histogram. Note that all histograms automatically get a bucket with label "0" for counting values below the "low" value." https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Declaring_a_Histogram So setting "low " to 1 won't help with the 0 bucket.
(In reply to Chris Pearce (:cpearce) from comment #1) > Sure. We'll need to delete the old probe and create a new one with a > different name. Sounds good. > Note that the documentation states: > > "low: Optional, the default value is 1. This field represents the minimum > value expected in the histogram. Note that all histograms automatically get > a bucket with label "0" for counting values below the "low" value." > > https://developer.mozilla.org/en-US/docs/Mozilla/Performance/ > Adding_a_new_Telemetry_probe#Declaring_a_Histogram > > So setting "low " to 1 won't help with the 0 bucket. Oh, OK, so ignore that part of my comment then!
It seems like this is still not going to be great. Won't the first box still be something like 1-99ms? Is that better than the 50-126ms that you were complaining about Ehsan?
(In reply to Bill McCloskey (:billm) from comment #4) > It seems like this is still not going to be great. Won't the first box still > be something like 1-99ms? Is that better than the 50-126ms that you were > complaining about Ehsan? No you're right, this won't be much better either. To me the key number that matters is 16ms, perhaps we should adjust our bucketing to get a similar size (or smaller) per bucket?
The problem is that we're going to be sending a lot of these histograms back (one per message type), and I'd rather not have too much data. Pings cost money (in S3 storage capacity). It probably makes sense to think about what buckets you want and then work backwards from there.
Perhaps we do in fact want an exponential histogram, but with more buckets so that we have more precision in the low end, and less in the high end? Is there a way we can predict the bucket sizes of an exponential? Or we could use an enumerated histogram, with manually defined bucket sizes?
(In reply to Chris Pearce (:cpearce) from comment #7) > Perhaps we do in fact want an exponential histogram, but with more buckets > so that we have more precision in the low end, and less in the high end? But don't we want less precision in the low end? I mean, a sync IPC message that takes 10ms vs 20ms is nowhere near as bad as one that takes 110ms vs 120ms. (I know this isn't a good example, but most of these probes will probably have most of their values in the first bucket. I think the interesting probes are those who don't adhere to that pattern -- aka have more probes distributed elsewhere.) > Is there a way we can predict the bucket sizes of an exponential? > > Or we could use an enumerated histogram, with manually defined bucket sizes? That could be another option, yes.
https://telemetry.mozilla.org/histogram-simulator/ lets you simulate and see buckets for various configs. Note that if you use something other than linear/exponential you'll lose the stddev/mean calculations. I don't know whether that matters in this case.
Thanks Benjamin. I think we probably should be setting the "low" value to something higher. Maybe 32ms or 40ms. That way we still get a 0<=x<40 bucket. And if we use an exponential histogram, we get a lot of precision in the range 40ms-75ms. I also think we should use a lower "high" value. I doubt we care about the difference between, say, 750ms and 1250ms. Anything higher than 500ms or so should probably just count as "too high".
I'm not really sure what we want the probe to look like now, but we probably want to fold this probe change into bug 1343729. Ehsan, do you feel strongly about the linear vs exponential issue? I think we probably want to increase the number of buckets and potentially bring down the high value to get more resolution, but I'm not as confident about whether we want linear or exponential bucketing.
Hmm. I think Bill's suggestion in comment 10 is pretty reasonable, and I don't think I have a better idea off the top of my head right now, so let's proceed with that...
This patch applies on top of the patch from bug 1343729 and performs the rename for it. MozReview-Commit-ID: 7JB7h06wCzu
Attachment #8845167 - Flags: review?(wmccloskey)
Attachment #8834174 - Attachment is obsolete: true
Attachment #8845167 - Flags: review?(wmccloskey) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/d2328234fe31 Improve the precision of the IPC_SYNC_LATENCY_MS probe and rename it to IPC_SYNC_MAIN_LATENCY_MS, r=billm
Comment on attachment 8845167 [details] [diff] [review] Improve the precision of the IPC_SYNC_LATENCY_MS probe and rename it to IPC_SYNC_MAIN_LATENCY_MS Approval Request Comment [Feature/Bug causing the regression]: N/A [User impact if declined]: Less useful sync IPC telemetry on beta/aurora [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: Just landed. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: bug 1333489 must be uplifted to beta. Also depends on bug 1343729. [Is the change risky?]: No [Why is the change risky/not risky?]: Renames and updates parameters for a telemetry probe. [String changes made/needed]: None
Comment on attachment 8845167 [details] [diff] [review] Improve the precision of the IPC_SYNC_LATENCY_MS probe and rename it to IPC_SYNC_MAIN_LATENCY_MS Improve precision of telemetry data. Beta53+ & Aurora54+.
Setting qe-verify- based on Michael's assessment on manual testing needs (see Comment 15).
You need to log in before you can comment on or make changes to this bug.