Closed Bug 1337073 Opened 8 years ago Closed 8 years ago

Improve the IPC_SYNC_LATENCY_MS probe

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla55

Tracking Flags:

Tracking

Status

firefox53

---

fixed

firefox54

---

fixed

firefox55

---

fixed

People

(Reporter: ehsan.akhgari, Assigned: nika)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 1337073 - Rename IPC_SYNC_LATENCY_MS to IPC_LINEAR_SYNC_LATENCY_MS and make it linear. 8 years ago Chris Pearce [:cpearce (Not reading bugmail)] 59 bytes, text/x-review-board-request		Details
Improve the precision of the IPC_SYNC_LATENCY_MS probe and rename it to IPC_SYNC_MAIN_LATENCY_MS 8 years ago Nika Layzell [:nika] (ni? for response) 2.76 KB, patch	billm : review+ gchang : approval-mozilla-aurora+ gchang : approval-mozilla-beta+	Details \| Diff \| Splinter Review

(no longer active)

Reporter

Description

•

8 years ago

I have a difficult time interpreting this probe, and I think we can improve it a little bit. Firstly by not specifying a low: 1 value we always waste one bucket to hold 0 for 0% of the samples. Also the exponential bucketing makes it a pain to analyze these results. For example for PContent::Msg_GetGfxVars we know that 12% of samples had a 50-126ms pause, which can be 3 to 8 missed frames, which is quite a wide range. I suggest making the bucketing linear and increasing the number to 20. Chris, what do you think?

Flags: needinfo?(cpearce)

(no longer active)

Reporter

Updated

•

8 years ago

Blocks: 1333489

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 1

•

8 years ago

Sure. We'll need to delete the old probe and create a new one with a different name. Note that the documentation states: "low: Optional, the default value is 1. This field represents the minimum value expected in the histogram. Note that all histograms automatically get a bucket with label "0" for counting values below the "low" value." https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Declaring_a_Histogram So setting "low " to 1 won't help with the 0 bucket.

(no longer active)

Reporter

Comment 2

•

8 years ago

(In reply to Chris Pearce (:cpearce) from comment #1) > Sure. We'll need to delete the old probe and create a new one with a > different name. Sounds good. > Note that the documentation states: > > "low: Optional, the default value is 1. This field represents the minimum > value expected in the histogram. Note that all histograms automatically get > a bucket with label "0" for counting values below the "low" value." > > https://developer.mozilla.org/en-US/docs/Mozilla/Performance/ > Adding_a_new_Telemetry_probe#Declaring_a_Histogram > > So setting "low " to 1 won't help with the 0 bucket. Oh, OK, so ignore that part of my comment then!

Comment hidden (mozreview-request)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 4

•

8 years ago

It seems like this is still not going to be great. Won't the first box still be something like 1-99ms? Is that better than the 50-126ms that you were complaining about Ehsan?

Flags: needinfo?(ehsan)

(no longer active)

Reporter

Comment 5

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #4) > It seems like this is still not going to be great. Won't the first box still > be something like 1-99ms? Is that better than the 50-126ms that you were > complaining about Ehsan? No you're right, this won't be much better either. To me the key number that matters is 16ms, perhaps we should adjust our bucketing to get a similar size (or smaller) per bucket?

Flags: needinfo?(ehsan)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 6

•

8 years ago

The problem is that we're going to be sending a lot of these histograms back (one per message type), and I'd rather not have too much data. Pings cost money (in S3 storage capacity). It probably makes sense to think about what buckets you want and then work backwards from there.

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 7

•

8 years ago

Perhaps we do in fact want an exponential histogram, but with more buckets so that we have more precision in the low end, and less in the high end? Is there a way we can predict the bucket sizes of an exponential? Or we could use an enumerated histogram, with manually defined bucket sizes?

Chris Pearce [:cpearce (Not reading bugmail)]

Updated

•

8 years ago

Flags: needinfo?(cpearce)

Attachment #8834174 - Flags: review?(wmccloskey)

Attachment #8834174 - Flags: review?(francois)

(no longer active)

Reporter

Comment 8

•

8 years ago

(In reply to Chris Pearce (:cpearce) from comment #7) > Perhaps we do in fact want an exponential histogram, but with more buckets > so that we have more precision in the low end, and less in the high end? But don't we want less precision in the low end? I mean, a sync IPC message that takes 10ms vs 20ms is nowhere near as bad as one that takes 110ms vs 120ms. (I know this isn't a good example, but most of these probes will probably have most of their values in the first bucket. I think the interesting probes are those who don't adhere to that pattern -- aka have more probes distributed elsewhere.) > Is there a way we can predict the bucket sizes of an exponential? > > Or we could use an enumerated histogram, with manually defined bucket sizes? That could be another option, yes.

Benjamin Smedberg

Comment 9

•

8 years ago

https://telemetry.mozilla.org/histogram-simulator/ lets you simulate and see buckets for various configs. Note that if you use something other than linear/exponential you'll lose the stddev/mean calculations. I don't know whether that matters in this case.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 10

•

8 years ago

Thanks Benjamin. I think we probably should be setting the "low" value to something higher. Maybe 32ms or 40ms. That way we still get a 0<=x<40 bucket. And if we use an exponential histogram, we get a lot of precision in the range 40ms-75ms. I also think we should use a lower "high" value. I doubt we care about the difference between, say, 750ms and 1250ms. Anything higher than 500ms or so should probably just count as "too high".

(no longer active)

Reporter

Updated

•

8 years ago

Assignee: nobody → michael

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 11

•

8 years ago

I'm not really sure what we want the probe to look like now, but we probably want to fold this probe change into bug 1343729. Ehsan, do you feel strongly about the linear vs exponential issue? I think we probably want to increase the number of buckets and potentially bring down the high value to get more resolution, but I'm not as confident about whether we want linear or exponential bucketing.

Flags: needinfo?(ehsan)

(no longer active)

Reporter

Comment 12

•

8 years ago

Hmm. I think Bill's suggestion in comment 10 is pretty reasonable, and I don't think I have a better idea off the top of my head right now, so let's proceed with that...

Flags: needinfo?(ehsan)

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 13

•

8 years ago

Attached patch Improve the precision of the IPC_SYNC_LATENCY_MS probe and rename it to IPC_SYNC_MAIN_LATENCY_MS — Details — Splinter Review

This patch applies on top of the patch from bug 1343729 and performs the rename for it. MozReview-Commit-ID: 7JB7h06wCzu

Attachment #8845167 - Flags: review?(wmccloskey)

Nika Layzell [:nika] (ni? for response)

Assignee

Updated

•

8 years ago

Attachment #8834174 - Attachment is obsolete: true

Bill McCloskey [inactive unless it's an emergency] (:billm)

Updated

•

8 years ago

Attachment #8845167 - Flags: review?(wmccloskey) → review+

Pulsebot

Comment 14

•

8 years ago

Pushed by michael@thelayzells.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d2328234fe31 Improve the precision of the IPC_SYNC_LATENCY_MS probe and rename it to IPC_SYNC_MAIN_LATENCY_MS, r=billm

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 15

•

8 years ago

Comment on attachment 8845167 [details] [diff] [review] Improve the precision of the IPC_SYNC_LATENCY_MS probe and rename it to IPC_SYNC_MAIN_LATENCY_MS Approval Request Comment [Feature/Bug causing the regression]: N/A [User impact if declined]: Less useful sync IPC telemetry on beta/aurora [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: Just landed. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: bug 1333489 must be uplifted to beta. Also depends on bug 1343729. [Is the change risky?]: No [Why is the change risky/not risky?]: Renames and updates parameters for a telemetry probe. [String changes made/needed]: None

Attachment #8845167 - Flags: approval-mozilla-beta?

Attachment #8845167 - Flags: approval-mozilla-aurora?

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

8 years ago

status-firefox53: --- → affected

status-firefox54: --- → affected

status-firefox55: --- → affected

Carsten Book [:Tomcat]

Comment 16

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d2328234fe31

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox55: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Gerry Chang [:gchang]

Comment 17

•

8 years ago

Comment on attachment 8845167 [details] [diff] [review] Improve the precision of the IPC_SYNC_LATENCY_MS probe and rename it to IPC_SYNC_MAIN_LATENCY_MS Improve precision of telemetry data. Beta53+ & Aurora54+.

Attachment #8845167 - Flags: approval-mozilla-beta?

Attachment #8845167 - Flags: approval-mozilla-beta+

Attachment #8845167 - Flags: approval-mozilla-aurora?

Attachment #8845167 - Flags: approval-mozilla-aurora+

Carsten Book [:Tomcat]

Comment 18

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/2024ba23cf5b

status-firefox54: affected → fixed

Ryan VanderMeulen [:RyanVM]

Comment 19

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/7db4d272a316

status-firefox53: affected → fixed

Andrei Vaida [:avaida]

Comment 20

•

8 years ago

Setting qe-verify- based on Michael's assessment on manual testing needs (see Comment 15).

Flags: qe-verify-

You need to log in before you can comment on or make changes to this bug.