Closed Bug 1198196 Opened 4 years ago Closed 4 years ago

[e10s] Fix EVENTLOOP_UI_LAG_EXP_MS to report all event times, not just >50ms

Categories

(Toolkit :: Telemetry, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
e10s + ---
firefox43 + fixed
firefox44 --- fixed
firefox45 --- fixed
b2g-v2.5 --- fixed

People

(Reporter: jimm, Assigned: chutten|PTO)

References

Details

Attachments

(1 file, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1198188 +++

Part of the e10s release criteria work.
tracking-e10s: --- → +
Blocks: 1222894
Tracking this since we need to land it in 43 to get the telemetry experiment up and running.
Blocks: 1198650
Assignee: nobody → chutten
I tested to see how much a single Telemetry::Accumulate costs in time by accumulating into a PRIntervalTime across the new unconditional Accumulate call in question (EVENTLOOP_UI_ACTIVITY_EXP_MS). I logged this out via Telemetry to get a histogram of how many millis every 100 calls took, but kept getting 0ms.

So I tried 1000. No 1000 accumulate calls took more than 4ms to run, which means the realtime cost for that Accumulate call is at most 4us. I'm still getting 0ms reports, so the value is usually lower. (on my machine, using my build, on this day, <rest of the usual perf-measuring caveats>)

I'm thinking this is probably an okay amount of overhead. However, looking at the collected data of said unconditional `Accumulate`s, at least 75% and up to 89.7% of the values are 0. So if we wanted to save some cycles, we could make the minimum 1.
Redid the measurement with microsecond resolution and its results bear out the previous tests. Math.

Going to keep the 0 samples, even though they're a little overwhelming, because we need them to compare populations.
Attachment #8686257 - Flags: review?(vladan.bugzilla)
Comment on attachment 8686257 [details] [diff] [review]
0001-bug-1198196-rework-EVENTLOOP_UI_LAG_EXP_MS-to-record.patch

Review of attachment 8686257 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Histograms.json
@@ +4248,1 @@
>      "alert_emails": ["perf-telemetry-alerts@mozilla.com"],

add the new bug_numbers field

@@ +4248,2 @@
>      "alert_emails": ["perf-telemetry-alerts@mozilla.com"],
>      "expires_in_version": "never",

can you file a follow up bug to make this histogram opt-out on release? we don't need it yet, but we will soonish. mark it a blocker of e10s-measurement

@@ +4251,2 @@
>      "high": "60000",
>      "n_buckets": 20,

make it an even 50 buckets?

@@ +4251,3 @@
>      "high": "60000",
>      "n_buckets": 20,
>      "extended_statistics_ok": true,

we don't need the extended stats, take this out
Attachment #8686257 - Flags: review?(vladan.bugzilla) → review+
any objections?
Flags: needinfo?(rvitillo)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #6)
> any objections?

No.
Flags: needinfo?(rvitillo)
Attachment #8686257 - Attachment is obsolete: true
Comment on attachment 8686264 [details] [diff] [review]
0001-bug-1198196-rework-EVENTLOOP_UI_LAG_EXP_MS-to-record.patch

Approval Request Comment
[Feature/regressing bug #]: N/A (new request)
[User impact if declined]: Without this, telemetry measures of UI lag will be blind below 50ms. Different plots will be uncomparable as they are specifically discarding some (unknown amounts of) data.
[Describe test coverage new/current, TreeHerder]: 
TreeHerder shows no regression.
Perfherder shows that the talos measures are all within variance.
I also concducted a manual test collecting stopwatch timings of the now-unconditional Accumulate call. It adds at most 4us to ui events on the main thread, but more often less than that.
[Risks and why]: Adds cycles to UI events on the main thread, which could slow things down in one of the worst possible places. Worth it for the data provided.
[String/UUID change made/needed]: None
Attachment #8686264 - Flags: approval-mozilla-beta?
Attachment #8686264 - Flags: approval-mozilla-aurora?
Given the risk, I'd like to stabilize this patch on Nightly for a week or even more before considering the uplift to Aurora.
sorry had to back this out since something in the push caused https://treeherder.mozilla.org/logviewer.html#?job_id=5712409&repo=fx-team so backing out in case this change caused this
Flags: needinfo?(chutten)
Probably wasn't my patch, as it got that test right: https://treeherder.mozilla.org/logviewer.html#?job_id=13649451&repo=try
Flags: needinfo?(chutten)
Keywords: checkin-needed
(In reply to Ritu Kothari (:ritu) from comment #9)
> Given the risk, I'd like to stabilize this patch on Nightly for a week or
> even more before considering the uplift to Aurora.

Chris's description of the risk is overstated. Chris measured the perfomrance impact and as expected found no issue. The Background Hang Reporter functionality already present on Aurora/Beta/Release does this exact same thing & additional work (with no performance impact).

I'd like to get this histogram recording change landed ASAP so it can go into Beta 4 and then we can use it a Telemetry Experiment next week.
Flags: needinfo?(rkothari)
I think we do want to take this for beta and aurora, let's get it landed on m-c first though. Wes can you help out?
Flags: needinfo?(wkocher)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #15)
> (In reply to Ritu Kothari (:ritu) from comment #9)
> > Given the risk, I'd like to stabilize this patch on Nightly for a week or
> > even more before considering the uplift to Aurora.
> 
> Chris's description of the risk is overstated. Chris measured the
> perfomrance impact and as expected found no issue. The Background Hang
> Reporter functionality already present on Aurora/Beta/Release does this
> exact same thing & additional work (with no performance impact).
> 
> I'd like to get this histogram recording change landed ASAP so it can go
> into Beta 4 and then we can use it a Telemetry Experiment next week.

Looks like there is a time pressure on this one to land on Beta and I can take it in Aurora sooner if it helps. Let's hope that it will not cause any disruptions.
Flags: needinfo?(rkothari)
Comment on attachment 8686264 [details] [diff] [review]
0001-bug-1198196-rework-EVENTLOOP_UI_LAG_EXP_MS-to-record.patch

Approved for uplift to Aurora, see my previous comment about my justification.
Attachment #8686264 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/94334352f9fa
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Comment on attachment 8686264 [details] [diff] [review]
0001-bug-1198196-rework-EVENTLOOP_UI_LAG_EXP_MS-to-record.patch

Approved for uplift to beta, aiming for beta 4 gtb on Monday.
Attachment #8686264 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(wkocher)
(In reply to Vladan Djeric (:vladan) -- please needinfo! PTO Nov 16/19/20 from comment #22)
> Aurora & beta don't have the bug_numbers field yet:
> 
> https://hg.mozilla.org/releases/mozilla-aurora/rev/f23bb8c457e0
> https://hg.mozilla.org/releases/mozilla-beta/rev/ec4b13420b71

setting the flags for this checkin
Blocks: 1235908
You need to log in before you can comment on or make changes to this bug.