Closed
Bug 1223780
Opened 9 years ago
Closed 8 years ago
[e10s][telemetry] regressions in EVENTLOOP_UI_LAG_EXP_MS (3 metrics: parent, children, parent + children)
Categories
(Firefox :: General, defect, P3)
Firefox
General
Tracking
()
RESOLVED
FIXED
People
(Reporter: jimm, Assigned: mconley)
References
(Blocks 1 open bug)
Details
https://github.com/vitillo/e10s_analyses/blob/master/aurora/e10s_experiment.ipynb This is one of the metrics we should not regress.
Reporter | ||
Comment 1•9 years ago
|
||
This is pending new telemetry data broken down by: 1) no addons 2) approved addons list 3) all addons Bug 1223777 is similar. Vladan, can we make a request to get the experiment data broken down like this for telemetry related regression bugs? Note there are other telemetry related regression bugs besides this one and bug 1223777, we're still working through e10s-perf.
Flags: needinfo?(vladan.bugzilla)
Comment 2•9 years ago
|
||
Sure, whenever we report an e10s regression from Telemetry, we'll provide a separate report for each of those 3 configurations. Jim: Which approved addons list are you referring to? Roberto: can you & Birunthan update the e10s Telemetry regression bugs with numbers from the no-addons & at-least-one-addon variants of the analyses?
Flags: needinfo?(vladan.bugzilla) → needinfo?(rvitillo)
Comment 3•9 years ago
|
||
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #2) > Sure, whenever we report an e10s regression from Telemetry, we'll provide a > separate report for each of those 3 configurations. > > Jim: Which approved addons list are you referring to? We could run the analysis on profiles that have only one (or a combination) of the top add-ons installed but single out the effect of individual add-ons for profiles with multiple add-ons installed might prove challenging; it might take some time to come up with a good model. Analysis for profiles that have no extensions installed: [1] Analysis for profiles with at least one extensions installed: [2] Analysis for profiles that have only Adblock Plus installed: [3] [1] https://github.com/vitillo/e10s_analyses/blob/master/aurora/addons/e10s_noext_experiment.ipynb [2] https://github.com/vitillo/e10s_analyses/blob/master/aurora/addons/e10s_onlyext_experiment.ipynb [3] https://github.com/vitillo/e10s_analyses/blob/master/aurora/addons/e10s_top_extension_experiment.ipynb
Flags: needinfo?(rvitillo)
Reporter | ||
Comment 4•9 years ago
|
||
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #2) > Sure, whenever we report an e10s regression from Telemetry, we'll provide a > separate report for each of those 3 configurations. > > Jim: Which approved addons list are you referring to? here ya go - https://wiki.mozilla.org/Electrolysis/Addons I would hope that most users that have addons installed have a list that falls into this set since it's based on amo's most popular among other criteria.
Flags: needinfo?(vladan.bugzilla)
Reporter | ||
Comment 5•9 years ago
|
||
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #3) > (In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #2) > > Sure, whenever we report an e10s regression from Telemetry, we'll provide a > > separate report for each of those 3 configurations. > > > > Jim: Which approved addons list are you referring to? > > We could run the analysis on profiles that have only one (or a combination) > of the top add-ons installed but single out the effect of individual add-ons > for profiles with multiple add-ons installed might prove challenging; it > might take some time to come up with a good model. > > Analysis for profiles that have no extensions installed: [1] > Analysis for profiles with at least one extensions installed: [2] > Analysis for profiles that have only Adblock Plus installed: [3] > > [1] > https://github.com/vitillo/e10s_analyses/blob/master/aurora/addons/ > e10s_noext_experiment.ipynb > [2] > https://github.com/vitillo/e10s_analyses/blob/master/aurora/addons/ > e10s_onlyext_experiment.ipynb > [3] > https://github.com/vitillo/e10s_analyses/blob/master/aurora/addons/ > e10s_top_extension_experiment.ipynb The list I posted is the list we care about. Adblock plus is certainly in it. Is there any way can cross match this list with a 'most popular' via telemetry to see how in sync they are?
Reporter | ||
Updated•9 years ago
|
Summary: [e10s][telemetry] regressions in EVENTLOOP_UI_LAG_EXP_MS → [e10s][telemetry] regressions in EVENTLOOP_UI_LAG_EXP_MS (parent and children)
Reporter | ||
Updated•9 years ago
|
Summary: [e10s][telemetry] regressions in EVENTLOOP_UI_LAG_EXP_MS (parent and children) → [e10s][telemetry] regressions in EVENTLOOP_UI_LAG_EXP_MS (3 metrics: parent, children, parent + children)
Updated•9 years ago
|
Assignee: nobody → mconley
Priority: -- → P1
Updated•9 years ago
|
Blocks: e10s-measurement
Flags: needinfo?(vladan.bugzilla)
Comment 6•9 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #5) > The list I posted is the list we care about. Adblock plus is certainly in > it. Is there any way can cross match this list with a 'most popular' via > telemetry to see how in sync they are? Good suggestion. We should be able to do that easily.
Comment 7•9 years ago
|
||
The analyses mentioned in comment 3 have been updated. In particular, the Adblock Plus analysis was replaced with a whitelisted addon experiment. Profiles that have no extensions installed: http://nbviewer.ipython.org/github/vitillo/e10s_analyses/blob/master/aurora/addons/e10s_no_extension_experiment.ipynb Profiles that have at least one extension installed: http://nbviewer.ipython.org/github/vitillo/e10s_analyses/blob/master/aurora/addons/e10s_with_extension_experiment.ipynb Profiles that have at least one and only whitelisted extensions installed: http://nbviewer.ipython.org/github/vitillo/e10s_analyses/blob/master/aurora/addons/e10s_whitelisted_extension_experiment.ipynb
Assignee | ||
Comment 8•9 years ago
|
||
This is tricky, since while I can see that there's greater likelihood of UI lag in the e10s case with no add-ons disabled, I have no data on where that lag is coming from. My first strategy is to look at the evolution of the probe to see if there's an obvious place where the regression was introduced (that's not just when we flipped e10s on by default). Unfortunately, bug 1226337 prevents me from seeing that evolution.
Depends on: 1226337
Assignee | ||
Comment 9•9 years ago
|
||
Unfortunately, I can no longer seem to access http://nbviewer.ipython.org/github/vitillo/e10s_analyses/blob/master/aurora/addons/e10s_no_extension_experiment.ipynb - I'm seeing: Error 503 backend read error backend read error Guru Mediation: Details: cache-lax1431-LAX 1449347668 2086734044 Varnish cache server rvitillo, any idea what's up?
Flags: needinfo?(rvitillo)
Comment 10•9 years ago
|
||
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #9) > Unfortunately, I can no longer seem to access > http://nbviewer.ipython.org/github/vitillo/e10s_analyses/blob/master/aurora/ > addons/e10s_no_extension_experiment.ipynb - I'm seeing: > > Error 503 backend read error > > backend read error > Guru Mediation: > > Details: cache-lax1431-LAX 1449347668 2086734044 > > Varnish cache server > > rvitillo, any idea what's up? There must be an issue with nbviewer but you can still access the notebook directly from github, e.g. https://github.com/vitillo/e10s_analyses/blob/master/aurora/addons/e10s_no_extension_experiment.ipynb.
Flags: needinfo?(rvitillo)
Assignee | ||
Comment 11•9 years ago
|
||
I talked with chutten and vladan during the work week, and I think we need more information here. I _think_ what I want here is a measure of _how often_ (like, perhaps how many times per minute) we exceed a certain threshold on EVENTLOOP_UI_ACTIVITY_EXP_MS, and compare that between e10s and non-e10s. chutten, am I remembering that correctly? If so, what is the threshold we should care about? Anything > 16ms? Or some other value that is considered our max time to service a system message?
Flags: needinfo?(chutten)
Comment 12•9 years ago
|
||
Generally the threshold of user visibility is around 100ms. If the user sends us a discrete interaction event (click, scroll, key, whatever), if we service it (or appear to service it) within 100ms, the user perceives no jank. If the user sends us a continuous interaction event sequence (like touch scrolling), the threshold is much lower (can't remember it off-hand). This is probably why the BHR "hang" threshold is set around 100ms. (this is the hang duration after which we collect pseudo-stacks to try and pinpoint repeat offenders. Due to bucketing, it means we only really get the >255ms ones, which might be troublesome). The problem becomes: what's useful? For bug 1187069 I am developing an addon that reports when Gecko-main is hanging for > 127ms (that's that bucketing again) according to the threadHangStats measure. On e10s it pretty steadily ticks... but I perceive no jank. So I tested to see if the BHR hang monitor was borked by running a tight requestAnimationFrame loop and reporting when it wasn't being serviced for > 127ms. They line up almost perfectly across a couple-hundred samples (< 1% difference in count). (this is all parent-process measurement) I installed the addon in an e10s-disabled profile, and it doesn't tick over nearly as quickly. So the measure appears to be correct, yet as a user I don't really care. It doesn't _feel_ bad. tl;dr - It depends what you're trying to measure. If you want a measurement of user-visible jank, I don't think we have one. If you want to know if e10s events are being serviced more slowly on the parent process: they are. If you want to know why... well, I'm afraid I don't know. Hopefully this helps?
Flags: needinfo?(chutten)
Comment 13•9 years ago
|
||
Here's the extension: https://github.com/chutten/statuser
Assignee | ||
Comment 14•9 years ago
|
||
I'm able to reproduce this behaviour on my Windows Surface laptop. What's particularly frustrating is that the behaviour goes away as soon as I start profiling. Lovely.
Assignee | ||
Comment 15•8 years ago
|
||
We fixed the issue that chutten and I were seeing in Mozlando in bug 1234618. Looking at new experiment data in https://github.com/vitillo/e10s_analyses/blob/master/beta/addons/e10s_without_addons_experiment.ipynb, at first glance, it looks like the EVENTLOOP_UI_ACTIVITY_EXP_MS readings are doing better with e10s than not - but vladan wants me to check the >50ms buckets (since those are basically being stomped out by the massive buckets at 0ms and 1ms). rvitillo - I tried to be awesome like you, and use a Spark instance to run an analysis to get a histogram of 50ms and higher for that probe, but I got nowhere (check the scrollback in #perf if you want the nitty gritty). I'm planning on taking the Telemetry onboarding session at some point, but in the meantime, would it be possible for you to put together the histogram for me, or at least guide me through doing it, since I can't seem to do it on my own?
Flags: needinfo?(rvitillo)
Comment 16•8 years ago
|
||
See http://nbviewer.ipython.org/gist/vitillo/30643f4de8c7fc33c372.
Flags: needinfo?(rvitillo)
Assignee | ||
Comment 17•8 years ago
|
||
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #16) > See http://nbviewer.ipython.org/gist/vitillo/30643f4de8c7fc33c372. Thanks rvitillo! vladan - is comment 16 enough to close this bug out for you?
Flags: needinfo?(vladan.bugzilla)
Comment 18•8 years ago
|
||
I would wait, EVENTLOOP_UI_ACTIVITY_EXP_MS doesn't report from child processes yet (bug 1235908)
Depends on: 1235908
Flags: needinfo?(vladan.bugzilla)
Assignee | ||
Comment 19•8 years ago
|
||
Just so we're clear, the responsiveness of the _child process_ as reported by Telemetry is part of the e10s release criteria?
Flags: needinfo?(vladan.bugzilla)
Comment 20•8 years ago
|
||
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #19) > Just so we're clear, the responsiveness of the _child process_ as reported > by Telemetry is part of the e10s release criteria? IMHO, the right way to think about this is to compare overall browser responsiveness in e10s & non-e10s configurations. For the e10s configuration, overall e10s jankiness = jankiness_of_parent + jank_rate_of_child
Flags: needinfo?(vladan.bugzilla)
Assignee | ||
Updated•8 years ago
|
Priority: P1 → P3
Updated•8 years ago
|
Blocks: e10s-responsiveness
Comment 21•8 years ago
|
||
Well, here's a run of jankiness as measured by INPUT_EVENT_RESPONSE_MS, which measures time from input event creation (by the OS) to the point when we're done handling the event in the reporting process (the plots are graphing only the values above the 100ms jank threshold): https://gist.github.com/chutten/9b9e29df10e0f7306f99 My casual interpretation is that the numbers are no different. Some are higher, some are lower, with no clear pattern. This is operating on preliminary data, so things might change after we account for early adopters over-representing in the sample population.
Comment 22•8 years ago
|
||
I have now rerun the analysis on the entire beta45-experiment1 dataset and the graphs look little different. To make triple sure I'll rerun on beta45-experiment2 (Bug 1244187), and I'll reopen this if necessary.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•