Open Bug 1260769 Opened 9 years ago Updated 2 years ago

[e10s] SLOW_SCRIPT_PAGE_COUNT regressed under e10s

Categories

(Firefox :: General, defect, P3)

46 Branch
defect

Tracking

()

Tracking Status
e10s + ---
firefox46 --- disabled
firefox47 --- disabled
firefox48 --- fix-optional
firefox49 --- fix-optional

People

(Reporter: azhang, Unassigned)

References

(Depends on 1 open bug, )

Details

(Keywords: regression)

Attachments

(2 files)

The data from the e10s-beta46-noapz experiment says that SLOW_SCRIPT_PAGE_COUNT histogram has a median of 0.47 slow-scripts/hour on e10s, and 0.36 slow-scripts/hour on non-e10s [1]. This is a regression of 0.11 slow-scripts/hour. In fact, on e10s slow-scripts seem to be a bit more common for most users [2]. More details can be found at [3]. [1]: https://github.com/vitillo/e10s_analyses/blob/master/beta46-noapz/e10s_experiment.ipynb [2]: https://gist.github.com/Uberi/64be9b7946849fa45e50 [3]: https://wiki.mozilla.org/Electrolysis/Release_Criteria/Slow_Script
Assignee: nobody → jmathies
Priority: -- → P1
Wontfix for 46 since the beta experiment will end tomorrow with beta 8. While we aren't sure if 47/48 are affected, it seems likely.
Jim, what are the next steps for understanding these slow script regressions? Do we need to add more detailed telemetry probes for our Beta 47 testing? We have telemetry data but no STR or actionable bugs.
Flags: needinfo?(jmathies)
Jan, can you look into this?
Flags: needinfo?(jmathies) → needinfo?(jdemooij)
I don't think this is a SpiderMonkey issue. The JS engine itself behaves pretty much the same with or without e10s, in terms of performance. It seems more likely that some IPC/DOM/graphics/plugin code is slowing us down somewhere with e10s, just enough to push us over the slow script threshold. I'm more than willing to help out with e10s issues, but our e10s experts probably have a better idea where to look or what to measure next.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(jmathies)
Flags: needinfo?(jdemooij)
I hear nobody has an idea why this is. I'll think about something.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(jmathies)
Flags: needinfo?(jdemooij)
One thing I noticed: the non-e10s slow script dialog has an "Don't show this again" checkbox, which will set the dom.max_script_run_time pref to 0. The users will never see the dialog again (and we won't collect Telemetry). The e10s dialog only has "Stop it" or "Wait" buttons, so it has a disadvantage.
Filed bug 1264647 to add that checkbox
Not sure if adding the checkbox makes it a fair comparison though: * For e10s we use a different pref (max_child_script_run_time), so if the user clicked "Don't show this again" *before* the e10s experiment starts, the results will be skewed: if the user has e10s enabled they will now see the dialog again, and if they have e10s disabled, they won't see it. We could fix this by removing the separate max_child_script_run_time pref (and always use max_script_run_time). * The e10s slow script thing has a very different UI: it's nicer and disappears automatically if the page does finish. So the user may be less inclined to check "Don't show this again".
Given the UI differences, we might want to rethink these release criteria in general. A more fair comparison would be to report slow scripts of over 10s through telemetry regardless of what the prefs are set to. NI to Barabara to reexamine the release criteria.
Flags: needinfo?(bbermes)
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #9) > Given the UI differences, we might want to rethink these release criteria in > general. > > A more fair comparison would be to report slow scripts of over 10s through > telemetry regardless of what the prefs are set to. NI to Barabara to > reexamine the release criteria. I need to ask some questions before fully commenting on any new release criteria - giving the 0.11s different, based on the data sample, is this considered statistically significant to be alarmed? Who can answer this with confidence? - What is the reporting set to right now for slow scripts? Not 10s? Why do you see 10s being a more conclusive number to measure against? - (In reply to Jan de Mooij [:jandem] from comment #6) > One thing I noticed: the non-e10s slow script dialog has an "Don't show this > again" checkbox, which will set the dom.max_script_run_time pref to 0. The > users will never see the dialog again (and we won't collect Telemetry). Can we get the # of the users on non-e10s that have dom.max_script_run_time pref set to 0, and subtract them from our comparison to be more "fair".
Flags: needinfo?(bbermes)
(In reply to Barbara Bermes [:barbara] from comment #11) > - giving the 0.11s different, based on the data sample, is this considered > statistically significant to be alarmed? Who can answer this with confidence? SLOW_SCRIPT_PAGE_COUNT accumulates 1 hit if a loaded web page displays the slow script dialog or notification. We don't accumulate more than 1 hit per loaded web page [1]. The timeout for both e10s and non-e10s is 10 seconds [2]. From the report here, I guess we're breaking this down as "show per hour" of browser use. e10s: 0.47/hr (1 per 2:07) non: 0.35/hr (1 per 2:51) So it's a significant regression. [1] https://wiki.mozilla.org/Electrolysis/Release_Criteria/Slow_Script [2] http://mxr.mozilla.org/mozilla-beta/source/modules/libpref/init/all.js#2549 > - (In reply to Jan de Mooij [:jandem] from comment #6) > > One thing I noticed: the non-e10s slow script dialog has an "Don't show this > > again" checkbox, which will set the dom.max_script_run_time pref to 0. The > > users will never see the dialog again (and we won't collect Telemetry). > > Can we get the # of the users on non-e10s that have dom.max_script_run_time > pref set to 0, and subtract them from our comparison to be more "fair". Yes we can land a telemetry probe for this and uplift, we won't know for sure about our beta population until the next experiment though.
Flags: needinfo?(bbermes)
I like blassey's idea of merging the two timeouts into a single pref (bug 1264651). If a user sets the pref to 0 under non-e10s, e10s picks up the benefit when enabled. Continuing to poke at this metric for other flaws would probably be good too.
(In reply to Jim Mathies [:jimm] from comment #12) > (In reply to Barbara Bermes [:barbara] from comment #11) > > Can we get the # of the users on non-e10s that have dom.max_script_run_time > > pref set to 0, and subtract them from our comparison to be more "fair". > > Yes we can land a telemetry probe for this and uplift, we won't know for > sure about our beta population until the next experiment though. Oh, we should be able to exclude them from the beta experiment groups. I'll file a bug on this.
Depends on: 1264796
(In reply to Jim Mathies [:jimm] from comment #13) > I like blassey's idea of merging the two timeouts into a single pref (bug > 1264651). If a user sets the pref to 0 under non-e10s, e10s picks up the > benefit when enabled. +1
Flags: needinfo?(bbermes)
(In reply to Jim Mathies [:jimm] from comment #14) > (In reply to Jim Mathies [:jimm] from comment #12) > > (In reply to Barbara Bermes [:barbara] from comment #11) > > > Can we get the # of the users on non-e10s that have dom.max_script_run_time > > > pref set to 0, and subtract them from our comparison to be more "fair". > > > > Yes we can land a telemetry probe for this and uplift, we won't know for > > sure about our beta population until the next experiment though. > > Oh, we should be able to exclude them from the beta experiment groups. I'll > file a bug on this. As I commented on the other bug, I don't think we should exclude them from the experiment. CHutten will re-run the analysis with these users excluded.
Adding 2 cents, (maybe totally wrong or already considered.) Perhaps switching tabs is a trigger for the increase that you have been recording. non-e10s will hang the browser while e10s "responds" to the switch. Would need to add telemetry recording for the number that happen within/after some threshold (10-30s) of any change of tab.
We modified a pref we think may have been biased things towards the better for non-e10s users. We will obtain results on May 4th to confirm if this is the root cause of the problem. Expect an update in this ticket May 5 or May 6 at the latest.
beta 46: Median difference in payload/histograms/SLOW_SCRIPT_PAGE_COUNT per hour is 0.11, (0.47, 0.36). The probability of this effect being purely by chance is 0.00. beta 47 (build 1 / prelim): Median difference in payload/histograms/SLOW_SCRIPT_PAGE_COUNT is 0.07, (0.50, 0.44). The probability of this effect being purely by chance is 0.00. https://gist.github.com/chutten/ef607dc7cc6cd835eca7f7cdd8ec45e3 This new data filters out non-default values. e10s: 0.50/hr (1 per 2:00 hours of use) non: 0.44/hr (1 per 2:27 hours of use) Barbara, curious what you think. Should we: wait for more data, continue to block on this, or move it to follow up work?
Flags: needinfo?(jdemooij) → needinfo?(bbermes)
So looking at the results in #12, we've gotten worse on both ends with less difference between non and e10s. comment #12: ~23% regressed comment #19: ~12% regressed I'd suggest to wait once we get the updated data as mentioned in 18 (if this is not already the updated data).
Flags: needinfo?(bbermes)
Actually-reviewed updated data from Beta47 experiment on b1: https://github.com/vitillo/e10s_analyses/blob/master/beta47-ex1/e10s_experiment.ipynb Median difference in payload/histograms/SLOW_SCRIPT_PAGE_COUNT per hour is 0.08, (0.42, 0.34). The probability of this effect being purely by chance is 0.00.
Summary: [e10s] SLOW_SCRIPT_PAGE_COUNT regressed in e10s-beta46-noapz experiment → [e10s] SLOW_SCRIPT_PAGE_COUNT regressed under e10s
(In reply to Barbara Bermes [:barbara] from comment #20) > So looking at the results in #12, we've gotten worse on both ends with less > difference between non and e10s. > > comment #12: ~23% regressed > comment #19: ~12% regressed > > I'd suggest to wait once we get the updated data as mentioned in 18 (if this > is not already the updated data). We improved between beta 46 and 47. The current regression is: e10s: 0.42/hr (1 per 2:23) non: 0.34/hr (1 per 2:56)
Flags: needinfo?(bbermes)
Also note that adding another week of data changed the results in 47: build 1: 0.50, 0.44 build 1 and 2: 0.42, 0.34 It'll be interesting to see what another week of data does to these numbers, we won't have that data until mid next week.
(( Please use the reviewed analysis in vitillo/e10s_analyses instead of unreviewed preliminary gists. In this case, the results are largely the same, but that might not be the case in future. ))
I took a look at the code that handles the reporting for this last week. Generally I didn't find any issues, both probes appear to be counting accurately for a single page that hangs. This said, there are differences in the behavior. With non-e10s, a hung script hangs the main browser for fifteen seconds. With e10s, the browser continues to operate while displaying a spinner for content. One theory I have is that users are adept at killing hung firefox processes, and potentially do so when script hangs the non-e10s browser. We wouldn't get reporting in this case since the telemetry data would be lost with the terminated process. In the e10s case, it's unlikely users would kill the browser at all. This difference could account for the slight difference in stats we see.
Attached file hangit.html
I ran a short experiment here to try to see if the hang detection was equally sensitive between the two modes. What I found is that under e10s we are more likely to prompt the user after the timeout has elapsed compared to non-e10s. In particular I noticed that if the browser frees up right around the timeout value, we'll always prompt even if the content process if free. In the non-e10s case, if the browser frees up right around the time the timeout is reached, we never prompt. Attached is a test case for measuring this. STR: Adjust the value in the text edit such that the browser frees up right around the second mark. Try this value in both e10s and non-e10s. I found that sometimes non-e01s could run upward of 12 seconds, the browser would free up, and there would be no prompt. On the other hand the e10s browser always prompted right around 11 seconds. Another oddity - hang prompting under e10s remains until the tab modal 'stop' dialog is dismissed in this test case. Seems like a bug in our detection code. I didn't run into this with non-e10s.
s/right around the second mark/right around the *ten* second mark
Flags: needinfo?(bbermes)
Depends on: 1271978
Depends on: 1272116
waiting on some telemetry data.
The current probe we have (SLOW_SCRIPT_PAGE_COUNT) doesn't differentiate between content and chrome. I wonder if chrome hangs associated with ipc sends in the content process might account for some of these? The chrome timeout is 20 seconds, so this seems unlikely but maybe there are cases where we do a sync or rpc send in an event handler that can take a really long time. billm, any thoughts here? Wondering if it would be worth splitting SLOW_SCRIPT_PAGE_COUNT up into two probes.
Flags: needinfo?(wmccloskey)
I don't think we'll record any chrome hangs in the content process here. There aren't any chrome nsGlobalWindows in the content process.
Flags: needinfo?(wmccloskey)
Current data: Median difference in payload/histograms/SLOW_SCRIPT_PAGE_COUNT per hour is 0.09, (0.42, 0.33). first week: 0.50, 0.44 second week: 0.42, 0.34 third week: 0.42, 0.33 e10s: 0.42/hr (1 per 2:23) non: 0.33/hr (1 per 3:02)
week3 results are in: https://github.com/vitillo/e10s_analyses/blob/master/beta47-week3/e10s_experiment.ipynb Median difference in payload/histograms/SLOW_SCRIPT_PAGE_COUNT per hour is 0.11, (0.47, 0.36). The probability of this effect being purely by chance is 0.00.
week 4 results are in: https://github.com/vitillo/e10s_analyses/blob/master/beta47-week4/e10s_experiment.ipynb Median difference in payload/histograms/SLOW_SCRIPT_PAGE_COUNT per hour is 0.07, (0.36, 0.29). The probability of this effect being purely by chance is 0.00. This is a bit of a difference and can likely be attributed to how week4 is analyzing only the most recent 7 days (so, more recent beta builds) where week3 was analyzing 21 days.
Assignee: jmathies → nobody
Priority: P1 → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: