Closed Bug 1471511 Opened 6 years ago Closed 6 years ago

5.18% tp5o_webext responsiveness (linux64) regression on push 91f49041e74406c74d2e0e05f4dd51111d414b8f (Tue Jun 26 2018)

Categories

(Core :: JavaScript: GC, defect)

59 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox61 --- unaffected
firefox62 --- unaffected
firefox63 --- wontfix

People

(Reporter: igoldan, Assigned: pbone)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=91f49041e74406c74d2e0e05f4dd51111d414b8f

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  5%  tp5o_webext responsiveness linux64 pgo e10s stylo     1.40 -> 1.48


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=14027

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: General → JavaScript: GC
Product: Testing → Core
This seems to reproduce on Windows 10 & 7 also. But I will properly confirm that after retriggering the jobs.
:pbone Can we work on a fix for this?
Flags: needinfo?(pbone)
Gecko profiles are on the way.
Thank you for providing the profiles, that's very helpful.

I am investigating and want to determine how my patch could have affected these tests, depending on that we'll know a bood next step (backout vs fix etc).  Note that I'm working on a another change that builds on this change and together should improve performance, but it is untested so-far.

Is there an easy way to find which subtests regressed?  Are they the same subtests on Linux and Windows?

Thanks.
Assignee: nobody → pbone
Status: NEW → ASSIGNED
Flags: needinfo?(pbone)
Sorry, forgot to NI.

Is there an easy way to find which subtests regressed?  Are they the same subtests on Linux and Windows?
Flags: needinfo?(igoldan)
(In reply to Paul Bone [:pbone] from comment #5)
> Sorry, forgot to NI.
> 
> Is there an easy way to find which subtests regressed?  Are they the same
> subtests on Linux and Windows?

The regression was noticed by our responsiveness test. We have no way to find it on a subtest.
Flags: needinfo?(igoldan)
I found that I can download JSON data summarising the tests:

https://taskcluster-artifacts.net/YAyeQvXkSMuyNOS9Mx-qjQ/0/public/test_info/perfherder-data.json

I note that most of the values are written as milliseconds and seem to be in the 100-400ms range.  But the result of the test, described as "responsiveness" is 1.48, how is that computed?  I didn't see any information about this on the wiki page.

Should I be looking at the elapased time to load these pages or something else when I open these profiles?
Flags: needinfo?(igoldan)
(In reply to Paul Bone [:pbone] from comment #7)
> I found that I can download JSON data summarising the tests:
> 
> https://taskcluster-artifacts.net/YAyeQvXkSMuyNOS9Mx-qjQ/0/public/test_info/
> perfherder-data.json
> 
> I note that most of the values are written as milliseconds and seem to be in
> the 100-400ms range.  But the result of the test, described as
> "responsiveness" is 1.48, how is that computed?  I didn't see any
> information about this on the wiki page.
> 
> Should I be looking at the elapased time to load these pages or something
> else when I open these profiles?

I will redirect you to the owner of the test. :kmag, can you answer pbone's question? Thanks!
Flags: needinfo?(igoldan) → needinfo?(kmaglione+bmo)
I landed the change that introduced the regression _after_ the train for 62.
I wrote some scripts to help tease out any differences in the results from the subtests:

https://github.com/PaulBone/moz-utils/tree/master/analyse-talos

I did not find any statistical difference, in what I presume is the elapsed time of each subtest.

My guess is that "responsiveness" measures something completely different.  However it's not shown per subtest so I don't know which profiles are relevant (if any).  Nor is it documented anywhere or repsented with any units of measurement.  I have nothing to go on and cannot continue work on this bug without further information.
responsiveness is calculated by taking the values and applying this formula:
https://searchfox.org/mozilla-central/source/testing/talos/talos/filter.py#253

while it is hard to see on the graph, this shows up on regular tp5o_responsiveness for pgo:
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1641964,1,1&series=mozilla-inbound,1640937,1,1&series=mozilla-inbound,1641909,1,1&series=mozilla-inbound,1640992,1,1

it doesn't generate an alert as we require a 2% regression to generate an alert.  Another odd point is non pgo (opt) doesn't seem to have any change.

Given that this isn't webext related (although webext seems to be amplifying this), I would say this is responsiveness specifically.  And that this is PGO only, results in that there is something in the black box of PGO which has optimized things.

Possibly wontfix here due to the PGO only data?
Sorry for the slow reply. While I am the owner of this test, it's just a variant of the base tp5o responsiveness test, which I don't have much to do with, so I'm probably not the best person to reply.

That said, the number is fairly opaque. Basically, we have a background thread that constantly sends a native input event, waits for it to be processed by the main thread, records the time, and then repeats. Those numbers are used to compute a responsiveness score based on the formula Joel mentioned.


To be honest, I have some problems with the way that we currently calculate the numbers. We should really probably only look at numbers that are above some threshold (maybe 50ms, which is what we use for idle tasks). Given the way we currently calculate things, moving work to idle slices often results in a worse responsiveness score even when it dramatically improves actual responsiveness.
Flags: needinfo?(kmaglione+bmo)
Thank you Joel and Kris.  Kris answered the question (about the thread sending an input event and timing the response) and Joel provided other critical information.  And I guess _webext means "Web Extensions" so firefox using these sites with some extensions enabled?


Knowing what's being measured, plus the change that caused the regression: I avoid doing some work during each GC slice, but that work may be done later and could cause other processing delays if the input events were particularly unlucky.  Plus that as you said, this isn't webext related and it's only observed in PGO builds makes me think it's just some unlucky measurement.  The only remaining red flag is that it was observed on multiple platforms, which suggests that it's not just some unlucky variation if it affects different builds on different OSs.

I'm happy to close this bug and continue my work generally decreseing GC jank.

Kris, about considering delays above 50ms or so.  That seems reasonable, some threashold that's user perceptible.  However I think in addition it'd be good to create a test that does some animation (in JS) and/or transitions in CSS, and probably video and measure dropped frames.  What do you think?

Thanks.
Flags: needinfo?(kmaglione+bmo)
I've chatted with jonco and we both agree to close this as described in comment 13.

Thank you.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Flags: needinfo?(kmaglione+bmo)
You need to log in before you can comment on or make changes to this bug.