Closed Bug 1495065 Opened 6 years ago Closed 4 years ago

2.73 - 54% tp5n nonmain_startup_fileio / tp5o responsiveness regression on push 126409bdf326 (Wed Sep 26 2018)

Categories

(Testing :: Talos, defect, P3)

Version 3
defect

Tracking

(firefox69 wontfix, firefox70 wontfix, firefox71 wontfix)

RESOLVED WONTFIX
Tracking Status
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix

People

(Reporter: aswan, Unassigned)

References

(Regression)

Details

(4 keywords)

+++ This bug was initially created as a clone of Bug #1494882 +++ [Peeling off just the tp5 responsiveness and startup regressions from the original bug] Talos has detected a Firefox performance regression from push: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=1b268215517c906a22e5341b3978f5b551e92328&tochange=126409bdf326645e735ee147bd70bd7d09759165 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 54% tp5o responsiveness windows10-64 pgo e10s stylo 0.49 -> 0.76 49% tp5o responsiveness windows7-32 pgo e10s stylo 0.47 -> 0.70 45% tp5o responsiveness windows7-32 opt e10s stylo 0.52 -> 0.74 42% tp5o responsiveness windows10-64 opt e10s stylo 0.55 -> 0.79 38% tp5o responsiveness windows10-64-qr opt e10s stylo 0.57 -> 0.78 28% tp5o_webext responsiveness windows7-32 pgo e10s stylo 0.95 -> 1.21 21% tp5o_webext responsiveness windows10-64 opt e10s stylo 1.09 -> 1.33 21% tp5o_webext responsiveness windows10-64 pgo e10s stylo 1.01 -> 1.23 17% tp5n nonmain_startup_fileio windows7-32 opt e10s stylo 2,382,029.08 -> 2,795,380.50 16% tp5o responsiveness linux64-qr opt e10s stylo 1.16 -> 1.35 14% tp5o_webext responsiveness linux64-qr opt e10s stylo 1.73 -> 1.96 [other regressions elided]
Kris, can I persuade you to take a look at this?
Flags: needinfo?(kmaglione+bmo)
Depends on: 1495068
Depends on: 1495069
:aswan, I haven't seen any action here in a while- is there another person who can look into this?
Flags: needinfo?(aswan)
:vchin we need your help on concluding this bug, as it's >2 weeks since it got stuck.
Flags: needinfo?(vchin)
Hi Dave, can someone on your team investigate?
Flags: needinfo?(vchin) → needinfo?(dave.hunt)
Is there anything else we can do here Ionut? This looks to be a significant regression, and we also appear to have a lot less stability in the results since this push. :aswan and :kmaglione already have pending needinfos, but perhaps there's someone else that can help to get an answer here?
Flags: needinfo?(dave.hunt) → needinfo?(igoldan)
(In reply to Dave Hunt [:davehunt] [he/him/his] ⌚️UTC+1 from comment #6) > Is there anything else we can do here Ionut? This looks to be a significant > regression, and we also appear to have a lot less stability in the results > since this push. :aswan and :kmaglione already have pending needinfos, but > perhaps there's someone else that can help to get an answer here? As :aswan mentioned in comment [1], these are only harness updates. Thus, their owner should look over them. Unfortunately, tp5o responsiveness is one of the perf tests that remained without any owner; we attempted multiple times to find someone to take care of this test, but without luck. What we can do is to just resume that search and escalate this issue where we can. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1494882#c4
Flags: needinfo?(igoldan)
Did I read recently that tests without owners are degrading and reduced to tier 2? It looks like these tests were introduced by bug 631571. Have these tests regressed before? Perhaps there is some precident that we can follow in the history of these tests.
(In reply to Dave Hunt [:davehunt] [he/him/his] ⌚️UTC+1 from comment #8) > Did I read recently that tests without owners are degrading and reduced to > tier 2? That was just a proposition of me and Joel, regarding the new Performance Test Validation policy. We need to re examine those policy proposals at management level and decide whether there's something more to add to them or they're sufficient and good enough. After that we can impose them.
(In reply to Dave Hunt [:davehunt] [he/him/his] ⌚️UTC+1 from comment #8) > It looks like these tests were introduced by bug 631571. Have these > tests regressed before? Perhaps there is some precident that we can follow > in the history of these tests. If you're referring to other harness/test itself updates, I have to look that up.
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #10) > (In reply to Dave Hunt [:davehunt] [he/him/his] ⌚️UTC+1 from comment #8) > > It looks like these tests were introduced by bug 631571. Have these > > tests regressed before? Perhaps there is some precident that we can follow > > in the history of these tests. > > If you're referring to other harness/test itself updates, I have to look > that up. I wasn't specifically referring to harness/test updates. Searching Bugzilla for 'responsiveness' and the 'talos-regression' keyword shows 28 bugs. Only 7 of these were resolved as fixed, and only one of these listed just responsiveness as the regressing test. The rest were resolved as wontfix (13), invalid (3), duplicate (1), or worksforme (1).
please see bug 1444212 about replacing tp5 with tp6- once we have input latency for tp6, then it is reasonable to retire tp5 which would include the responsiveness test.
Flags: needinfo?(aswan)

Dave, do we have any estimates for when tp6 is ready to replace tp5?

Flags: needinfo?(dave.hunt)

I think measuring latency/responsiveness in tp6 is likely to be H2/2019. Even if it were sooner, it doesn't help us with regards to this regression. Given the magnitude I don't feel comfortable with ignoring this.

Joel: do you have any suggestions for ways to move forward on this?

Flags: needinfo?(dave.hunt) → needinfo?(jmaher)

we have already shipped with this regression, so we have ignored it enough to treat it as low priority.

We could add the existing responsiveness into tp6, it would be a hack and something that like today is desired, yet agreed upon that there is no real support for it and what responsiveness measures is questionable.

here is what we do:

set an environment variable during the run:
https://searchfox.org/mozilla-central/source/testing/talos/talos/ttest.py#121

parse the stdout for messages:
https://searchfox.org/mozilla-central/source/testing/talos/talos/results.py#295

summarize the data:
https://searchfox.org/mozilla-central/source/testing/talos/talos/filter.py#253

some unknowns would be how to hack this into raptor so it is included in a logical way in the final results.

the advantage here is we would get responsiveness per page whereas tp5 gives it for the entire suite. If we don't want to hack it in, we could look at historical regressions on responsiveness (last year on the graph) and see if we fixed issues and where we get value from this test.

Flags: needinfo?(jmaher)
Priority: -- → P3

Vicky, maybe this is something you can help with since it might take a bigger project to fix than just one bug? It looks like a possible regression we accepted but might not want to live with long term.

Flags: needinfo?(vchin)
Flags: needinfo?(plawless)
Whiteboard: [perftest:triage]

:davehunt, can we mark this bug as incomplete or inactive?

Flags: needinfo?(dave.hunt)
Whiteboard: [perftest:triage]

Closing as WONTFIX as there's nothing we can realistically do here at this point. A harness change caused a baseline shift and introduced noise. Our work on establishing perf sheriffing criteria may cause us to revisit this test in the near future. The other bugs split from bug 1494882 have already been resolved as WONTFIX.

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(vchin)
Flags: needinfo?(plawless)
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(dave.hunt)
Resolution: --- → WONTFIX
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.