Closed Bug 1691898 Opened 3 years ago Closed 3 years ago

2.37 - 95.44% cnn-ampstories ebay-kleinanzeigen-search / reddit / web-de (android-hw-g5-7-0-arm7-api-16-shippable) regression on push cfc0f48add0096b2a5ebdf045a31316eb4de1998 (Thu February 4 2021)

Categories

(Core :: Security: PSM, defect)

defect

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox85 --- unaffected
firefox86 --- unaffected
firefox87 blocking fixed
firefox88 blocking fixed
firefox89 + fixed
firefox90 --- fixed

People

(Reporter: Bebe, Assigned: keeler)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [perf:alert:0])

Attachments

(1 file)

Perfherder has detected a browsertime performance regression from push cfc0f48add0096b2a5ebdf045a31316eb4de1998. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
95% cnn-ampstories loadtime android-hw-g5-7-0-arm7-api-16-shippable cold live nocondprof webrender 985.00 -> 1,925.12
75% cnn-ampstories loadtime android-hw-g5-7-0-arm7-api-16-shippable cold live nocondprof webrender 1,087.88 -> 1,904.04
50% cnn-ampstories fcp android-hw-g5-7-0-arm7-api-16-shippable cold live nocondprof webrender 1,606.04 -> 2,415.83
37% cnn-ampstories SpeedIndex android-hw-g5-7-0-arm7-api-16-shippable cold live nocondprof webrender 2,286.83 -> 3,126.58
37% cnn-ampstories PerceptualSpeedIndex android-hw-g5-7-0-arm7-api-16-shippable cold live nocondprof webrender 2,286.38 -> 3,125.58
36% cnn-ampstories FirstVisualChange android-hw-g5-7-0-arm7-api-16-shippable cold live nocondprof webrender 2,249.92 -> 3,055.42
35% cnn-ampstories ContentfulSpeedIndex android-hw-g5-7-0-arm7-api-16-shippable cold live nocondprof webrender 2,268.75 -> 3,071.17
34% cnn-ampstories LastVisualChange android-hw-g5-7-0-arm7-api-16-shippable cold live nocondprof webrender 2,371.29 -> 3,189.25
15% web-de fcp android-hw-g5-7-0-arm7-api-16-shippable cold nocondprof webrender 578.85 -> 667.29
9% web-de android-hw-g5-7-0-arm7-api-16-shippable cold nocondprof webrender 1,109.55 -> 1,212.53
6% reddit fcp android-hw-g5-7-0-arm7-api-16-shippable cold nocondprof webrender 2,327.21 -> 2,474.88
5% reddit android-hw-g5-7-0-arm7-api-16-shippable cold nocondprof webrender 2,770.23 -> 2,917.57
4% ebay-kleinanzeigen-search fcp android-hw-g5-7-0-arm7-api-16-shippable cold nocondprof webrender 1,129.83 -> 1,179.04
4% reddit FirstVisualChange android-hw-g5-7-0-arm7-api-16-shippable cold nocondprof webrender 3,282.12 -> 3,408.75
2% reddit SpeedIndex android-hw-g5-7-0-arm7-api-16-shippable cold nocondprof webrender 5,295.67 -> 5,421.00

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
14% netflix FirstVisualChange linux64-shippable-qr cold nocondprof webrender 280.00 -> 240.00
6% imgur FirstVisualChange linux64-shippable-qr cold nocondprof webrender 635.00 -> 600.00

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(dkeeler)

Set release status flags based on info from the regressing bug 1689729

Is there a way to get a profile of the hardware running this test? I don't have access to this particular hardware.

Flags: needinfo?(dkeeler) → needinfo?(fstrugariu)

The numbers here look pretty bad. Could we consider backing out bug 1689729 from at least beta87 next week?

Flags: needinfo?(dkeeler)

Bug 1689729 is only in Nightly so far. Do you mean not let it ride to beta? I think that would be a good idea for now.

Flags: needinfo?(dkeeler)

Oh, I see - that is what you're saying. Yes, let's do that.

Thanks, tracking so it's on my radar for the merge on Monday.

Rather than tracking this bug, can you just back out https://hg.mozilla.org/mozilla-central/rev/cfc0f48add00?

Flags: needinfo?(jcristau)

You mean back it out now on central, or back it out on beta after the merge? I'm using the blocker flag on this bug as a reminder for myself to do the latter on Monday.

Flags: needinfo?(jcristau)

Oh, I didn't realize that was what you were planning. I guess let's start with that.

Florin, I can't seem to run any test-android-hw-g5-7-0-arm7-api-16-shippable/opt-browsertime-tp6m-live-geckoview-cnn-ampstories-wr-e10s jobs when I push to try - is that expected?

Flags: needinfo?(fstrugariu)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #10)

Oh, I didn't realize that was what you were planning. I guess let's start with that.

Florin, I can't seem to run any test-android-hw-g5-7-0-arm7-api-16-shippable/opt-browsertime-tp6m-live-geckoview-cnn-ampstories-wr-e10s jobs when I push to try - is that expected?

Please try the following: ./mach try --full -q "'android-hw-g5 'live-geckoview-cnn". The --full is necessary to run tests against the limited pool of Android hardware.

Flags: needinfo?(fstrugariu)

Hmmm - that resulted in mach: error: unrecognized arguments for try auto: '--full', '-q', ''android-hw-g5 'live-geckoview-cnn'

Flags: needinfo?(dave.hunt)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #13)

Hmmm - that resulted in mach: error: unrecognized arguments for try auto: '--full', '-q', ''android-hw-g5 'live-geckoview-cnn'

Sorry, I think I was just missing the fuzzy argument. My environment is not currently working so I'm unable to test and confirm. I would suggest trying ./mach try fuzzy --full -q "'android-hw-g5 'live-geckoview-cnn" but if that's not working try using ./mach try fuzzy --full and looking for the appropriate jobs.

Flags: needinfo?(dave.hunt)

Hey :keeler let me know if this helps. Also, this bug needs a priority and assignee as we're tracking it for 88.

Flags: needinfo?(dkeeler)

The graphs are a bit noisy, but it looks like bug 1694542 has helped - is there a way to confirm this?

Flags: needinfo?(fstrugariu)

The change that introduced this, Bug 1689729, is now in Beta 88.

I'm measuring a pageload performance regression between Fenix Beta 87 and Fenix Beta 88.

Is it possible to re-run the tests that triggered the initial alert?

On a broad live pageload test, I can measure a ~7% drop in visual metric performance from fenix_beta_2021_03_15 (gv87) to fenix_beta_2021_03_24 (gv88) that results from the revision that triggered this regression alert.

https://docs.google.com/spreadsheets/d/18qCiz3SReDgDPwhbYfuDrbnBK1030FuVWGBHWwdgCFY/edit#gid=1689990234&range=AG87:AO87

Unless we can find a way to mitigate this, I don't think we want to ship this to Fenix users because of the performance impact for new users.

I took a look over the graphs and did not saw any significant improvements.

As you can see from these graphs there are no performance fixes/alerts after the 04Feb alert
https://treeherder.mozilla.org/perfherder/graphs?timerange=5184000&series=autoland,2891209,1,13
https://treeherder.mozilla.org/perfherder/graphs?timerange=5184000&series=autoland,2890593,1,13
https://treeherder.mozilla.org/perfherder/graphs?timerange=5184000&series=autoland,2891207,1,13

I will let you guys decide if we should accept this regression or not

Flags: needinfo?(fstrugariu)

Might be best to hold this back until the rest of the changes in this area have landed.

Pascal, I guess we should back this out of beta again for 89.

Dana, where do we stand here for 90?

Flags: needinfo?(pascalc)
Flags: needinfo?(dkeeler)

The initial patch does not backout cleanly to beta, Dana could you provide an update patch to back it out? Thanks

Flags: needinfo?(pascalc)

Because this change is affecting a couple of performance tests (android startup, android pageload), can we consider putting it behind a pref or backing it out of nightly until we have the improvement patches ready?

Rather than repeatedly backing out a patch that doesn't backout cleanly or using a pref, I'd rather just revert the specific changes that caused the performance issue until the rest of the code is ready.

Flags: needinfo?(dkeeler)

Bug 1689729 moved some certificate verification operations to the socket thread
using synchronous runnables. Unfortunately this caused a performance regression
that can't be addressed until all certificate verification operations that
involve NSS certificate resources happen on the socket thread. Until then, this
patch reverts that behavior.

Assignee: nobody → dkeeler
Status: NEW → ASSIGNED
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/783b7ceb7ff9
revert the parts of bug 1689729 that caused a performance regression r=bbeurdouche
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch

The patch landed in nightly and beta is affected.
:keeler, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(dkeeler)

Comment on attachment 9221691 [details]
Bug 1691898 - revert the parts of bug 1689729 that caused a performance regression r?bbeurdouche

Beta/Release Uplift Approval Request

  • User impact if declined: Performance regression.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch reverts to previous behavior, so it shouldn't be risky.
  • String changes made/needed: none
Flags: needinfo?(dkeeler)
Attachment #9221691 - Flags: approval-mozilla-beta?

Comment on attachment 9221691 [details]
Bug 1691898 - revert the parts of bug 1689729 that caused a performance regression r?bbeurdouche

Approved for 89 beta 14, thanks.

Attachment #9221691 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Regressions: 1711966
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: