Closed Bug 1909816 Opened 2 months ago Closed 23 days ago

18.32 - 7.37% stackoverflow ContentfulSpeedIndex / stackoverflow largestContentfulPaint + 4 more (Android) regression on Thu July 11 2024

Categories

(Fenix :: Translations, defect, P2)

All
Android
defect

Tracking

(firefox128 unaffected, firefox129 unaffected, firefox130 wontfix, firefox131 wontfix)

RESOLVED WORKSFORME
Tracking Status
firefox128 --- unaffected
firefox129 --- unaffected
firefox130 --- wontfix
firefox131 --- wontfix

People

(Reporter: afinder, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [fxdroid][foundation][translations:130][avocado sprint])

Perfherder has detected a browsertime performance regression from push 978d31629fa27390e9c519edfa1fce164ba3eb37. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
18% stackoverflow ContentfulSpeedIndex android-hw-a51-11-0-aarch64-shippable-qr warm webrender 328.75 -> 388.97 Before/After
18% stackoverflow FirstVisualChange android-hw-a51-11-0-aarch64-shippable-qr warm webrender 327.15 -> 386.53 Before/After
15% stackoverflow PerceptualSpeedIndex android-hw-a51-11-0-aarch64-shippable-qr warm webrender 342.03 -> 392.09 Before/After
11% stackoverflow SpeedIndex android-hw-a51-11-0-aarch64-shippable-qr warm webrender 384.49 -> 428.42 Before/After
9% stackoverflow fcp android-hw-a51-11-0-aarch64-shippable-qr warm webrender 202.01 -> 219.33 Before/After
7% stackoverflow largestContentfulPaint android-hw-a51-11-0-aarch64-shippable-qr warm webrender 212.65 -> 228.31 Before/After

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
11% bing FirstVisualChange android-hw-a51-11-0-aarch64-shippable-qr warm webrender 235.23 -> 208.85 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 1443

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(ohall)

Set release status flags based on info from the regressing bug 1905727

I'll take a look at this and see if it is related.

Perfherder has detected a browsertime performance regression from push 978d31629fa27390e9c519edfa1fce164ba3eb37. As author of one of the patches included in that push, we need your help to address this regression.

978d31629fa27390e9c519edfa1fce164ba3eb37 only lists bug 1906346, so not 100% sure this tagged the right regressor?

I'm not totally sure this patch could cause a regression like this, but possible since it turns on the translations UI on Android.

One reason it seems unlikly as the regressor is the translations engine is already turned on fully in Gecko for a long while (bug 1882841 v125). I'd think that would be when a performance change would happen. The only UI thing that seems particularly relevant is that an automatic popup happens, but I'm not seeing that impacting page load times. IIRC, there are checks on the Gecko side for the popup too.

Hi Alex,

Can you help me confirm that bug 1905727 is the regressor and suggest next steps?

Bug 1905727 was more of a UI unlock of translations on Android than anything. It possibly could have led to the regression, though. One of the reasons I'm not sure it is the regressor is because the translations engine has been running since bug 1882841 and the performance hit would have been more likely at that point.

Flags: needinfo?(ohall) → needinfo?(afinder)
Whiteboard: [fxdroid][foundation][translations:130]

Not sure if I did something wrong on the other ./mach try perf --alert 1443, the report seems not to be fully populated, but it did have a timeout in one of the runs and maybe is still completing?

Sent another one with a more recent base here, just in case the other one didn't work:

You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3803bc45ba53355a45dc5a1146a31d41e553bd0b&newProject=try&newRevision=552b5f38148f1632540a2f7085861688d5918d59

Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=3803bc45ba53355a45dc5a1146a31d41e553bd0b
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=552b5f38148f1632540a2f7085861688d5918d59

Hi Alex,

When you get a chance could you help me interpret these reports and suggest next steps?

Just some background on the scenario:

  • The patch in question unlocks Andoid UI on Fenix for translations on Android:

    • The patch switches Nimbus from a default of all false to a default of true on all channels except release.
    • The patch only unlocks UI entry points.
      • It doesn't turn on the translations engine or anything like that. The engine has been running in the background for some time now.
    • The patch does have the potential to trigger native UI popups (to offer a translation); however, it should be turned off in automation.
      • The general criteria for a popup is the app language and page language need to be different.
    • The Android UI is 100% native UI and nothing in XUL
  • The much earlier bug 1882841 (v125) turned on the translations engine, which is running in the background:

    • The translations engine does do a lot of things, like checking for if it should translate the page and gathering information. However, this has been running on Android since that bug landed.
    • There is also the possibility for automatic translations of a page, but that scenario could have played out when the v125 bug landed. (A pref would also have to have flipped.)
    • Translations does connect to remote settings and can even download language model files on request or in an automatic translation scenario. That being said, unlocking the UI by itself doesn't change what the engine was doing in that regard before this patch landed.
  • We also plan on fully enabling translations by default in Release on Android at the end of v130 in bug 1909679.

Just an update on this bug. Several of us met today to look at the profile on this bug.

:jonalmeida's summary

Intent to open a page from the command-line for testing:
adb shell am start -a "android.intent.action.VIEW" -d "http://developer.android.com" -n "org.mozilla.firefox/org.mozilla.fenix.IntentReceiverActivity"

Profile with Translations UI on:
https://share.firefox.dev/3yheGVH
Profile with Translations UI off:
https://share.firefox.dev/4dpqNit

We hypothesize:

  1. Nimbus is in the menu is expensive, so we should test and see if removing the recordExposure makes it faster today (link).
  2. See if we can reduce the number of onTranslationsActionUpdated callbacks that trigger the menu invalidation by using some caching strategies before we invoke it.
  3. With the menu rewrite, that might improve it as well.

I plan on generating profiles to either confirm or eliminate (1) as a possibility.

(2) is captured in bug 1892463, which will gain priority if (1) is eliminated as a possibility.

(3) is currently underway independently under this bug 1863790.

Bug 1909679, which also touches this same code is queued for landing.

See Also: → 1892463, menu-redesign

Edit: Disregard this comment, the base is not correct, jump to Comment 10 for valid comparisons.

Added three new tests to narrow down some possibilities on the Nimbus side to better see if that is part of the root cause or something else.

Testing removing all FxNimbus.features.translations calls:

Revision removes all FxNimbus.features.translations code usage.

You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3803bc45ba53355a45dc5a1146a31d41e553bd0b&newProject=try&newRevision=349f683c5aa1557d38d5556f3a3d9022ca57518b

Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=3803bc45ba53355a45dc5a1146a31d41e553bd0b
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=349f683c5aa1557d38d5556f3a3d9022ca57518b

Testing removing only FxNimbus.features.translations.recordExposure() calls:

Revision removes only FxNimbus.features.translations.recordExposure() usage.

You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3803bc45ba53355a45dc5a1146a31d41e553bd0b&newProject=try&newRevision=526e9b9166147ea08c124a3d65fe48183a170537

Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=3803bc45ba53355a45dc5a1146a31d41e553bd0b
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=526e9b9166147ea08c124a3d65fe48183a170537

Testing switching Nimbus to default true on all channels (bug 1909679 change and new default) :

You'll be able to find a performance comparison here once the tests are complete (ensure you select the right framework): https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3803bc45ba53355a45dc5a1146a31d41e553bd0b&newProject=try&newRevision=03497b8266ae7939a472593cf50db9f1398598c4

Base revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=3803bc45ba53355a45dc5a1146a31d41e553bd0b
Local revision's try run: https://treeherder.mozilla.org/jobs?repo=try&revision=03497b8266ae7939a472593cf50db9f1398598c4

Blocks: 1909674

About comment 8:

It looks like the latest tests are using the wrong Base. From what I can tell, the Base on all of theses is the first base I had several days ago, but Local is latest base branch I have + local changes. (I'm seeing things like it being 140% worse on the new branch on these tests, which doesn't make sense, since I'm removing things. I'm thinking I picked up something on rebase.)

It sounds like it should have autodetected this based on the docs:

The comparison that is produced will be based on the base
revision in your local repository (i.e. the base revision your
patches, if any, are based on). If only specific tests need to
run, use --tests to specify them

I also switched the compare to the latest mozilla-central, which looks more reasonable, but still diverged enough that I think I need to find a way to send up a new base.

I'll see if I can find a way to rerun these with the latest base I have.

Figured it out, I needed to specify --rebuild when sending ./mach try perf --alert 1443 when I changed base branches.

NB: This base has bug 1909679 enabled, which changed the Nimbus setup to a full default of on on all channels, as well.

Testing removing all FxNimbus.features.translations calls:

Testing removing only FxNimbus.features.translations.recordExposure() calls:

Testing switching Nimbus to default true on all channels (bug 1909679 change and new default) :

Testing switching Nimbus back to before bug 1905727 :

Hi Olivia,
I triggered a side-by-side job to get a video comparison of the before and after revisions. The video comparison should include the ContentfulSpeedIndex and FirstVisualChange metrics results. I'll come back with the results later. Based on the testing results you included, is there any follow-up patch planned for this ?

Flags: needinfo?(afinder) → needinfo?(ohall)

Hi Alex,

Thanks for taking a look! I'm still looking for the root cause. I'll look at the profiles more today since idea (1) didn't seem to pan out based on the tests. That moves me to looking at idea (2), and the sentiment/work is captured in bug 1892463.

Based on the Nimbus testing results, I think we can eliminate Nimbus as the main cause. (One of the tests showed performance just slightly worse when Nimbus was pulled out of the path, which was a bit surprising.)

Most of the others Nimbus tests I ran showed no change or insignificant change, except testing turning the feature back to where it was before this started.

So, moving down to idea (2):

1. Nimbus is in the menu is expensive, so we should test and see if removing the recordExposure makes it faster today (link).
2. See if we can reduce the number of onTranslationsActionUpdated callbacks that trigger the menu invalidation by using some caching strategies before we invoke it.
3. With the menu rewrite, that might improve it as well.

Based on the testing results you included, is there any follow-up patch planned for this ?

Nothing forthcoming yet, I can work on idea (2), which is bug 1892463, to see if there is anything to be gained there.

One thing I find really strange though is why the testing shows an improvent on bing, but dramatically worse on stackoverflow as well. Latest run is showing stackoverflow FirstVisualChange opt warm webrender as improved as well. I'd expect everything to be blanketly worse.

Trying to determine if there is a connection to the toolbar work and the performance regression on bug 1909180, as well.

Flags: needinfo?(ohall)

Set release status flags based on info from the regressing bug 1905727

Whiteboard: [fxdroid][foundation][translations:130] → [fxdroid][foundation][translations:130][avocado sprint]
Assignee: nobody → ohall

The severity field is not set for this bug.
:olivia, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(ohall)
Severity: -- → S3
Flags: needinfo?(ohall)
Priority: -- → P2
Assignee: ohall → nobody

Marking this fix-optional for 130, since we are getting close to the Sept. 3 130 release date.

For posterity, bug 1909679 enabled this across all channels in 130+.

Regressed by: 1909679

I strongly suspect this bug might be a similar situation as bug 1909180. Exploring bug 1892463 is the only speculative lead right now.

I would like to propose WORKSFORME because:

  • This patch regression was an Android Compose UI button reveal.
    • Expect the root cause is either Nimbus or something going on with how we decide to show icons on the toolbar (bug 1892463).
  • The translations engine has been working in the background much longer and running in the background this whole time.
    • Likely nothing translations engine specific caused this regression, must be something much more subtle going on.
  • It is also a bit odd we are reporting both a regression and improvement of similar but opposite magnitude.
    • This improvement and regression shows up in subsequent tests as well.
    • Almost wonder if some ordering event has changed somehow in the test harness. For example, it could be that something is now preloaded for one of the tests and intialized for the other, so it swapped ordering and creates a regression/improvement depending on the order.
  • The performance guild has also looked at translations specifically as well.
  • Situation is similar to bug 1909180, which was resolved as WORKSFORME.
  • In addition, the new toolbar changes and makes adjustments to these code paths.

Please reopen if you think this bug should have additional investigation or ideas for action.

Status: NEW → RESOLVED
Closed: 23 days ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.