Open Bug 1722266 Opened 3 years ago Updated 4 months ago

9.77 - 2.2% espn fcp / imdb PerceptualSpeedIndex + 113 more (Android) regression on Wed July 21 2021

Categories

(GeckoView :: General, defect, P2)

Firefox 92
Unspecified
All

Tracking

(firefox-esr78 unaffected, firefox-esr91 unaffected, firefox92 wontfix, firefox93 wontfix, firefox94 wontfix, firefox95 fix-optional)

Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox92 --- wontfix
firefox93 --- wontfix
firefox94 --- wontfix
firefox95 --- fix-optional

People

(Reporter: alexandrui, Unassigned)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [geckoview:m93?] [geckoview:2022h2?])

Perfherder has detected a browsertime performance regression from push 2f8bbf2478c7bf6e6f9d586cfa89e30a332a735b. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
10% espn fcp android-hw-g5-7-0-arm7-shippable cold 2,316.92 -> 2,543.38
10% espn fnbpaint android-hw-g5-7-0-arm7-shippable cold 2,326.79 -> 2,553.38
9% allrecipes dcf android-hw-g5-7-0-arm7-shippable warm 2,031.08 -> 2,218.83
9% allrecipes fcp android-hw-g5-7-0-arm7-shippable warm 2,070.00 -> 2,257.17
9% allrecipes fnbpaint android-hw-g5-7-0-arm7-shippable warm 2,089.50 -> 2,276.12
9% espn FirstVisualChange android-hw-g5-7-0-arm7-shippable cold 2,681.17 -> 2,911.50
8% allrecipes FirstVisualChange android-hw-g5-7-0-arm7-shippable warm 2,273.21 -> 2,458.83
7% allrecipes FirstVisualChange android-hw-g5-7-0-arm7-shippable-qr warm webrender 2,178.71 -> 2,341.58
7% booking loadtime android-hw-g5-7-0-arm7-shippable-qr warm webrender 1,290.40 -> 1,384.71
7% youtube dcf android-hw-g5-7-0-arm7-shippable-qr warm webrender 840.69 -> 896.71
... ... ... ... ... ...
3% amazon-search FirstVisualChange android-hw-g5-7-0-arm7-shippable warm 942.75 -> 967.25
3% amazon-search SpeedIndex android-hw-g5-7-0-arm7-shippable warm 1,011.50 -> 1,036.75
2% imdb SpeedIndex android-hw-g5-7-0-arm7-shippable-qr warm webrender 2,611.67 -> 2,676.50
2% amazon-search PerceptualSpeedIndex android-hw-g5-7-0-arm7-shippable warm 1,018.50 -> 1,043.50
2% imdb PerceptualSpeedIndex android-hw-g5-7-0-arm7-shippable-qr warm webrender 4,258.12 -> 4,351.67

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
3% booking fnbpaint android-hw-p2-8-0-android-aarch64-shippable-qr warm webrender 452.29 -> 439.96

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(agi)

Not sure which of the 2 bugs caused the regression. :agi feel free to leave the regressing bug only if you know. Thanks!

Flags: needinfo?(agi)
No longer regressed by: 1709640
Has Regression Range: --- → yes

It's interesting this only seems to affect 32bit builds, while 64bit are unaffected. I might have screwed up something in the 32bit pgo build pipeline.

Priority: -- → P2
Whiteboard: [geckoview:m93?]

(In reply to Agi Sferro | :agi | ni? for questions | ⏰ PST | he/him from comment #3)

It's interesting this only seems to affect 32bit builds, while 64bit are unaffected. I might have screwed up something in the 32bit pgo build pipeline.

Agi, did you have time to look into this more?

My current theory is that we need to run the arm pgo profile on an arm CPU to get the performance back. Talking to aklotz last month he mentioned that he believes that pgo profiles should be run on actual devices only, I'm gonna look into that, it might get us better perf on arm64 too (which is the large majority of our users)

Flags: needinfo?(agi)

jamher will get an estimate for running the profile-generate Android job on actual devices for aarch64 and arm.

Flags: needinfo?(jmaher)

ok, these jobs take ~27 minutes total to complete (I will assume same runtime on physical phones). Adding ~6 minutes to account for any reboots - I would round up to 35 minutes per job max.

In the last month we have had 719 64 bit jobs and 809 x86 jobs - accounting for down devices and peak loads, I would round up to 1000 profile runs/day. The last month our load has been higher the the 4 months prior, I assume we are having more pushes as we have fewer PTO days?

Doing the math:
27 minutes @800 runs/day = 15 devices x86_64 and 15 devices x86
35 minutes @1000 runs/day = 24 devices x86_64 and 24 devices x86

What I don't know:

  1. how long it takes to run on a physical device
  2. what the reboot/overhead is of the devices
  3. if there is a reason for higher load in the last month
  4. if we have other pgo types that are not represented in x86_64 and x86 (new versions upcoming?!?)

I would probably pick between 30 and 45 devices - rough math indicates that 30 devices would be ~$180K/year in infrastructure cost.

Flags: needinfo?(jmaher)
See Also: → 1663700
Whiteboard: [geckoview:m93?] → [geckoview:m93?] [geckoview:2022h2?]

(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #7)

ok, these jobs take ~27 minutes total to complete (I will assume same runtime on physical phones). Adding ~6 minutes to account for any reboots - I would round up to 35 minutes per job max.

In the last month we have had 719 64 bit jobs and 809 x86 jobs - accounting for down devices and peak loads, I would round up to 1000 profile runs/day. The last month our load has been higher the the 4 months prior, I assume we are having more pushes as we have fewer PTO days?

Doing the math:
27 minutes @800 runs/day = 15 devices x86_64 and 15 devices x86
35 minutes @1000 runs/day = 24 devices x86_64 and 24 devices x86

What I don't know:

  1. how long it takes to run on a physical device
  2. what the reboot/overhead is of the devices
  3. if there is a reason for higher load in the last month
  4. if we have other pgo types that are not represented in x86_64 and x86 (new versions upcoming?!?)

I would probably pick between 30 and 45 devices - rough math indicates that 30 devices would be ~$180K/year in infrastructure cost.

Maybe I'm missing something but, don't we only need to run these jobs for mozilla-central (and beta and release) builds? that should only be 8-ish runs per day not 800-1000.

good point- I think I overlooked the obvious. Rounding up to 10 to account for blue jobs (that fail in the middle and auto retry, or higher load).

Given the math, then we have:
35 minutes/run * 10 runs/day = 350 minutes/day.

That is < 1/2 device/day.

Joel, what is the next step for this bug?

In comment 2, Agi said he thinks his change to use an "instrumented build on x86_64" (https://hg.mozilla.org/integration/autoland/rev/59beb0677c0f) caused this page load regression. Is that "instrumented build" used to generate the profile data then used for PGO? Would this regression affecting real users or is this only a perf regression for generating the profile data?

The regression in comment 0 (from July 2021) is for android-hw-g5-7-0-arm7-shippable, which I believe we have already retired.

Flags: needinfo?(jmaher)

it looks like the change switched from profiling on arm7 -> arm64. That means that we probably optimize on arm64 and not as well on arm7.

If there is a strong desire to look into this, I would suggest on try server running tip on the a51 phones, then backing out the root cause and running a second push- then comparing to see the difference. The a51's are aarch64 and that is all we run on these days- so quite likely this won't be seen.

If this isn't seen, then we need to determine if it really is arm7 and if arm7 is a real concern for us and the marketplace.

Flags: needinfo?(jmaher)

If this isn't seen, then we need to determine if it really is arm7 and if arm7 is a real concern for us and the marketplace.

Thanks. I'll follow up with PM about the priority of arm7 vs arm64 performance.

You need to log in before you can comment on or make changes to this bug.