Closed Bug 1933244 Opened 1 year ago Closed 1 year ago

28 - 2.95% reddit-billgates-ama.members fcp / reddit-billgates-ama.billg-ama LastVisualChange + 22 more (Linux, OSX, Windows) regression on Wed November 13 2024

Categories

(Toolkit :: Startup and Profile System, defect)

defect

Tracking

()

RESOLVED INVALID
Tracking Status
firefox-esr128 --- unaffected
firefox133 --- unaffected
firefox134 --- fix-optional
firefox135 --- affected

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Perfherder has detected a browsertime performance regression from push a3590cf454bc8d44e59090e2dde956723b76ca5d. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
28% reddit-billgates-ama.billg-ama fcp macosx1015-64-shippable-qr cold fission webrender 146.84 -> 187.96 Before/After
28% reddit-billgates-ama.members fcp macosx1015-64-shippable-qr cold fission webrender 146.84 -> 187.96 Before/After
18% reddit-billgates-ama.billg-ama loadtime macosx1015-64-shippable-qr cold fission webrender 1,012.55 -> 1,199.74 Before/After
18% reddit-billgates-ama.members loadtime macosx1015-64-shippable-qr cold fission webrender 1,012.55 -> 1,199.74 Before/After
18% reddit-billgates-ama.billg-ama FirstVisualChange macosx1015-64-shippable-qr cold fission webrender 223.10 -> 262.83 Before/After
16% reddit-billgates-post-1.posts ContentfulSpeedIndex linux1804-64-shippable-qr cold fission webrender 220.43 -> 254.94 Before/After
15% reddit-billgates-post-2.top loadtime macosx1015-64-shippable-qr cold fission webrender 1,009.78 -> 1,157.87 Before/After
15% reddit-billgates-post-2.billg loadtime macosx1015-64-shippable-qr cold fission webrender 1,009.78 -> 1,157.87 Before/After
15% reddit-billgates-post-2.hot loadtime macosx1015-64-shippable-qr cold fission webrender 1,009.78 -> 1,157.87 Before/After
14% reddit-billgates-post-1.billg loadtime macosx1015-64-shippable-qr cold fission webrender 1,011.40 -> 1,157.23 Before/After
... ... ... ... ... ...
8% reddit-billgates-post-1.posts PerceptualSpeedIndex linux1804-64-shippable-qr cold fission webrender 234.20 -> 253.33 Before/After
7% reddit ContentfulSpeedIndex linux1804-64-shippable-qr cold fission webrender 1,266.32 -> 1,359.22 Before/After
7% reddit SpeedIndex linux1804-64-shippable-qr cold fission webrender 1,509.47 -> 1,609.51 Before/After
6% reddit PerceptualSpeedIndex linux1804-64-shippable-qr cold fission webrender 1,447.08 -> 1,537.03 Before/After
3% reddit-billgates-ama.billg-ama LastVisualChange macosx1015-64-shippable-qr cold fission webrender 11,168.62 -> 11,498.36 Before/After

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
5% speedometer3 NewsSite-Nuxt/NavigateToPolitics/Sync windows11-64-nightlyasrelease-qr fission webrender 14.54 -> 13.85 Before/After
5% speedometer3 NewsSite-Nuxt/NavigateToUS/Sync android-hw-a55-14-0-aarch64-shippable fission webrender 43.73 -> 41.75
4% speedometer3 TodoMVC-Lit-Complex-DOM/Adding100Items/total windows11-64-shippable-qr fission webrender 11.70 -> 11.23 Before/After
4% speedometer3 TodoMVC-WebComponents/DeletingAllItems/total windows11-64-shippable-qr fission webrender 5.20 -> 5.01 Before/After
4% speedometer3 TodoMVC-JavaScript-ES5/Adding100Items/Sync android-hw-a55-14-0-aarch64-shippable fission webrender 76.51 -> 73.75
... ... ... ... ... ...
2% speedometer3 total windows11-64-shippable-qr fission webrender 1,156.55 -> 1,133.08 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 42596

The following documentation link provides more information about this command.

For more information on performance sheriffing please see our FAQ.

If you have any questions, please do not hesitate to reach out to afinder@mozilla.com.

Flags: needinfo?(jhirsch)

Just to clarify, only the following alerts are valid regressions:

Regressions:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
7% reddit ContentfulSpeedIndex linux1804-64-shippable-qr cold fission webrender 1,266.32 -> 1,359.22 Before/After
7% reddit SpeedIndex linux1804-64-shippable-qr cold fission webrender 1,509.47 -> 1,609.51 Before/After
6% reddit PerceptualSpeedIndex linux1804-64-shippable-qr cold fission webrender 1,447.08 -> 1,537.03 Before/After

The other regressions mentioned in comment 0 are all infra alerts (invalid) and can be discarded.

Sorry for the confusion.

Set release status flags based on info from the regressing bug 1924850

It has been over 7 days with no activity on this performance regression.

:jhirsch, since you are the author of the regressor, bug 1924850, which triggered this performance alert, could you please provide a progress update?

If this regression is something that fixes a bug, changes the baseline of the regression metrics, or otherwise will not be fixed, please consider closing it as WONTFIX. See this documentation for more information on how to handle regressions.

For additional information/help, please needinfo the performance sheriff who filed this alert (they can be found in comment #0), or reach out in #perftest, or #perfsheriffs on Element.

For more information, please visit BugBot documentation.

Flags: needinfo?(jhirsch)

IIUC the reddit graphs for the tests from comment 1 show the same period ~ from Nov 13 to Nov 25 with worse numbers as the other graphs from comment 0.

Are we sure those are real?

Flags: needinfo?(afinder)

(In reply to Jens Stutte [:jstutte] from comment #4)

IIUC the reddit graphs for the tests from comment 1 show the same period ~ from Nov 13 to Nov 25 with worse numbers as the other graphs from comment 0.

Are we sure those are real?

Hi Jens! Thanks for reaching out!

As mentioned previously, only the regressions mentioned in comment 1 are valid (ContentfulSpeedIndex, SpeedIndex and PerceptualSpeedIndex on linux1804-64-shippable-qr for reddit), and the rest of the regressions mentioned in comment 0 that do not show up in comment 1 are marked as infra, therefore should be discarded.

Infra alerts are graphs that upon retriggering or backfilling, are aligned with the performance trend established after the culprit revision, not the previous performance trend (therefore invalid, which can be caused by various changes in the hardware infrastructure upon which the tests are executed). The following graph is an example of a visible infra alert from the alerts linked in comment 0. We can see there that for revisions e22456853973 and 3c174ea10f04 highlighted in the graph, the retriggers align with the performance trend established after a3590cf454bc8d44e59090e2dde956723b76ca5d. The reason why the infra alerts were added in comment 0, is because the current "File Bug" feature does not filter them out currently (this will be fixed eventually).

One thing I noticed in the graph, which was not visible at the time when this performance regression bug was logged, is that starting with revision 61d11aa9346e, which also generated an improvement alert (later marked also as infra), the graphs reported in comment 1 reverted to their initial performance trends, suggesting they might also be infra (or the performance regression was fixed in the meantime).

I started some retriggers before the original culprit revision and within the following range, and will return on Monday to check if they also turn out to be infra, or can still be considered valid regressions.

Flags: needinfo?(afinder)
Flags: needinfo?(afinder)

(In reply to Alex Finder from comment #5)

(In reply to Jens Stutte [:jstutte] from comment #4)

IIUC the reddit graphs for the tests from comment 1 show the same period ~ from Nov 13 to Nov 25 with worse numbers as the other graphs from comment 0.

Are we sure those are real?

Hi Jens! Thanks for reaching out!

As mentioned previously, only the regressions mentioned in comment 1 are valid (ContentfulSpeedIndex, SpeedIndex and PerceptualSpeedIndex on linux1804-64-shippable-qr for reddit), and the rest of the regressions mentioned in comment 0 that do not show up in comment 1 are marked as infra, therefore should be discarded.

Infra alerts are graphs that upon retriggering or backfilling, are aligned with the performance trend established after the culprit revision, not the previous performance trend (therefore invalid, which can be caused by various changes in the hardware infrastructure upon which the tests are executed). The following graph is an example of a visible infra alert from the alerts linked in comment 0. We can see there that for revisions e22456853973 and 3c174ea10f04 highlighted in the graph, the retriggers align with the performance trend established after a3590cf454bc8d44e59090e2dde956723b76ca5d. The reason why the infra alerts were added in comment 0, is because the current "File Bug" feature does not filter them out currently (this will be fixed eventually).

One thing I noticed in the graph, which was not visible at the time when this performance regression bug was logged, is that starting with revision 61d11aa9346e, which also generated an improvement alert (later marked also as infra), the graphs reported in comment 1 reverted to their initial performance trends, suggesting they might also be infra (or the performance regression was fixed in the meantime).

I started some retriggers before the original culprit revision and within the following range, and will return on Monday to check if they also turn out to be infra, or can still be considered valid regressions.

Following up from the previous comment, I added some re-triggers and backfills before revision a3590cf454bc8 to get a clearer graph. Will revisit the results tomorrow and check.

Flags: needinfo?(afinder)
Flags: needinfo?(afinder)

Returning with an analysis on the tests after the retriggers, it looks like the graphs now match the same infra pattern as the other reported tests from comment 0. I'll mark the bug as Invalid and unlink it from the alert summary. Sorry for the confusion!

Status: NEW → RESOLVED
Closed: 1 year ago
Flags: needinfo?(afinder)
Resolution: --- → INVALID
Flags: needinfo?(jhirsch)
You need to log in before you can comment on or make changes to this bug.