Closed Bug 1555268 Opened 3 years ago Closed 2 years ago

4.99% tp5o_webext responsiveness (linux64-shippable) regression on push f41fab78af2f06d2ad86d49e7308ced2a39e1891

Categories

(Firefox Build System :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression, Whiteboard: faq-candidate)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=085342ba416f464742b068edd32f20fae8e18e87&tochange=f41fab78af2f06d2ad86d49e7308ced2a39e1891

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

5% tp5o_webext responsiveness linux64-shippable opt e10s stylo 1.59 -> 1.67

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=21098

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Component: General → Toolbars and Customization
Product: Testing → Firefox

:gregtatum this is a weird perf regression. When bug 1552565 first landed no perf change was noticed. However, when it got backed out, we saw this 5% regression.

I don't have a solid explanation for this, other than one of the patches that landed after bug 1552565 was somehow related to it. Could you skim this bug list and point on the related bugs, if any?

Flags: needinfo?(gtatum)

Or, of course provide another explanation?

Please provide a treeherder link for the push range in question (including both at least one commit before the landing of bug 1552565 and one after it being backed out), filtered on the job that is actually regressing here.

Flags: needinfo?(igoldan)
Flags: needinfo?(igoldan)

(In reply to :Gijs (he/him) from comment #4)

https://treeherder.mozilla.org/#/jobs?repo=autoland&tochange=b5656f3353165e991e563acd3a0b276df4de7e65&searchStr=g5%2Cshippable%2Clinux&fromchange=ddb7366f09a390b17b25984be1681026143ac679

https://treeherder.mozilla.org/perf.html#/graphs?series=autoland,1927254,1,1&zoom=1558723638479.2905,1558736569901.2415,1.5094566727599597,1.8011370161509457&selected=autoland,1927254,482399,819892708,1

This looks to me like it's bug 1542746 instead, which landed right before the backout. That would make a bunch more sense. Ionuț, can you confirm?

Hmmm... I agree with you. Especially since this regression didn't affect any other platforms. Could we conclude that this so called "regression" was actually caused by a temporary misbehaving PGO build process? And close the bug as WONTFIX?

Flags: needinfo?(igoldan) → needinfo?(gijskruitbosch+bugs)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #5)

(In reply to :Gijs (he/him) from comment #4)

https://treeherder.mozilla.org/#/jobs?repo=autoland&tochange=b5656f3353165e991e563acd3a0b276df4de7e65&searchStr=g5%2Cshippable%2Clinux&fromchange=ddb7366f09a390b17b25984be1681026143ac679

https://treeherder.mozilla.org/perf.html#/graphs?series=autoland,1927254,1,1&zoom=1558723638479.2905,1558736569901.2415,1.5094566727599597,1.8011370161509457&selected=autoland,1927254,482399,819892708,1

This looks to me like it's bug 1542746 instead, which landed right before the backout. That would make a bunch more sense. Ionuț, can you confirm?

Hmmm... I agree with you. Especially since this regression didn't affect any other platforms. Could we conclude that this so called "regression" was actually caused by a temporary misbehaving PGO build process? And close the bug as WONTFIX?

I'm confused - has the regression disappeared again? I thought it was still there. And bug 1542746 was a deliberate change to how we do PGO, and it's possible there's a real regression caused by that, right? Of course, given the comparatively small numbers here and the other improvements bug 1542746 got us we might still decide to wontfix this particular regression...

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(igoldan)

(In reply to :Gijs (he/him) from comment #6)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #5)

(In reply to :Gijs (he/him) from comment #4)

https://treeherder.mozilla.org/#/jobs?repo=autoland&tochange=b5656f3353165e991e563acd3a0b276df4de7e65&searchStr=g5%2Cshippable%2Clinux&fromchange=ddb7366f09a390b17b25984be1681026143ac679

https://treeherder.mozilla.org/perf.html#/graphs?series=autoland,1927254,1,1&zoom=1558723638479.2905,1558736569901.2415,1.5094566727599597,1.8011370161509457&selected=autoland,1927254,482399,819892708,1

This looks to me like it's bug 1542746 instead, which landed right before the backout. That would make a bunch more sense. Ionuț, can you confirm?

Hmmm... I agree with you. Especially since this regression didn't affect any other platforms. Could we conclude that this so called "regression" was actually caused by a temporary misbehaving PGO build process? And close the bug as WONTFIX?

I'm confused - has the regression disappeared again? I thought it was still there. And bug 1542746 was a deliberate change to how we do PGO, and it's possible there's a real regression caused by that, right?

No, the regression is still there & it's real. But bug 1542746 didn't cause it directly. As you said: it only changed the way we do PGO. With this in mind, when bug 1552565 got backed out, somehow the updated PGO build process produced binaries which regress on tp5o_webext responsiveness.

From my experience, PGO-induced regressions like these are just temporary. At some point, the produced binaries come back to original baseline by themselves.
Situations like these happen rarely, but constantly (every 1-2 months); me & :jmaher call these broken PGO builds. Once we're confident that this is the case, we close these bugs as WONTFIX. In our case, I'm confident.

Flags: needinfo?(igoldan)
Whiteboard: faq
Whiteboard: faq → faq-candidate

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7)

No, the regression is still there & it's real. But bug 1542746 didn't cause it directly.

As you said: it only changed the way we do PGO. With this in mind, when bug 1552565 got backed out, somehow the updated PGO build process produced binaries which regress on tp5o_webext responsiveness.

From my experience, PGO-induced regressions like these are just temporary. At some point, the produced binaries come back to original baseline by themselves.
Situations like these happen rarely, but constantly (every 1-2 months); me & :jmaher call these broken PGO builds. Once we're confident that this is the case, we close these bugs as WONTFIX. In our case, I'm confident.

I'm not sure I follow. I expect :froydnj could do a better job explaining the change in bug 1542746 than I could, but AIUI we fundamentally changed how we do PGO. So any regressions from that seem (to me!) more likely to have to do with "the new way we do PGO" than "something odd changed and it happens to impact PGO builds because PGO builds are funny like that". I'd needinfo :froydnj but they're out until later in June...

I'll add that there seems to be a slow upward trend in the 14-day overview graph at https://treeherder.mozilla.org/perf.html#/graphs?series=autoland,1927254,1,1&zoom=1558723638479.2905,1558736569901.2415,1.5094566727599597,1.8011370161509457&selected=autoland,1927254,482399,819892708,1 that seems a little worrying, but probably hard to tie that to a specific change...

Moving this out seeing as consensus seems to be that this is something to do with PGO.

Component: Toolbars and Customization → General
Product: Firefox → Firefox Build System

WONTFIX per IRC discussion with froydnj. As noted elsewhere, changes to PGO can lead to some variation in results, but this specific change was more win than loss.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.