Closed Bug 1700313 Opened 4 years ago Closed 4 years ago

26.15 - 115.97% booking PerceptualSpeedIndex / imgur / imgur fcp (android-hw-g5-7-0-arm7-api-16-shippable, macosx1015-64-shippable-qr) regression on push e1fa9dc7464ba49efdf422116ac2c92e2d40a200 (Fri March 19 2021)

Categories

(Core :: DOM: Core & HTML, defect)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr78 --- unaffected
firefox87 --- unaffected
firefox88 + fixed
firefox89 + fixed

People

(Reporter: Bebe, Assigned: sfink)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Perfherder has detected a browsertime performance regression from push e1fa9dc7464ba49efdf422116ac2c92e2d40a200. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
116% imgur fcp macosx1015-64-shippable-qr nocondprof warm webrender 68.62 -> 148.21
109% imgur fcp macosx1015-64-shippable-qr nocondprof warm webrender 69.17 -> 144.62
41% imgur macosx1015-64-shippable-qr nocondprof warm webrender 167.50 -> 235.47
26% booking PerceptualSpeedIndex android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 821.12 -> 1,035.83

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
15% booking fcp android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 560.65 -> 476.12
15% booking fcp android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 564.71 -> 480.75
12% booking FirstVisualChange android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 621.29 -> 546.83
10% booking android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 843.94 -> 756.49
3% booking loadtime android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 1,341.85 -> 1,306.38

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(sphink)

I am looking into this, but it should be backed out if I don't come up with something in the next day or two (in my opinion; you can take it out earlier if it's causing problems.)

Set release status flags based on info from the regressing bug 1692308

I think we can go ahead with the backout here.

Flags: needinfo?(fstrugariu)

:RyanVM yep we can back this out

Flags: needinfo?(fstrugariu)
Assignee: nobody → sphink
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(sphink)
Resolution: --- → FIXED

== Change summary for alert #29550 (as of Fri, 02 Apr 2021 05:03:55 GMT) ==

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
15% booking fcp android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 494.58 -> 569.54
15% booking fnbpaint android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 501.62 -> 575.92
15% booking fnbpaint android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 502.02 -> 576.00
12% booking FirstVisualChange android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 565.42 -> 634.00
11% booking dcf android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 1,065.29 -> 1,177.96
7% amazon LastVisualChange android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 1,847.10 -> 1,968.25
4% reddit loadtime linux1804-64-shippable-qr nocondprof warm webrender 2,036.19 -> 2,114.71

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
22% booking PerceptualSpeedIndex android-hw-g5-7-0-arm7-api-16-shippable nocondprof warm webrender 1,013.88 -> 795.25

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29550

I'm confused, doesn't that point to mostly progressions? Was the imgur regression in comment 0 missclassified given we didn't see the relevant progression when reverting?

Flags: needinfo?(sphink)
Flags: needinfo?(ryanvm)
Flags: needinfo?(fstrugariu)

I'm not sure if I can fully answer that, but:

  • comment 0 had 5 regressions and 4 improvements, but the magnitude of the regressions was much larger
  • this was meant to be a nearly pure refactoring patch, so just about any change is unexpected
  • when I manually pushed the osx imgur jobs, I consistently saw the regression (of about the same magnitude)
  • the regression is one of those annoying ones where the results are and have always been bimodal, but before the patch it was rare to be in the slow mode and after the patch it was rare to be in the fast mode :(

I'm still intending to look into this, since it's holding up some other changes (and not just mine). But I got a little sidetracked into making it easy to push marker-only profiling runs, which I thought would be helpful for diagnosing this. ;-)

Flags: needinfo?(sphink)

I do agree that comment 6 implies that it might be better to leave it in at this point, rather than holding up stuff on my investigation. I was certainly not expecting those results.

Bebe, did we see any change in the numbers on Beta after this landed?

Flags: needinfo?(ryanvm)

== Change summary for alert #29555 (as of Fri, 02 Apr 2021 12:54:49 GMT) ==

This is the beta alert:

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
116% imgur FirstVisualChange macosx1015-64-shippable-qr nocondprof warm webrender 83.33 -> 180.00
13% imgur SpeedIndex macosx1015-64-shippable-qr nocondprof warm webrender 649.72 -> 734.42
11% imgur PerceptualSpeedIndex macosx1015-64-shippable-qr nocondprof warm webrender 753.92 -> 839.25

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
53% imgur fcp macosx1015-64-shippable-qr nocondprof warm webrender 145.96 -> 68.17
30% imgur macosx1015-64-shippable-qr nocondprof warm webrender 234.07 -> 164.78

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29555

Flags: needinfo?(fstrugariu) → needinfo?(ryanvm)

I'm so lost. On Nightly we only changed mostly Android numbers but on Beta it was all macOS? Not gonna lie, I'm having a really difficult time trusting the reliability of these test suites right now.

Flags: needinfo?(ryanvm)

Looking at the graph for booking on android we can see that the regression hit autoland and was backed out
on beta we can only fin the backout commit and not the original commit

https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=0&highlightedRevisions=b6b786aa47c5fc504aa2a8cbc5f510091626e1ac&series=mozilla-beta,2955667,1,13&series=autoland,2891214,1,13&series=autoland,3341809,1,13&timerange=2592000&zoom=1615285015325,1617891945658,374.94444444444446,671.5555555555555

on osx imgur we can see the data is there and we saw the backout regression on beta
https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&series=autoland,3341232,1,13&series=autoland,3186474,1,13&series=mozilla-beta,3337152,1,13&series=mozilla-beta,2907877,1,13&timerange=2592000

As mac suffered a higher magnitude regression on beta should't we focus on that
Also is there any reason we did not seen the initial commit on beta?

Are there any beta flags that can hide this regression on andorid?

Flags: needinfo?(ryanvm)

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #13)

As mac suffered a higher magnitude regression on beta should't we focus on that

I'm just trying to understand why we're seeing different effects on different platforms depending on which branch we're on. My concerns aren't about the priority of one platform vs. another as much as trying to understand the inconsistencies we're seeing.

Also is there any reason we did not seen the initial commit on beta?

The regressing bug landed on m-c during the Fx88 development cycle, so presumably it came to Beta on merge day when 88 migrated there.

Are there any beta flags that can hide this regression on andorid?

Not that I'm aware of, hence my confusion. Also, I don't know why we wouldn't have seen the macOS regressions on autoland either.

Flags: needinfo?(ryanvm)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #14)

Not that I'm aware of, hence my confusion. Also, I don't know why we wouldn't have seen the macOS regressions on autoland either.

I suspect this is due to the changes in the macOS platform (bug 1694185).

I'm having a really difficult time trusting the reliability of these test suites right now.

Is this because we didn't see the change on mozilla-beta for Android? I agree this is worth investigating but I don't think it's concerning enough to question the reliability of our test suites. :bebe could you file a bug to investigate autoland Android changes reaching beta? We should determine if this is an isolated incident.

Flags: needinfo?(fstrugariu)

I've spent some time looking at this.

Before: https://share.firefox.dev/2P2WUPk
After: https://share.firefox.dev/3edApQs

I hope that when I click the permalink button in the profiler I get a link that gives the view of what I'm zoomed in to.

The first non-blank paint is definitely slower in "After". The difference as far as I can tell is that there's extra JS code compiled and executed before the paint, I think that this code is normally executed after this paint. This is my first time diving into our task scheduler code, and the part that I don't understand is that both Before and After profiles create a LOW_PRIORITY_ONE_SHOT timer, the 2nd one does it indirectly through the IdleTaskRunner. There's something I'm missing. How could these timers that look the same (or maybe some other difference I haven't noticed) cause different tasks to be scheduled? (NI: Bas)

This is shuffling around of some tasks, so maybe it doesn't actually make an important difference to performance? Is it worth landing Bug 1692308 anyway? (NI: sfink).

Thanks.

Flags: needinfo?(sphink)
Flags: needinfo?(bas)

I stared at this for a while today too. As pbone describes, the patch really looks like it ought to keep the timing identical. It creates the same timer as before, with the same priority.

Here are my sample profiles.

Before (unpatched): https://share.firefox.dev/3dCsdKt
After (patched): https://share.firefox.dev/3tD0tef

Those are zoomed into the beginning portion of the time, roughly aligned. They show enough time for the unpatched version to finish the first nonblank paint, but the patched version is still going.

Differences I noticed (validated with another pair of profiles to remove some of the chance artifacts:

  • Both have a common section where they're doing roughly the same thing in terms of flushing/reflowing/painting, until the unpatched version does a reflow followed by SetNeedStyleFlush in "Task nsHtml5SVGLoadDispatcher". Then the next refresh tick does the paint that finishes the FirstNonBlankPaint. The patched version also does an SVGLoadDispatcher SetNeedsStyleFlush, but much later, well after the patched version is already done. Both do other SetNeedStyleFlushes.
  • Shortly before the above, the unpatched (fast) version gets "SVGLoad" and "overflow" DOMEvents, associated with the same URL. There are 3 SVGLoad runnables that fire at this time. None of these show up within the same time window in the patched version.
  • The unpatched (fast) profile fires a RunWillPaintObservers runnable right around the time of the FirstNonBlankPaint
  • The patched (slow) version compiles https://s.imgur.com/min/gallery.js?1591637802 offthread, runs it, and finishes much sooner than the unpatched.
  • The patched version does several more mainthread inline script compiles+runs

I don't know what all this means. I'm kind of out of my depth here. If I were to make up a story, it would be that the patched version is too busy running some scripts to bother loading the SVG, and that's the first thing drawn in the unpatched version (to make it a NonBlankPaint)?

I doubt that in the end this matters much for performance, so it would probably be best to land and take the (mysterious) hits on osx and Android. I would still like to understand what's going on, given that it doesn't change the set of timers as far as I know. I'm going to hope that Bas's more experienced eye can see what's going on. The difference is quite reliable, even if it is a result of chance reshuffling.

If there is some difference in scheduling decisions, is there a good place I could instrument the scheduling code with markers? Like, "we had 3 tasks to choose from and picked X".

mccr8 suggested we also NI smaug as he's done some work with scheduling too.

Flags: needinfo?(bugs)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #14)

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #13)

Are there any beta flags that can hide this regression on andorid?

Not that I'm aware of, hence my confusion. Also, I don't know why we wouldn't have seen the macOS regressions on autoland either.

Note that there are some (different) Android improvements from the backout on Beta, on the order of 5%. See the reddit & espn results at https://treeherder.mozilla.org/perfherder/alerts?id=29555

Flags: needinfo?(sphink)

I commented in https://phabricator.services.mozilla.com/D104881 and matrix. The regressing patch does change behavior at least in one case. It might lead to running full GC more often.

Flags: needinfo?(bugs)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚GMT from comment #15)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #14)

Not that I'm aware of, hence my confusion. Also, I don't know why we wouldn't have seen the macOS regressions on autoland either.

I suspect this is due to the changes in the macOS platform (bug 1694185).

I'm having a really difficult time trusting the reliability of these test suites right now.

Is this because we didn't see the change on mozilla-beta for Android? I agree this is worth investigating but I don't think it's concerning enough to question the reliability of our test suites. :bebe could you file a bug to investigate autoland Android changes reaching beta? We should determine if this is an isolated incident.

https://bugzilla.mozilla.org/show_bug.cgi?id=1722902

Flags: needinfo?(fstrugariu)
Has Regression Range: --- → yes

Hey :smaug has this been resolved, and if not can you provide additional context on the status of this?

Flags: needinfo?(bugs)

I guess that should be more a question to sfink.

Flags: needinfo?(bugs) → needinfo?(sphink)
See Also: → 1722902

(In reply to Kash [:kshampur] ⌚EST from comment #22)

Hey :smaug has this been resolved, and if not can you provide additional context on the status of this?

I think we can call this resolved. The regressing bug was re-landed with some fixes. I would have liked to have figured out the mystery here, but it does not appear to be an issue at this point.

Flags: needinfo?(sphink)
Flags: needinfo?(bas)
You need to log in before you can comment on or make changes to this bug.