Closed Bug 1745479 Opened 3 years ago Closed 3 years ago

306.06 - 6.43% outlook fnbpaint / twitch dcf + 25 more (Linux, OSX, Windows) regression on Wed December 8 2021

Categories

(Core :: DOM: Core & HTML, defect)

Firefox 97
defect

Tracking

()

RESOLVED INVALID
Tracking Status
firefox-esr91 --- unaffected
firefox95 --- unaffected
firefox96 --- unaffected
firefox97 --- affected

People

(Reporter: alexandrui, Assigned: hsivonen)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(1 obsolete file)

Perfherder has detected a browsertime performance regression from push e135fe933c4e31482e91c77617a27c04b3a44eef. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
306% outlook fnbpaint linux1804-64-shippable-qr warm webrender 70.15 -> 284.83
256% outlook dcf macosx1015-64-shippable-qr warm webrender 76.62 -> 272.83
251% outlook fnbpaint macosx1015-64-shippable-qr warm webrender 78.50 -> 275.42
250% outlook fcp macosx1015-64-shippable-qr warm webrender 78.71 -> 275.67
220% outlook dcf linux1804-64-shippable-qr warm webrender 87.42 -> 280.12
218% outlook fnbpaint linux1804-64-shippable-qr warm webrender 89.29 -> 284.33
218% outlook fcp linux1804-64-shippable-qr warm webrender 89.33 -> 284.42
154% outlook dcf windows10-64-shippable-qr warm webrender 86.71 -> 220.12
148% outlook fnbpaint windows10-64-shippable-qr warm webrender 89.71 -> 222.50
148% outlook fcp windows10-64-shippable-qr warm webrender 89.71 -> 222.38
... ... ... ... ...
35% outlook loadtime windows10-64-shippable-qr warm webrender 397.17 -> 535.08
17% outlook loadtime linux1804-64-shippable-qr cold webrender 1,091.46 -> 1,271.79
16% outlook loadtime macosx1015-64-shippable-qr cold webrender 1,216.54 -> 1,407.17
14% outlook loadtime windows10-64-shippable-qr cold webrender 1,033.46 -> 1,176.21
6% twitch dcf linux1804-64-shippable-qr cold webrender 626.06 -> 666.33

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
40% twitch FirstVisualChange macosx1015-64-shippable-qr warm webrender 78.33 -> 46.67
36% twitch fcp macosx1015-64-shippable-qr warm webrender 51.62 -> 33.17
31% twitch FirstVisualChange windows10-64-shippable-qr warm webrender 97.25 -> 67.00
29% twitch fnbpaint windows10-64-shippable-qr warm webrender 64.06 -> 45.62
29% twitch fcp windows10-64-shippable-qr warm webrender 59.27 -> 42.33
... ... ... ... ...
2% twitch loadtime macosx1015-64-shippable-qr warm webrender 504.69 -> 492.67

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(hsivonen)

First of all, I'd like to ask that the patch not be backed out, because the patch also has performance benefits for other pages and is meant to have performance benefits. Notably, the patch eliminates a whole reload on some pages.

I don't have access to what exactly Raptor loads in the outlook case, but looking at the live site, it indeed looks like the top-level page (but not the subdocuments) goes through a code path that the patch changed.

The top-level document doesn't declare the encoding on the HTTP layer but declares it via <meta> very early on. Overall, this should result in the parser thread doing less work than previously, but the interaction between the parser thread and the main thread can be expected to differ.

Without actually looking at this in Pernosco, I expect the situation with the patch to result in flushing a small number of tree operations earlier than previously. That shouldn't cause layout to be started, though, so it's unclear to me what exactly in the flushing of operations causes the measured difference.

Perhaps the first flush starts a new timer delay for the next flush or something along those lines so that the next flush comes later than the single pre-patch flush encompassing the data from both post-patch flushes.

I'll investigate.

Flags: needinfo?(hsivonen)

Set release status flags based on info from the regressing bug 1701828

Assignee: nobody → hsivonen
Status: NEW → ASSIGNED

I expect the patch for bug 1745142 to address this.

Depends on: 1745142

New idea: Maybe taking the encoding speculation from the queue shouldn't involve calling Interrupt() and the interrupted state continues for too long.

Eliminating parser continuation runnables and using MutexAutoUnlock instead:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=59b2ccdb1dc122ee564d3e04f92dadc865e38672

Hmm. Disappointing results.

Add NI here so it doesn't fall of the radar.

Flags: needinfo?(hsivonen)
Has Regression Range: --- → yes

This isn't quite the root root cause, but the problem here is that the time from the parser thread flushing the queue that contains that StartLayout operation got the time from flush on parser thread to StartLayout executing on the main thread extended from 6 milliseconds to 41 milliseconds (on my workstation). This suggests that whatever the parser flushed to the main thread previously made the main thread busy enough not to process the next flush in a timely manner.

try candidate:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ab1ca1d802bc215e234f9fdac6a057f8471db86

(In reply to Henri Sivonen (:hsivonen) from comment #15)

This isn't quite the root root cause, but the problem here is that the time from the parser thread flushing the queue that contains that StartLayout operation got the time from flush on parser thread to StartLayout executing on the main thread extended from 6 milliseconds to 41 milliseconds (on my workstation). This suggests that whatever the parser flushed to the main thread previously made the main thread busy enough not to process the next flush in a timely manner.

It seems that the RunFlushLoop throttling mechanism ends up throttling the flush that contains the StartLayout operation. The attached patch does three things:

  1. If a tree op flush is throttled, speculative loads are flushed anyway.
  2. Throttling is bypassed if layout hasn't started.
  3. The first speculation dequeued after an encoding commitment is flushed right away.

(If this is fixed by bypassing throttling, we should find out as a follow-up, why the throttling activates what's supposed to be the frontmost page in the harness.)

Further examination suggests that it's not the throttling mechanism in the tree op executor. But something seems to be making the relevant flush runnable run after quite a delay relative to when it was posted to the event queue.

What I was seeing locally seems to have no relation to the metrics on try. Also, I also tried a different case locally, and it was quite noisy.

On the topic of whether this metric change is representative of Outlook specifically or a larger set of similar pages: Comment 0 shows an improvement on Twitch. As it happens, Outlook and Twitch are the two sites in our measurement set that don't declare the character encoding on the HTTP layer and, therefore, were affected by bug 1701828. That Twitch improved suggests that we don't need to worry about a problem on all sites of this type.

Notably, telemetry on time to non-blank paint doesn't show any cliff.

smaug pointed out to me that on the day the patch that regressed the fnbpaint measurement on Outlook here, SpeedIndex for Outlook improved considerably on December 8 when the patch landed. Since that metric is more important than fnbpaint, it doesn't make sense to back out.

Attachment #9260351 - Attachment is obsolete: true

It looks like before this patch, we somehow measured dnf and fnb "twice" - we measured two values, e.g. 215ms and 60ms, and then only took the latter value. After the patch, there's only one dnf / fnb value for the page load, and it seems to be roughly equal to the sum of the two old values.

Here's a profile from "before" which shows the double measurement: https://share.firefox.dev/3onT8ib
In the marker chart, there are two DOMContentFlushed markers and two FirstNonBlankPaint markers.

Here's a profile from "after": https://share.firefox.dev/3g5zouM
You can see only one marker each, and it is roughly equal to the sum of the markers in the "before" profiles.

These profiles come from a ./mach try fuzzy --geckoProfile run, for the macosx1015-64-shippable-qr configuration, which had the following "regression" detected:

| 256% | outlook dcf | macosx1015-64-shippable-qr | warm webrender | 76.62 -> 272.83 |

before job, after job

(In reply to Markus Stange [:mstange] from comment #22)

It looks like before this patch, we somehow measured dnf and fnb "twice" - we measured two values, e.g. 215ms and 60ms, and then only took the latter value. After the patch, there's only one dnf / fnb value for the page load, and it seems to be roughly equal to the sum of the two old values.

Thank you!

The most obvious explanation for this would be that bug 1701828 eliminated a reload scenario and the reload scenario could easily have lead to two measurements. However, 1) one would expect the first measurement to be smaller than the second one in that case and 2) at least the current live version of Outlook doesn't appear to have the conditions that would have triggered a reload before bug 1701828.

In any case, I'm resolving this based on the finding that this is a measurement error.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(hsivonen)
Resolution: --- → INVALID

Thank you all for looking into this!

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: