Closed Bug 1773127 Opened 2 years ago Closed 1 year ago

8.62 - 5.95% tumblr LastVisualChange / tumblr fcp (Linux) regression on Mon June 6 2022

Categories

(Testing :: Raptor, defect, P2)

Firefox 103
defect

Tracking

(firefox-esr91 unaffected, firefox-esr102 unaffected, firefox101 unaffected, firefox102 unaffected, firefox103 wontfix, firefox104 fix-optional)

RESOLVED FIXED
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox101 --- unaffected
firefox102 --- unaffected
firefox103 --- wontfix
firefox104 --- fix-optional

People

(Reporter: alexandrui, Unassigned)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Perfherder has detected a browsertime performance regression from push 5718cf939e0d24b8e7324515f8ff9fc1867aeb7b. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
200% tumblr FirstVisualChange linux1804-64-shippable-qr fission warm webrender 56.67 -> 170.00
53% tumblr SpeedIndex linux1804-64-shippable-qr fission warm webrender 180.88 -> 277.00
36% tumblr PerceptualSpeedIndex linux1804-64-shippable-qr fission warm webrender 246.33 -> 335.67
21% tumblr ContentfulSpeedIndex linux1804-64-shippable-qr fission warm webrender 345.25 -> 417.67
9% tumblr LastVisualChange linux1804-64-shippable-qr fission warm webrender 773.33 -> 840.00
6% tumblr fcp linux1804-64-shippable-qr fission warm webrender 183.10 -> 194.00

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(gmierz2)

Set release status flags based on info from the regressing bug 1655554

Looking through the commits here: https://github.com/sitespeedio/browsertime/commits/main?before=d319b00e965e16d559abf7ce1f7c99dd989929d5+35&branch=main&qualified_name=refs%2Fheads%2Fmain

From there to May 4th, the one that sticks out most is this one: https://github.com/sitespeedio/browsertime/commit/e610bfb4f6f7231f73228e993bbebac8a5911f22

If this is the only regression we have though then I suggest closing it as wontfix (harness change).

Flags: needinfo?(gmierz2)
Has Regression Range: --- → yes
Assignee: nobody → gmierz2
Severity: -- → S2
Priority: -- → P1

I think this indicates that something isn't working as it should: The correct flow should be that the video is turned off before we start to collect JavaScript metrics, meaning independently of what metrics we collect, the video should be the same.

I wonder if we need to add an extra wait? I'll go through the code first thing tomorrow.

Is that test using Firefox window recorder? We have different code depending on which recorder that is used.

Flags: needinfo?(gmierz2)

Yes, it uses the Firefox window recorder, here's the full command we are running if it helps: /home/cltbld/tasks/task_165480080508932/fetches/node/bin/node /home/cltbld/tasks/task_165480080508932/fetches/browsertime/node_modules/browsertime/bin/browsertime.js --firefox.geckodriverPath /home/cltbld/tasks/task_165480080508932/fetches/geckodriver /home/cltbld/tasks/task_165480080508932/build/tests/raptor/raptor/browsertime/../../browsertime/browsertime_pageload.js --firefox.noDefaultPrefs --browsertime.page_cycle_delay 1000 --skipHar --pageLoadStrategy none --webdriverPageload true --firefox.disableBrowsertimeExtension true --pageCompleteCheckStartWait 5000 --pageCompleteCheckPollTimeout 1000 --timeouts.pageLoad 60000 --timeouts.script 120000 --browsertime.page_cycles 2 --pageCompleteWaitTime 5000 --browsertime.url https://www.tumblr.com/dashboard --browsertime.post_startup_delay 1000 --iterations 25 --browsertime.secondary_url https://www.tumblr.com/tagged/funny+cats?sort=top --browsertime.commands --viewPort 1280x1024 --browser firefox --firefox.binaryPath /home/cltbld/tasks/task_165480080508932/build/application/firefox/firefox --firefox.profileTemplate /tmp/tmp8wly_akf/profile --resultDir /home/cltbld/tasks/task_165480080508932/build/blobber_upload_dir/browsertime-results/tumblr --video true --visualMetrics true --visualMetricsContentful true --visualMetricsPerceptual true --visualMetricsPortable true --videoParams.keepOriginalVideo true --firefox.windowRecorder true

Flags: needinfo?(gmierz2)

I think the code then somehow is broken in the recorder? I've added https://github.com/sitespeedio/browsertime/pull/1805 where you can configure a delay to wait so that the video really stops before moving on (but if the async code would work in the recorder that should happen automatically?). The code work like this:

  1. The page complete check has finished
  2. Stop the video
  3. Start collect JavaScript metrics (as the one you noticed causing the regression).

Or at least should work like that :)

Ok, thanks for investigating Peter! I made a couple try runs here to see if there's still a regression on tumblr when we use FFMPEG instead: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=1cbf93477ee6fbe599ff343088ac6c2e877a9568&newProject=try&newRevision=55369df6954b1297d7e0dc500d9ff4c291f02f56&framework=13&page=1

I'll post again when results are available.

:sefeng, would you be able to look into this issue with the window recorder?

Flags: needinfo?(sefeng)

Sorry I don't think I am the best person to look into this as I am not familiar with window recorder..

Thanks!

Flags: needinfo?(sefeng)

:bas are you able to review?

Flags: needinfo?(bas)

Redirect a needinfo that is pending on an inactive user to the triage owner.
:sparky, could you please find another way to get the information or close the bug as INCOMPLETE if it is not actionable?

For more information, please visit auto_nag documentation.

Flags: needinfo?(peter) → needinfo?(gmierz2)
Flags: needinfo?(gmierz2)

Set release status flags based on info from the regressing bug 1655554

Assignee: gmierz2 → nobody
Priority: P1 → P2

@sparky Is there anything more to do here? Are we currently using the Firefox window recorder or FFMPEG? As this is not a regression that impacts users, I suggest closing this bug as WONTFIX and opening a bug against the Firefox window recorder. Perhaps also opening a meta bug to track switching to the Firefox window recorder if there are multiple blockers. Perhaps we even already have such a bug?

Flags: needinfo?(gmierz2)

Yes, no progress has been made on this bug and we're waiting on a ni? from :bas. We're currently using the window recorder, and that's what's broken here (or the reason why this regression occurred). For context, this regression doesn't exist when we use FFMPEG.

I disagree with closing this as WONTFIX unless we decide not to fix the issue (or unless we're changing our approach to handling these).

Flags: needinfo?(gmierz2)

Hmpf. So based on the comments on this bug it suggests that there can be JS code running before the Window Recorder has actually recorded its last frame. My theory here is that these websites have something that's visually changing continuously, and that because the Window Recorder operates asynchronously, we end up grabbing another frame that is delayed by that work before we actually stop it. Ideally we could just make sure the Window Recorder's promise here https://searchfox.org/mozilla-central/source/dom/base/nsDOMWindowUtils.cpp#4580 has resolved before we start collecting metrics.

Flags: needinfo?(bas)

Not sure if it's relevant, but sounds like it might be, in bug 1779953 we fixed a bug where the window recorder was actually recording the previous frame. So the time stamps of the frames would be off, but this also meant we would never get ahold of the very last snapshot of the window. The fix landed after this bug was filed.

Resolving as fixed based on :tnikkel's comment. We can open a new bug for this if it happens again.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.