Closed Bug 1769903 Opened 2 years ago Closed 2 years ago

45.93 - 15.86% espn ContentfulSpeedIndex / espn PerceptualSpeedIndex + 3 more (Windows) regression on Mon May 16 2022

Categories

(Core :: Audio/Video: Playback, defect)

Firefox 102
defect

Tracking

()

RESOLVED INVALID
Tracking Status
firefox-esr91 --- unaffected
firefox100 --- unaffected
firefox101 --- unaffected
firefox102 --- affected

People

(Reporter: alexandrui, Assigned: gerard-majax)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(5 files, 1 obsolete file)

Perfherder has detected a browsertime performance regression from push 769d05c30a7cd125690a133eb6892b71937dd081. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
46% espn ContentfulSpeedIndex windows10-64-shippable-qr cold fission webrender 847.00 -> 1,236.00
38% espn fcp windows10-64-shippable-qr cold fission webrender 962.90 -> 1,324.46
29% espn fcp windows10-64-shippable-qr cold fission webrender 1,030.25 -> 1,331.17
24% espn SpeedIndex windows10-64-shippable-qr cold fission webrender 957.75 -> 1,188.67
16% espn PerceptualSpeedIndex windows10-64-shippable-qr cold fission webrender 739.21 -> 856.42

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(lissyx+mozillians)
Flags: needinfo?(lissyx+mozillians)
Assignee: nobody → lissyx+mozillians

I suspect it's because RDD is started very early and Utility AudioDecoder is not

Set release status flags based on info from the regressing bug 1767582

So far:

  • data are kinda noisy so the issue takes a few retry to be visible on try
  • we used to start only rdd to account for audio + video
  • now we start rdd (video) and utility (audio), so I think it mechanically delays us being ready technically
  • data decoders are created in sequence?
  • data decoder are created by code paths that seems to differ between rdd and utility

RDD seems to be created by a canplay (event?): https://searchfox.org/mozilla-central/rev/4719c903af605c2caeda3bb1ce4edb9ca58bf68e/dom/html/HTMLMediaElement.cpp#4848-4868
Utility seems to be created by a .isTypeSupported (property?): https://searchfox.org/mozilla-central/source/__GENERATED__/dom/bindings/MediaSourceBinding.cpp#779-806

Also:

  • running raptor on a windows 10 VM was broken
  • I'm unable yet to get a profile of the issue with media threads on try

Discussing with Paul we should be able to more eagerly load the utility process, but I still need to figure out where exactly.

Attached image with_utility.png

I didn't know the definition of "Cold" test, but I assume that might be related with all threads being idle or something? This screenshot is captured from this try push.

The increased metric could be explained by that now we need more propagation than before we did, because there is one more process for decoding. However, those propagations are all async, and they won't block the main thread. And the time shown in the profiler including the target thread latency, so those tasks actually spent way less time than they were shown on the profiler on the parent process.

Therefore, IMMO, these increased metric number don't really represent a performance problem. Because user won't be affected by those async tasks which were running in the background. (Also, they won't be executed again in the future because we only need them once)

Alexandre, is there a fix planned for 102 or are we OK shipping with this regression? Thanks

Flags: needinfo?(lissyx+mozillians)

(In reply to Pascal Chevrel:pascalc from comment #5)

Alexandre, is there a fix planned for 102 or are we OK shipping with this regression? Thanks

The assessment was given in the above comment, and I was about to ask people from perf team to have a look since nobody checked.

Flags: needinfo?(lissyx+mozillians)

(In reply to Alastor Wu [:alwu] from comment #4)

Created attachment 9277399 [details]
with_utility.png

I didn't know the definition of "Cold" test, but I assume that might be related with all threads being idle or something? This screenshot is captured from this try push.

A cold test is right after loading the browser most probably.

The increased metric could be explained by that now we need more propagation than before we did, because there is one more process for decoding. However, those propagations are all async, and they won't block the main thread. And the time shown in the profiler including the target thread latency, so those tasks actually spent way less time than they were shown on the profiler on the parent process.

Therefore, IMMO, these increased metric number don't really represent a performance problem. Because user won't be affected by those async tasks which were running in the background. (Also, they won't be executed again in the future because we only need them once)

It's preceisly the problem. fcp probably refers to "First Content Paint". Here, this might mean that painting the first frame of video is slower. If this is the case, then the regression is real, but there's no reason we can't fix it.

I agree with what Paul said. We restart the browser between each runs for cold pageload tests, unlike warm pageload which we reuse the same browser session.

When we build the display list, we check each display item in the list and see if it's contentful, and we report the time when we see the first contentful item, this is FCP.

In Gecko Profile, there's the FirstContentfulComposite marker which is pretty much equivalent to FCP. I think one way to diagnose could be collecting 2 profiles for before and after the patch and compare that?

(In reply to Sean Feng [:sefeng] from comment #8)

I agree with what Paul said. We restart the browser between each runs for cold pageload tests, unlike warm pageload which we reuse the same browser session.

When we build the display list, we check each display item in the list and see if it's contentful, and we report the time when we see the first contentful item, this is FCP.

In Gecko Profile, there's the FirstContentfulComposite marker which is pretty much equivalent to FCP. I think one way to diagnose could be collecting 2 profiles for before and after the patch and compare that?

Unfortunately, no FirstContentfulComposite marker on the "good" case.

(In reply to Paul Adenot (:padenot) from comment #7)

fcp probably refers to "First Content Paint". Here, this might mean that painting the first frame of video is slower.

From the image in the comment4, those increasing time used for support information propagation to other processes (which contain script from different domain) shouldn't be related with painting, so I don't know if the painting time really increases or not.

If it did increases, one possibility I could imagine is that, espn starts asking support information before we propagate the audio support information to the content process, which makes them delay the media resource creation/playback.


[update]

Above assumption is wrong. Because we would always return true when we don't have the support information from the utility process yet.

(In reply to Alastor Wu [:alwu] from comment #11)

[...]

If it did increases, one possibility I could imagine is that, espn starts asking support information before we propagate the audio support information to the content process, which makes them delay the media resource creation/playback.

That would match what I mentionned about the moments RDD and Utility gets created in comment 3 ?

Some more analysis:

Here is a push-to-try with profiler enabled https://treeherder.mozilla.org/jobs?repo=try&revision=e6a8e4f38fd324c95da66f1c92244d9229e44ce3&selectedTaskRun=eHxpNF5GQKaWIfI6piVjag.0, and utility disabled by default. Metrics looks good, e.g., fcp opt cold is 913ms. Looking at FirstContentfulComposite on espn process in https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FeHxpNF5GQKaWIfI6piVjag%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_espn.zip/marker-table/?file=espn-cold%2F1-geckoProfile-1.json.gz&globalTrackOrder=kwm0wj&hiddenGlobalTracks=1wh&hiddenLocalTracksByPid=9356-1w6~8380-0w3~5056-0w3~5484-0w5~8464-0w5~1912-0w2~6492-0w3~2724-0w5~8780-0w3~364-0w5~5292-0w2~8652-0w5~10108-0w5~5408-0w5~5580-0w5~808-0w3~6440-0w3~1000-0w5~1132-0w5~4308-01&localTrackOrderByPid=9356-7w901a2w6~8380-2301~5056-2301~5484-2w4051~8464-2w4051~1912-120~6492-2301~2724-2w4051~8780-2301~364-2w4051~5292-120~8652-2w4051~10108-2w4051~5408-2w4051~5580-2w4051~808-2301~6440-2301~1000-2w4051~1132-67081w5~4308-2w40516&markerSearch=FirstContentful&thread=y0y2&timelineType=cpu-category&v=6 however does not yield anything related to 913ms ? Start at 3.02s, duration of 1.77s. I'm not sure how much FirstContentfulComposite relates to fcp here.

After lots of local profiling, I could see some noticable difference in when RDD and Utility starts:

Pushing this kind of improvement to try: https://treeherder.mozilla.org/jobs?repo=try&revision=773bf7a99888344d3d86629248e2b76cebf8583f and then comparing:

:sparky on #perf mentionned about https://bugzilla.mozilla.org/show_bug.cgi?id=1755491, so I did a try with this reverted as well, and compared to the same previous versions:

This illustrates the original regression reported:

$ poetry run python generate_side_by_side.py --base-revision 338c1afa163526b8aab61dcddcb44db70130e2af --base-branch mozilla-central --new-revision 769d05c30a7cd125690a133eb6892b71937dd081 --test-name browsertime-tp6-firefox-espn-fis-e10s --platform test-windows10-64-shippable-qr --cold --overwrite --output bug1769903_original_regression/

this illustrates https://treeherder.mozilla.org/jobs?repo=try&revision=773bf7a99888344d3d86629248e2b76cebf8583f

$ poetry run python generate_side_by_side.py --base-revision 338c1afa163526b8aab61dcddcb44db70130e2af --base-branch mozilla-central --new-revision 773bf7a99888344d3d86629248e2b76cebf8583f --new-branch try --test-name browsertime-tp6-firefox-espn-fis-e10s --platform test-windows10-64-shippable-qr --cold --overwrite --output bug1769903_run_utility_rdd_together/

this illustrates https://treeherder.mozilla.org/jobs?repo=try&revision=837c713395fa8287274286b05511be7655f22b0c

$ poetry run python generate_side_by_side.py --base-revision 338c1afa163526b8aab61dcddcb44db70130e2af --base-branch mozilla-central --new-revision 837c713395fa8287274286b05511be7655f22b0c --new-branch try --test-name browsertime-tp6-firefox-espn-fis-e10s --platform test-windows10-64-shippable-qr --cold --overwrite --output bug1769903_run_utility_rdd_together_with_revert/

Testing a backout of utility audio decoder by default and mitmproxy changes: https://treeherder.mozilla.org/perfherder/compare?originalProject=mozilla-central&originalRevision=338c1afa163526b8aab61dcddcb44db70130e2af&newProject=try&newRevision=26b620fc0084f64cfb3d0c392ad128c5cfa3203f&framework=13&page=1&showOnlyImportant=1

it seems there are still regressions, which would advocate that my patch is not that faulty?

Flags: needinfo?(sefeng)
Flags: needinfo?(sefeng)

After investigation and help from Sean as well, I finally found how to get DOMContentFlushed (dcf) marker visible in profiling. After a first round on Windows and Linux, I would get values ~1500ms consistently, but far away from what is reported on try (linux1804 espn dcf is ~520ms).

Until I figure out raptor has mitmproxy setup (and this backout ...). So I ended up setting up it as well, and re-using records from tooltool. mitmproxy setup is quite trivial, download the binaries, run, setup your proxy in firefox and then browse to http://mitm.it to setup the certificate.

Getting ESPN dump was just:

$ ./mach artifact toolchain --tooltool-manifest "testing/raptor/raptor/tooltool-manifests/playback/mitm7-linux-firefox-espn.manifest" --cache-dir path/to/cache

Rename and extract the zip file downloaded, you get dump.mp

Then on mitmproxy side:

$ ./mitmdump --set websocket=false --set upstream_cert=false --set connection_strategy=lazy --set alt_server_replay_nopop=true --set alt_server_replay_kill_extra=true --set alt_server_replay_order_reversed=true --set alt_server_replay=espn/dump.mp --scripts GECKO_PATH/testing/mozbase/mozproxy/mozproxy/backends/mitm/scripts/alt-serverplayback.py

Then browsing espn page https://www.espn.com/nba/story/_/page/allstarweekend25788027/the-comparison-lebron-james-michael-jordan-their-own-words should only hit the proxy.

Once done, I could finally get figured consistents with what is reported on try, as visible in the screenshot.

I did ~10 runs of each, once with the utility process pref set to false, then to true. As visible, the mean/std are very close, respectively (false) 523ms with 53ms stddev, and (true) 542ms with 65ms stddev. Done on my laptop, so some noise is not unexpected.

Bug 1755491 has been backed out from central and beta https://bugzilla.mozilla.org/show_bug.cgi?id=1755491#c8 https://bugzilla.mozilla.org/show_bug.cgi?id=1755491#c9, so this regression should be considered fixed now, and from comment 19, we should not expect other regressions ?

Just to summarize, we have the audio decoder patch and the mitmproxy patch.

  • audio decoder patch + mitmproxy patch = regression
  • mitmproxy patch = regression (from comment 19)

Have we concluded that audio decoder patch = no regression (due to Bug 1755491 has been backed out)? I just checked the latest ContentfulSpeedIndex graph and it looks like it's moving to the right direction. So this is very likely to be true?

(In reply to Sean Feng [:sefeng] from comment #22)

Just to summarize, we have the audio decoder patch and the mitmproxy patch.

  • audio decoder patch + mitmproxy patch = regression
  • mitmproxy patch = regression (from comment 19)

Have we concluded that audio decoder patch = no regression (due to Bug 1755491 has been backed out)? I just checked the latest ContentfulSpeedIndex graph and it looks like it's moving to the right direction. So this is very likely to be true?

Yes, the dual backout (mitmproxy and utility default enabled) test shows still regressions: https://treeherder.mozilla.org/perfherder/compare?originalProject=mozilla-central&originalRevision=338c1afa163526b8aab61dcddcb44db70130e2af&newProject=try&newRevision=26b620fc0084f64cfb3d0c392ad128c5cfa3203f&framework=13&page=1&showOnlyImportant=1, so it speaks for an anterior regression

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID

Sorry, just to make sure we aren't missing any things..

I checked the above link...It doesn't show the same regression as what it was specified in the description? I mean it doesn't show regressions for fcp, ContentfulSpeedIndex etc..

:sefeng, I think there were some comparison mix-ups. The issue is resolved and we're almost certain that the patch in question didn't cause a regression in our tests. We're going to continue looking at this once we get an alert from perfherder (1-3 days) about all the changes that the mitmproxy backout caused and go from there. But to be clear, so far, we have not found any regressions for this patch after the mitmproxy backout.

See Also: → 1772368
Attachment #9278930 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: