Closed Bug 1853147 Opened 1 year ago Closed 1 year ago

13.02 - 10.15% wikia fcp / wikia FirstVisualChange + 2 more (Windows) regression on Tue September 12 2023

Categories

(Testing :: web-platform-tests, defect)

Desktop
Windows 10
defect

Tracking

(firefox-esr102 unaffected, firefox-esr115 fixed, firefox117 unaffected, firefox118 unaffected, firefox119 fixed, firefox120 fixed)

RESOLVED FIXED
120 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 --- fixed
firefox117 --- unaffected
firefox118 --- unaffected
firefox119 --- fixed
firefox120 --- fixed

People

(Reporter: afinder, Assigned: whimboo)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(1 file)

Perfherder has detected a browsertime performance regression from push 6b6020b8d61f2caa4a75ff6719cbd9da74933650. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
13% wikia fcp windows10-64-shippable-qr cold fission webrender 219.13 -> 247.66
12% wikia fcp windows10-64-shippable-qr cold fission webrender 218.31 -> 244.85
11% wikia FirstVisualChange windows10-64-shippable-qr cold fission webrender 254.97 -> 284.07
10% wikia FirstVisualChange windows10-64-shippable-qr cold fission webrender 256.92 -> 282.99

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(hskupin)

Set release status flags based on info from the regressing bug 1849972

This might be a remaining performance regression that I missed while fixing bug 1850950. There were basically too many things to obey.

The changeset that should have cause that is basically:
https://hg.mozilla.org/integration/autoland/rev/d418c3caa7dea5695a29750c5073179650ba68d1

Specifically the changes in the RecommendedPreferences.sys.mjs should be checked.

I'll push some try builds with preference changes reverted.

Flags: needinfo?(hskupin)
See Also: → 1850950
Assignee: nobody → hskupin
Status: NEW → ASSIGNED

The problem here is the preference for the captive portal (captivedetect.canonicalURL). When it's not set to an empty string the performance regression on Windows for wikia is gone.

It is unclear to me how this preference could influence the page load, and FirstVisualChange metrics especially when it is turned off by setting an empty string as value. Not sure who from the network team works on it but Valentin maybe you have an idea or can forward it to the right person?

Or maybe setting network.captive-portal-service.enabled to false might be the right thing to do otherwise? Here another try build to check for a different behavior:

https://treeherder.mozilla.org/jobs?repo=try&revision=efed1e78e5361ef429a3c081a9f1b9a17e1e2624

Flags: needinfo?(valentin.gosu)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #3)

Or maybe setting network.captive-portal-service.enabled to false might be the right thing to do otherwise? Here another try build to check for a different behavior:

https://treeherder.mozilla.org/jobs?repo=try&revision=efed1e78e5361ef429a3c081a9f1b9a17e1e2624

This doesn't help for these above metrics but shows a 5-7% improvement for LastVisualChange and PerceptualSpeedIndex.

It would like to wait for some feedback from the network team to know what should be done here. Maybe we should just remove setting at all for now. But I would still be interested why this feature affects these metrics.

Assuming we want the perf test to be as close to real life as possible, we probably want the canonical URL to be non-empty.
Not sure why the pref would be worse with the empty URL. Maybe some exception is being thrown when the URL is empty, and that's affecting performance. You could try setting the canonicalURL to data:text/html,<meta http-equiv="refresh" content="0;url=https://support.mozilla.org/kb/captive-portal"/>
This should act like a functioning captive portal check, but without the actual network request.
Let me know if this still has a perf hit.

Flags: needinfo?(valentin.gosu)

Thank you for the feedback Valentin!

Indeed, we throw an error which looks like JavaScript error: resource://gre/modules/CaptiveDetect.sys.mjs, line 287: NS_ERROR_FAILURE: No canonical URL set up.. Changing it to what you suggest doesn't work as well, given that it will still throw an error. But we could use http://localhost/dummy/captivedetect, which doesn't cause an error during startup of Firefox.

Lets see if this might be an option. Here a new try build:
https://treeherder.mozilla.org/jobs?repo=try&revision=7f90367c19c0cd75b7fa2445a700d6ed09d35523

Ah, the data URL throws a QueryInterface error because the data channel is not a HTTP channel here 🤦🏻‍♂️

http://localhost/dummy/captivedetect

This might be OK as long as we're sure there's nothing running on localhost:80 that could return an HTTP response. (which there could 🙂)
Maybe we should instead choose a port that we're sure we can't connect to, like port 1 (see gbadportlist )
So http://localhost:1/dummy/captivedetect might make a better choice. Let me know if that works.

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #6)

Lets see if this might be an option. Here a new try build:
https://treeherder.mozilla.org/jobs?repo=try&revision=7f90367c19c0cd75b7fa2445a700d6ed09d35523

There are no changes in the perf numbers for this try build. :/

(In reply to Valentin Gosu [:valentin] (he/him) from comment #7)

http://localhost/dummy/captivedetect

This might be OK as long as we're sure there's nothing running on localhost:80 that could return an HTTP response. (which there could 🙂)
Maybe we should instead choose a port that we're sure we can't connect to, like port 1 (see gbadportlist )
So http://localhost:1/dummy/captivedetect might make a better choice. Let me know if that works.

Thanks for the information. I've applied and triggered a try build:
https://treeherder.mozilla.org/jobs?repo=try&revision=1c217c5cba28126e8940c4eda2e422602edca8d1

If that all doesn't help we most likely cannot set the preference by default in Remote Agent but have to leave it and only set it for wpt tests in its default profile, which would be similar to the safebrowsing update one.

This didn't help neither. So I'm going forward with my idea from the last comment and only disable the captive portal service for web-platform tests where this is not needed.

Here a try build:
https://treeherder.mozilla.org/jobs?repo=try&revision=50cece9320ec33b6bc9c86cb97b196afa71b67f3

Simply disabling the portal seems fine then 👍‍‍

This build works! Performance numbers are back to normal and Wdspec tests also pass (the ones on Android are expected fail because they were run with Fission).

I'm going to upload the patch.

Set release status flags based on info from the regressing bug 1849972

Attachment #9355053 - Attachment description: Bug 1853147 - [remote] Disable captive portal service for web-platform tests. → Bug 1853147 - [remote] Disable captive portal service for web-platform tests only.
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9ebc1575e679 [remote] Disable captive portal service for web-platform tests only. r=webdriver-reviewers,Sasha
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 120 Branch

The patch landed in nightly and beta is affected.
:whimboo, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox119 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(hskupin)

Comment on attachment 9355053 [details]
Bug 1853147 - [remote] Disable captive portal service for web-platform tests only.

Beta/Release Uplift Approval Request

  • User impact if declined: The issue doesn't affect end-users but only WebDriver clients specifically measuring page load performance. Without the patch the values could be a bit higher than usual.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Just unsets a preference for Remote Protocols.
  • String changes made/needed: no
  • Is Android affected?: Yes
Flags: needinfo?(hskupin)
Attachment #9355053 - Flags: approval-mozilla-beta?

Comment on attachment 9355053 [details]
Bug 1853147 - [remote] Disable captive portal service for web-platform tests only.

Approved for 119.0b4

Attachment #9355053 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

(In reply to Pulsebot from comment #14)

Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9ebc1575e679
[remote] Disable captive portal service for web-platform tests only.
r=webdriver-reviewers,Sasha

== Change summary for alert #39727 (as of Sat, 30 Sep 2023 04:31:26 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
11% wikia fcp windows10-64-shippable-qr cold fission webrender 242.71 -> 214.99 Before/After
9% wikia FirstVisualChange windows10-64-shippable-qr cold fission webrender 280.32 -> 254.06 Before/After

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=39727

FYI the last comment shows that performance numbers are back at normal values.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: