Closed Bug 1592975 Opened 1 year ago Closed 7 months ago

2.99 - 8.02% raptor-tp6-instagram-firefox-cold / raptor-tp6-instagram-firefox-cold fcp / raptor-tp6-twitch-firefox fcp (linux64-shippable, linux64-shippable-qr) regression on push e66da643d9bcbc594a9c09a99271ae4d9415e388 (Wed October 30 2019)

Categories

(Core :: DOM: Security, defect, P1)

defect

Tracking

()

RESOLVED INVALID
Tracking Status
firefox-esr68 --- unaffected
firefox70 --- unaffected
firefox71 --- wontfix
firefox72 --- wontfix
firefox73 --- wontfix
firefox74 --- wontfix

People

(Reporter: Bebe, Assigned: sstreich)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [domsecurity-active][qf:p3:pageload])

Attachments

(1 file)

Raptor has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=e66da643d9bcbc594a9c09a99271ae4d9415e388

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

8% raptor-tp6-twitch-firefox fcp linux64-shippable opt 70.90 -> 76.58
5% raptor-tp6-instagram-firefox-cold fcp linux64-shippable-qr opt 188.96 -> 199.00
5% raptor-tp6-instagram-firefox-cold fcp linux64-shippable-qr opt 190.25 -> 198.83
3% raptor-tp6-instagram-firefox-cold linux64-shippable-qr opt 370.32 -> 381.40

Improvements:

26% raptor-tp6-twitch-firefox-cold fcp linux64-shippable-qr opt 115.46 -> 85.08
26% raptor-tp6-twitch-firefox-cold fcp linux64-shippable-qr opt 115.67 -> 85.42
9% raptor-tp6-twitch-firefox-cold linux64-shippable-qr opt 253.08 -> 231.46

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=23640

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.

To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/TestEngineering/Performance/Raptor

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling

Blocks: 1592626
Component: Performance → DOM: Security
Flags: needinfo?(sstreich)
Product: Testing → Core
Regressed by: 1592651
Version: Version 3 → unspecified
Assignee: nobody → sstreich
Flags: needinfo?(sstreich)
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ebc3ca33bc0c
Re-enable XTCO per default r=ckerschb
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72

Is this wontfix for 71 given the regressing bug?

Flags: needinfo?(sstreich)

not sure this is the correct solution:
We just reverted the improvements in the last alert

== Change summary for alert #23736 (as of Thu, 07 Nov 2019 07:38:30 GMT) ==

Regressions:

39% raptor-tp6-twitch-firefox-cold fcp linux64-shippable-qr opt 89.92 -> 125.42
33% raptor-tp6-twitch-firefox-cold fcp linux64-shippable-qr opt 94.33 -> 125.83

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=23736

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Basti, are you looking into this one?

Priority: -- → P1
Whiteboard: [domsecurity-active]

Hey, I'm a bit confused here. In the regressing patch we only disabled a pref.
In the patch provided here we re-enabled this, so we should be perfomance wise "pre regeression" right?

Flags: needinfo?(sstreich)

We have shipped our last beta for 71, so this is wontfix for 71.

I think the question in comment 7 is for Christoph, so adding n-i.

Flags: needinfo?(ckerschb)

(In reply to Liz Henry (:lizzard) from comment #9)

I think the question in comment 7 is for Christoph, so adding n-i.

I am slightly confused myself - the regressing bug causes us to do less work in terms of computation effort. Why that would affect performance I don't know; purely from a reasoning perspective it seems wrong that we would regress performance by not enforcing XCTO: nosniff for top-level loads.

Flags: needinfo?(ckerschb)

I am sorry that this has not been addressed for such a long time. How critical is it at this point? Is it still valid even? If it still causes problems we could defer sending XCTO options to 74 - we haven't been shipping in 72, but it's on in 73.

Yes, this regression is still present and Beta also shows the regression occurring there after Fx73 was merged to it last week.

https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&series=autoland,2092359,1,10&series=mozilla-beta,2092555,1,10&timerange=7776000

Status: REOPENED → NEW
Flags: needinfo?(ckerschb)
Target Milestone: mozilla72 → ---

Hey Ryan,
I'm not quite sure how to handle this.
-> We a the Patch Disabling the pref for xtco which caused the initial regression/improvements
-> In return we reverted the causing patch, which then in return also removed our improvements
-> Also in Bug 1591932 we removed all code that was in context to the pref, so since fx72 the pref is only flipping a codepath for a Telemetry-probe.

I think that here is nothing else i could do, as the pref handles nothing else then enabling grabbing telemetry for us, which we currently need for Bug 1594766. I could need some input on how to move from here.
:)

Flags: needinfo?(ckerschb) → needinfo?(ryanvm)

This whole bug is a mess at this point. Not knowing the specifics of this test, I'm not sure how to move this investigation forward at this point. Randell, can you maybe provide some help on diagnosing this if we think this is even actionable at this point?

Flags: needinfo?(ryanvm) → needinfo?(rjesup)

The old code is gone, the pref is basically worthless from comment 13. The perfherder results are strongly bimodal on nightly; with occasional stretches for weeks with just "slow" loads - not obviously connected to anything.

Understanding why we regressed is still interesting and still something we could hopefully fix. it'll be harder since the initial impact was confusing, and we can't a/b easily (though comparative profiles may work)

Flags: needinfo?(rjesup)
Whiteboard: [domsecurity-active] → [domsecurity-active][qf]
Whiteboard: [domsecurity-active][qf] → [domsecurity-active][qf:p3:pageload]
Severity: normal → S3

Is there anything left to do here or can we close this bug?

Flags: needinfo?(sstreich)

Yes as we cant really pinpoint what the cause of the Regression was , i think we can close this.

Status: NEW → RESOLVED
Closed: 1 year ago7 months ago
Flags: needinfo?(sstreich)
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.