Closed Bug 1623053 Opened 5 years ago Closed 5 years ago

2.09 - 2.72% tp5o_webext (windows10-64-shippable) regression on push a0508ae6c037928981ac2733860b6ec84d7069ec (Sat March 14 2020)

Tracking

()

Status:

RESOLVED WONTFIX

Milestone:

mozilla76

Tracking Flags:

Tracking

Status

firefox-esr68

---

unaffected

firefox74

---

unaffected

firefox75

---

unaffected

firefox76

disabled

firefox77

---

wontfix

People

(Reporter: alexandrui, Assigned: ckerschb)

References

(Regression)

Details

(4 keywords, Whiteboard: [domsecurity-active])

Attachments

(2 obsolete files)

Bug 1623053: Improve performane of same-origin and same-site checks for Sec-Fetch. r=baku 5 years ago Christoph Kerschbaumer [:ckerschb] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1623053: Improve performance of Sec-Fetch-* Headers. r=baku,dragana 5 years ago Christoph Kerschbaumer [:ckerschb] 47 bytes, text/x-phabricator-request		Details \| Review

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Description

•

5 years ago

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=a0508ae6c037928981ac2733860b6ec84d7069ec

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

3% tp5o_webext windows10-64-shippable opt e10s stylo 296.13 -> 304.18
2% tp5o_webext windows10-64-shippable opt e10s stylo 294.56 -> 300.72

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=25419

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/TestEngineering/Performance/Talos

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/TestEngineering/Performance/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling

Flags: needinfo?(ckerschb)

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Updated

•

5 years ago

status-firefox76: --- → affected

Component: Performance → DOM: Security

Product: Testing → Core

Target Milestone: --- → mozilla76

Version: Version 3 → unspecified

Christoph Kerschbaumer [:ckerschb]

Assignee

Comment 1

•

5 years ago

Hey Alexandru, thanks for letting me know - I was afraid this might happen.

Let's start with the good news, the Sec-Fetch-* implementation is only prefed on in Nightly at the moment:
https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/modules/libpref/init/StaticPrefList.yaml#2351

A word about the algorithm:

For every request we are adding new http request headers (sec-fetch-mode, sec-fetch-dest, sec-fetch-user, sec-fetch-site).
For every request we have to determine if the request and all it's re-directions are same-origin or at least same site.

What are our alternatives, because at the moment I don't see how we could speed up that algorithm. I guess if we want to support Sec-Fetch-*, this is the price for it.

What are the next steps here in that case?

Flags: needinfo?(ckerschb) → needinfo?(aionescu)

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 2

•

5 years ago

The regression is just over the 2% threshold, so the magnitude is not worrying. I'm afraid the decision about keeping the algorithm is yours. Is the improvement is more valuable than the regression that brings with it?

Flags: needinfo?(aionescu) → needinfo?(ckerschb)

Christoph Kerschbaumer [:ckerschb]

Assignee

Comment 3

•

5 years ago

(In reply to Alexandru Ionescu :alexandrui (needinfo me) from comment #2)

The regression is just over the 2% threshold, so the magnitude is not worrying. I'm afraid the decision about keeping the algorithm is yours. Is the improvement is more valuable than the regression that brings with it?

I would say yes, but also I will assign this bug to me and see if we can do some improvements on the algorithm. My gut feeling tells me that the algorithm improvements will only allow for marginal improvements.

Assignee: nobody → ckerschb

Status: NEW → ASSIGNED

Flags: needinfo?(ckerschb)

Priority: -- → P2

Christoph Kerschbaumer [:ckerschb]

Assignee

Updated

•

5 years ago

Whiteboard: [domsecurity-active]

Christoph Kerschbaumer [:ckerschb]

Assignee

Updated

•

5 years ago

Comment 4

•

5 years ago

Attached file Bug 1623053: Improve performane of same-origin and same-site checks for Sec-Fetch. r=baku (obsolete) — Details

Christoph Kerschbaumer [:ckerschb]

Assignee

Comment 5

•

5 years ago

I did some performance measurements using high resolution clock where I take the time before we call SecFetch and after we have returned from SecFetch. In addition I use nice -n -20 to minimize operating system scheduler effects. Measuring time is always a hairy operation; leaving potential issues with timing aside, it seems any optimizations we do within SecFetch are only micro optimizations and do not buy us really anything in the context of supporting SecFetch.

E.g. in the attached patch I combined the checks for same-origin and same-site so we only have to loop through the redirectchain once. Comparing the numbers underneath, those are more or less the same if we factor in CPU timing inaccuracy.

In a third run I commented out all calls to SetRequestHeader() which is now performed for every single request and it seems that function introduces more performance overhead than doing the calculations to determine what sec-fetch headers to set.

Without the patch from this bug (baseline):
[1]: 23, 25, 23 -> 23.6
[2]: 21, 29, 30 -> 26.6
[3]: 32, 34, 23 -> 29.6
[4]: 27, 29, 22 -> 26
[5]: 27, 27, 21 -> 25
[6]: 26, 27, 20 -> 24.3
[7]: 32, 23, 28 -> 27.6
[8]: 52, 36, 32 -> 40
[9]: 52, 67, 51 -> 56.6
[10]: 22, 21, 28 -> 23,6

With the patch from this bug:
[1]: 24, 22, 26 -> 24
[2]: 21, 20, 23 -> 21.3
[3]: 35, 34, 34 -> 34.3
[4]: 53, 33, 34 -> 40
[5]: 36, 34, 32 -> 34
[6]: 34, 33, 32 -> 33
[7]: 50, 52, 33 -> 45
[8]: 41, 39, 40 -> 40
[9]: 68, 64, 52 -> 61.3
[10]: 42, 41, 42 -> 41.6

With the patch from this bug + commenting out 'SetRequestHeader' calls:
[1]: 13, 23, 11 -> 15.6
[2]: 11, 14, 25 -> 16.6
[3]: 27, 20, 25 -> 24
[4]: 25, 51, 24 -> 33.3
[5]: 26, 29, 24 -> 26.3
[6]: 24, 25, 35 -> 28
[7]: 21, 21, 25 -> 22.3
[8]: 32, 19, 24 -> 25
[9]: 49, 29, 38 -> 38.6
[10]: 26, 21, 20 -> 22.3

@baku: I don't think it's worth landing the attached patch, because it makes the code harder to read but does not really improve performance.
@Dragana: I don't think there is any other operation than SetRequestHeader for adding headers, right?
@Alexandru: I guess the fundamental performance problem is that we are adding request headers to every single request. As usual, trade off between Security and Performance.

@all: Any suggestions on how to move forward?

Flags: needinfo?(dd.mozilla)

Flags: needinfo?(amarchesini)

Flags: needinfo?(aionescu)

Christoph Kerschbaumer [:ckerschb]

Assignee

Comment 6

•

5 years ago

FWIW, we can obviously do perfherder runs and compare the two implementations on TRY, but I don't think the result will differ much from what I've shown above.

Andrea Marchesini [:baku]

Comment 7

•

5 years ago

I would not land this patch if it doesn't resolve the problem entirely. Can you provide a comparison try link?

Flags: needinfo?(amarchesini)

Christoph Kerschbaumer [:ckerschb]

Assignee

Comment 8

•

5 years ago

(In reply to Andrea Marchesini [:baku] from comment #7)

I would not land this patch if it doesn't resolve the problem entirely. Can you provide a comparison try link?

Yeah, I totally agree. Again, I think everything we can do here are just micro optimizations. It's the fact that we are setting a request header on every http request is what causes the performance downgrade.

Base:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b081efe96fd749a49d4d0b37ac7e1a88fbf5d88d

With Optimization patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ccbd0bfcec7a5d9cd33ec383349974ad6c4ded3

Comparison:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b081efe96fd749a49d4d0b37ac7e1a88fbf5d88d&newProject=try&newRevision=1ccbd0bfcec7a5d9cd33ec383349974ad6c4ded3&framework=1

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox74: --- → unaffected

status-firefox75: --- → unaffected

status-firefox-esr68: --- → unaffected

tracking-firefox76: --- → +

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 9

•

5 years ago

(In reply to Christoph Kerschbaumer [:ckerschb] from comment #5)

@Alexandru: I guess the fundamental performance problem is that we are adding request headers to every single request. As usual, trade off between Security and Performance.

Christoph, I agree that sometimes we have to live with this tradeoff. Please feel free to land the improvement if you can make one or mark this as WONTFIX.

Flags: needinfo?(aionescu)

Ethan Tseng [:ethan]

Comment 10

•

5 years ago

Sec-Fetch-* is Nightly-only for now (see comment 1).
If it's still Nightly-only when 76 hits Beta, we should set 76 as unaffected.

status-firefox76: affected → fix-optional

Dragana Damjanovic [:dragana]

Comment 11

•

5 years ago

There is https://searchfox.org/mozilla-central/rev/4d9cd186767978a99dafe77eb536a9525980e118/netwerk/protocol/http/nsHttpRequestHead.cpp#152

But that is not expose to nsIHttpChannel and probably not easy to expose.
I would like to understand what is the problem, there are 2 options:

SetRequestHeader does some sanity checking that strings, header name and value, are sane (historically this was expose to addons and probably fetch is using it now, I am not sure if fetch does any additional checking). This check are not really computationally intensive, it is only going through 2 strings once.
nsHttpRequestHead need locks for each manipulation. (This is old code, we may look into it since we do not expose this to addons any more, but this is very low priority, especially since we were chasing crashing for a long time because of nsHttpRequestHead, and we do not want to go back to that state). I expect that this is the problem. Can we verify this? (If this is the reason, we may look into prioritizing this, but at the moment(without really looking into the code) I cannot promise that we can fix it). You can probably rerun test with commented out locks.

Other than that this function does not do much.

Flags: needinfo?(dd.mozilla)

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox76: fix-optional → disabled

status-firefox77: --- → fix-optional

Christoph Kerschbaumer [:ckerschb]

Assignee

Comment 12

•

5 years ago

Alexandru, can you help me evaluate and interpret these performance improvements(!?)

About the patch:
I was following the suggestions from Dragana from Comment 11 where I exposed a new setHeader function bypassing the sanity checks within (1) and acquiring the lock for setting the headers within (2) only once instead of four times.

(a) base:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab31efc2e5830b012256817c3e0f615a5972f07a

(b) with patches applied
https://treeherder.mozilla.org/#/jobs?repo=try&revision=375030d7fb925a05388e4177f0a52c5ea52bc2d1

Perfherder:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=ab31efc2e5830b012256817c3e0f615a5972f07a&newProject=try&newRevision=375030d7fb925a05388e4177f0a52c5ea52bc2d1&framework=1

So we should see an performance improvement, right? Is that the case with respect to this Bug and also Bug 1623850? It's hard to tell for me, what's your interpretation of those numbers?

Flags: needinfo?(aionescu)

Christoph Kerschbaumer [:ckerschb]

Assignee

Comment 13

•

5 years ago

Attached file Bug 1623053: Improve performance of Sec-Fetch-* Headers. r=baku,dragana (obsolete) — Details

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 14

•

5 years ago

I am going to make a backout to your patch and push to try to see if you patch really caused the regressions.

Flags: needinfo?(aionescu)

Christoph Kerschbaumer [:ckerschb]

Assignee

Comment 15

•

5 years ago

Alexandru, thanks for your help!

FWIW, please note that there is a pref dom.security.secFetch.enabled which would allow you to disable the feature for perf testing purposes.

Flags: needinfo?(aionescu)

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 16

•

5 years ago

There's a weird situation here. The results on try don't show that much regression and the noise is pretty high. Also, there's a regression for tp5o_webext resposiveness that perfherder failed to asociate with the alert. Giving the magnitude is at the lower limit of the threshold and the test is also noisy, you can close it as WONTFIX or WORKSFORME.

Flags: needinfo?(aionescu)

Christoph Kerschbaumer [:ckerschb]

Assignee

Comment 17

•

5 years ago

(In reply to Alexandru Ionescu (needinfo me) :alexandrui from comment #16)

Giving the magnitude is at the lower limit of the threshold and the test is also noisy, you can close it as WONTFIX or WORKSFORME.

Thank you for the update!

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Resolution: --- → WORKSFORME

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 18

•

5 years ago

(In reply to Christoph Kerschbaumer [:ckerschb] from comment #17)

Thank you for the update!

By this I don't want to say that this is ok. You still need to try determine the problem until you're left without solutions. When a test is that noisy, normally we need to question its sheriffable state. I know you're not the owner, but it is valuable to know this.

Christoph Kerschbaumer [:ckerschb]

Assignee

Comment 19

•

5 years ago

(In reply to Alexandru Ionescu (needinfo me) :alexandrui from comment #18)

By this I don't want to say that this is ok. You still need to try determine the problem until you're left without solutions. When a test is that noisy, normally we need to question its sheriffable state. I know you're not the owner, but it is valuable to know this.

I understand that. FWIW, please also see my comment on 1623850#c12. I know it's not the root cause of the noisiness but at least tries to eliminate the performance downgrade.

Phabricator Automation

Updated

•

4 years ago

Attachment #9135062 - Attachment is obsolete: true

Phabricator Automation

Updated

•

4 years ago

Attachment #9143981 - Attachment is obsolete: true

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox77: fix-optional → wontfix

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 20

•

4 years ago

I don't think regression bugs raised via alerts should be resolved as WORKSFORME. I'm going to mark this as a potential regression (considering the noise) that was accepted (WONTFIX).

Resolution: WORKSFORME → WONTFIX

BMO Automation

Updated

•

4 years ago

Has Regression Range: --- → yes

You need to log in before you can comment on or make changes to this bug.