Closed Bug 1623053 Opened 5 years ago Closed 5 years ago

2.09 - 2.72% tp5o_webext (windows10-64-shippable) regression on push a0508ae6c037928981ac2733860b6ec84d7069ec (Sat March 14 2020)

Categories

(Core :: DOM: Security, defect, P2)

defect

Tracking

()

RESOLVED WONTFIX
mozilla76
Tracking Status
firefox-esr68 --- unaffected
firefox74 --- unaffected
firefox75 --- unaffected
firefox76 + disabled
firefox77 --- wontfix

People

(Reporter: alexandrui, Assigned: ckerschb)

References

(Regression)

Details

(4 keywords, Whiteboard: [domsecurity-active])

Attachments

(2 obsolete files)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=a0508ae6c037928981ac2733860b6ec84d7069ec

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

3% tp5o_webext windows10-64-shippable opt e10s stylo 296.13 -> 304.18
2% tp5o_webext windows10-64-shippable opt e10s stylo 294.56 -> 300.72

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=25419

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/TestEngineering/Performance/Talos

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/TestEngineering/Performance/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling

Flags: needinfo?(ckerschb)
Component: Performance → DOM: Security
Product: Testing → Core
Target Milestone: --- → mozilla76
Version: Version 3 → unspecified

Hey Alexandru, thanks for letting me know - I was afraid this might happen.

Let's start with the good news, the Sec-Fetch-* implementation is only prefed on in Nightly at the moment:
https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/modules/libpref/init/StaticPrefList.yaml#2351

A word about the algorithm:

  • For every request we are adding new http request headers (sec-fetch-mode, sec-fetch-dest, sec-fetch-user, sec-fetch-site).
  • For every request we have to determine if the request and all it's re-directions are same-origin or at least same site.

What are our alternatives, because at the moment I don't see how we could speed up that algorithm. I guess if we want to support Sec-Fetch-*, this is the price for it.

What are the next steps here in that case?

Flags: needinfo?(ckerschb) → needinfo?(aionescu)

The regression is just over the 2% threshold, so the magnitude is not worrying. I'm afraid the decision about keeping the algorithm is yours. Is the improvement is more valuable than the regression that brings with it?

Flags: needinfo?(aionescu) → needinfo?(ckerschb)

(In reply to Alexandru Ionescu :alexandrui (needinfo me) from comment #2)

The regression is just over the 2% threshold, so the magnitude is not worrying. I'm afraid the decision about keeping the algorithm is yours. Is the improvement is more valuable than the regression that brings with it?

I would say yes, but also I will assign this bug to me and see if we can do some improvements on the algorithm. My gut feeling tells me that the algorithm improvements will only allow for marginal improvements.

Assignee: nobody → ckerschb
Status: NEW → ASSIGNED
Flags: needinfo?(ckerschb)
Priority: -- → P2
Whiteboard: [domsecurity-active]
See Also: → 1623850

I did some performance measurements using high resolution clock where I take the time before we call SecFetch and after we have returned from SecFetch. In addition I use nice -n -20 to minimize operating system scheduler effects. Measuring time is always a hairy operation; leaving potential issues with timing aside, it seems any optimizations we do within SecFetch are only micro optimizations and do not buy us really anything in the context of supporting SecFetch.

E.g. in the attached patch I combined the checks for same-origin and same-site so we only have to loop through the redirectchain once. Comparing the numbers underneath, those are more or less the same if we factor in CPU timing inaccuracy.

In a third run I commented out all calls to SetRequestHeader() which is now performed for every single request and it seems that function introduces more performance overhead than doing the calculations to determine what sec-fetch headers to set.

Without the patch from this bug (baseline):
[1]: 23, 25, 23 -> 23.6
[2]: 21, 29, 30 -> 26.6
[3]: 32, 34, 23 -> 29.6
[4]: 27, 29, 22 -> 26
[5]: 27, 27, 21 -> 25
[6]: 26, 27, 20 -> 24.3
[7]: 32, 23, 28 -> 27.6
[8]: 52, 36, 32 -> 40
[9]: 52, 67, 51 -> 56.6
[10]: 22, 21, 28 -> 23,6

With the patch from this bug:
[1]: 24, 22, 26 -> 24
[2]: 21, 20, 23 -> 21.3
[3]: 35, 34, 34 -> 34.3
[4]: 53, 33, 34 -> 40
[5]: 36, 34, 32 -> 34
[6]: 34, 33, 32 -> 33
[7]: 50, 52, 33 -> 45
[8]: 41, 39, 40 -> 40
[9]: 68, 64, 52 -> 61.3
[10]: 42, 41, 42 -> 41.6

With the patch from this bug + commenting out 'SetRequestHeader' calls:
[1]: 13, 23, 11 -> 15.6
[2]: 11, 14, 25 -> 16.6
[3]: 27, 20, 25 -> 24
[4]: 25, 51, 24 -> 33.3
[5]: 26, 29, 24 -> 26.3
[6]: 24, 25, 35 -> 28
[7]: 21, 21, 25 -> 22.3
[8]: 32, 19, 24 -> 25
[9]: 49, 29, 38 -> 38.6
[10]: 26, 21, 20 -> 22.3

@baku: I don't think it's worth landing the attached patch, because it makes the code harder to read but does not really improve performance.
@Dragana: I don't think there is any other operation than SetRequestHeader for adding headers, right?
@Alexandru: I guess the fundamental performance problem is that we are adding request headers to every single request. As usual, trade off between Security and Performance.

@all: Any suggestions on how to move forward?

Flags: needinfo?(dd.mozilla)
Flags: needinfo?(amarchesini)
Flags: needinfo?(aionescu)

FWIW, we can obviously do perfherder runs and compare the two implementations on TRY, but I don't think the result will differ much from what I've shown above.

I would not land this patch if it doesn't resolve the problem entirely. Can you provide a comparison try link?

Flags: needinfo?(amarchesini)

(In reply to Andrea Marchesini [:baku] from comment #7)

I would not land this patch if it doesn't resolve the problem entirely. Can you provide a comparison try link?

Yeah, I totally agree. Again, I think everything we can do here are just micro optimizations. It's the fact that we are setting a request header on every http request is what causes the performance downgrade.

Base:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b081efe96fd749a49d4d0b37ac7e1a88fbf5d88d

With Optimization patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ccbd0bfcec7a5d9cd33ec383349974ad6c4ded3

Comparison:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b081efe96fd749a49d4d0b37ac7e1a88fbf5d88d&newProject=try&newRevision=1ccbd0bfcec7a5d9cd33ec383349974ad6c4ded3&framework=1

(In reply to Christoph Kerschbaumer [:ckerschb] from comment #5)

@Alexandru: I guess the fundamental performance problem is that we are adding request headers to every single request. As usual, trade off between Security and Performance.

Christoph, I agree that sometimes we have to live with this tradeoff. Please feel free to land the improvement if you can make one or mark this as WONTFIX.

Flags: needinfo?(aionescu)

Sec-Fetch-* is Nightly-only for now (see comment 1).
If it's still Nightly-only when 76 hits Beta, we should set 76 as unaffected.

There is https://searchfox.org/mozilla-central/rev/4d9cd186767978a99dafe77eb536a9525980e118/netwerk/protocol/http/nsHttpRequestHead.cpp#152

But that is not expose to nsIHttpChannel and probably not easy to expose.
I would like to understand what is the problem, there are 2 options:

  1. SetRequestHeader does some sanity checking that strings, header name and value, are sane (historically this was expose to addons and probably fetch is using it now, I am not sure if fetch does any additional checking). This check are not really computationally intensive, it is only going through 2 strings once.
  2. nsHttpRequestHead need locks for each manipulation. (This is old code, we may look into it since we do not expose this to addons any more, but this is very low priority, especially since we were chasing crashing for a long time because of nsHttpRequestHead, and we do not want to go back to that state). I expect that this is the problem. Can we verify this? (If this is the reason, we may look into prioritizing this, but at the moment(without really looking into the code) I cannot promise that we can fix it). You can probably rerun test with commented out locks.

Other than that this function does not do much.

Flags: needinfo?(dd.mozilla)

Alexandru, can you help me evaluate and interpret these performance improvements(!?)

About the patch:
I was following the suggestions from Dragana from Comment 11 where I exposed a new setHeader function bypassing the sanity checks within (1) and acquiring the lock for setting the headers within (2) only once instead of four times.

(a) base:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab31efc2e5830b012256817c3e0f615a5972f07a

(b) with patches applied
https://treeherder.mozilla.org/#/jobs?repo=try&revision=375030d7fb925a05388e4177f0a52c5ea52bc2d1

Perfherder:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=ab31efc2e5830b012256817c3e0f615a5972f07a&newProject=try&newRevision=375030d7fb925a05388e4177f0a52c5ea52bc2d1&framework=1

So we should see an performance improvement, right? Is that the case with respect to this Bug and also Bug 1623850? It's hard to tell for me, what's your interpretation of those numbers?

Flags: needinfo?(aionescu)

I am going to make a backout to your patch and push to try to see if you patch really caused the regressions.

Flags: needinfo?(aionescu)

Alexandru, thanks for your help!

FWIW, please note that there is a pref dom.security.secFetch.enabled which would allow you to disable the feature for perf testing purposes.

Flags: needinfo?(aionescu)

There's a weird situation here. The results on try don't show that much regression and the noise is pretty high. Also, there's a regression for tp5o_webext resposiveness that perfherder failed to asociate with the alert. Giving the magnitude is at the lower limit of the threshold and the test is also noisy, you can close it as WONTFIX or WORKSFORME.

Flags: needinfo?(aionescu)

(In reply to Alexandru Ionescu (needinfo me) :alexandrui from comment #16)

Giving the magnitude is at the lower limit of the threshold and the test is also noisy, you can close it as WONTFIX or WORKSFORME.

Thank you for the update!

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME

(In reply to Christoph Kerschbaumer [:ckerschb] from comment #17)

Thank you for the update!

By this I don't want to say that this is ok. You still need to try determine the problem until you're left without solutions. When a test is that noisy, normally we need to question its sheriffable state. I know you're not the owner, but it is valuable to know this.

(In reply to Alexandru Ionescu (needinfo me) :alexandrui from comment #18)

By this I don't want to say that this is ok. You still need to try determine the problem until you're left without solutions. When a test is that noisy, normally we need to question its sheriffable state. I know you're not the owner, but it is valuable to know this.

I understand that. FWIW, please also see my comment on 1623850#c12. I know it's not the root cause of the noisiness but at least tries to eliminate the performance downgrade.

Attachment #9135062 - Attachment is obsolete: true
Attachment #9143981 - Attachment is obsolete: true

I don't think regression bugs raised via alerts should be resolved as WORKSFORME. I'm going to mark this as a potential regression (considering the noise) that was accepted (WONTFIX).

Resolution: WORKSFORME → WONTFIX
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: