<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 2

•

7 years ago

> *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Just FYI, Dennis is on PTO on Monday and there will be a holiday in Germany on Tuesday -- so don't get super trigger happy, he'll respond when he's back on Wednesday! :)

Priority: -- → P1

Robert Wood [:rwood]

Reporter

Comment 3

•

7 years ago

(In reply to Mike Taylor [:miketaylr] from comment #2) > > *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** > > Just FYI, Dennis is on PTO on Monday and there will be a holiday in Germany > on Tuesday -- so don't get super trigger happy, he'll respond when he's back > on Wednesday! :) Thanks Mike :)

Comment 4

•

7 years ago

Thanks Mike! And thanks Robert for waiting a bit, much appreciated. This is regression is actually a really hard one. First, it's worth noting that it only affects Linux. On macOS and Windows, we already run webextensions out of process, but we don't on Linux so far, see bug 1190679. Eventually, when we enable OOP WebExts on Linux, this issue will go away by itself. However, since there is no timeline to that, I feel a bit bad for leaving that as the state of this issue, and I'd not like us to get backed out, since it would heavily impact our ability to roll out some site-fixes we already have in our queue. I had a few looks around, and it looks like that I regress tp5o responsiveness as soon as I enable the background script. However, all the WebExtension that is responsible for this perf regression is actually doing is only registering some content scripts, see [0]. So the issue is probably deep within the WebExtension implementation, which is a bit hard for me to profile right now. :aswan, kindly asking you for some feedback here, as you are one of the people most familiar with WebExtensions, as far as I can tell. :) Given the background script linked, and the performance regression in this issue, do you have any idea? Is there something I can improve in the WebExtension so we don't regress this test? Do you have any hints for me on how to best profile this? Or do you think we should just wait for OOP WebExt on Linux and have this regression as "necessary" for our WebCompat work on Linux? [0]: https://searchfox.org/mozilla-central/rev/795575287259a370d00518098472eaa5b362bfa7/browser/extensions/webcompat/webextension/background.js

Flags: needinfo?(dschubert) → needinfo?(aswan)

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

7 years ago

I'm not all that familiar with Talos, bear with me. Does tp5o measure anything during browser startup? From the description on the Talos wiki page I gather that it is supposed to measure the time to navigate to and load specific pages, after startup is done. Can you elaborate on "it looks like that I regress tp5o responsiveness as soon as I enable the background script"? That script should be idle other than at startup so either this test is measuring (part or all of) the cost of starting the background script during browser startup, or the regression comes from content script matching. The latter doesn't sound likely since that code is pretty well optimized and I believe it works the same on all platforms regardless of whether extension pages run out-of-process or not.

Flags: needinfo?(aswan) → needinfo?(dschubert)

Comment 6

•

7 years ago

tp5o (and the responsiveness metric) do not measure startup. The browser starts, we wait almost 10 seconds, then start loading tests and measuring metrics. Maybe there is some delayed initialization/lateStartTasks (like telemetry) which could be affecting things related to the web extensions.

Comment 7

•

7 years ago

No worries, I maybe didn't do a good job in explaining in the first place! :) That's not related to browser startup at all, yes. tp5o responsiveness measures the event loop lag in the parent process while loading a predefined set of pages. As soon as I enable the content scripts, we seem to delay the content script a bit in this case. It's not much, but given the high performance in general, it turns out to be almost 15% slower now. > Can you elaborate on "it looks like that I regress tp5o responsiveness as soon as I enable the background script"? As soon as I have a content script registered, regardless of whether it's registered for the current site or not, I seem to hit regressions. My assumption is that the methods checking whether or not the actual content scripts per domain/URI is to blame here, as I can see the regression as soon as I register a content script, even if it's empty. However, I'm not sure that this assumption is true, and I don't really know how to actually check. :/ By any chance, do you know someone who did WebExtension performance testing/optimization before? We probably have some logic somewhere that runs in the webextension process if available, but somehow ends up blocking the event loop otherwise. But I don't really know what that is.

Flags: needinfo?(dschubert) → needinfo?(aswan)

Comment 8

•

7 years ago

Well as I understand it, content script matching runs in all content processes including the extension process. And on Linux the extension process is still the main process, so I guess this is measuring that logic running in the main process? Adding Kris who is better acquainted with the details here. Is the content script being registered from the background page meant to be kept around permanently? It looks like unnecessary overhead for every content load. If this is just a placeholder to represent other "real" scripts to be added in the future then you can disregard that question...

Flags: needinfo?(aswan)

Comment 9

•

7 years ago

Just for the record: > If this is just a placeholder to represent other "real" scripts to be added in the future then you can disregard that question... Yes, it will be replaced by real scripts very soon, assuming we don't get backed out for perf issues. :)

Comment 10

•

7 years ago

Hm, OOP extensions on Linux are blocked on bug 1406533 which doesn't look particularly close to landing...

Panos Astithas (he/him) [:past] (please ni?)

Comment 11

•

7 years ago

Hey Panos, given the size of this regression (~14%), it seems a bit hard to justify that we live with it, even if webcompat is important. I see a few possible options: 1. Disable content script injections ("site patching") for everyone until bug 1406533 is landed. We have a pref to control this currently. We can still use the UA spoofing capabilities independent of this. 2. Disable site patching for Linux only. Awkward, maybe, but we can still target Windows and OSX, where most of our users are. 3. Eat the regression. Thoughts?

Flags: needinfo?(past)

Comment 12

•

7 years ago

Do we have profiles from the changeset that introduced this regression and its parent? This is just a matter of pushing to try with the syntax |mach try fuzzy --talos-profile|. This would give us a better understanding of where the problem lies and how it could be mitigated. From a somewhat recent experience in another team, do any of the ideas in bug 1415467 comment 12 apply here? I am not comfortable shipping such a regression and this is not a practice we want to encourage. Disabling it for Linux only, means that when we eventually ship a compatibility remediation, some of our users will not benefit from it, which is I think more confusing than helpful. Disabling content script injections seems like the best path forward, in the sense that it makes shipping our v2 a step forward from v1 (UA spoofing plus CSS injections), but I don't know that it helps us in the short term. Do we have plans for CSS injections currently? The last option is to back it all out, but I don't see much value in that over option 1.

Flags: needinfo?(past)

Joel Maher ( :jmaher ) (UTC -8)

Comment 13

•

7 years ago

before: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FaPcsqNcNTHKc6_XuwD-PBA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_tp5o.zip after: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FFBVQbkDXS7yrZ2rCD9uPKw%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_tp5o.zip

Comment 14

•

7 years ago

ni? Dennis for the following question: > From a somewhat recent experience in another team, do any of the ideas in bug 1415467 comment 12 apply here?

Flags: needinfo?(dschubert)

Comment 15

•

7 years ago

> From a somewhat recent experience in another team, do any of the ideas in bug 1415467 comment 12 apply here? Well... point 1 is still true. This issue wouldn't be an issue with out-of-process web extensions. On Linux, we still run in the main process, and thus can regress the event loop lag. But as pointed out by Andrew, OOP-WebExt on Linux aren't really close to landing, unfortunately. Point 2 is a lot harder. I, for myself, don't set up tab listener. However, I don't know what WebExtensions are doing internally, and it may be that the content script registry is using them to check when they need to inject scripts. I wanted to figure that out before replying, but the WebExtension source is surprisingly complex... I'll do some more digging and look at the profile. > Disabling content script injections seems like the best path forward, in the sense that it makes shipping our v2 a step forward from v1 (UA spoofing plus CSS injections) Just for the record: CSS injections are using content scripts as well, as it is the easiest way to inject CSS files on a per-site basis, so it causes the same perf regression on Linux. Disabling JS injections also means disabling CSS injections. We have some site patches in the queue which we could handle with UA overrides, like bug 1452707, but in these cases, using JS injections would actually be a much safer way. If we spoof as Chrome, we'd get these sites to run all kinds of Chrome-specific code paths, while we could simply get the Firefox-detection to work again with a very simple JS injection, which makes me relatively certain we don't cause any bugs as a side effect.

Flags: needinfo?(dschubert)

Ryan VanderMeulen [:RyanVM]

Comment 16

•

7 years ago

Point 2 from the comment on bug 1415467 was about the browser.tabs.onUpdated event which the webcompat webextension does not use. I talked briefly to Kris about this last week on IRC and he was pretty adamant that content script matching should not be the cause of a regression like this. Needinfo to him to elaborate.

Flags: needinfo?(kmaglione+bmo)

Updated

•

7 years ago

Component: Go Faster → WebExtensions: General

Product: Web Compatibility Tools → Toolkit

Version: unspecified → 61 Branch

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox60: --- → unaffected

status-firefox61: --- → affected

status-firefox62: --- → affected

status-firefox-esr52: --- → unaffected

status-firefox-esr60: --- → unaffected

tracking-firefox61: --- → +

tracking-firefox62: --- → +

Kris Maglione [:kmag]

Comment 17

•

7 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #13) > before: > https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster. > net%2Fv1%2Ftask%2FaPcsqNcNTHKc6_XuwD- > PBA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_tp5o.zip > > after: > https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster. > net%2Fv1%2Ftask%2FFBVQbkDXS7yrZ2rCD9uPKw%2Fruns%2F0%2Fartifacts%2Fpublic%2Fte > st_info%2Fprofile_tp5o.zip I don't see any extension overhead at all in either of those profiles. (In reply to Andrew Swan [:aswan] from comment #16) > Point 2 from the comment on bug 1415467 was about the browser.tabs.onUpdated > event which the webcompat webextension does not use. > I talked briefly to Kris about this last week on IRC and he was pretty > adamant that content script matching should not be the cause of a regression > like this. Needinfo to him to elaborate. I don't really have anything to add. Content script matching is *extremely* cheap. And it doesn't really matter whether extensions are in-process or out-of-process. We'd see the same overhead on all platforms if that were the issue. The only place it would conceivably make a difference is in non-e10s runs. All of this said, I think we're making an issue out of something that is not an issue here. 15% is not a large regression when we're talking about baseline numbers as low as they currently are. Unless we're seeing occasionally large janks (which does not appear to be the case), the difference here is so far from being user-perceivable that it is barely even worth considering. We have better things to spend time on.

Kris Maglione [:kmag]

Updated

•

7 years ago

Flags: needinfo?(kmaglione+bmo)

Panos Astithas (he/him) [:past] (please ni?)

Comment 18

•

7 years ago

Kris makes a great point about the regression magnitude in absolute numbers, I can't believe I missed that. Joel, how important is this regression in your opinion? Since the only alternative is to essentially back the entire thing out (per comment 15), I'm inclined to just accept the regression. There will be further opportunities to investigate and improve this overhead as we ship real compatibility interventions like bug 1452707 and this feature is expected to help us a lot in tackling the long tail of web compatibility issues.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 19

•

7 years ago

yes, this is a small value- but it is a large percentage. if we accepted all of the small changes our responsiveness would be much higher. I don't think a user can distinguish the difference here, so that logic is correct. If there is no reasonable fix, and this is the only regression, I think it is a good choice to accept this.

Flags: needinfo?(jmaher)

Kris Maglione [:kmag]

Comment 20

•

7 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #19) > yes, this is a small value- but it is a large percentage. if we accepted > all of the small changes our responsiveness would be much higher. I don't > think a user can distinguish the difference here, so that logic is correct. I don't think it makes sense to try to deal with every small change in these metrics as they come up as long as our metrics are well below user-perceivable. We obviously need to stay on top of responsiveness metrics, but I think we're better off focusing on bigger issues during QF releases than on small issues like this.

Ryan VanderMeulen [:RyanVM]

Comment 21

•

7 years ago

Given Comments 17, 18 and 19, let's go ahead and WONTFIX this. I think the gains in ability to respond to webcompat bustage across platforms outweigh the (small) hit in perf here. Thanks everyone. (If there's a better resolution or formal process, someone please feel free to correct!)

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

Updated

•

7 years ago

Status: RESOLVED → VERIFIED

status-firefox61: affected → wontfix

status-firefox62: affected → wontfix

BMO Automation

Updated

•

6 years ago

Product: Toolkit → WebExtensions