Closed Bug 1538631 Opened 3 years ago Closed 1 year ago

0.28 - 4.65% tp5n main_startup_fileio / tp5n nonmain_startup_fileio / tp5n time_to_session_store_window_restored_ms (windows7-32) regression on push 06e9cd204aacf5bb735a8f7370a409f64ec6e975

Categories

(Testing :: General, defect, P3)

x86
Windows 7
defect

Tracking

(firefox69 wontfix, firefox70 fix-optional, firefox71 ?)

RESOLVED WONTFIX
Tracking Status
firefox69 --- wontfix
firefox70 --- fix-optional
firefox71 --- ?

People

(Reporter: igoldan, Unassigned)

References

(Depends on 1 open bug)

Details

(5 keywords)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=e2e2b325344fcda75fe015b3de5ba25bd53613ba&tochange=06e9cd204aacf5bb735a8f7370a409f64ec6e975

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

5% tp5n time_to_session_store_window_restored_ms windows7-32 pgo e10s stylo 746.43 -> 781.13
4% tp5n nonmain_startup_fileio windows7-32 pgo e10s stylo 2,596,034.46 -> 2,693,698.25
0% tp5n main_startup_fileio windows7-32 pgo e10s stylo 97,318,858.00 -> 97,592,742.33

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=20060

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Valentin, sorry for filing this bug so late. I barely noticed it.
I know you moved some business logic from one thread type to another, so these results should be expected.
If that's the case, you can simply close this as WONTFIX.

Flags: needinfo?(valentin.gosu)

So what I did in bug 1435141 was go from this:

+-------------+------------------------------+--------------------------------------------+
|             | Startup                      | User action                                |
+-----------------------------------------------------------------------------------------+
| Main Thread |                              | net_EnsurePSMInit    HasUserCertsInstalled |
+-------------+------------------------------+--------------------------------------------+

to this:

+-------------+------------------------------+--------------------------------------------+
|             | Startup                      | User action                                |
+-----------------------------------------------------------------------------------------+
| Main Thread | net_EnsurePSMInit +          |                                            |
+---------------------------------|----------+--------------------------------------------+
| Bg thread   |                   +---> HasUserCertsInstalled                             |
+-------------+---------------------------------------------------------------------------+

So while the NSS initialization is delaying startup a bit more, it's not delaying the main thread in response to user interaction, which IMO is a lot better.
But indeed, it is delaying startup, which isn't that great.

There's bug 1370516, which would improve this.
There's also the option of not calling net_EnsurePSMInit at startup, but waiting for it to happen in response to a network load (session restore or user interaction), but that would be perceived as slowness during navigation, which I think is worse.

I'm inclined to close this as WONTFIX.
:keeler, what do you think?

Flags: needinfo?(valentin.gosu) → needinfo?(dkeeler)

At which point of startup is net_EnsurePSMInit called, and is it triggering main thread I/O?

I thought the previous behavior was to trigger NSS init during delayed startup (ie. right after browser first paint, but before any user action) when captive portal starts its HTTPS request to detect if we are behind a captive portal.

Flags: needinfo?(valentin.gosu)

(In reply to Florian Quèze [:florian] from comment #3)

At which point of startup is net_EnsurePSMInit called, and is it triggering main thread I/O?

From looking at the patch in bug 1435141, it seems it's triggered by browser-delayed-startup-finished, which is after the captive portal detection request is started: https://searchfox.org/mozilla-central/rev/56705678f5fc363be5e0237e1686f619b0d23009/browser/base/content/browser.js#1711,1722

So I don't see how this patch changes anything to startup. I'm just confused I guess.

(In reply to Florian Quèze [:florian] from comment #3)

At which point of startup is net_EnsurePSMInit called, and is it triggering main thread I/O?

Not sure about the I/O. Question for :keeler

I thought the previous behavior was to trigger NSS init during delayed startup (ie. right after browser first paint, but before any user action)

I now call it in response to browser-delayed-startup-finished.
Before it was called in SpeculativeConnectInternal, at which point it may have been a no-op if PSM had already been initialized by some other component, or it may have blocked the main thread otherwise.

when captive portal starts its HTTPS request to detect if we are behind a captive portal.

Captive portal detection is done over http, not https, so it wouldn't trigger it.

Flags: needinfo?(valentin.gosu)

(In reply to Valentin Gosu [:valentin] from comment #5)

(In reply to Florian Quèze [:florian] from comment #3)

when captive portal starts its HTTPS request to detect if we are behind a captive portal.

Captive portal detection is done over http, not https, so it wouldn't trigger it.

Oh indeed, it's http, I always thought it was HTTPS because it triggers the load of NSS libraries: https://perfht.ml/2JFKARB shows the network request for http://detectportal.firefox.com/success.txt triggers the load of softokn3.dll and freebl3.dll.

Initializing NSS can cause a number of files to be read. There's the dll files themselves, and then there's the the certificate/key/module DBs (cert9.db, key4.db, and pkcs11.txt in most cases, although cert8.db, key3.db, and secmod.db can also be read if they're present).

I don't see why bug 1435141 would cause startup to be slower. In any case, though, the real solution would be something like bug 1370516.

Flags: needinfo?(dkeeler)
Priority: -- → P3
Depends on: 1370516

:keeler Do you think we will get any progress on this soon?

Flags: needinfo?(dkeeler)

This is a non-trivial amount of work and our team's engineering capacity is allocated elsewhere, so unfortunately no, I don't think we'll get any progress on this soon.

Flags: needinfo?(dkeeler)

Marking the performance regression as WONTFIX as it's been over a year since it was reported and long-since shipped to user. Once bug 1370516 is resolved we should see a performance improvement but I don't see the value in keeping this open until then.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.