Initial tab in new window can do unnecessary process flip when opened from external application

VERIFIED FIXED in Firefox 66

Status

()

P3
normal
VERIFIED FIXED
3 months ago
18 days ago

People

(Reporter: igoldan, Assigned: mconley)

Tracking

(Blocks: 2 bugs, {perf, regression, talos-regression})

unspecified
Firefox 66
perf, regression, talos-regression
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox63 unaffected, firefox64 unaffected, firefox65+ disabled, firefox66+ fixed)

Details

Attachments

(2 attachments)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=5c3e0fc819ba0da8cdcd85eb16cedcb9d2c8f81e&tochange=d835874be99afb421be275d5ed04b1a050458977

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 12%  sessionrestore linux64 pgo e10s stylo                     333.89 -> 373.67
  9%  sessionrestore_no_auto_restore linux64 pgo e10s stylo     385.56 -> 422.08
  7%  sessionrestore linux64 opt e10s stylo                     372.83 -> 400.00

Improvements:

 18%  tp5n nonmain_startup_fileio windows7-32 opt e10s stylo        2,641,408.71 -> 2,155,141.33
 18%  tp5n nonmain_startup_fileio windows7-32 pgo e10s stylo        2,633,549.62 -> 2,156,848.83
  3%  sessionrestore windows7-32 pgo e10s stylo                     323.33 -> 314.85
  3%  sessionrestore windows10-64 pgo e10s stylo                    322.32 -> 314.00
  2%  sessionrestore_no_auto_restore windows7-32 pgo e10s stylo     372.71 -> 363.75


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=17292

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling
Component: General → Activity Streams: Newtab
Product: Testing → Firefox
Flags: needinfo?(mconley)
Here's the new AWSY perf status, which I believe results in a wontfix.

== Change summary for alert #17278 (as of Wed, 31 Oct 2018 07:14:47 GMT) ==

Regressions:

  6%  Heap Unclassified windows10-64 pgo stylo     41,277,894.62 -> 43,838,079.72
  6%  Resident Memory windows10-64 pgo stylo       472,781,554.38 -> 501,640,238.68
  6%  JS windows10-64 pgo stylo                    110,376,800.65 -> 116,663,221.14

Improvements:

 14%  Base Content Explicit linux64-qr opt stylo              13,645,994.67 -> 11,758,592.00
 14%  Base Content Heap Unclassified linux64-qr opt stylo     5,269,180.33 -> 4,555,003.33
 10%  Base Content JS linux64-qr opt stylo                    5,124,735.00 -> 4,588,253.33

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=17278
I'll post the Gecko profiles soon.
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #2)
> I'll post the Gecko profiles soon.

:jmaher I'm experiencing failures when trying to get profiles from Linux 64bit PGO.
Flags: needinfo?(jmaher)
:igoldan, this appears to be broken, I tried again, and it failed, both on pgo and opt.  I would recommend pushing before/after to try and edit talos.json [1] to make other just be sessionrestore, then the profiles should work.

[1] https://searchfox.org/mozilla-central/source/testing/talos/talos.json#16
Flags: needinfo?(jmaher) → needinfo?(igoldan)
also, lets file a bug to fix this- we get profiles for ts_paint, tpaint, a11y, but fail on sessionrestore- looking at the logs, it appears that sessionrestore is broken in general- maybe there is an issue with sessionrestore and profile gathering, it was recently converted to a web extension.
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #5)
> :igoldan, this appears to be broken, I tried again, and it failed, both on
> pgo and opt.  I would recommend pushing before/after to try and edit
> talos.json [1] to make other just be sessionrestore, then the profiles
> should work.
> 
> [1] https://searchfox.org/mozilla-central/source/testing/talos/talos.json#16

I did what you requested, but the profiling still fails [1].

[1] https://treeherder.mozilla.org/#/jobs?repo=try&author=igoldan%40mozilla.com&fromchange=3d69ba3f3b57598d3fb6075b0785fc59c070bb1c&tochange=ed69700debffdeb2be28b64790b930fe58e7169f
Flags: needinfo?(igoldan)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #6)
> also, lets file a bug to fix this- we get profiles for ts_paint, tpaint,
> a11y, but fail on sessionrestore- looking at the logs, it appears that
> sessionrestore is broken in general- maybe there is an issue with
> sessionrestore and profile gathering, it was recently converted to a web
> extension.

sessionrestore fishiness was also reported in bug 1503490 [1].

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1503490#c2
Depends on: 1504133
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #6)
> also, lets file a bug to fix this-

I filed bug 1504133 for this.
status-firefox63: --- → unaffected
status-firefox64: --- → unaffected
status-firefox65: --- → affected
status-firefox-esr60: --- → unaffected
tracking-firefox65: --- → +
(Assignee)

Comment 10

3 months ago
Investigating.
Assignee: nobody → mconley
Flags: needinfo?(mconley)
Iteration: --- → 65.2 (Nov 16)
(Assignee)

Comment 11

2 months ago
With bug 1504133 fixed, it should be possible to get profiles for this now for analysis.

Try pushes:

Base (with separate process): https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=84408752945d0c2da7a65d1910ea408f124556af

Patch (separate process disabled): https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=b68150f95c1c65c2160bd5fb81af1fdd3a20f104
Flags: needinfo?(mconley)
Iteration: 65.2 (Nov 16) → 65.3 (Nov 30)
Comment hidden (obsolete)
Comment hidden (obsolete)
(Assignee)

Updated

2 months ago
Depends on: 1509571
Any updates on this matter?
(Assignee)

Updated

a month ago
Blocks: 1508577
No longer blocks: 1472212
Flags: needinfo?(mconley)
(Assignee)

Updated

a month ago
Blocks: 1513045
No longer blocks: 1508577
(Assignee)

Comment 16

a month ago
Still digging.
Flags: needinfo?(mconley)
(Assignee)

Comment 17

a month ago
It took basically forever to track this down, but I think there's a real regression here, and something actionable to fix.

My hypothesis is that we're doing an unnecessary process flip immediately on start-up. I suspect we're reaching this code: https://searchfox.org/mozilla-central/rev/3160ddc1f0ab55d230c595366662c62950e5c785/browser/base/content/tabbrowser.js#308, which selects E10SUtils.DEFAULT_REMOTE_TYPE as the initial browser type, and then we load about:newtab, which causes the process flip to the privileged content process.

The reason it took me so long to figure this out, is because as soon as I start profiling, all of the timings shift, and the process flip occurs _after_ the window of time that the sessionrestore tests measure. Super frustrating.

So I had to fallback to first principals - effectively, logging timestamps to try to find where time was going, and logging some events to see what was happening in between those timestamps.

And lo and behold, in the "bad" case, with the separate privileged process enabled, I noticed we were doing a process flip.

I'm going to throw together a patch to ensure this is where the regression is coming from and push to try.
Flags: needinfo?(mconley)
(Assignee)

Comment 18

a month ago
The other good news is that, since this is due to the privileged content process, and because the privileged content process is held on Nightly for now, this regression has not gone up to beta.
(Assignee)

Comment 19

a month ago
Hooray, hypothesis supported by evidence - here's a comparison between a try push that has no privileged content process, and one that avoids the flip:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=ed465463cbd0874d8190c3d80b772e4370696926&newProject=try&newRevision=4a3f685f69f918cf391925453a6e701a67b63232&framework=1

No difference, so I think we've found our culprit.
Summary: 7.29 - 11.91% sessionrestore / sessionrestore_no_auto_restore (linux64) regression on push d835874be99afb421be275d5ed04b1a050458977 (Wed Oct 31 2018) → Initial browser tab in each window does unnecessary process flip when using default home page with privileged content process enabled

Comment 20

a month ago
Wait, so isn't this already fixed per bug 1509906? Is the fix in that bug broken? Or is the way session restore and/or the talos version passes arguments "special"? We definitely do string comparisons agains the `uriToLoad` thing elsewhere... :-(

I guess this is just more evidence we should fix bug 1485961
(In reply to :Gijs (he/him) from comment #20)
> Wait, so isn't this already fixed per bug 1509906? Is the fix in that bug
> broken?

It works with session restore disabled from what I can tell.
(Assignee)

Comment 22

a month ago
I think this is further evidence that we need to fix bug 1485961. In the case of the sessionrestore_no_auto_restore (and presumably sessionrestore) Talos tests, what we're getting at[1] is an nsIMutableArray, containing a single nsISupportsString.

This appears to be from here:

https://searchfox.org/mozilla-central/rev/fd32b3a6fa3eff1468311f6fcf32b45c117136df/browser/components/nsBrowserContentHandler.js#244-262

[1]: https://searchfox.org/mozilla-central/rev/3160ddc1f0ab55d230c595366662c62950e5c785/browser/base/content/tabbrowser.js#301
Updating summary to reflect what I hope I understand is going on here based on previous comments... i.e. in the real world this doesn't affect the home page, right?
Component: Activity Streams: Newtab → Tabbed Browser
Priority: -- → P3
Summary: Initial browser tab in each window does unnecessary process flip when using default home page with privileged content process enabled → Initial tab in new window can do unnecessary process flip when opened from external application
(Assignee)

Updated

a month ago
Blocks: 1514778
(Assignee)

Comment 24

a month ago
Created attachment 9031898 [details]
Bug 1503796 - Convert XPCOM arrays and strings from nsBrowserContentHandler to JS arrays and strings earlier when opening browser windows. r?Gijs

This also makes it so that the initial browser tab setup code can handle a JS array
as the URI(s) to load during start-up. If it's an array, the first element of that
array is inspected to determine what process type the initial browser tab should
be in.
(Assignee)

Comment 25

a month ago
Created attachment 9031900 [details]
Bug 1503796 - Add a regression test to ensure that the initial process type is set correctly. r?Gijs

Depends on D14755

Comment 26

a month ago
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/51dd9bb13e07
Convert XPCOM arrays and strings from nsBrowserContentHandler to JS arrays and strings earlier when opening browser windows. r=Gijs
https://hg.mozilla.org/integration/autoland/rev/d125adfbfc9f
Add a regression test to ensure that the initial process type is set correctly. r=Gijs

Comment 27

a month ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/51dd9bb13e07
https://hg.mozilla.org/mozilla-central/rev/d125adfbfc9f
Status: NEW → RESOLVED
Last Resolved: a month ago
status-firefox66: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 66
status-firefox65: affected → disabled
tracking-firefox66: --- → +
Flags: in-testsuite+
== Change summary for alert #18303 (as of Mon, 17 Dec 2018 21:23:13 GMT) ==

Regressions:

  5%  sessionrestore windows10-64-qr opt e10s stylo                     314.12 -> 328.83
  5%  sessionrestore_no_auto_restore windows10-64-qr opt e10s stylo     333.23 -> 348.42
  4%  sessionrestore_no_auto_restore windows7-32 pgo e10s stylo         322.33 -> 335.25
  4%  sessionrestore windows10-64 opt e10s stylo                        318.21 -> 329.83
  4%  sessionrestore windows7-32 opt e10s stylo                         310.25 -> 321.25
  3%  sessionrestore_no_auto_restore windows10-64 opt e10s stylo        337.71 -> 349.27
  3%  sessionrestore windows7-32 pgo e10s stylo                         304.29 -> 314.67
  3%  sessionrestore_no_auto_restore windows10-64 pgo e10s stylo        318.33 -> 328.58
  3%  sessionrestore windows10-64 pgo e10s stylo                        300.61 -> 309.20

Improvements:

 11%  sessionrestore_no_auto_restore osx-10-10 opt e10s stylo     756.54 -> 675.25
  9%  sessionrestore linux64 opt e10s stylo                       364.29 -> 331.92
  8%  sessionrestore_no_auto_restore linux64 opt e10s stylo       381.54 -> 349.50
  4%  sessionrestore linux64-qr opt e10s stylo                    828.75 -> 799.08

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18303
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.