WindowsPreviewPerTab.jsm is extremely expensive when restoring a large session

VERIFIED FIXED in Firefox 63

Status

()

defect
P2
normal
VERIFIED FIXED
2 years ago
11 months ago

People

(Reporter: zxspectrum3579, Assigned: Gijs)

Tracking

(Blocks 1 bug, {perf})

57 Branch
Firefox 63
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox63 fixed)

Details

(Whiteboard: [fxperf:p2])

Attachments

(5 attachments)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
Build ID: 20171112125346

Steps to reproduce:

Just launched the browser.


Actual results:

During the starting time FireFox 57 window's title shows "Not Responding" as well as broked rendering of the window's content. This state is short, but it is definitely not a classy way to present an application to a user.


Expected results:

FireFox should not show the window unless the main process is "sure" that the window can be rendered properly without becoming "Not Responding".
Component: Untriaged → General
Severity: normal → critical
Keywords: hang
Could you record a profile so that we can understand why the window is hang? Here are some instructions to the gecko profiler:

https://developer.mozilla.org/docs/Mozilla/Performance/Reporting_a_Performance_Problem
Flags: needinfo?(zxspectrum3579)
Sorry, I forgot to mention the important part: the hang has happened during the first start up after upgrade from a previous version, not during start ups in regular use. During this time the browser is doing transomations of the session and other resources to a new, updated format. Since I have a lot of tabs (up to a thousand), it manifested in this way, what is usually unnoticable in small sessions.

This one-time process should not become super-modal to a point when the window is not responding. Maybe it should be moved to a point when window is not created yet?
Flags: needinfo?(zxspectrum3579)
Marco, is there a specific migration that you suspect might be causing this?
Flags: needinfo?(mak77)
Adjacent issues during regular startups of FireFox 57.03: 

1) the browser is showing blank address line for few seconds until it actually loads itself fully. It also shows blanket content area. While, of course, it is only temporary it still makes users momentary unconfortable because of the fear that the browser has lost a URL in the current tab. It is not a classy way for the browser to present itself.

WHAT TO DO: the browser should render the URL along with the address field, not some time later. The content area of the current tab has to show a big grey "wait" spinner, not a blank page.

2) the browser also does not show icons. It is not comfortable for users to fear, even shortly, that the browser has yanked all of the sites' icons.
Also, even after everything else is loaded and the browser is ready to be used, the icons that are not initially visible are loaded into the browser so slowly that user can see in real time how they appear in a pull-down menu with the full list of tabs and/or when user switches to a remote tab. Users do not have to wait for site icons to appear.

WHAT TO DO: the current way how things is an implementation of "lazy load" principle, but those icons are super tiny and it literally takes just something like a millisecond to render them. Maybe the current solution is just too "lazy" and needs a speed-up.
There are too many issues being reported in a single bug, we usually stick to 1 issue per bug, otherwise everything becomes untrackable.

(In reply to User Dderss from comment #4)
> 1) the browser is showing blank address line for few seconds until it
> actually loads itself fully. It also shows blanket content area. While, of
> course, it is only temporary it still makes users momentary unconfortable
> because of the fear that the browser has lost a URL in the current tab. It
> is not a classy way for the browser to present itself.

I suspect restore didn't finish yet and the browser isn't loaded, setting a url sooner may open the way to new spoofing security issues where we show a url different than the actual content for a brief time, and content may find ways to circumvent that.

> WHAT TO DO: the browser should render the URL along with the address field,
> not some time later. The content area of the current tab has to show a big
> grey "wait" spinner, not a blank page.

IIRC the wait spinners were removed because they made the browser look even slower.
Considering the two above problems, this part sounds like a wontfix.

> 2) the browser also does not show icons. It is not comfortable for users to
> fear, even shortly, that the browser has yanked all of the sites' icons.
> Also, even after everything else is loaded and the browser is ready to be
> used, the icons that are not initially visible are loaded into the browser
> so slowly that user can see in real time how they appear in a pull-down menu
> with the full list of tabs and/or when user switches to a remote tab. Users
> do not have to wait for site icons to appear.

We can't show icons that we don't have, so for those we should regardless wait for the page to load.
I'm not sure if there's a way for session restore to restore icons sooner, that requires someone with better restore knowledge, it may be a possibility.

> WHAT TO DO: the current way how things is an implementation of "lazy load"
> principle, but those icons are super tiny and it literally takes just
> something like a millisecond to render them. Maybe the current solution is
> just too "lazy" and needs a speed-up.

I'm not sure icons are ignorable, we have users with hundreds/thousands tabs, and loading and showing those is not an ignorable amount of work, wouldn't it be better to get the current page sooner, than the icons? Indeed we recently delayed icons loading because they were delaying content loading and other network fetches. Showing the icons sooner has a cost to be payed against something else, if that is loading content, I don't think we'd want to do that exchange.

(In reply to Panos Astithas [:past] (please ni?) from comment #3)
> Marco, is there a specific migration that you suspect might be causing this?

It's unclear to me which version was upgraded to 57, upgrading multiple versions at once can surely pay an higher cost than upgrading the immediately previous version. From a Places point of view the only expensive migration was to 55. 56 and 57 (from 55) were quite unexpensive.
But surely there have been other migrations around, for example I suppose photon customization did some kind of migration. I'm not aware of all the migrations that may have happened for each version, maybe having such a documentation around would be handy, even if hard to ensure people would update it constantly.
I'm cc-ing Florian since he's investigating startup times, but I don't think they are concentrating efforts on the single after-upgrade startup, it has a lower priority than everyday's startup in general.

Which version did you upgrade from?
Flags: needinfo?(mak77) → needinfo?(zxspectrum3579)
Sorry, I thought that the start-up issues are close enough to not generate more bug reports.

I was upgrading from version 56 to version 57. It looks like some start-up processes go in a super modal synchronous mode that makes the browser so busy that it fails to respond to Windows "is alive" pings. Does Mozilla have an analyzer utility that would automatically search the code for such super modal calls that could be suspect to cause such "not responding" effect?

As to the blank URL field: it can also show a spinner there just like the main area to make. Why can not FireFox check the size of the session and decide whether to render those spinners or not? This way it would not make the browser appear slower for the overwhelming majority of users that have small sessions as in those cases no spinners will be rendered while otherwise, they will.

On sites icons: I did not mean to load all of them simultaneously, I meant to load initially only those that will be visible -- it would be super fast. And then the rest of them as soon as the browser will be fully ready (just not as "lazy" as it is now when you can work in the browser for quite some time and the browser will not care to load the icons until user will click to pull down a list of all tabs or switch to a far away tab).

***

The attachment shows how, for example, the latest Opera (Chromium engine) deals with big sessions during start-up (I imported all of my tabs from FireFox): first of all, the session opens in Opera 51 much faster than in FireFox 57, and, second of all, it immediately shows URL in the address page as well as a spinner in the tab's label. FireFox can certainly do likewise, but now the first view it gives to users is a single (!) blank tab/page (I did not do screenshot of that), then a list of tabs with no icons and current tab with not URL or spinner anywhere, as the first attached screenshot shows.
Flags: needinfo?(zxspectrum3579)
The problem here isn't really what's shown on your screen during startup (your screenshots seem OK to me), but really that this stays on screen long enough for you to be able to take a screenshot of it.

As comment 1 said, we would need to see a profile of what's going on to be able to diagnose the problem.

To capture a startup profile, go to https://perfht.ml, install the add-on. Then quit Firefox.
Start it with these environment variables:
MOZ_PROFILER_STARTUP=1
MOZ_PROFILER_STARTUP_ENTRIES=30000000
and then as soon as Firefox has finished loading, press ctrl+shift+2.

In the profiler UI, wait for symbols to be downloaded, and then click the "Share" button in the top right corner.
Well, having this many tabs it is not surprising that it takes some time to load the session (even though it happens much faster in Opera Chromium). 

How do I anonymize my profile? I do not see such button in the profiler UI. Or the data is anonymized by default?
This new attachment is about how FireFox 57 looks initially during its startup, scaring the user into believing that it is loading an empty session. After few seconds, however, the view changes to "No site icons and blank page during startup.png" that I attached previously to this bug that shows that the session is not really yanked, but rather the current page is yanked, as well as all of the icons. 

Neither Opera nor Chrome have shown in my testing such two-stage change of blank/half blank renders (with the same big amount of tabs in the session; though Chrome was not quite good to test this as its lazy load mechanism does not really work, unlike Opera).
(In reply to User Dderss from comment #8)
> Well, having this many tabs it is not surprising that it takes some time to
> load the session (even though it happens much faster in Opera Chromium). 
> 
> How do I anonymize my profile? I do not see such button in the profiler UI.
> Or the data is anonymized by default?

There's no way to anonymize the profile. The profile will show URLs of the JS code that is executing. When restoring a session, it's likely that only the selected tab will actually load, so if you ensure the session will be restarted with a tab that doesn't reveal anything as the selected tab (eg. about:blank, or mozilla.org), you are probably fine. You can email me the link to the profile privately if you would like someone to double check that the profile contains no personally identifiable information before sharing it in the bug.
I looked at the file and did not find much other than URLs for YouTube JS scripts. However, my JSON file is 60 megabytes so I can not upload it to here, it fails (it took a long time for FF to load with the profiler activated, hence the resulting file is not small). How do I upload it?
(In reply to User Dderss from comment #11)
> I looked at the file and did not find much other than URLs for YouTube JS
> scripts. However, my JSON file is 60 megabytes so I can not upload it to
> here, it fails (it took a long time for FF to load with the profiler
> activated, hence the resulting file is not small). How do I upload it?

You could try: https://send.firefox.com/
Flags: needinfo?(zxspectrum3579)
Thanks.

This test system always failed when I tried to upload a 60 MB file, and failed also with a 12 MB compressed file (approached 99% and then aborted), however, it did finally finish:

https://send.firefox.com/download/9d56d594fd/#LPrINsed8js5jA3uVeqYLQ
Flags: needinfo?(zxspectrum3579)
Unfortunately the link has expired before we could take a look. My apologies for that. Could you please do it once again and then immediately needinfo myself to make sure we download it on time? Thanks!
Flags: needinfo?(zxspectrum3579)
Hi, Panos, here is an updated link: https://send.firefox.com/download/de2a0ea62f/#Ptmg1zURgicWSCuTura47A

The page has crashed on me and this is normal for a test stage, but the crash did leave anything in "about:crashes" in either sent or unsent parts -- is it normal?
Flags: needinfo?(zxspectrum3579)
(Ah, it looks I now see the reason why FF has crashed without leaving anything in the reports: it was an OOM crash: one of FireFox.exe processes has 29 GB of allocated memory, even though currently uses only 1.8 GB of it. Whether this is related to Send test site or to something else I do not know. How do I debug this and file a bug report?)
Florian, I have sent you an email with the profile to take a look.

A crash may not appear in about:crashes if breakpad couldn't record the crash for whatever reason. Often times when this happens you get an OS-level window suggesting to report the crash to the OS vendor. That window usually contains useful information, albeit less useful than what we get from within Firefox.
Flags: needinfo?(florian)
Thanks.

Just in case I filed a separate #bug 1431486 with all the details for those who deal with the memory allocation issues.
(In reply to Panos Astithas [:past] (please ni?) from comment #17)
> Florian, I have sent you an email with the profile to take a look.

Very interesting profile, thanks! It shows a 13s hang while restoring the tabs: https://perfht.ml/2mOdYq7

The vast majority of this time is spent in resource:///modules/WindowsPreviewPerTab.jsm, with O(n2) code where n is the tab count.

The indexOf call at https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/browser/modules/WindowsPreviewPerTab.jsm#636 is alone responsible for 36% of the time.

updateTabOrdering iterates over every tabs at https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/browser/modules/WindowsPreviewPerTab.jsm#510 and this function is called for each TabOpen event https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/browser/modules/WindowsPreviewPerTab.jsm#531
This is 10% of the time.

It seems to me that this code should run once after we are done restoring all the tabs, rather than once each time we open a tab. And this AeroPeak code should probably be started off of an idle callback once we are fully done with startup.

This is currently started from delayedStartup at https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/browser/base/content/browser.js#1492
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(florian)
Keywords: hangperf
Priority: -- → P2
Summary: FF 57's window is "Not Responding" during start up → WindowsPreviewPerTab.jsm is extremely expensive when restoring a large session
Whiteboard: [fx-perf]
Tentatively marking this fxperf:p2 as it falls within florian's startup area of concentration.
Whiteboard: [fxperf] → [fxperf:p2]
Initialize WindowsPreviewPerTab later, and make its favicon update code more efficient.
Component: General → Shell Integration
I've tested this on Windows 10 and I don't see any ill effects from my patch. Unfortunately, even prior to my change, I only see the default application icon (not Firefox's icon, *Windows* default application icon) for tabs when I turn this feature on. So I don't know if this makes anything worse on earlier versions of Windows (notably Win7), or if the favicon handling is just broken outright because we now have hidpi favicons, or what. I also noticed that like bug 1293692 and bug 1427629, the preview functionality doesn't interact well with lazy tabs (you get black previews).
Component: Shell Integration → General
Comment on attachment 8997131 [details]
Bug 1418793 - stop trying to get per-tab previews to work prior to session restore, r?florian

Florian Quèze [:florian] has approved the revision.
Attachment #8997131 - Flags: review+
(In reply to :Gijs (Not available 3-19 Aug; he/him) from comment #22)

This sounds like the feature would benefit from some QA exploratory testing (maybe including a few mozregression runs to check if some of this has always been broken, or broke when we introduced hidpi favicons). I'm not sure we currently have anybody available to fix the resulting bugs though.
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/3d64c00c695a
stop trying to get per-tab previews to work prior to session restore, r=florian
(In reply to Florian Quèze [:florian] (PTO until August 27th) from comment #24)
> (In reply to :Gijs (Not available 3-19 Aug; he/him) from comment #22)
> 
> This sounds like the feature would benefit from some QA exploratory testing
> (maybe including a few mozregression runs to check if some of this has
> always been broken, or broke when we introduced hidpi favicons). I'm not
> sure we currently have anybody available to fix the resulting bugs though.

I concur. Ryan, who would we contact about this?
Flags: needinfo?(ryanvm)
https://hg.mozilla.org/mozilla-central/rev/3d64c00c695a
Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 63
Assignee: nobody → gijskruitbosch+bugs
Tania is the QA supervisor you're looking for :)
Flags: needinfo?(ryanvm) → needinfo?(tmaity)
(In reply to :Gijs (he/him) from comment #26)
> (In reply to Florian Quèze [:florian] (PTO until August 27th) from comment
> #24)
> > (In reply to :Gijs (Not available 3-19 Aug; he/him) from comment #22)
> > 
> > This sounds like the feature would benefit from some QA exploratory testing
> > (maybe including a few mozregression runs to check if some of this has
> > always been broken, or broke when we introduced hidpi favicons). I'm not
> > sure we currently have anybody available to fix the resulting bugs though.
> 
> I concur. Ryan, who would we contact about this?

(In reply to Ryan VanderMeulen [:RyanVM] from comment #28)
> Tania is the QA supervisor you're looking for :)

Had a quick chat with :Gijs, this is on our radar now and will have a QA owner assigned to do some exploratory testing next week.
Flags: needinfo?(tmaity)
Depends on: 1485253
As a result of an exploratory session, there are a few things that I've noted in the link bellow:

https://docs.google.com/document/d/1jg02IxuhqT-jDWY_lST01Jzl1jt1nDEFtUGBzKCpWUw/edit

Would need a confirmation regarding a few and that the noted pref was indeed the one you mentioned about in comment 22.
As for the unrelated ones, will investigate around them and log issues separately if they turn out to be new issues.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Cristi Fogel [:cfogel] from comment #30)
> As a result of an exploratory session, there are a few things that I've
> noted in the link bellow:
> 
> https://docs.google.com/document/d/1jg02IxuhqT-
> jDWY_lST01Jzl1jt1nDEFtUGBzKCpWUw/edit
> 
> Would need a confirmation regarding a few and that the noted pref was indeed
> the one you mentioned about in comment 22.
> As for the unrelated ones, will investigate around them and log issues
> separately if they turn out to be new issues.

I'm very confused. This change is about the Windows-only taskbar preview feature. These are previews that appear if you hover the taskbar entry for Firefox on Windows. By default you get 1 preview per window. If you go into Firefox's Options, there is a checkbox labeled "Show tab previews in the Windows taskbar". If you turn this on, you get 1 preview per tab (limited to 20 by default, there's a hidden pref to change that limit, see also bug 1480686). It would be nice to have some testing to see whether this bug's code change regressed anything in the icon/preview used for those taskbar previews, and/or in switching to or closing tabs, on Win7/Win8/Win10.


From the document, it seems like you've done testing regarding the ctrl-tab preview feature and/or session restore. That's not really related here, the only connection is that the ctrl-tab code uses the same backend to generate previews, which is why they both get black thumbnails for restored tabs that have not been focused yet. And yes, the pref you list in the document ("browser.sessionstore.restore_tabs_lazily") is related to whether those restored tabs get loaded immediately and thus do or do not have black thumbnails. That's what comment 22 was about.

Anyway, at least item (7) and (13) seem like they want bugs filing. (7) is related to bug 1473595 (as are any other remarks about ctrl-tab previews; you may want to check with Dão in that bug to see if it's worth filing bugs on the other points in your doc that relate to ctrl-tab previews). (13) seems like it's a result of bug 1484275, so checking that and maybe needinfo'ing Paolo on a regression bug you file for that would be worthwhile.

For items (1) and (2), it's not clear to me if they happen only with taskbar or ctrl-tab previews turned on, or only if you turn off lazy session restore, or... ?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(cristian.fogel)
I appreciate the extra information and the feedback! Yeah, it seems I was a bit off.
You've mentioned the "hidden pref to change that limit", mind sharing it?  

Items (1) and (2) wasn't related to the pref being turned on/off, it is a note that the performance is affected with multiple tabs, there was one case so far that the script became unresponsive due to the high load.

For items (7) and (13) will try to contact the mentioned people asap.
I've added 2 new points to the list, (15) may be linked to bug 1403620, (16) regression from bug 1476224.

Your changes don't seem to have introduced anything new (checking with Win10/8 x 64, Win7x32 with both classic and default theme).
Flags: needinfo?(cristian.fogel)
(In reply to Cristi Fogel [:cfogel] from comment #32)
> I appreciate the extra information and the feedback! Yeah, it seems I was a
> bit off.
> You've mentioned the "hidden pref to change that limit", mind sharing it?

browser.taskbar.previews.max

in about:config is the pref I meant.

> Items (1) and (2) wasn't related to the pref being turned on/off, it is a
> note that the performance is affected with multiple tabs, there was one case
> so far that the script became unresponsive due to the high load.

Yeah, I'm not too surprised by this. I think we have bugs already on file, maybe to load 1 tab and create the other tabs as lazy tabs, to be loaded when the user selects them.

> For items (7) and (13) will try to contact the mentioned people asap.
> I've added 2 new points to the list, (15) may be linked to bug 1403620, (16)
> regression from bug 1476224.

Thanks! Yeah, I updated the summary of https://bugzilla.mozilla.org/show_bug.cgi?id=1403620, seems it's to do with svg icons. If you are seeing something specific to session-restored pages, even websites (rather than about:config / about:preferences and other internal Firefox pages) then it may be worth filing a separate bug for this.

The doorhanger one (16) is kind of interesting. Filing a bug in `Firefox :: Site Identity and Permissions Panels` and needinfo'ing :johannh might be useful. I don't know if it's something we can reasonably fix, but it'll be good to discuss explicitly in a separate bug.

> Your changes don't seem to have introduced anything new (checking with
> Win10/8 x 64, Win7x32 with both classic and default theme).

Great! Going to mark this verified then. :-)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.