Closed Bug 1105891 Opened 5 years ago Closed 5 years ago

[e10s] Closing a tab sometimes causes Firefox to hang

Categories

(Core :: General, defect)

x86_64
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
e10s ? ---

People

(Reporter: rowbot, Assigned: jimm)

References

(Blocks 1 open bug)

Details

(Keywords: hang, steps-wanted)

Attachments

(4 files, 1 obsolete file)

Not sure where I should be filing this bug, so please move it to a more appropriate category if needed.

STR:
1) Open Hulu.com and another tab (so that you don't have to keep reopening the browser)
2) Close the Hulu.com tab.
3) Repeat steps 1 and 2 as necessary, as it doesn't occur all the time.

Actual Results:
With e10s enabled, closing tabs sometimes causes firefox to hang.  This hang can last as long as 30 seconds.  After the tab finally closes, all other tabs that are open fail to draw anything and display a blank white page.

Expected results:
The tab should close immediately.  The browser and other tabs should be functional and responsive.

I have seen this on a few other sites as well, but occurs most often for me on Hulu. Bug 1102302 seems similar, however, I can reproduce this with CSS filters disabled.

I captured the following profile [1], so I hope is sheds some light on this.  I can profile it again if needed.

[1] http://people.mozilla.org/~bgirard/cleopatra/#report=e66f05376173e6c868427763c172340ea0f05e8e
Is this a plugin hang like bug 1071751?
tracking-e10s: --- → m5+
Keywords: hang
Summary: [e10s] Closing a tab sometimes causes firefox to hang → [e10s] Closing a tab sometimes causes Firefox to hang
I don't think so.  The only plugin it is trying to use is Flash, and I had that set to Ask to Activate and I had never activated the plugin so no plugin code should have been running.

I have been looking at this more, and this seems to be what generally happens:
1) The page seems to paint everything that should be there.
2) I cannot interact with the content process (mouse cursor doesn't change when hovering over things, can't scroll, can't click on anything)
3) The chrome process has been responsive thus far.
4) When I try to close the tab or the browser, that is when the chrome process becomes unresponsive as well.  It will stay unresponsive for up to 30 seconds before the tab or browser eventually closes.  I can minimize the browser during this time, but when I unminimize the page is displayed as a blank white page.

If I let the page sit there long enough (maybe a minute?) after loading, I can eventually interact with the content process, but it is slow.

Also, if you have multiple tabs open, the other tabs will display as blank white pages for about 10 seconds after the problem tab has closed.

My best guess is that the content process is getting flooded with events and it is taking up to 30 seconds for Firefox to process those events before it can respond to a request to close a tab.
Flags: needinfo?(lhenry)
Assignee: nobody → jmathies
Keywords: steps-wanted
Hey Trevor, are you still seeing these issues in latest nightly?
Flags: needinfo?(lhenry) → needinfo?(smokey101stair)
I'm trying this with hulu.com, youtube, and other sites running flash and not hitting this bug (with Win7 and today's Nightly).
I am still experiencing this bug, however, it is not as severe as when I first reported it.  It occurs less frequently now, and the delay for closing the tab is now probably no longer than 10 seconds rather then the 30 that I originally described.  I can try to get another profile of it again or capture my screen so you can see what is happening if you would find that helpful.
Flags: needinfo?(smokey101stair)
Thanks for the follow up. Another question, do you have any addons installed? Can you try reproducing with any addons you might have disabled?
Flags: needinfo?(smokey101stair)
I can reproduce this in a clean profile.
Flags: needinfo?(smokey101stair)
Attached video long tab close time.mp4
I have attached a short video which just shows that closing a tab can sometimes take a considerable amount of time.  In this video it takes a little over 11 seconds to close the tab containing www.hulu.com.  During those 11 seconds, the chrome process is completely unresponsive.  This was in a clean profile.
This is a video showing how long it sometimes takes firefox to recover after closing a tab.  In this instance, it takes Firefox 7 seconds before it can paint the www.google.com homepage after closing a tab containing www.hulu.com.

Ignore the fact that the doorhangers aren't showing up when I click on the UI in this video, I forgot to have my recording software capture layered windows.

The events shown in this video and the one from comment 8 often occur together which makes the total wait time before a user can successfully interact with the browser even longer, but I captured them separately so that it might be easier to analyze.  Like in the video from comment 8, this was done on a clean profile.

Hope you find these videos useful.
Any chance you can generate a fresh profile of this Trevor?
Flags: needinfo?(smokey101stair)
Also, please post your about:support text.
Status: UNCONFIRMED → NEW
Depends on: 1125471
Ever confirmed: true
Here is a fresh profile[1], the tab took about 7 seconds to close when this profile was captured.

[1] http://people.mozilla.org/~bgirard/cleopatra/#report=eebdc055340c9d6a4ad864a23ab30812b0cacd17
Flags: needinfo?(smokey101stair)
about:support info as requested:

Application Basics
------------------

Name: Firefox
Version: 38.0a1
Build ID: 20150212030231
Update Channel: nightly
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:38.0) Gecko/20100101 Firefox/38.0
Multiprocess Windows: 1/1 (default: true)

Crash Reports for the Last 3 Days
---------------------------------

Report ID: bp-447f1df3-8924-4a96-a07c-15db12150211
Submitted: 1 day ago

Report ID: bp-7211d6d8-86df-4e9a-ab05-dd0472150210
Submitted: 2 days ago

All Crash Reports

Extensions
----------

Name: Gecko Profiler
Version: 1.15.7
Enabled: true
ID: jid0-edalmuivkozlouyij0lpdx548bc@jetpack

Graphics
--------

Adapter Description: NVIDIA GeForce GTX 660
Adapter Drivers: nvd3dumx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um
Adapter RAM: 2048
Device ID: 0x11c0
DirectWrite Enabled: false (6.2.9200.16492)
Driver Date: 2-5-2015
Driver Version: 9.18.13.4752
GPU #2 Active: false
GPU Accelerated Windows: 1/1 Direct3D 11 (OMTC)
Subsys ID: 30693842
Vendor ID: 0x10de
WebGL Renderer: Google Inc. -- ANGLE (NVIDIA GeForce GTX 660 Direct3D11 vs_5_0 ps_5_0)
windowLayerManagerRemote: true
AzureCanvasBackend: skia
AzureContentBackend: cairo
AzureFallbackCanvasBackend: cairo
AzureSkiaAccelerated: 0

Important Modified Preferences
------------------------------

browser.cache.disk.capacity: 358400
browser.cache.disk.smart_size.first_run: false
browser.cache.disk.smart_size.use_old_max: false
browser.cache.frecency_experiment: 4
browser.download.importedFromSqlite: true
browser.places.smartBookmarksVersion: 7
browser.sessionstore.upgradeBackup.latestBuildID: 20150212030231
browser.startup.homepage_override.buildID: 20150212030231
browser.startup.homepage_override.mstone: 38.0a1
dom.mozApps.used: true
extensions.lastAppVersion: 38.0a1
gfx.direct2d.disabled: true
gfx.direct3d.last_used_feature_level_idx: 0
media.gmp-gmpopenh264.lastUpdate: 1423603763
media.gmp-gmpopenh264.version: 1.3
media.gmp-manager.lastCheck: 1423698683
network.cookie.prefsMigrated: true
places.database.lastMaintenance: 1423698685
places.history.expiration.transient_current_max_pages: 104858
plugin.disable_full_page_plugin_for_types: application/pdf
plugin.importedState: true
privacy.sanitize.migrateFx3Prefs: true

Important Locked Preferences
----------------------------

JavaScript
----------

Incremental GC: true

Accessibility
-------------

Activated: false
Prevent Accessibility: 0

Library Versions
----------------

NSPR
Expected minimum version: 4.10.8
Version in use: 4.10.8

NSS
Expected minimum version: 3.17.4 Basic ECC
Version in use: 3.17.4 Basic ECC

NSSSMIME
Expected minimum version: 3.17.4 Basic ECC
Version in use: 3.17.4 Basic ECC

NSSSSL
Expected minimum version: 3.17.4 Basic ECC
Version in use: 3.17.4 Basic ECC

NSSUTIL
Expected minimum version: 3.17.4
Version in use: 3.17.4

Experimental Features
---------------------
(In reply to Trevor Rowbotham from comment #12)
> Here is a fresh profile[1], the tab took about 7 seconds to close when this
> profile was captured.
> 
> [1]
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=eebdc055340c9d6a4ad864a23ab30812b0cacd17

This is odd, there's no content process information here. Browser chrome is stuck in a cpow triggered in sessionstore code on the mouse down / tab close event.

Any chance you could generate another of these?
Depends on: 1109875
No longer depends on: 1125471
Here is another profile, hopefully it contains useful information this time.  If not, I'll try switching to the 32-bit nightly to profile this to see if it makes a difference, though I'm pretty sure I read last week that profiling in the 64-bit nightly was working now.

http://people.mozilla.org/~bgirard/cleopatra/#report=9421ff6baf93ea64b70fc88f78b60e9820db22c2
After rereading what you said in comment 14, I'm pretty sure I completely misunderstood what you were saying.  If the browser is getting stuck in a cpow in the tab close event, then I think that Bug 1065463 is probably pretty similar to this as I would assume the same code is run whether you close the tab by clicking the close tab button or clicking the tab with the middle mouse button.
Blocks: e10s-perf
(In reply to Trevor Rowbotham from comment #15)
> Here is another profile, hopefully it contains useful information this time.
> If not, I'll try switching to the 32-bit nightly to profile this to see if
> it makes a difference, though I'm pretty sure I read last week that
> profiling in the 64-bit nightly was working now.
> 
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=9421ff6baf93ea64b70fc88f78b60e9820db22c2

Unfortunately that profile is missing content data as well. :/ The main process is there, but the child is missing. It would be great if you could try again with 32-bit build, just to see if it helps.
Flags: needinfo?(smokey101stair)
Here are two profiles from the 32-bit nightly, which should have content data.  I'll try to capture another profile since the closing delay on the tab in these reports were only about 3 seconds.

http://people.mozilla.org/~bgirard/cleopatra/#report=5b03b7090533cbf18112b569e15df8c1a1df9428
http://people.mozilla.org/~bgirard/cleopatra/#report=141b0f7d560f5c533625bafb395894b4ead73a65
Flags: needinfo?(smokey101stair)
Attached file profile-2015-02-12-32-bit.zip (obsolete) —
I was having some difficulty reproducing this in today's nightly, so I tried the 32-bit version of the 2015-02-12 build.  Unfortunately, it doesn't seem to contain content data like the profiles from comment 12 and comment 15, which are from the 64-bit 2015-02-12 build. But, I wanted to post it anyways since it took about 32 seconds for the tab to close and you might be able to get some info out of it.
The theory is that the content data is in these profiles but there's a bug someplace in getting them to display properly in the cleopatra interface.
Is there anything else I can do on my end to help?
(In reply to Jim Mathies [:jimm] from comment #20)
> The theory is that the content data is in these profiles but there's a bug
> someplace in getting them to display properly in the cleopatra interface.

I just opened a few of these profiles in my text editor since they are just large json files and did a search.  The profiles that actually show a blank Content category in cleopatra have the following line:

{"name":"Content","samples":[]

This seems to indicate that the profile contains no content data since the samples array is empty.  The profiles in which cleopatra doesn't even show the blank Content category don't contain that line, which would explain why the Content category doesn't show up in cleopatra in some of the profiles.

It is odd that some of the profiles have an object for the content data with an empty samples array, while some of the other profiles don't contain an object for the content data at all.

Based on that information, it seems like this is a bug in the profiler itself rather than in cleopatra.  I could be wrong here, but what do you think?
Flags: needinfo?(jmathies)
Certainly possible, and from the data you pulled out, sounds right.
Flags: needinfo?(jmathies)
I finally discovered the problem that was causing profiles to be missing the content data.  Apparently, the Gecko Profiler fails to collect content data when the Windows User Account Control (UAC) setting is disabled, which it is on my machine.  I assume this is some sort of permissions problem between Windows and the Gecko Profiler.  I'll file a bug for this quirk in the Gecko Profiler.

I've set UAC back to its default setting and I will try to grab some profiles later today or over the weekend.
(In reply to Trevor Rowbotham from comment #24)
> I finally discovered the problem that was causing profiles to be missing the
> content data.  Apparently, the Gecko Profiler fails to collect content data
> when the Windows User Account Control (UAC) setting is disabled, which it is
> on my machine.  I assume this is some sort of permissions problem between
> Windows and the Gecko Profiler.  I'll file a bug for this quirk in the Gecko
> Profiler.
> 
> I've set UAC back to its default setting and I will try to grab some
> profiles later today or over the weekend.

Awesome! Thanks for digging in to this. Please post the bug # once you file it.
Attached file Profiles.zip
For anyone else reading this, in response to comment #25, the bug number is Bug 1135173.

Attached are 5 different profiles.  The tabs all closed with in about 5 seconds in these profiles.  I tried for an hour to get a longer one, but couldn't get it to happen.
(In reply to Trevor Rowbotham from comment #26)
> Created attachment 8574219 [details]
> Profiles.zip
> 
> For anyone else reading this, in response to comment #25, the bug number is
> Bug 1135173.
> 
> Attached are 5 different profiles.  The tabs all closed with in about 5
> seconds in these profiles.  I tried for an hour to get a longer one, but
> couldn't get it to happen.

We are so close here, I see content stacks now! Unfortunately the profiles are missing symbol information, are you still testing with 32-bit Nightly builds?
Attachment #8567153 - Attachment is obsolete: true
If I remember correctly, 2 of the profiles were 32-bit builds[1], and 3 were 64-bit builds[2].  I was using these builds to diagnose the missing content stack stuff and they were the builds that I had used for profiles in comment #0 and comment #15. Do I need to profile on a debug build or not use a zipped version?

[1] https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/11/2014-11-27-03-02-08-mozilla-central/firefox-36.0a1.en-US.win32.zip
[2] https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2015-02-12-03-02-31-mozilla-central/firefox-38.0a1.en-US.win64.zip
Using a zipped build might be an issue, lets try a normal nightly install.
Attached file more_profiles.zip
Here are 2 more profiles I grabbed from today's nightly, which is a normal nightly install.
This is interesting, we're stuck in IPDL::PHttpChannel::RecvOnStopRequest on the child side doing a bunch of js on the hulu site. The parent is stuck in this session store flush call blocked on the child.

blake any ideas here?
Flags: needinfo?(mrbkap)
(In reply to Jim Mathies [:jimm] from comment #31)
> blake any ideas here?

My only thought is that we have to get rid of the CPOW in the session store code. Bug 1109875 (which already blocks this bug) is m8+, but is blocked on bug 1126089 (which is m5+). As annoying as this bug is, I wonder if it wouldn't make sense to push this to m6 and bring bug 1109875 to m6 instead of trying to do something crazy as a stop-gap.
Flags: needinfo?(mrbkap)
Hey Jim,

I've got another profile here for you.  I was doing some PHP work and ended up creating an infinite loop.  I tried to close the browser tab rather than waiting for the PHP script to timeout, but the browser locked up on me for about 15 seconds like all the other times in this bug, but looking at this new profile it seems to be different from the other ones.  I think I remember seeing a bug about not being able to close a tab while it was loading, but I can't find it right now.

http://people.mozilla.org/~bgirard/cleopatra/#report=0b2ed749d91277105c7af637f6344e16973be7c6
Flags: needinfo?(jmathies)
Thanks. Looks like the content process was locked up, unfortunately the profiler fails to collect data when this happens. Our tools are not helping much here. We do know what the parent thread is doing though, which helps.
Flags: needinfo?(jmathies)
Bug 1155974 may offer a clue to this.
Hey Trevor, still seeing this in latest nightly?
Flags: needinfo?(smokey101stair)
Hi Jim,

Well, I'm not really sure how to answer this. I test closing Hulu for 20 minutes and could not get the hang to reproduce, which is good.  Out of curiosity I also retested the inifinite PHP loop from comment #33 and the browser no longer locked up when closing the tab, which is great!

Now for the bad news... In the case of the inifinite PHP loop, although the tab closes now and the browser does not lock up, it now just shows the spinner of doom for the duration of the PHP script timeout (30 seconds in this case).  I did see the spinner a handful of times when closing the Hulu tabs, although its duration was no more than a few seconds.  Did the issue just get shifted and we show a spinner now rather than the browser locking up?  It's possible that the Hulu issue is now fixed and the spinner I see after closing those tabs are just Bug 1135719 and the PHP infinite loop thing is a completely different issue, but it's hard to say.

I tried to profile this, but the symbol servers for Windows are still down (Bug 1161720) so the profiler didn't produce any meaningful results.
Flags: needinfo?(smokey101stair) → needinfo?(jmathies)
(In reply to Trevor Rowbotham from comment #37)
> Hi Jim,
> 
> Well, I'm not really sure how to answer this. I test closing Hulu for 20
> minutes and could not get the hang to reproduce, which is good.  Out of
> curiosity I also retested the inifinite PHP loop from comment #33 and the
> browser no longer locked up when closing the tab, which is great!
> 
> Now for the bad news... In the case of the inifinite PHP loop, although the
> tab closes now and the browser does not lock up, it now just shows the
> spinner of doom for the duration of the PHP script timeout (30 seconds in
> this case).  I did see the spinner a handful of times when closing the Hulu
> tabs, although its duration was no more than a few seconds.  Did the issue
> just get shifted and we show a spinner now rather than the browser locking
> up?  It's possible that the Hulu issue is now fixed and the spinner I see
> after closing those tabs are just Bug 1135719 and the PHP infinite loop
> thing is a completely different issue, but it's hard to say.
> 
> I tried to profile this, but the symbol servers for Windows are still down
> (Bug 1161720) so the profiler didn't produce any meaningful results.

Can you tell me a bit about this php script? Maybe we can morph this into a bug on that which sounds kinda bad. The hang on close was clearly fixed though which is good to see.
Flags: needinfo?(jmathies)
Absolutely!  It is nothing special, it's just an infinite loop. I don't think you run into too many infinite loops out in the wild, but the infinite loop could represent a script that takes a long time to run.  I have uploaded the test script I made so you can try it[1].  The code in the script is nothing more than:

while (true) {
    echo 'Infinite loops! \o/ ';
}


[1] http://trowbotham.com/bug_tests/iloop.php

Steps:
1) Make sure the that the dom.ipc.processCount is set to the default setting of 1.
2) Open 2 tabs and in the second tab open [1].
3) Wait about 10 seconds for the page to load (You may or may not see stuff printed to the screen).
4) Close the second tab that has [1] open in it.

Actual Results:
After closing the tab, the spinner shows until the PHP script times out.  The browser chrome appears to be responsive, however, the browser may as well be hung since the spinner makes the browser useless.  If you try to exit the browser completely, it will hang until the PHP script has timed out.

Expected results:
No spinner. No browser hang when closing the browser.

I tested setting dom.ipc.processCount = 20 and I no longer get a spinner nor does the browser hang when trying to exit it completely.
Flags: needinfo?(jmathies)
Thanks for the test case. We'll take a look.
Flags: needinfo?(jmathies)
filed bug 1164189 for follow up.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.