Closed Bug 1471124 Opened 6 years ago Closed 23 days ago

frozen tab immediately after URL + enter

Categories

(Core :: DOM: Content Processes, defect, P3)

60 Branch
defect

Tracking

()

RESOLVED WORKSFORME
Performance Impact none

People

(Reporter: chris, Unassigned)

References

Details

(Keywords: hang, Whiteboard: dom-lws-bugdash-triage)

Attachments

(7 files)

Attached image firefox-tab-spinner.gif
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Build ID: 20180607193927

Steps to reproduce:

 * Open a new tab
 * Paste an address in.
 * Press enter.


Actual results:

 * Spinner appears in tab.
 * No content ever loads in page (blank page).
 * Spinner never stops or times out.
 * Right-clicking window no menu appears.

This happens about 10% of the time.

Happens on 3 different networks (two separate WiFi networks at separate locations and via tethered phone network).

Loading the same URL via Curl instantly returns (see attached gif).

Disabled all add-ons, still happens.

Disabled "Use hardware acceleration when available", still happens.


Expected results:

 * Page should load.

Version 60.0.2 (64-bit)
Ubuntu 14.04.5 LTS

Happy to attach any other information you require.

Could be related to #1463851.
Is there any way to ask Firefox what it is using as the actual Proxy server and address when "Use system proxy settings" is selected?

I think the issue might go away when I select "No proxy" instead of "Use system proxy settings" however this makes no sense as I have no system proxy settings set. I would like to know what Firefox thinks are the system proxy settings so I can verify that they are unset.
(In reply to chris@mccormick.cx from comment #0)
> Version 60.0.2 (64-bit)
> Happy to attach any other information you require.

Please check if the issue is reproducible in a brand new profile in the latest Nightly.
https://support.mozilla.com/kb/profile-manager-create-and-remove-firefox-profiles
https://www.mozilla.org/firefox/nightly/all/

> Disabled "Use hardware acceleration when available", still happens.
> Could be related to bug 1463851.

Disabling hardware acceleration also disables WebRender. Apart from that, I don't think it can be enabled in the release version in the first place: in 60.0.2 about:support shows "unavailable by runtime: Build doesn't include WebRender".

(In reply to chris@mccormick.cx from comment #1)
> Is there any way to ask Firefox what it is using as the actual Proxy server
> and address when "Use system proxy settings" is selected?

Try entering about:config?filter=proxy into the address bar and see if any settings are retrieved and stored there.
If not, maybe try about:networking#logging and see if it offers any clues as to what's going on.
https://developer.mozilla.org/docs/Mozilla/Debugging/HTTP_logging
Component: Untriaged → Networking
Product: Firefox → Core
Thanks for the help!

I checked the about:networking tab and it is filled with entries for a bunch of sites that are definitely not open in my browser. some of them I haven't used for at least a week. Is that normal?

I managed to capture a log file when I tried to open a page: https://djangogigs.com/gigs/2111/

I will attach the log file here now.

One thing I notice is there are a lot of entries for sites I am not currently browsing. Is that normal?

Like this:

2018-06-27 03:25:40.752017 UTC - [28710:Socket Thread]: V/nsHttp nsHttpConnectionMgr::OnMsgPruneDeadConnections
2018-06-27 03:25:40.752047 UTC - [28710:Socket Thread]: V/nsHttp   pruning [ci=.S.....[tlsflags0x00000000]askubuntu.com:443]
2018-06-27 03:25:40.752055 UTC - [28710:Socket Thread]: V/nsHttp   pruning [ci=.S.....[tlsflags0x00000000]www.facebook.com:443]
...

I am not even a Facebook user so I don't know why that connection should exist.

And later in the log file a lot of these:

2018-06-27 03:25:41.754390 UTC - [28710:Socket Thread]: V/nsHttp nsHttpConnectionMgr::ProcessPendingQForEntry [ci=.S.....[tlsflags0x00000000]www.facebook.com:443 ent=0x7f4a28157eb0 active=0 idle=0 urgent-start-queue=0 queued=0]
2018-06-27 03:25:41.754394 UTC - [28710:Socket Thread]: V/nsHttp urgent queue [
2018-06-27 03:25:41.754398 UTC - [28710:Socket Thread]: V/nsHttp ] active urgent conns [
2018-06-27 03:25:41.754401 UTC - [28710:Socket Thread]: V/nsHttp ] active regular conns [
2018-06-27 03:25:41.754405 UTC - [28710:Socket Thread]: V/nsHttp ] idle urgent conns [
2018-06-27 03:25:41.754409 UTC - [28710:Socket Thread]: V/nsHttp ] idle regular conns [
2018-06-27 03:25:41.754413 UTC - [28710:Socket Thread]: V/nsHttp ]
Log was started before tab was opened. Tab opened. Infinite spinner for some seconds. Log stopped.
(In reply to chris@mccormick.cx from comment #3)
> I checked the about:networking tab and it is filled with entries for a bunch
> of sites that are definitely not open in my browser. some of them I haven't
> used for at least a week. Is that normal?

If the HTTP category lists the entries with Active:0 and Idle:0 then you're not connected to those sites. I'm also seeing entries that I haven't visited since startup. I don't know why about:networking works this way.

> I will attach the log file here now.

Thank you for making the effort. It'll probably be helpful once a developer has a look at this.

> I am not even a Facebook user so I don't know why that connection should
> exist.

It must've been loaded as third-party content, either as integration with another app or a "Like" button.
(In reply to chris@mccormick.cx from comment #4)
> Created attachment 8988051 [details]
> Network log during infinite spinner tab.
> 
> Log was started before tab was opened. Tab opened. Infinite spinner for some
> seconds. Log stopped.

Thanks for the log.  I don't see any channels created, hence this is not a networking issue, likely.  Could you please reproduce with full logging according [1] so we have child process logs as well?  Maybe it will tell us something.  Please state URLs that you managed to reproduce the issue with to look them up in the log.


Perfect would be to change the list of log modules to:

-MOZ_LOG=timestamp,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5,LoadGroup:5,DocumentLeak:5,PresShell:5,DocLoader:5,nsDocShellLeak:5,nsChannelClassifier:5


Thanks!

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Start_logging_using_command_line_arguments
Flags: needinfo?(chris)
Just a quick update to say I tried this with a freshly generated profile and was still seeing the same behaviour.

I will try the logging above and post it here.
Flags: needinfo?(chris)
Flags: needinfo?(chris)
(In reply to chris@mccormick.cx from comment #7)
> Just a quick update to say I tried this with a freshly generated profile and
> was still seeing the same behaviour.
> 
> I will try the logging above and post it here.

Please do provide the logs
Whiteboard: [closeme 2018-10-01]
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Whiteboard: [closeme 2018-10-01]
Hi,

It took me several tries but I've managed to capture a log file with a tab failing to load.

I used this command line:

firefox -MOZ_LOG=timestamp,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5,LoadGroup:5,DocumentLeak:5,PresShell:5,DocLoader:5,nsDocShellLeak:5,nsChannelClassifier:5 > firefox.log.txt 2>&1

The URL which failed to load was this:

https://duckduckgo.com/?q=squeeze+backports&t=canonical

I had several other tabs open at the time and the log is 201Mb (21Mb gzipped). Is it safe for me to upload the logfile here or will it expose me to hacks?

I noticed when I ran this in debug mode it showed a hanging name resolution to maps.yandex.ru. In the log when I search for "yandex" I see this:

2018-09-12 01:19:58.238107 UTC - [27922:Main Thread]: V/nsHttp host=vec01.maps.yandex.net port=-1

That "port=-1" looks strange to me but maybe it's not.
Status: RESOLVED → VERIFIED
Flags: needinfo?(chris)
Status: VERIFIED → RESOLVED
Closed: 6 years ago6 years ago
(In reply to chris@mccormick.cx from comment #9)
> I had several other tabs open at the time and the log is 201Mb (21Mb
> gzipped). Is it safe for me to upload the logfile here or will it expose me
> to hacks?

Thanks for capturing the log!

If the log is captured from your regular browsing profile then it's better to send it to me directly.  Maybe use https://send.firefox.com/, it generates a unique untracked link, you can also add a password.  Then just send the link+password to my bugzilla email.  Thank you.

Reopening the bug.
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: INCOMPLETE → ---
Chris, thanks for the log again.  However, I can't find anything wrong on the networking and request grouping level.  I don't see a request even made for that URL.  I also don't see any load group that would be hanging in general (regardless the failing URL.)

Could you please describe in more detail (the more the better) what exactly have you done and how exactly and when did the spinner start to spin?
Based on my findings from the log referred in comment 9 I don't think this is a networking bug.
Component: Networking → General
Product: Core → Firefox
Chris, it may also be interesting to look into the browser console (Ctrl-J) after you encountered the problem and maybe copy-paste the content here (may contain URLs)
@mayhemer yes you're quite right there don't seem to be any network requests going out to that URL as far as I can tell. When I run this command on the log:

zgrep "squeeze+mirror" firefox.log.txt.gz

The only hits I get are for that URL are the `page-icon` requests - see squeeze-mirror.txt above which is the result of grepping that log.

So this seems to indicate that Firefox's networking layer never sees a request to load that page. Something is going wrong higher up maybe in the UI?

The thing is that I definitely typed that into the URL bar and hit enter. When it did not start loading I hit enter several more times to no avail. The only thing that works is opening a new tab and pasting the same URL and hitting enter again. Sometimes I have to do this several times before one of the tabs "takes" and starts loading the page. Note that I have discovered that there isn't always a loading spinner. Sometimes the tab is just "frozen" in a non-loading, blank state.

You suggested "Ctrl-J" but unfortunately I am not able to open a console once a tab has entered the state described.

Very puzzling!
Have just had one of these frozen-URL situations (I had right-clicked "open in a new tab" from duckduckgo search) and I can confirm the following:

 * Putting my cursor in the URL bar and hitting "enter" does nothing.
 * Ctrl-J does nothing.
 * Back, forward, and reload button icons are greyed out.
 * Hitting the search magnifying glass on the left side of the URL bar does nothing.

The only thing that works is selecting the URL, hitting Ctrl-C to copy, pressing Ctrl-T to open a new tab, pasting the URL, and then hitting enter. After that the page loads correctly and the only thing I can do with the stuck tab is close it.
Thanks for all the details.

I have to apologize - the hot key to open the browser console is Ctrl-Shift-J.  Still, if Ctrl-J doesn't do anything either, then there is something wrong as that should open the Library window (downloads, bookmarks etc)
Summary: Infinite spinner in loading tab → frozen tab immediately after URL + enter
Attached image screenshot.png
Good news - I've managed to capture a log in an empty profile with the first tab I opened exhibiting the "freeze" behaviour. In the screenshot you can see the second tab says "New Tab" this actually has the url "https://duckduckgo.com/?q=hello&t=canonical" entered into the URL bar and enter has been hit (but after this nothing happens and the tab does not respond to right-click). I then tried the exact same URL in the next tab and as you can see this one has worked (the title is set in the screenshot).
I've also attached a full log of that session in "clean profile with frozen tab" above.
(In reply to Honza Bambas (:mayhemer) from comment #12)
> Based on my findings from the log referred in comment 9 I don't think this
> is a networking bug.

Moving to Firefox::Untriaged for further investigation. Please only use Firefox::General for bugs where it's known there is no more appropriate component.
Component: General → Untriaged
Setting Tabbed Browser as component in order to involve the development team.
Component: Untriaged → Tabbed Browser
(In reply to Honza Bambas (:mayhemer) from comment #17)
> Thanks for all the details.
> 
> I have to apologize - the hot key to open the browser console is
> Ctrl-Shift-J.  Still, if Ctrl-J doesn't do anything either, then there is
> something wrong as that should open the Library window (downloads, bookmarks
> etc)

Ctrl-J is supposed to focus the location bar on Linux.

> You suggested "Ctrl-J" but unfortunately I am not able to open a console
> once a tab has entered the state described.

Please open the console before you get into that state.

Back to Untriaged until we know more.
Component: Tabbed Browser → Untriaged
As stated above the console will not open with Ctrl-Shift-J and I also cannot right click when the tab is in this state.
You're saying you're unable to open the console *before* you get into the failure state?
No. I will see if I can replicate with the console open first.
Hello. I've managed to catch the frozen tab state after opening a JS console. The tab's loading spinner is loading and the JS console appears completely empty and frozen (does not respond to any input).
Keywords: hang
Hi, I've discovered that the tab is actually frozen before I even enter anything in the URL bar.

To replicate:

 * Ctrl-T to open a new tab.
 * Right-click in the empty document.

Some percentage of the time I can tell the tab is frozen because the context menu will not come up when right clicking. I've tried leaving these frozen tabs open for a while and they never recover. Another weird detail is that the frozen tabs have a slightly different background color to the non-frozen tabs: #F9F9FA instead of the usual #FFFFFF (I verified this with an independent color picker after I noticed the slightly off shade).

I'm very confused about what is going on here.
Attached image every fourth tab frozen
I did some more testing on a completely empty profile and I don't get the above behaviour with frozen tabs from the start but after I type in a random search *exactly* every fourth tab freezes with an infinite spinner upon hitting "enter" to begin the search.
(In reply to chris@mccormick.cx from comment #31)
> Created attachment 9019571 [details]
> every fourth tab frozen

This suggests to me that one of the content processes has hung, but the other three are okay.

What might be useful is to intentionally crash that hung content process, generate a crash report, and then post a link to it here.

Here's what you can do to crash the hung content process:

1) Switch to a non-frozen tab
2) Open the Developer Toolbox (Ctrl-Shift-K)
3) Press F1 to open the Developer Toolbox settings
4) Scroll down the settings, and ensure "Enable browser chrome and add-on debugging toolboxes" and "Enable remote debugging" are checked
5) Close the Developer Toolbox
6) Open the Browser Console (Ctrl-Shift-J). I'm hoping this works with a non-frozen tab selected. The console should come up, and there should be a text input at the bottom of it.
7) Switch back to your Firefox window, and switch to a frozen tab.
8) Switch back to the Browser Console, and type:

gBrowser.selectedBrowser.frameLoader.tabParent.osPid

and then press enter. This should hopefully spit out a number, which is the pid of the hung content process.

9) Open a Terminal, and then type in:

kill -SIGABRT 123456

Where 123456 should be the pid that you got from step 8.

Hopefully this causes the tab to crash.

10) Ensure the checkbox for "Send an automated crash report so we can fix issues like this" is checked, and then choose either to Close Tab or Restore This Tab to send it off.

11) Open a new tab, and go to about:crashes
12) Find the most recent submitted crash report in the list - it should have an associated date close to the current time, since that's the time of the crash. Click on that link.
13) Copy the URL in the URL bar for where it sent you, and paste it here.


I know that's a lot of steps, but I _think_ that'll give us a clearer sense of where we're hanging in the content process. Let me know if you have any questions!
Flags: needinfo?(chris)
Awesome, thanks for these instructions. So here's what I did: once I hit a frozen tab I did your trick with getting the osPid and then killing it from the command line.

Result:

 * Automated crash report did not fire.
 * Instead the tab URL changed to the same URL as the tab immediately to the right.
 * The tab remained crashed.
 * Verified the process is actually dead (tried to kill again and got an error about no such process).
 * Saw a bunch of errors in the console:


gBrowser.selectedBrowser.frameLoader.tabParent.osPid
16480


!!! Here is where I issued kill -SIGABRT 16480 in my terminal !!!

[Show/hide message details.] this.frameLoader.tabParent is null remote-browser.xml:209
[Show/hide message details.] NS_ERROR_NOT_INITIALIZED: remote-browser.xml:301
[Show/hide message details.] TelemetryStopwatch: key "FX_TAB_SWITCH_UPDATE_MS" was already initialized TelemetryStopwatch.jsm:352
[Show/hide message details.] this.frameLoader.tabParent is null remote-browser.xml:209
[Show/hide message details.] TelemetryStopwatch: requesting elapsed time for nonexisting stopwatch. Histogram: "FX_TAB_SWITCH_UPDATE_MS", key: "null" TelemetryStopwatch.jsm:373
[Show/hide message details.] NS_ERROR_NOT_INITIALIZED: remote-browser.xml:301
[Show/hide message details.] TelemetryStopwatch: key "FX_TAB_SWITCH_UPDATE_MS" was already initialized TelemetryStopwatch.jsm:352
[Show/hide message details.] this.frameLoader.tabParent is null remote-browser.xml:209
[Show/hide message details.] TelemetryStopwatch: requesting elapsed time for nonexisting stopwatch. Histogram: "FX_TAB_SWITCH_UPDATE_MS", key: "null" TelemetryStopwatch.jsm:373

!!! Here I tried to get the osPid of the tab again since it still exists !!!

gBrowser.selectedBrowser.frameLoader.tabParent.osPid
TypeError: gBrowser.selectedBrowser.frameLoader.tabParent is null[Learn More] debugger eval code:2:1
[Show/hide message details.] NS_ERROR_NOT_INITIALIZED: remote-browser.xml:301
NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIInterfaceRequestor.getInterface]
network-monitor.js:303
[Show/hide message details.] TelemetryStopwatch: key "FX_TAB_SWITCH_UPDATE_MS" was already initialized TelemetryStopwatch.jsm:352
[Show/hide message details.] TelemetryStopwatch: requesting elapsed time for nonexisting stopwatch. Histogram: "FX_TAB_SWITCH_UPDATE_MS", key: "null" TelemetryStopwatch.jsm:373
[Show/hide message details.] this.frameLoader.tabParent is null remote-browser.xml:209
[Show/hide message details.] NS_ERROR_NOT_INITIALIZED: remote-browser.xml:301
[Show/hide message details.] TelemetryStopwatch: key "FX_TAB_SWITCH_UPDATE_MS" was already initialized TelemetryStopwatch.jsm:352
[Show/hide message details.] this.frameLoader.tabParent is null remote-browser.xml:209
[Show/hide message details.] TelemetryStopwatch: requesting elapsed time for nonexisting stopwatch. Histogram: "FX_TAB_SWITCH_UPDATE_MS", key: "null" TelemetryStopwatch.jsm:373
[Show/hide message details.] NS_ERROR_NOT_INITIALIZED: remote-browser.xml:301
[Show/hide message details.] TelemetryStopwatch: key "FX_TAB_SWITCH_UPDATE_MS" was already initialized TelemetryStopwatch.jsm:352
[Show/hide message details.] this.frameLoader.tabParent is null remote-browser.xml:209
[Show/hide message details.] TelemetryStopwatch: requesting elapsed time for nonexisting stopwatch. Histogram: "FX_TAB_SWITCH_UPDATE_MS", key: "null" TelemetryStopwatch.jsm:373
Flags: needinfo?(chris)
(In reply to chris@mccormick.cx from comment #34)
> !!! Here is where I issued kill -SIGABRT 16480 in my terminal !!!
> 
> [Show/hide message details.] this.frameLoader.tabParent is null
> remote-browser.xml:209

Tangential question: Should we null-check this.frameLoader.tabParent when accessing this.frameLoader.tabParent.* in browser.xml to handle this kind of failure more gracefully? E.g. it seems reasonable for docShellIsActive and hasContentOpener to return false in that case.
Hi, I tried reproducing this issue in the latest version of Nightly 65.0a1 (2018-10-25) but without any success, Can someone help us set a component for this issue it's coming up in our Triage list.
(In reply to Rares Doghi from comment #36)
> Hi, I tried reproducing this issue in the latest version of Nightly 65.0a1
> (2018-10-25) but without any success, Can someone help us set a component
> for this issue it's coming up in our Triage list.

We're still in the process of figuring this out.

Core::DOM: Content Processes might be a good start, but depending on what's really going on here the bug might end up somewhere else.
Component: Untriaged → DOM: Content Processes
Product: Firefox → Core
Hi chris,

Thanks for running those steps. So there's nothing listed in about:crashes around the time that you killed the process?
Flags: needinfo?(chris)
(In reply to Dão Gottwald [::dao] from comment #35)
> Tangential question: Should we null-check this.frameLoader.tabParent when
> accessing this.frameLoader.tabParent.* in browser.xml to handle this kind of
> failure more gracefully? E.g. it seems reasonable for docShellIsActive and
> hasContentOpener to return false in that case.

Ideally, yes. Historically, our handling of crashed browser tabs in the front-end has been scattershot at best.
A new FF binary came in via apt update and now when I kill the hung tab on the terminal I get the "Send automated crash report" interface. \o/

https://crash-stats.mozilla.com/report/index/59a74d9c-0f5a-4e29-b4ac-50e1a0181026
Flags: needinfo?(chris)
Argh, so it seems as if we don't have symbols for Ubuntu builds. :(

We do, however, have symbols for our own builds. Are you able to reproduce the issue using a Mozilla-built version of Firefox?

Here's a Linux 64-bit build: https://download.mozilla.org/?product=firefox-latest-ssl&os=linux64&lang=en-US

If so, can you please (sorry) repeat the crashing experiment to get another crash report? Hopefully, that one will have symbols.
Flags: needinfo?(chris)

I finally got around to trying to reproduce this with the Linux 64-bit build you linked.

Unfortunately when I now get the stuck tab it does not trigger a crash report. However I managed a couple of times to reproduce it with only one or two other tabs open - once when the only other tab was this bug report URL itself. I killed the whole firefox instance triggering a crash report for Firefox (not just the tab) and have submitted those. I'm not sure where they go but I included a link back to this bug.

Something weird I have noticed is that there occasionally appear in my process list two firefox processes that don't seem to be associated with any tab. For example when I do ps auxwww | grep firefox I can see the tabs have processes that look like this:

chrism 336 0.1 0.9 1433452 78892 pts/13 Sl+ 14:03 0:00 /home/chrism/bin/contrib/firefox/firefox-bin -contentproc -childID 4 -isForBrowser -prefsLen 5819 -prefMapSize 182236 -schedulerPrefs 0001,2 -parentBuildID 20190211233335 -greomni /home/chrism/bin/contrib/firefox/omni.ja -appomni /home/chrism/bin/contrib/firefox/browser/omni.ja -appdir /home/chrism/bin/contrib/firefox/browser 32581 true tab

The two processes that I a not sure about sometimes appear after a while of Firefox running and they look like this in the same ps list:

chrism 31786 0.0 1.7 1983992 143604 pts/0 S 13:58 0:00 ./bin/contrib/firefox/firefox
chrism 31787 0.0 1.7 1983992 143604 pts/0 S 13:58 0:00 ./bin/contrib/firefox/firefox

So no arguments basically. If I run strace on these they always look like this:

$ strace -p 31786
Process 31786 attached
futex(0x7f0007e00018, FUTEX_WAIT_PRIVATE, 2, NULL

$ strace -p 31787
Process 31787 attached
read(147,

To me these look like hung processes. Is this normal?

Flags: needinfo?(chris) → needinfo?(mconley)

Note also:

  • After I exit firefox those two processes appear to hang around afterwards. I have to kill them individually on the command line.

  • I was no longer able to use the gBrowser.selectedBrowser.frameLoader.tabParent.osPid as the variable gBrowser does not seem to be defined any longer. So I may have been killing the wrong thing which is why I couldn't get a crash dump for the tab. Is there some other way I can get the osPid of the tab now?

I tried something new. When I got one of the hung tabs (newly opened) I checked ps auxwww | grep firefox and saw the two stuck processes as outlined above. So I killed those two processes and the stuck tab turned into a "Restart Required" tab:

Sorry. We just need to do one small thing to keep going.

We have just installed an update in the background. Click Restart Firefox to finish applying it.

We will restore all your pages, windows and tabs afterwards, so you can be on your way quickly.

Ok, after further testing I can confirm that these two "stuck" processes (with the same strace results as above) occur every time I get one of these frozen tabs.

One thing I notice is these stuck processes seem to have the same small set of file descriptors open:

$ ls -alh /proc/16099/fd/
total 0
dr-x------ 2 chrism chrism 0 Feb 21 16:18 .
dr-xr-xr-x 9 chrism chrism 0 Feb 21 16:18 ..
lrwx------ 1 chrism chrism 64 Feb 21 16:18 0 -> /dev/pts/0
lrwx------ 1 chrism chrism 64 Feb 21 16:18 1 -> /dev/pts/0
lrwx------ 1 chrism chrism 64 Feb 21 16:18 2 -> /dev/pts/0
lrwx------ 1 chrism chrism 64 Feb 21 16:18 3 -> socket:[3067794]
lrwx------ 1 chrism chrism 64 Feb 21 16:18 4 -> socket:[3058522]
lrwx------ 1 chrism chrism 64 Feb 21 16:18 5 -> socket:[3058525]
lrwx------ 1 chrism chrism 64 Feb 21 16:18 6 -> socket:[3072084]
l-wx------ 1 chrism chrism 64 Feb 21 16:18 7 -> pipe:[3067795]
lrwx------ 1 chrism chrism 64 Feb 21 16:18 8 -> /run/shm/org.mozilla.ipc.15344.247 (deleted)
lr-x------ 1 chrism chrism 64 Feb 21 16:18 9 -> /run/shm/org.chromium.SQ6y4J (deleted)

Why would there be a file descriptor open to Chromium's shared memory interface?

Navigator.userAgent: Mozilla/5.0 (X11; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0

Ok, so when I kill those two hung processes I see the following in the terminal window where I ran firefox:

[Parent 16561, Gecko_IOThread] WARNING: pipe error: Broken pipe: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 718
[Parent 16561, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 16561, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 16561, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 16561, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 16561, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 16561, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 16561, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 16561, Gecko_IOThread] WARNING: pipe error (143): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 349
[Parent 16561, Gecko_IOThread] WARNING: pipe error (174): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 349

###!!! [Parent][MessageChannel] Error: (msgtype=0x1E0086,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv

[Parent 16561, Gecko_IOThread] WARNING: pipe error (176): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 349
[Parent 16561, Gecko_IOThread] WARNING: pipe error (178): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 349
[Parent 16561, Gecko_IOThread] WARNING: pipe error (188): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 349

What is going on with this Chromium IPC?

After reading around a bit I found that I can completely get rid of the frozen tabs issue by changing the config setting browser.tabs.remote.autostart.2 to false. What is the default in stable FF?

No I'm wrong, sorry. Taht didn't get rid of the frozen tabs.

(In reply to chris@mccormick.cx from comment #45)

One thing I notice is these stuck processes seem to have the same small set of file descriptors open:

$ ls -alh /proc/16099/fd/
total 0
dr-x------ 2 chrism chrism 0 Feb 21 16:18 .
dr-xr-xr-x 9 chrism chrism 0 Feb 21 16:18 ..
lrwx------ 1 chrism chrism 64 Feb 21 16:18 0 -> /dev/pts/0
lrwx------ 1 chrism chrism 64 Feb 21 16:18 1 -> /dev/pts/0
lrwx------ 1 chrism chrism 64 Feb 21 16:18 2 -> /dev/pts/0
lrwx------ 1 chrism chrism 64 Feb 21 16:18 3 -> socket:[3067794]
lrwx------ 1 chrism chrism 64 Feb 21 16:18 4 -> socket:[3058522]
lrwx------ 1 chrism chrism 64 Feb 21 16:18 5 -> socket:[3058525]
lrwx------ 1 chrism chrism 64 Feb 21 16:18 6 -> socket:[3072084]
l-wx------ 1 chrism chrism 64 Feb 21 16:18 7 -> pipe:[3067795]
lrwx------ 1 chrism chrism 64 Feb 21 16:18 8 -> /run/shm/org.mozilla.ipc.15344.247 (deleted)
lr-x------ 1 chrism chrism 64 Feb 21 16:18 9 -> /run/shm/org.chromium.SQ6y4J (deleted)

Why would there be a file descriptor open to Chromium's shared memory interface?

Legacy code, still used for some things, and to be removed in bug 1479960; Firefox's IPC was originally a fork of Chromium's. (As for the pipe error messages, see bug 1428509.)

This looks like the process got stuck very early, before handling the initial pref data or initializing IPC (if nothing else there'd be an epoll descriptor for that).

(In reply to chris@mccormick.cx from comment #42)

chrism 31786 0.0 1.7 1983992 143604 pts/0 S 13:58 0:00 ./bin/contrib/firefox/firefox
chrism 31787 0.0 1.7 1983992 143604 pts/0 S 13:58 0:00 ./bin/contrib/firefox/firefox

So no arguments basically. If I run strace on these they always look like this:

$ strace -p 31786
Process 31786 attached
futex(0x7f0007e00018, FUTEX_WAIT_PRIVATE, 2, NULL

$ strace -p 31787
Process 31787 attached
read(147,

Process 31787 might be a sandbox chroot helper, which would explain why it's blocked in read and why it has the parent's argv. If so, it should have a smaller set of fds than the list in comment #45: just stdin/out/err and the socket it's reading.

Continuing with that assumption, then process 31786 would be the actual child process and, since it also has the parent's argv, it got stuck before exec. In futex, which would be consistent with trying to take a lock that was held by another thread in the parent process during the clone, for example by calling malloc. In theory we got rid of anything that would do that (see, e.g., bug 1480401), but maybe not.

The fd list in comment #45 suggests that it finished setting up its fds and is preparing to call execve. I'm suspicious of this call to std::string::c_str; it looks like GNU libstdc++ built with debug assertions could try to take a lock to dynamically check iterator invalidation, and it might be a global lock.

It would be useful to get a stack trace from the locked-up child process, but our crash reporter won't work for that; it would be necessary to get full debug symbols and attach a debugger.

Can you try to reproduce the bug with this build? (Link is directly to a .tar.bz2 from this Try run.) I'll probably want to land that patch anyway (with a better commit message), because it should be harmless and it's probably a bug, but it would be nice to get evidence on whether it's this bug.

Flags: needinfo?(mconley) → needinfo?(chris)

Thanks for the test build. I thought this build had fixed the issue but I ran into it again after about an hour of browsing.

It appears to be at a later point in the process of opening a tab and then visiting a URL, and again is intermittent. Most tabs open fine and work fine.

What I am now seeing is that the tab opens fine and I can right-click on the blank page and the menu pops up (this is different than before) however once I type a URL:

  • I see the loading spinner but the page never loads.
  • I can't right-click on the empty page any more and it appears frozen like before.
  • The 2 weird firefox processes are back with the FUTEX_WAIT problem I described above.
  • When I kill them the tab says "Restart Required" and I see the same IPC errors on the console the moment it dies as above.
  • The following is the list of open file descriptors for the hung processes:

chrism@diziet:~$ ls -alh /proc/8074/fd/*
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8074/fd/0 -> /dev/pts/13
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8074/fd/1 -> /dev/pts/13
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8074/fd/2 -> /dev/pts/13
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8074/fd/3 -> socket:[3477049]
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8074/fd/4 -> socket:[3401113]
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8074/fd/5 -> socket:[3401116]
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8074/fd/6 -> socket:[3477922]
l-wx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8074/fd/7 -> pipe:[3477050]
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8074/fd/8 -> /run/shm/org.mozilla.ipc.4522.1808 (deleted)
lr-x------ 1 chrism chrism 64 Feb 22 14:11 /proc/8074/fd/9 -> /run/shm/org.chromium.ZNOYhq (deleted)

chrism@diziet:~$ ls -alh /proc/8075/fd/*
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8075/fd/0 -> /dev/pts/13
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8075/fd/1 -> /dev/pts/13
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8075/fd/102 -> socket:[3477923]
lrwx------ 1 chrism chrism 64 Feb 22 14:11 /proc/8075/fd/2 -> /dev/pts/13

Happy to do more testing! If I have to break out gdb I can do that too.

Flags: needinfo?(chris)

Just incase you need the line numbers here is the terminal output when I kill the frozen tabs now:

[Parent 4522, Gecko_IOThread] WARNING: pipe error: Broken pipe: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 726
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: pipe error (100): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
[Parent 4522, Gecko_IOThread] WARNING: pipe error (102): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
[Parent 4522, Gecko_IOThread] WARNING: pipe error (104): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
[Parent 4522, Gecko_IOThread] WARNING: pipe error (106): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
[Parent 4522, Gecko_IOThread] WARNING: pipe error (216): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
[Parent 4522, Gecko_IOThread] WARNING: pipe error: Broken pipe: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 726
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 23
[Parent 4522, Gecko_IOThread] WARNING: pipe error (219): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
[Parent 4522, Gecko_IOThread] WARNING: pipe error (224): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
[Parent 4522, Gecko_IOThread] WARNING: pipe error (231): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
[Parent 4522, Gecko_IOThread] WARNING: pipe error (236): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357
[Parent 4522, Gecko_IOThread] WARNING: pipe error (267): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357

Priority: -- → P3
Whiteboard: qf?

Hello, I have an update on this bug. I upgraded to 68.0.1 (64-bit) yesterday and I am now getting intermittent frozen tabs even with the browser.tabs.remote.autostart.2 setting set to false. The definition of "frozen tab" here is a tab which after opening will not load the URL, and right-clicking the main area of the browser window does not show a context menu.

Since opening new tabs is such a frequent pattern I am finding this quite frustrating. Happy to persevere and help find the source of this bug if it will mean I can keep using Firefox, which I love. Please let me know how I can help track down this bug.

Whiteboard: qf? → [qf?]
Whiteboard: [qf?] → [qf]

Performance triage team reviewed this bug and determined that it's not a performance bug and most likely a functional issue. Move this out of core:performance.

Component: DOM: Content Processes → General
Whiteboard: [qf] → [qf-]

(In reply to Jean Gong :jgong from comment #56)

Move this out of core:performance.

This wasn't in Core :: Performance. Moving back to Core :: DOM: Content Processes.

Component: General → DOM: Content Processes
Performance Impact: --- → -
Whiteboard: [qf-]
Severity: normal → S3

Is this something you're still seeing? (Our team has taken over this component and we are triaging older bugs and we don't want to just close the bug due to its age.)

Flags: needinfo?(chris)
Whiteboard: dom-lws-bugdash-triage

No, I don't see this problem any more in my current version of Ubuntu with Firefox. I haven't seen it for years but I don't remember when it went away or why. Probably after a Firefox update. You can close this, thanks for following up!

Flags: needinfo?(chris)

Thanks!

Status: REOPENED → RESOLVED
Closed: 6 years ago23 days ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: