Firefox becomes sluggish for a while after closing tab with Google cached Twitter page

RESOLVED FIXED in mozilla21

Status

()

Core
Networking: HTTP
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: Torbjörn Andersson, Assigned: mcmanus)

Tracking

(Blocks: 1 bug, {hang, regression})

18 Branch
mozilla21
x86
All
hang, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox18-, firefox19-, firefox20-)

Details

(Whiteboard: [spdy], URL)

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

6 years ago
User Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.11) Gecko/20100101 Firefox/10.0.11 Iceweasel/10.0.11
Build ID: 20121121003804

Steps to reproduce:

(I've also submitted this using the Firefox feedback mechanism a few days ago, but since space there is limited, and I've done some further testing now, I figured a proper Bugzilla report would be better. I hope that's ok.)

Using Firefox nightly (is that "20 branch" or "trunk"?), I opened the Google cache version of a Twitter page, e.g.

http://webcache.googleusercontent.com/search?q=cache:NDvdVnqCJngJ:https://twitter.com/mozilla+mozilla+twitter&cd=1&hl=sv&ct=clnk&gl=se

in a new tab. I then closed that tab.


Actual results:

Firefox became really sluggish for a while. Switching between two tabs, which was pretty much instantaneous before I closed the tab, now takes a noticeable amount of time. Sometimes several seconds. Oddly enough, I couldn't see that Firefox was using any undue amounts of CPU or memory at the time.

Eventually things settled down and went back to normal. At least most of the time. There was at least once where I waited a couple of minutes and it was still sluggish.

I have observed this with Firefox Nightly on my Debian Linux box, but also with Firefox Beta (18) under Windows 7.

I've tried restarting Firefox with add-ons enabled, but that didn't make any difference. With Windows Firefox 18, I even tried resetting my profile, to no avail.


Expected results:

I can understand if Firefox momentarily gets a bit bogged down when a tab is closed, but not like this. I would have assumed Firefox would go back to its old snappy self in seconds at most, not minutes.
Confirmed against Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20121207 Firefox/20.0 ID:20121207030741.

Failed to get a proper WinDbg Log/SPS Profile :-/.
Status: UNCONFIRMED → NEW
Component: Untriaged → General
Ever confirmed: true
OS: Linux → All
Product: Firefox → Core
Last good nightly: 2012-10-06
First bad nightly: 2012-10-07

Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2da1f2bde40e&tochange=ecd4c4304219

my STR:
* open given Link
* open a new Tab
* close first Tab with given Link

=> trying to open a new Tab using the +-Button or e.g. clicking on the Download Toolbar Button fails
Keywords: regression, regressionwindow-wanted
Version: Trunk → 18 Branch

Comment 3

6 years ago
Regression window of STR in comment#2 is as follows,

Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/b47091bd3144
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/18.0 Firefox/18.0 ID:20121006065933
Bad:
http://hg.mozilla.org/mozilla-central/rev/9f677c2bb33d
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/18.0 Firefox/18.0 ID:20121006134717
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b47091bd3144&tochange=9f677c2bb33d


Regression window(m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/56cf71c15ef3
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/18.0 Firefox/18.0 ID:20121005131335
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/b0431ae9cad5
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/18.0 Firefox/18.0 ID:20121005131934
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=56cf71c15ef3&tochange=b0431ae9cad5

Suspected : bug 798426 and bug 798243


Setting network.http.spdy.enabled = false helps.
Blocks: 798243, 798426
tracking-firefox18: --- → ?
tracking-firefox19: --- → ?
tracking-firefox20: --- → ?
Component: General → Networking: HTTP
Keywords: regressionwindow-wanted

Comment 4

6 years ago
Browser become almost hangs,  toolbar button, Location bar and Search bar do not response
Severity: normal → critical
Keywords: hang
(Assignee)

Comment 5

6 years ago
three odd things here.. I bet there are at least two bugs:

1] we send a SPDY request for that resource which the google server does not send a syn-reply for.. but it does send a GOAWAY frame with a last-good id of 1 and then sits there with the connection open.. it shouldn't do that according to spec last-good is "The last stream id which was replied to (with either a SYN_REPLY or RST_STREAM)" and stream 1 hasn't been replied to. So I think that is a google bug. The purpose of that is to create a soft shutdown of the session (no more streams will be admitted, but stream 1 can finish up) - so we wait for stream 1 to finish up. But no data ever comes. After a long wait a bunch of timers go off and we call the XHR onStopRequest() and things sort themselves out. I'll work with google to see what's going on there on the server side.

2] we don't need to take the server's word for the last good stream id.. we know a syn-reply hasn't been received for stream 1. so we can enhance our error handling.

3] XHR is doing a nested event loop spin.. its not clear to me why it is causing the weird UI behavior while waiting for OnStopRequest.. even if everything were working better than this that event could be a while in the making just due to slow servers or networks.. maybe bz would be interested in the STR independent of the spdy stuff. (awesome STR, btw - thanks!)

-Pat
(Assignee)

Comment 6

6 years ago
(In reply to Patrick McManus [:mcmanus] from comment #5)

> to. So I think that is a google bug. 

its actually a twitter.com bug if I'm right (not google). the spdysession that misbehaves is an options from the CORS to twitter.com on behalf of XHR on the google.com page.

I know some twitter folks too - so I'll ping them.

We definitely can improve our handling in the face of a non clean goaway however.

#3 in comment 5 is still the most interesting part of this that I don't grok yet.
> 3] XHR is doing a nested event loop spin..

That means it's a synchronous XHR.  Yes, the fact that such things exist is bad, but they're widely used on the web.

That said, simply spinning a nested event loop shouldn't hose the UI.  I would indeed be interesting in STR for that that don't involve spdy if you have them!
(Assignee)

Comment 8

6 years ago
(In reply to Boris Zbarsky (:bz) from comment #7)
> > 3] XHR is doing a nested event loop spin..
> 
> That means it's a synchronous XHR.  Yes, the fact that such things exist is
> bad, but they're widely used on the web.
> 
> That said, simply spinning a nested event loop shouldn't hose the UI.  I
> would indeed be interesting in STR for that that don't involve spdy if you
> have them!

yes, synchronous xhr.

I don't have a non spdy case handy - but I think it it is pretty beside the point here.. the right events fire from the http channel - they just take so long because of the spdy issue but nothing is happening that couldn't also be caused by a very slow (or buggy) http/1 server. (turning off spdy makes it appear to work just beacuse the http/1 implementation of the server isn't bug and completes quickly.)

you can easily break it in gdb and see the nested loop working, but the ui is still hosed.. I wonder if something important lower in the nested stack is being prevented from completing?
Olli, any ideas?
(In reply to Boris Zbarsky (:bz) from comment #9)
> Olli, any ideas?

I thought it was on purpose to pass just some of the main thread events while looping inside sync XHR.  This behavior is very well known.
(Assignee)

Comment 11

6 years ago
For Tracking-Drivers: I wouldn't track this bug or backport fixes for it.. This is only significant in impact because of a server side problem, and so far doesn't apply to a broad set of cases.

We can fix the interaction with the server-side bug on the normal trains and then the sync xhr folks can separately figure out if some kind of snappy enhancement is needed as a feature.
(Assignee)

Comment 12

6 years ago
I've sent some mail to twitter.. basically I expect they should either be resetting the stream they don't like (99% this) or sending the goaway with last-stream == 0 instead of 1.

I do have a patch that gives us better goaway handling though. And that solves the major practical problem with the sync xhr on this site, though as we've said I think the problem persists with sites that are just slooooooooow.
(Assignee)

Comment 13

6 years ago
Created attachment 690403 [details] [diff] [review]
patch 0

We don't close a spdy session until all the streams are complete or another error happens. The go away frame tells us about some streams that aren't going to complete so we can handle those synchronous to the GA frame.

1] streams with id > last-good-stream weren't handled by the server so they can be restarted on a different session.
2] other streams by spec must be open, so if they aren't open its a bug and they should be implicitly reset. (this is the path that helps the twitter server in this bug.)
Assignee: nobody → mcmanus
Attachment #690403 - Flags: review?(honzab.moz)
Agree with comment 11.
tracking-firefox18: ? → -
tracking-firefox19: ? → -
tracking-firefox20: ? → -
(Assignee)

Comment 15

6 years ago
After I reached out to twitter, they have fixed their server problem with the OPTIONS request that left the spdy stream hanging. Quickly - thanks Jeff!

So this URL won't reproduce anymore - but the issue is equivalent to an HTTP/1 server hanging instead of responding to a OPTIONS request for the purposes of the XHR UI problem.
(Assignee)

Comment 16

6 years ago
Created attachment 691326 [details] [diff] [review]
patch 1

Honza - sorry to update a patch with an r? on it.. but spdy-dev has decided that the requirement that the last-good-stream-id reflect only opened connections was a mistake in the v3 text.. my patch implemented that, so I've updated it to take that part out. (I also had a bug because I implemented it in 2 and 3 and it is only a v3 item.)
Attachment #690403 - Attachment is obsolete: true
Attachment #690403 - Flags: review?(honzab.moz)
Attachment #691326 - Flags: review?(honzab.moz)
Comment on attachment 691326 [details] [diff] [review]
patch 1

Personaly, I'd rather put all the streams you want to close with reset to a temp array and iterate that array (and call Close(RESET)) on each stream out of the hashtable enumeration.  There is too much code nsHttpTransaction::Close triggers that I perceive it unsafe.  If we change something and the transaction could synchronously be re-added to the hashtable, we break it with unexpected behavior.

I know there is the mShouldGoAway flag set on the session that shouldn't allow it, but anyway.
Attachment #691326 - Flags: review?(honzab.moz)
(Assignee)

Comment 18

6 years ago
would you reconsider? I don't really see the value in the extra work.
I should rephrase, the comment might not be what I really wanted to suggest: please don't call stream->Close(RESET) from inside the enumerator function.  I.e. do the delete as you do, but collect streams to a local array and then iterate that array after you finished enumeration of the hash table and call Close(RESET) on each stream.

Stream Close() method calls ns(A)HttpTransaction::Close().  That is a code far from spdy and when we change something in the complicated mechanism of restarts or transaction closing we may get to a situation the same or different transaction would be added to the hash table you just iterate and break the hash table state.  Also don't forget you have two hash tables you need to keep in sync.
(Assignee)

Comment 20

6 years ago
(In reply to Honza Bambas (:mayhemer) from comment #19)
> I should rephrase, the comment might not be what I really wanted to suggest:
> please don't call stream->Close(RESET) from inside the enumerator function. 
> I.e. do the delete as you do, but collect streams to a local array and then
> iterate that array after you finished enumeration of the hash table and call
> Close(RESET) on each stream.
>

That won't work because the only hard ref to the stream is actually held by the transaction hash.. so removing it from the transaction hash deletes the stream.

I've updated the patch to have the iterator collect the streams to be closed and when that returns do the close() along with the two hash removals. That's a tad more expensive, but realistically the list is empty when goaway gets sent anyhow so its not really important.

I've also considered the case of streams that haven't ever sent a SYN.. that might be because they are queued due to concurrency limits or simply because they haven't fully buffered the request yet (which doesn't really happen in firefox proper, but an addon can do it). They are restartable inherently (they've never been sent anywhere after all).
(Assignee)

Comment 21

6 years ago
Created attachment 692639 [details] [diff] [review]
patch v2
Attachment #691326 - Attachment is obsolete: true
Attachment #692639 - Flags: review?(honzab.moz)
Comment on attachment 692639 [details] [diff] [review]
patch v2

Review of attachment 692639 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab.
Please check for white spaces.

::: netwerk/protocol/http/SpdySession2.cpp
@@ +115,5 @@
> +  // these streams were not processed by the server and can be restarted.
> +  // Do that after the enumerator completes to avoid the risk of
> +  // a restart event re-entrantly modifying this hash.
> +  if (stream->StreamID() > self->mGoAwayID || !stream->HasRegisteredID())
> +    self->mGoAwayStreamsToRestart.Push(stream);

You can forget() the stream here and immediately remove it.  That was my idea.

::: netwerk/protocol/http/SpdySession3.cpp
@@ +116,5 @@
> +  // these streams were not processed by the server and can be restarted.
> +  // Do that after the enumerator completes to avoid the risk of
> +  // a restart event re-entrantly modifying this hash.
> +  if (stream->StreamID() > self->mGoAwayID || !stream->HasRegisteredID())
> +    self->mGoAwayStreamsToRestart.Push(stream);

You should have a class specific for this enumerator to keep the necessary data.

::: netwerk/protocol/http/SpdyStream3.h
@@ +39,5 @@
>      NS_ABORT_IF_FALSE(!mFullyOpen, "SetFullyOpen already open");
>      mFullyOpen = 1;
>    }
>  
> +  bool HasRegisteredID() {return mSynFrameComplete;}

So its up to those who modify or review this code to check this flag is still set close to call of RegisterStreamID.  Could we use the stream id it self rather?  It is by default 0 but RegisterStreamID should never return 0.
Attachment #692639 - Flags: review?(honzab.moz) → review+
(Assignee)

Comment 23

6 years ago

> You can forget() the stream here and immediately remove it.  That was my
> idea.
> 

I don't like that because right now the presence of a stream in mStreamTransactionHash is what defines whether or not it is allocated. That's the only hard pointer that matters. Trying to keep it alive with forget() while removing it from the hash breaks that 'rule'.

And honestly, I'm not even sure what it means to .forget() the autoptr that is a stack argument here while also deleting it from the hash where it is a autoptr.. those aren't real refcounts they are just autoptrs. I'm not saying it wouldn't work - but I am saying it doesn't make it clearer to me.


> still set close to call of RegisterStreamID.  Could we use the stream id it
> self rather?  It is by default 0 but RegisterStreamID should never return 0.

sure
(Assignee)

Comment 24

6 years ago
bz - do you want to deal with the sync xhr issue by leaving this open or making a new bug? It will repro with a server that just doesn't send an http response header right away.

 https://hg.mozilla.org/integration/mozilla-inbound/rev/d0797dfcab56
(I still don't understand how sync XHR could cause UI badness here.)
(Assignee)

Updated

6 years ago
Blocks: 823030
Backed out on suspicion of causing Linux32 mochitest-4 leaks:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=Rev3 Fedora 12 mozilla-inbound debug test mochitest-4&fromchange=d3848ee8c9da&tochange=d7a482019f54

https://hg.mozilla.org/integration/mozilla-inbound/rev/d7a482019f54
(Note these are leaks of form bug 787312, but much more frequent)
(Assignee)

Updated

6 years ago
Depends on: 818833
(Assignee)

Comment 29

6 years ago
I'm going to try reland this unchanged on inbound' hoping that suspicion of it causing the leak was wrong, because I've got 13 linux32 debug mochi-4 runs that are green on try. (and getting linux32 runs on try is no small feat these days.)

https://tbpl.mozilla.org/?tree=Try&rev=98396a969ce9

https://hg.mozilla.org/integration/mozilla-inbound/rev/53ade07f41ce
(Assignee)

Comment 30

6 years ago
changeset:   116500:d0797dfcab56
user:        Patrick McManus <mcmanus@ducksong.com>
date:        Tue Dec 18 22:28:24 2012 -0500
summary:     bug 819044 - better spdy stream cleanup when handling goaway r=honzab
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
The comment 30 changeset was the original that was backed out; the second landing hasn't merged. Reopening until it does :-)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 32

6 years ago
(In reply to Ed Morley (Away 20th Dec-2nd Jan) [UTC+0; email:edmorley@moco] from comment #31)
> The comment 30 changeset was the original that was backed out; the second
> landing hasn't merged. Reopening until it does :-)

sorry - I checked for that explicitly.. I just plain did it wrong!
No need for sorry :-) Just thought I'd reopen before anyone retested the next Nightly and thought there was an issue with the patch.
No longer blocks: 798243
https://hg.mozilla.org/mozilla-central/rev/53ade07f41ce
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Depends on: 787158
Causing spike of bug 787158.

Patrick, this may need a backout.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 36

6 years ago
I'm going to back this out in the next few minutes, because it is highly correlated with the crash activity of 787158.

but I can't find anything wrong with the patch - I suspect rather that this is related to state problems of 788001/829120 which tend to result in streams that are just ignored without the patch.

so I'll try and resolve those and then reland this.
Depends on: 788001, 829120
(Assignee)

Comment 37

6 years ago
Aurora (FF20) backout:
https://hg.mozilla.org/releases/mozilla-aurora/rev/332c4ccdfe5b

M-C (FF21) is CLOSED TREE right now so that backout will wait for tomorrow.
Target Milestone: mozilla20 → mozilla21
(Assignee)

Comment 38

6 years ago
M-C [FF21] backout
  https://hg.mozilla.org/integration/mozilla-inbound/rev/5e63610b9072
Severity: critical → normal
Target Milestone: mozilla21 → ---
Severity: normal → critical
(Assignee)

Comment 39

6 years ago
Hey smaug - I change the severity intentionally.. this patch is a fixup to improve handling around a server side bug (which has been fixed actually).. the server was doing the HTTP/1 equivalent of ignoring a request without even timing out or responding to it with an error.
Severity: critical → normal
https://hg.mozilla.org/mozilla-central/rev/5e63610b9072
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
(Assignee)

Updated

6 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Updated

5 years ago
Whiteboard: [spdy]
https://hg.mozilla.org/mozilla-central/rev/f06b4ca5c9a4
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.