Last Comment Bug 790184 - [SPDY] Firefox V15 & 15.0.1 fails with attachment upload on gmail with squid proxy
: [SPDY] Firefox V15 & 15.0.1 fails with attachment upload on gmail with squid ...
Status: RESOLVED FIXED
[spdy]
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: 15 Branch
: x86 Linux
: -- normal (vote)
: mozilla18
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-11 03:24 PDT by Jerome
Modified: 2012-10-13 09:54 PDT (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
affected
+
+
fixed
fixed


Attachments
patch 0 (9.41 KB, patch)
2012-10-05 10:00 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
Details | Diff | Splinter Review
patch v1 (15.77 KB, patch)
2012-10-05 14:35 PDT, Patrick McManus [:mcmanus]
mcmanus: review+
cbiesinger: superreview-
Details | Diff | Splinter Review
patch 2 (16.85 KB, patch)
2012-10-05 15:29 PDT, Patrick McManus [:mcmanus]
mcmanus: review+
cbiesinger: superreview+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Jerome 2012-09-11 03:24:00 PDT
User Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.2) Gecko/20100101 Firefox/10.0.2 Iceweasel/10.0.2
Build ID: 20120217184109

Steps to reproduce:

Bonjour et merci de votre Fondation.
Depuis la mise à jour 15.0, 
connecté sur Gmail, j'essaye d'ajouter un fichier joint de 5Mo à un mail. 

Mon réseau utilise un proxy de type squid. 
Mon débit disponible en upload est de l'ordre de 300kbits/s .


Actual results:

Cela démarre très lentement et cela finit par un message "Échec de l'ajout de la pièce jointe". 
Différents essais de version et d'environnement:
J'essaye une piece jointe de 2Mo: cela fonctionne
Piece Jointe 5Mo , FF V15.0.1(windows XP SP3) , même erreur.
Piece Jointe 5Mo , FF V12 (windows XP SP3) : cela fonctionne
Piece Jointe 5Mo , FF V10 (Linux Debian Squeeze) : cela fonctionne.
Piece jointe 5Mo , Invalidant tous les plugins : cela ne change pas le comportement.

A signaler: sous IE8 on voit encore "fonctionnalité de base des pieces jointe" dans les parametres de gmail. On ne voit plus cette option dans Firefox.


Expected results:

La pièce jointe aurait dû se télécharger tranquillement.


Merci de vos efforts.

Bien amicalement
Jerome
Comment 1 Matthias Versen [:Matti] 2012-09-11 07:29:35 PDT
Hello Jerome !

I'm sorry but we usually English in bugzilla. I have translated your report using google translate but it makes the communication a little bit difficult if we use mixed languages. 

As a test, could you please open about:config (enter as URL in Firefox), confirm the warning, enter "spdy" in the top filter bar without the quotes and change the entry "network.http.spdy.enabled" to "false". Restart Firefox and try it again to upload the attachment. Does it work better ?

btw: I added Loic to this report. He should understand french and make it easier for you in case you don't understand english.
Comment 2 Loic 2012-09-11 07:49:53 PDT
@Jerome: Vous devriez utiliser le forum français de support sur Geckozone (http://www.geckozone.org/forum/) et reposter ici s'il y a vraiment un bug dans Firefox 15.

Translation: he says he's using a network with a Squid proxy and FF15 fails to attach 5-MB attachment to an email on Gmail but works with 2-MB attachment.
I redirected him to the French support board anyway.
Comment 3 Jerome 2012-09-11 09:38:14 PDT
Hi Matthias and Loic ..

Many apologies! Sorry for french speaking here!

I have just tried your tuning on V15.0.1
"network.http.spdy.enabled" to "false" , and it works  fine !

Now, how may I deploy this tuning over my LAN , because lots of  my users are very annoyed ?

Thanks a lot for your solution.

Jerome
Comment 4 Matthias Versen [:Matti] 2012-09-11 09:54:01 PDT
>"network.http.spdy.enabled" to "false" , and it works  fine !
Great, that confirms that i was right.

>Now, how may I deploy this tuning over my LAN
I apologize for the issues but I can't answer your question.
Bugzilla is a pure bug database and our missions is to fix bugs and not for support help. The french support forum can probably help. One way could be a Mozilla autoconfiguration file.

Anyway, would you help us with a log file if the networking developer requests it ?
He or I will post the instructions in that case.
Comment 5 Patrick McManus [:mcmanus] 2012-09-11 10:00:46 PDT
log instructions: https://developer.mozilla.org/en-US/docs/HTTP_Logging

this would be critical - though the squid notes are interesting.. as is http://code.google.com/p/chromium/issues/detail?id=69813#c62
Comment 6 Matthias Versen [:Matti] 2012-09-11 14:12:42 PDT
Jerome:
Could you please help us and generate a http log and attach it here using the "add an attachment" link in this report ?

Be sure that spdy is enabled again and follow the instructions posted in the link from Patrick in Comment#5
It would help our network developer /Patrick) to get this issue fixed.
Comment 7 Jerome 2012-09-12 04:58:36 PDT
OK 
-spdy enabled again
-problem repeated
-Log recorded.
-Gzipped log.txt size: 13Mo
I can upload it to the dedicated area you will show me, unless
you tell me what to grep in it. 
Jerome
Comment 8 Matthias Versen [:Matti] 2012-09-12 05:23:00 PDT
Send me the log via mail and i will host it on my server
Comment 9 Jerome 2012-09-12 09:34:15 PDT
Log sent to Matthias at his email adress.
Jerome
Comment 10 Matthias Versen [:Matti] 2012-09-12 10:39:15 PDT
http Logfile: http://mversen.de/mozilla/FirefoxBugSpdy120912log.txt.gz
Comment 11 Patrick McManus [:mcmanus] 2012-10-04 14:01:46 PDT
jerome - you got a ping timeout. interesting.

I wonder if squid isn't doing any flow control.

would you say that you have a much faster connection to the proxy than squid does to gmail? (e.g. squid is on your lan and squid has poorish internet connectivity)
Comment 12 Patrick McManus [:mcmanus] 2012-10-04 14:54:11 PDT
This is pretty interesting.

FF uploads for 44 seconds, still has a way to go, and then generates a spdy ping because it hasn't heard anything from the server in a while and it wants to make sure it isn't uploading into the wind.

If the buffering is too deep, either at squid or in the local socket buffers (or a combination of both), then that buffer bloat creates a huge round trip for that ping.. it doesn't make it back in 8 seconds and the session is aborted.

Our local socket buffers probably need to be tighter for spdy because of the inline ping.. but proxies will still play a role. We might need to extend the deadline considerably in the face of streams that have been uploading a lot of data - which makes dead connection detection much harder but most sessions are not woken up by this kind of large upload.
Comment 13 Patrick McManus [:mcmanus] 2012-10-05 07:15:55 PDT
this bug was filed on linux, and on non-windows platforms we generally use TCP auto tuned buffer sizes. Doing a 25MB gmail attachment I saw my local OS socket buffers grow as large as 4MByte. If I've only got 1 megabit of upstream bandwidth that's 32 seconds worth - way too much local buffering which prevents the ping from being answered quickly enough. Cancel's will also perform poorly.

I could see pings take several seconds to complete and that's on my pretty well connected desktop. The reported showed >200ms ping times, so his situation would perform worse.

Let's start with a change to get that under control - when doing large uploads set the buffer size to 128KB as we do for windows (which doesn't auto tune on older versions). Leave autotuning in tact when uploads aren't going on because that's a memory friendly thing to do.

That reduces the ping turnaround time I see to at most 1 second.

I'll make some try builds and ask jerome to try it out. Its possible that squid will still be adding too much buffering and we'll need to adjust the timeout period - but generally proxies are pretty aware of buffering too much [though admittedly they are more aware of it in the downstream direction]
Comment 14 Patrick McManus [:mcmanus] 2012-10-05 09:58:08 PDT
interestingly this problem will be mitigated by v3 in ff16 as the per stream flow control will typically have the effect of keeping the buffers short. However that is a per server setting so we shouldn't simply rely on it.
Comment 15 Patrick McManus [:mcmanus] 2012-10-05 10:00:39 PDT
Created attachment 668488 [details] [diff] [review]
patch 0

In my testing I confirmed that this kept buffer queues down, improved ping rtt times substantially, and didn't impact overall upload time.
Comment 16 Honza Bambas (:mayhemer) 2012-10-05 11:05:05 PDT
Maybe the spdy ping is designed wrong?  

I remember AllPeers multiplexer protocol was after a lot of research finally not using ping/pong reply, but just a regular ping, or no-op packet (with no expected reply to that ping) in 35 seconds after something has been sent out to the peer.  It was symmetric, from the peer it was expected to receive something in 35 seconds as well.  When for 60 seconds (if I remember the numbers well) nothing was heard from the peer, the connection closed.

We wouldn't need to play with buffers (what I'm not a big fan of, btw) when the ping wouldn't be of the request/reply type.

Thought for SPDY spec...
Comment 17 Honza Bambas (:mayhemer) 2012-10-05 11:28:55 PDT
Comment on attachment 668488 [details] [diff] [review]
patch 0

Review of attachment 668488 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/SpdyStream2.cpp
@@ +481,5 @@
> +  // the session and cap the send buffers at 128KB. (10Mbit/sec @ 100ms)
> +  //
> +  if ((mTotalSent > (128 * 1024)) && !mSetTCPSocketBuffer) {
> +    mSetTCPSocketBuffer = 1;
> +    mSocketTransport->SetSendBufferSize(128 * 1024);

Turn this to one or even two separate prefs please.
Comment 18 Patrick McManus [:mcmanus] 2012-10-05 11:53:05 PDT
(In reply to Honza Bambas (:mayhemer) from comment #16)
> Maybe the spdy ping is designed wrong?  
> 
> I remember AllPeers multiplexer protocol was after a lot of research finally
> not using ping/pong reply, but just a regular ping, or no-op packet (with no
> expected reply to that ping) in 35 seconds after something has been sent out
> to the peer.  It was symmetric, from the peer it was expected to receive
> something in 35 seconds as well.  When for 60 seconds (if I remember the
> numbers well) nothing was heard from the peer, the connection closed.
> 

its an interesting suggestion. It would have to be negotiated due to some implementation's concerns about battery - but that could be ok as the negotiation wouldn't be a pre-req to doing something useful.

> We wouldn't need to play with buffers (what I'm not a big fan of, btw) when
> the ping wouldn't be of the request/reply type.

as long as this stays tcp based relatively short buffers are still pretty interesting in order to effectively support prioritization and cancel semantics, right? We don't aggressively do a lot with that right now, but its potentially the biggest responsiveness benefit we can get and could forseeably mean tightening them further to match cwnd (which isn't easily knowable). Its also a reason that this shouldn't be tcp based in the long run.

> 
> Thought for SPDY spec...

Thanks!
Comment 19 Patrick McManus [:mcmanus] 2012-10-05 11:56:42 PDT
jerome, will you try out a build from

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mcmanus@ducksong.com-66f3e649c9a0/

using your proxy and let me know if the attachment succeeds?
Comment 20 Patrick McManus [:mcmanus] 2012-10-05 14:35:16 PDT
Created attachment 668624 [details] [diff] [review]
patch v1

small idl change to nsISocketTransport
Comment 21 Christian :Biesinger (don't email me, ping me on IRC) 2012-10-05 14:45:18 PDT
Comment on attachment 668624 [details] [diff] [review]
patch v1

nsSocketTransport2.cpp

Don't you have to do locking to access mFD, similar to GetSelfAddr?
Comment 22 Christian :Biesinger (don't email me, ping me on IRC) 2012-10-05 14:47:39 PDT
(the IDL change looks good though)
Comment 23 Patrick McManus [:mcmanus] 2012-10-05 15:29:04 PDT
Created attachment 668643 [details] [diff] [review]
patch 2

its a good point. is this what you had in mind?
Comment 24 Christian :Biesinger (don't email me, ping me on IRC) 2012-10-05 15:31:45 PDT
Comment on attachment 668643 [details] [diff] [review]
patch 2

yes. thanks.
Comment 25 Patrick McManus [:mcmanus] 2012-10-05 18:13:37 PDT
  https://hg.mozilla.org/integration/mozilla-inbound/rev/9e6567a9d1b2
Comment 26 Phil Ringnalda (:philor, back in August) 2012-10-05 19:39:19 PDT
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/c71ac48601d9 - it built on Windows, which is nice, and pile-of-letters Armv7a ICS, which is whatever, but not on any other platforms.
Comment 27 Jeff Walden [:Waldo] (remove +bmo to email) 2012-10-05 20:34:09 PDT
Sigh, this probably needs bug 730805, which I've been neglecting for awhile.  I might pick up the little I've done for that bug and finish up a patch by the end of the weekend, but in the meantime, just casting to some natural type is probably the quickest fix.
Comment 28 Patrick McManus [:mcmanus] 2012-10-06 13:30:28 PDT
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #27)
> Sigh, this probably needs bug 730805, which I've been neglecting for awhile.
> I might pick up the little I've done for that bug and finish up a patch by
> the end of the weekend, but in the meantime, just casting to some natural
> type is probably the quickest fix.

thanks jeff, but we won't need it here.. this bounced because I screwed up and left in a diag line I slipped in to locally confirm a small sr requested change before pushing.
Comment 29 Patrick McManus [:mcmanus] 2012-10-06 17:49:04 PDT
 https://hg.mozilla.org/integration/mozilla-inbound/rev/ecd4c4304219

the first landing of this patch ended up in a range that triggered a talos regression warning (along with other patches).. so I took the backout as an opportunity to run it and a control run (i.e. same tip without the patch) on try and it compares fine on tp5.

https://tbpl.mozilla.org/?tree=Try&rev=a5bf591261ee
and
https://tbpl.mozilla.org/?tree=Try&rev=408ff9d50ddb
Comment 30 Phil Ringnalda (:philor, back in August) 2012-10-06 22:18:20 PDT
https://hg.mozilla.org/mozilla-central/rev/ecd4c4304219
Comment 31 Bill Gianopoulos [:WG9s] 2012-10-07 06:17:59 PDT
I added the tracking flags here because the original bug was filed on a release version and it seemed the filer was concerned about getting a fix to be used across a significant number of users.

Since Firefox 16 is coming out on Tuesday, it seemed lame to try to create a version 15 version with the fix.  For the same reason it is probably way too late to get this into the version 16 release.  However, if a 16.0.1 is required for some other reason this would be good to take as a tag-along.

I really think we should try to take this for the Firefox 17 beta however.
Comment 32 Bill Gianopoulos [:WG9s] 2012-10-07 06:19:12 PDT
OH should have said especially since it is my understanding that the next ESR release is going to be based on version 17 and this proxy messing up the ping time issue is highly likely to impact enterprise deployments.
Comment 33 Alex Keybl [:akeybl] 2012-10-07 13:51:03 PDT
I think the best justification for getting a fix in sooner rather than later is at http://www.squid-cache.org/:

"Squid is used by hundreds of Internet Providers world-wide to provide their users with the best possible web access."

Please nominate for Aurora 17 uplift (before Monday morning PT) or Beta 17 uplift (after Monday morning PT) to get the fix into that Firefox version. Please make sure to include a risk evaluation, since whether or not we take this as part of a 16.0.1 would depend upon risk/testing.
Comment 34 Patrick McManus [:mcmanus] 2012-10-08 06:33:13 PDT
Comment on attachment 668643 [details] [diff] [review]
patch 2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): original spdy issue
User impact if declined: long running gmail attachments or other spdy uploads can time out depending on network path properties. I have heard some generic reports of non-reproducible attachment problems so this may be relevant.
Testing completed (on m-c, etc.): generally manual testing. it has only been on m-c for 1 day. Fairly safe approach but we don't have feedback yet if it is totally effective outside of tested scenarios.
Risk to taking this patch (and alternatives if risky): the code change will only effect spdy sessions that have done uploads.
String or UUID changes made by this patch: none

This should probably be uplifted atomically with 798423
Comment 35 Patrick McManus [:mcmanus] 2012-10-08 20:03:04 PDT
https://hg.mozilla.org/releases/mozilla-beta/rev/e583ab0502f1
Comment 36 Jerome 2012-10-09 01:59:50 PDT
Hi Everybody,
 
Many thanks for all your fine work.

I tried out this nightly build, and it worked with a 10Mb attachment in my (old) squid environment (squid-2.6.STABLE16) .
Cool !
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mcmanus@ducksong.com-66f3e649c9a0/

Many friendly greets 

Jerome
Comment 37 Patrick McManus [:mcmanus] 2012-10-09 06:19:51 PDT
thanks jerome

Note You need to log in before you can comment on or make changes to this bug.