Closed Bug 454990 Opened 16 years ago Closed 16 years ago

Firefox 3/win32 file uploads (HTTP POST) are very slow compared to other browsers

Categories

(Core :: Networking, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: hakan.lindqvist, Assigned: mcmanus)

References

()

Details

(Keywords: fixed1.9.1, perf, Whiteboard: [fixed1.9.1b3])

Attachments

(2 files, 2 obsolete files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1 (.NET CLR 3.5.30729) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1 (.NET CLR 3.5.30729) We have found that uploading files (using plain HTTP POST) is significantly slower in Firefox 3.0.1 on win32 (takes about 3-5x as long) compared to some other win32 browsers such as Internet Explorer 7 (7.0.5730.13), Google Chrome beta (0.2.149.29), Opera 9.52. (All tested on the same machines.) This happens on Windows XP (SP3) as well as Windows Vista (SP1). Firefox UA strings for above mentioned versions: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1 (.NET CLR 3.5.30729) Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1 (.NET CLR 3.5.30729) However, Firefox 3.0.1 under MacOS X 10.5.4 as well as Firefox 3.0.1 under Ubuntu Linux 8.04 work as expected. Firefox UA strings for above mentioned versions: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008072820 Firefox/3.0.1 Tests were performed using this simple form, where uploadtest.php is a page that does nothing with the form data: <html> <body> <form method="post" enctype="multipart/form-data" action="uploadtest.php"> <input type="file" name="data"> <input type="submit"> </form> </body> </html> Timing has been done manually, but with a fairly large file, so we firmly believe that any timing error caused by the manual timing is negligable in comparison to the massive difference in upload time. Example: Uploading a file that takes about 1 minute with Internet Explorer or Google Chrome may typically take about 3-4 minutes with Firefox on win32. Except for our own testing, similar results were also achieved by Mardeg in #firefox @ irc.mozilla.org (log excerpt will be attached). One theory in particular from that discussion is: <%Mardeg> hawk: without jumping to any conclusions, let me jump to this conclusion: There are ways in windows to increase the QoS for an app which Chrome and Opera and IE and Arora are using <%Mardeg> and Safari, Hv3, Netscape and Firefox are not using Reproducible: Always Steps to Reproduce: 1. Navigate to the page with the http post upload form 2. Select the file to upload 3. Click submit button 4. Wait, timing how long the upload takes to finish Actual Results: Firefox 3 on Windows (XP, Vista) consistently seems to take about 3-5 times longer than some other Windows browsers (such as Internet Explorer, Opera, Google Chrome), as well as compared to Firefox 3 under MacOS X or Linux. Expected Results: Firefox should perform about the same as the faster browsers do on the same machine/network. I was discussing this matter in #firefox @ irc.mozilla.org before filing this bug. I will attach a log of that discussion for your reference.
I remember doing the test in Bug 452458 (not locally) and getting weird results. IE6's upload speed was 3 times faster. But the Firefox 2 and 3 download speed was considerably faster so I thought it could be coincidence.
Status: UNCONFIRMED → NEW
Component: General → Networking
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → networking
Version: unspecified → Trunk
(In reply to comment #2) > But the Firefox 2 and 3 download speed > was considerably faster Faster than IE6.
Adding Patrick. He might have an idea too.
Having encountered the same issue on http://flickr.com/photos/upload/basic/ and within XULRunner. Additional info: this is more noticeable on relatively fast connections.
Jerome - those traces are golden. They show the roughly 3MB upload finishing in around 8 sec on FF, and 3 seconds on IE. RTT is roughly 18ms. (Wow, you are well connected to flickr!). Each trace shows about a constant +5seconds after the upload for flickr to generate a response - but that is constant across browsers and uninteresting. The issue is related to congestion control. Flickr uses a pretty conservative 32KB rwin which should be the bottleneck here, but is not working that way for FF. Something else (unknown) is restricting the sending rate. If you look at the IE trace you'll see that IE, when it reaches steady state, gets 32KB out in front of the returning acks (i.e. a 32KB window) and then waits for an ack so it can send a little more.. that wait is usually just a ms or two which means that 32KB comes close to the bandwidth-delay product of the path. On the FF trace you'll see that the tcp stack never grows the window to more than 8KB. Specifically it seems to send in chunks of exactly 4KB (3*1260 + 316.. 1260 is the mss so that makes some sense), where IE does more what I would expect in that it always sends 1260 byte segments (the mss). Anyhow, FF gets up to 8KB and something decides not to send anymore until there is an ACK.. because the 8KB window is way too small compared to the bandwidth delay product of the path there isn't a lot of useful ACK pipelining going on, so it needs to wait a long time.. according to the traces it needs to wait ~10+ ms on average.. when it gets that ACK it dribbles out another 4KB and waits again for another ack. indeed, there are 732 4KB chunks in a 3MB upload.. spread over 8 seconds (the time the UL took), that's about 11ms each. The entire transfer time is dominated by that waiting and the RTT. A higher RTT would make it go considerably slower even if you had more bandwidth - good thing you're well connected to flickr or it would be worse :) So the question is, why is FF not growing that congestion window up to 32KB? I don't know windows very well at all, but I can try and look into it. Its no doubt related to a 4KB buffer in the network stack somewhere (though that does not seem to be a problem for the other platforms). This seems like a pretty significant bug.
Reading some more it appears this is clamped on windows by the DefaultSendWindow property.. which by default is 8192. So there you go. Search for that and firefox upload speed and you'll see lots of registry tweaking advice. http://technet.microsoft.com/en-us/library/cc781532.aspx it seems this is controllable through the SO_SNDBUF setsockopt() path, as it is on most OS's.. the windows default is just low. so IE probably just sets it locally to something more reasonable which is what I would think FF should do - Unlike RCVBUF there is very little reason to undersize this one in my opinion. It just sets the max amount of data the kernel will buffer for you - it doesn't really use any memory as much as shift it around (from userspace to kernelspace) and the amounts in question are small. My linux desktop defaults this to over 100KB (/proc/sys/net/core/wmem_default). it certainly seems worth giving a properly placed pr_setsocketoption(PR_SockOpt_SendBufferSize) a try. MXR indicates this isn't happening anywhere I can see. Take this all with a grain of salt - I've never compiled Firefox on Windows before, but I'll borrow my wife's laptop, cons up a patch and report back.
(In reply to comment #9) > Could it be related to the 4096 > http://code.flickr.com/trac/browser/trunk/uploadr/MacUploadr.app/Contents/Resources/chrome/content/uploadr/upload.js#L770 > ? no, its strictly a congestion control thing. This really simple patch (below) fixes it and results in much better congestion windows; I've tried it out and ff actually beats msie on my test with it applied. I'm going to build a better patch that is driven off a pref we only set for windows.. working on that now. iff --git a/netwerk/base/src/nsSocketTransport2.cpp b/netwerk/base/src/nsSocketTransport2.cpp --- a/netwerk/base/src/nsSocketTransport2.cpp +++ b/netwerk/base/src/nsSocketTransport2.cpp @@ -1121,17 +1121,22 @@ nsSocketTransport::InitiateSocket() PRStatus status; // Make the socket non-blocking... PRSocketOptionData opt; opt.option = PR_SockOpt_Nonblocking; opt.value.non_blocking = PR_TRUE; status = PR_SetSocketOption(fd, &opt); NS_ASSERTION(status == PR_SUCCESS, "unable to make socket non-blocking"); - + + opt.option = PR_SockOpt_SendBufferSize; + opt.value.send_buffer_size = 125000; + status = PR_SetSocketOption(fd, &opt); + NS_ASSERTION(status == PR_SUCCESS, "unable to set send buffer size"); + // inform socket transport about this newly created socket... rv = gSocketTransportService->AttachSocket(fd, this); if (NS_FAILED(rv)) { PR_Close(fd); return rv; } mAttached = PR_TRUE;
I believe this patch will resolve the issue. It creates a preference for the maximum size of the TCP sending window. It sets the pref on Windows, as it is too small by default there. It sets it to 128KB which is the default on Linux.. In testing IE out for this bug I saw it generate congestion windows of at least 95KB, so the value set by IE is at least that large - so 128 again seems reasonable. For my tests I was using a network path that had a pretty weak BDP so the 8KB windows default didn't kill me. Using either FF from my Linux desktop, or IE on windows I could complete my 3.7MB post in 34 seconds while FF 3.0.6 (and 3.1 beta) on windows took about 40 seconds consistently. With this patch applied, my local FF build also clocked in at 34 seconds - matching the others. Heck, it was even a debug build - but CPU isn't the choke here. Folks with bigger bandwidth-delay paths (and there are many people with that) will see much bigger improvements in overall speed. I also observed the patched FF busting up send windows to 90KB or so, where without this they stay fixed at 8KB. I would expect Jerome's flickr case to draw even with IE.
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Attachment #360988 - Flags: review?(bzbarsky)
Comment on attachment 360988 [details] [diff] [review] add preference for sending window size, and set it by default on windows - v1 Do we really need the preference observer here?
Great job on this, btw
(In reply to comment #12) > (From update of attachment 360988 [details] [diff] [review]) > Do we really need the preference observer here? Ah, I thought it was always a nice idea. Are there guidelines for when we want dynamic config and when not to bother?
I'll let Boris decide. I have been told that it could be a performance issue. This particular pref may not really need to be dynamic. Just wanted to bring it up
Attachment #360988 - Flags: review?(jduell.mcbugs)
Attachment #360988 - Flags: review?(bzbarsky)
Attachment #360988 - Flags: review-
Comment on attachment 360988 [details] [diff] [review] add preference for sending window size, and set it by default on windows - v1 >+++ b/browser/app/profile/firefox.js Shouldn't this go in all.js instead? This doesn't seem like a Firefox-specific thing, but a Gecko-specific one... >+++ b/netwerk/base/public/nsPISocketTransportService.idl >+ /** >+ * controls the TCP sender window clamp >+ */ >+ attribute long sendBufferSize; Is there a good reason this isn't a readonly attribute? >+++ b/netwerk/base/src/nsSocketTransport2.cpp >+ rv = gSocketTransportService->GetSendBufferSize(&sndBufferSize); >+ if (NS_SUCCEEDED(rv) && (sndBufferSize > 0)) { That call never actually fails, right? I guess the check doesn't hurt anything much... In any case, please remove the extra parens around the > 0 comparison. >+ status = PR_SetSocketOption(fd, &opt); Is |status| used for anything after this point? Or are you just ignoring it? Don't assign it if you don't plan to do anything with the value... >+++ b/netwerk/base/src/nsSocketTransportService2.cpp nsSocketTransportService::nsSocketTransportService() >+ tmpPrefService->AddObserver(SEND_BUFFER_PREF, this, PR_FALSE); This is actually kinda bad. You don't want to be passing an object with a zero refcount (before its first addref) to XPCOM methods. If someone happens to addref and then release it in there, your destructor will get called. The right way to do this is to have an Init() method that does the observer-adding. Also, you're telling the pref service to hold a strong ref to you, right? Shouldn't you RemoveObserver at some point (presumably xpcom shutdown)? It'd be good if Jason looks over this too; the actual talking-to-the-network part is a lot more his area of expertise than mine, though this looks reasonable offhand.
I can't claim any expertise about the framework/XPCOM code in this patch--I'm still learning all that stuff. As far as as the actual logic we're trying to achieve (use a bigger send buffer if the default is too small), I think using 128K is a sensible default. However, it seems like it would make our code port better to future platforms if we actually check the default send buffer size (ideally just once as opposed to for each and every socket we create: it's not going to ever change AFAIK), and bump it up to 128K for each new socket if it's smaller. This way we won't have to rediscover this bug for new platforms that come along with small send buffers. On the other hand, there may be some mobile platforms (Patrick probably knows this better than I) where allocating 128K per socket is going to be too memory-expensive? It's still only a MB for 8 sockets, so maybe not. Perhaps such platforms could set a lower default using the prefs. Alternatively, we could increase the send buffer size only for sockets where we know we'll be sending a lot of data (file uploads, etc.), but I don't know how easy that would be to do within our APIs. I'd add in this bug # to your comment where you actually set the sockopt.
hey jason -thanks for the comments. (In reply to comment #17) > However, it seems like it would make our code port better to future platforms > if we actually check the default send buffer size (ideally just once as opposed > to for each and every socket we create: it's not going to ever change AFAIK), > and bump it up to 128K for each new socket if it's smaller. I thought about that, but a 128K min isn't really necessarily a better policy than what the OS does - its just better than the fixed 8K default of xp and vista. Linux and Mac autotune this pretty effectively under most circumstances, and setting a specific value disables the autotune so I avoided it on purpose. However if someone wants to set the pref locally or per product they can enforce a value locally. In general I think this is the kind of thing where you want to trust the OS settings unless the platform has let you down.. we have a specific data point of being let down here with windows and it seems to be the exception rather than the rule, so I tried to target just that - by doing it with a pref it is easiest for indvidual installations or products to change it. the other thing I want to reiterate is that this does not result in a 128KB allocation per socket. It is simply the max amount of data that can be sent without ack (and therefore has to remain buffered in the kernel until the ack comes). Every implementation I am aware of keeps the real data in a linked list of packets that were sent. Even for large POSTs it generally just moves memory around (instead of being queued in userspace it is queued in kernelspace) and doesn't effectively bloat anything. Keeping the value as small as possible, while still being larger than the BDP, is useful for content generators that will take the flow control hint and stop producing stream information that they just queue in userspace.. I don't know if we have anything like that or not.. maybe an addon? Anyhow, that's where autotune would really shine and we want to preserve that where possible. So, barring strong objection, I think the best approach is as is (modulo fixing Boris's code comments - Can't believe I blew the RemoveObserver() bit).. is that ok by you?
> So, barring strong objection, I think the best approach is as is. Sounds good to me. I think you've also made the case that it's worth having these be tunable knobs, so that part seems good too.
(In reply to comment #16) > (From update of attachment 360988 [details] [diff] [review]) > >+++ b/browser/app/profile/firefox.js > > Shouldn't this go in all.js instead? This doesn't seem like a Firefox-specific > thing, but a Gecko-specific one... If we do that, I worry about the affect on Windows mobile. I suppose we could make the conditional for XP_WIN check also check for !WINCE at the same time
(In reply to comment #20) > If we do that, I worry about the affect on Windows mobile. I suppose we could > make the conditional for XP_WIN check also check for !WINCE at the same time That's fine. But I think with some testing we might want to turn it on (maybe to a different value) with wince as well - mobile sees lots of high latency paths which may require the larger window to function well. But I agree we should add it only if we identify that to be the case (and can do so in mobile.js if need be.)
This is a little lazy of me - I should just look it up. does anyone know how the file post code works? Does it: a] copy it all off disk and dump it into userspace buffers, trickling those into the network as send() accepts portions.. or b] copy chunks off the disk into userspace buffers and hand those to send() and then only copy more off the disk immediately if the non blocking send() succeeded I ask because (b) is nicely flow controlled and therefore increasing SO_SNDBUF will add a little memory usage in order to accommodate the larger flow. (But only on the socket doing the POST, not all sockets). Where (a) is just going to shift memory around because the whole thing is already buffered anyway. if its (b) then the windows mobile thing is more relevant.
nsFormSubmission contains the guts of our form submission logic (unsurprisingly): nsFormSubmission::SubmitTo is called when you submit a form: http://mxr.mozilla.org/mozilla-central/source/content/html/content/src/nsFormSubmission.cpp#1187 There are separate implementations for the various form encodings, you probably care about nsFSMultipartFormData the most: http://mxr.mozilla.org/mozilla-central/source/content/html/content /src/nsFormSubmission.cpp#859 It uses a multiplexed stream to add simple data (name=value pairs) and file streams without loading them entirely into memory. (Read up a little from where I linked to see the full details.) I chased the load process down through webshell to docshell for a while and then got tired, bz could probably tell you more, but it passes the POST data stream off to the HTTP channel eventually: http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/src/nsHttpChannel.cpp#4338 (and that's about where I ran out of interest)
Attached patch patch v2 (obsolete) — Splinter Review
This version should have boris's review comments addressed. Thanks for those. Ted, Thanks for the pointer.
Attachment #360988 - Attachment is obsolete: true
Attachment #361059 - Flags: review?(bzbarsky)
Attachment #360988 - Flags: review?(jduell.mcbugs)
Just to pick up from comment 23, we then pass the stream to nsHttpTransaction, which possibly wraps another multiplex stream around it, and then a buffered stream (with a 4KB buffer). At this point, no data has been moved from disk yet. The data moves when nsHttpTransaction::ReadSegments is called from nsHttpConnection::OnSocketWritable. This then calls nsHttpConnection::OnReadSegment, which calls Write() on the socket output stream that was opened in nsHttpConnection::CreateTransport. It looks like that immediately cals PR_Write, so at the moment we're probably not buffering more than 4KB of file data at any given point in time (the size of the buffered stream used). When SSL is in use, things might be more complicated, but I suspect the same thing will happen. So this is basically case (b) from comment 22. For everything but the file upload part of a POST, all the data is in memory already, so we're just shifting data around; that's case (a).
> so at the moment we're probably not buffering more than 4KB of > file data at any given point in time (the size of the buffered > stream used). Would it be worth trying to experiment with a larger file buffer, at least for large file POSTs?
basically the same as v2, just removed an unused assignment
Attachment #361059 - Attachment is obsolete: true
Attachment #361174 - Flags: review?(bzbarsky)
Attachment #361059 - Flags: review?(bzbarsky)
> Would it be worth trying to experiment with a larger file buffer Possibly yes, but note that the code reading data from the stream only reads it in 4KB chunks. If you changed both the buffered stream and OnSocketWritable you might actually see a behavior difference...
Or you could change NS_HTTP_SEGMENT_SIZE in general, as an experiment.
Comment on attachment 361174 [details] [diff] [review] patch v3 [Checkin: Comment 37 & 47] Looks great.
Attachment #361174 - Flags: superreview+
Attachment #361174 - Flags: review?(bzbarsky)
Attachment #361174 - Flags: review+
Is this something we could take on 1.9.1? Seems like it's something that users might actually notice in terms of perf.
Great to see the progress that has been made! To follow up on comment #31: The problem is very noticeable when uploading large files (eg waiting 5 minutes with for instance IE or waiting 20 minutes with Firefox). Assuming that this fix makes Firefox on win32 perform on par with the better performing browsers or Firefox on other platforms, it should be clearly noticeable for anyone uploading files regularly. Once I tracked down that the issues I was experiencing were limited to Firefox I personally have switched browsers when uploading larger files. I really hope this makes it to 1.9.1. (Personally I would be happy if it was considered for 1.9.0 too.)
(In reply to comment #31) > Is this something we could take on 1.9.1? Seems like it's something that users > might actually notice in terms of perf. I agree. To see this you need a certain kind of network path (high BDP) that is becoming more common.. so it probably becomes even more relevant to 1.9.1 over its lifecycle.
Flags: wanted1.9.1?
I'd just request approval on the patch once you check it into m-c and bake a bit. Requesting wanted isn't going to change anything at this point, since we already have a fix.
Oh, and this doesn't seem like a reasonable 1.9.0 change to me, but feel free to nominate if you want the branch drivers to evaluate it.
I wish having this patch part of the XULrunner binaries (mfinkle as well I guess?) since this is where I first encountered the issue.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Keywords: perf
(In reply to comment #36) > I wish having this patch part of the XULrunner binaries (mfinkle > as well I guess?) since this is where I first encountered the issue. Jerome - If you're building XULRunner from trunk, you have this fix now. If you are using the 1.9.1 branch, you'll have to wait until we see if it will landed there too. If you're using the 1.9.0.x branch, you might have to patch this yourself as I doubt it will land there (but you never know)
Would someone with the permissions to do so please nominate attachment 361174 [details] [diff] [review] for 1.9.1 approval? Or is it already "on the radar" for 1.9.1 inclusion even without that? (And possibly 1.9.0.8, but it seems the conclusion is that probably isn't going to happen.)
(In reply to comment #39) > Would someone with the permissions to do so please nominate attachment 361174 [details] [diff] [review] > for 1.9.1 approval? Or is it already "on the radar" for 1.9.1 inclusion even > without that? > > (And possibly 1.9.0.8, but it seems the conclusion is that probably isn't going > to happen.) I'm very confident in my patch, but I would also appreciate hearing from another source that it resolved the problem for them. Just to be sure.
I can do some testing/verification, but I don't have a suitable environment set up for building Firefox for Win32. Can I use a nightly build of trunk for this purpose (something like ftp://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central/), or would that be too many of other changes that could affect the results to be conclusive regarding this particular change?
(In reply to comment #41) > I can do some testing/verification, but I don't have a suitable environment set > up for building Firefox for Win32. > > Can I use a nightly build of trunk for this purpose (something like > ftp://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central/), or would > that be too many of other changes that could affect the results to be > conclusive regarding this particular change? assuming you have the dramatic reproducible test case from the original description, the nightly should be fine as long as it is in decent shape. (I've been running Shiretoko, so I don't really know what shape m-c is in.)
I'm running m-c daily, and it's fine. You could also test the two nightlies around this checkin (one before, one after) if you want to try to narrow the set of changes you're testing.
I have now done some quick tests comparing Mozilla-central "2009-02-07-03" with "2009-02-11-03" on Vista SP1, with a few test runs of each. There were some minor differences between the attempts, but the "2009-02-11-03" build consistently took about 40% to upload the file compared to the "2009-02-07-03" build. (With a higher latency path, I believe there could be an even bigger difference.)
Attachment #361174 - Flags: approval1.9.1?
Comment on attachment 361174 [details] [diff] [review] patch v3 [Checkin: Comment 37 & 47] asking for 1.9.1 approval
Attachment #361174 - Flags: approval1.9.1? → approval1.9.1+
Comment on attachment 361174 [details] [diff] [review] patch v3 [Checkin: Comment 37 & 47] a191=beltzner
Keywords: checkin-needed
Attachment #361174 - Attachment description: patch v3 → patch v3 [Checkin: Comment 37 & 47]
Whiteboard: [fixed1.9.1b3]
Target Milestone: --- → mozilla1.9.2a1
Flags: wanted1.9.1?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: