Closed Bug 147979 Opened 23 years ago Closed 23 years ago

Mac-SSL connections very slow

Categories

(Core :: Networking: HTTP, defect, P2)

PowerPC
macOS
defect

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: johne, Assigned: darin.moz)

References

()

Details

(Keywords: regression, topembed+, Whiteboard: [adt1 RTM])

Attachments

(4 files)

All https:// connections I've used on MacOS X have been painfully slow. It can take in excess of 30 seconds to display pages. Works just fine from IE 5.1. MacOS X 10.1.4 Build ID: 2002052305 (rc3) The mac is a PowerBook G4 800MHz, just a few days old so there shouldn't be any cruft getting in the way.
do you have HTTP pipelining enabled? if so, does disabling it improve performance?
Pipelining was not enabled. I have enabled it, totally quit out of Mozilla, and restarted. There is no perceptual difference, things are still quite slow for https: connections. Doing things like banking sessions are still increadibly slow (http://www.commonwealthcu.org/ , will automatically switch to https: connection). Any other debugging you want me to do?
John, try using a new Mozilla user profile. Let us know if that behaves any differently.
-> PSM
Assignee: darin → ssaux
Component: Networking: HTTP → Client Library
Product: Browser → PSM
QA Contact: tever → junruh
Version: other → 2.0
I couldn't immediatly find a way in preferences to create a new user profile. I faked it by creating a whole new user account on the mac and trying mozilla from there. Same slowness. While viewing http://www.commonwealthcu.org/ (switches to https) Under the new account, the normal "You're switching to an encrypted page" dialog came up. However, it came up 'slowly' as well, after about 5 seconds and half the page was displayed. Don't know if this means anything. While watching `top`, mozilla is not CPU bound (1-4%). Sorry, I've only had the mac for a week. I don't know how to do the normal profiling and what not. :(
*** Bug 143483 has been marked as a duplicate of this bug. ***
See also bug 146950.
Confirming on 6/3 Mac OSX trunk build.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
*** Bug 148854 has been marked as a duplicate of this bug. ***
Keywords: nsbeta1
Version: 2.0 → 2.3
javi, can you take a look at the TCP trace? Any idea what could be the matter on Mac?
Assignee: ssaux → javi
*** Bug 148243 has been marked as a duplicate of this bug. ***
*** Bug 148311 has been marked as a duplicate of this bug. ***
I looked through the first two SSL connections in the TCP trace. The first connection was a full handshake: 0.135s between the server's SYN ACK and the client's first handshake record 0.048s for the server to reply with its cert 2.985s for the client to respond to the server's cert 0.085s for the server to respond 0.006s for the client to respond 0.049s for the server to send its first data record The second connection was a resume: 0.052s between the server's SYN ACK and the client's first handshake record 0.032s for the server to reply (no cert in record) 0.420s for the client to respond 0.047s for the server to send its first data record
I timed a connection with my watch. Entering "www.commonwealthcu.org" in the address bar and hit return. 40 seconds for page to display. 70 seconds for the Mozilla logo to stop spinning. The bottom of the page, in the status bar, says Document: Done (65.798 secs)
Keywords: regression
*** Bug 115072 has been marked as a duplicate of this bug. ***
i've actually experienced a worsening of this problem since 0.9.9. i just got 1.0 (Build ID: 2002052918) and turned on pipelining, and the problem is the worst i've ever seen it. it was still pretty bad before i turned on pipelining, though. i'm using Mac OS 10.1.5 on a 400MHz G4.
ok, i just did a comparison with pipelining on and pipelining off to test SSL slowness. try https://webbanking.waterhousebank.com/ with pipelining and then without. i guess this could be a server problem rather than a browser problem? it seems like there might be too many caveats for pipelining to make it a truly primetime feature. is it worth filing another bug?
*** Bug 148295 has been marked as a duplicate of this bug. ***
It should be noted that this is a very bad problem on SOME SSL sites, but not all. At least that's how it is for me. Bankofamerica.com and Half.com are unusably slow for me in Mozilla 1.0 on OS X 10.1.5, but Amazon.com's SSL pages seem just as fast as non-SSL pages. A bug that's bad enough to force me to fire up IE at least once a day is a very annoying bug...
More clues. The 6.2.3 RTM build (May 9) which had PSM 2.1, is very fast. The same day's trunk build, with PSM 2.3, downloads SSL pages very slowly. At this in-house test page, it downloads the gifs one at a time, instead of all at once. https://pki.mcom.com/color/color.html. Times for the above URL 6.2.3 - .775 seconds page load time 5/9/02 trunk - 35.52 seconds
Blocks: 150046
*** Bug 147908 has been marked as a duplicate of this bug. ***
*** Bug 150330 has been marked as a duplicate of this bug. ***
Note that in bug 150330, [https://www.etrade.com/] seems to actually hang Mozilla.
I've narrowed it down. These are the dates when Mac SSL slowed way down. All builds are using PSM 2.3. Mac OSX 4-29-branch - fast Mac OSX 4-30-branch - slow Mac OSX 4-19-trunk - fast Mac OSX 4-20-trunk - slow
Summary: SSL connections very slow → Mac-SSL connections very slow
According to this query: http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=MOZILLA_1_0_0_BRANCH&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=04%2F29%2F2002+08%3A00%3A00&maxdate=04%2F30%2F2002+12%3A00%3A00&cvsroot=%2Fcvsroot The only change to network code was Darin's patch for 93054, the pipelining bug. Darin, do you think your patch could have an unexpected sideeffect on Mac, even when pipelining is disabled? It might worth trying to back out this patch for testing.
i have a hard time believing that the pipelining patch could have caused this regression unless pipelining was somehow enabled. disabling the pref should revert behavior back to the pre-pipelining days. the code path with pipelining disabled is almost identical to the code path without the pipelining patch! pipelining is known to not work well with IIS/4.0 servers (see bug 144442). TD Waterhouse runs IIS/4.0, so i'm not surprised by comment #18. the latest trunk and branch builds include code to disable pipelining when speaking to an IIS/4.0 server. junruh: when you ran those performance tests did you have pipelining enabled?
I have disabled pipelining on Mozilla 1.0 release. Same problem. Server: Apache/1.3.23 (Unix) mod_gzip/1.3.19.1a PHP/4.1.2 mod_perl/1.26 mod_ssl/2.8.6 OpenSSL/0.9.6c The page I test downloaded (intranet page) is nothing more than MRTG/RRD graphs, one right after another. Still takes forver to download.
All testing was done with new a profile for every build, so pipelining was disabled. Circumstancial evidence points to the pipelining code. Code was checked into the trunk on 4/19. The trunk Mac SSL connections slowed the next day. Code was checked into the branch on 4/29. The branch Mac ssl connections slowed the next day.
hmm.. very odd that this would have something to do with the pipelining patch, but it certainly does seem like a strong connection. anyhow, i captured a HTTP log while loading https://pki.mcom.com/color/color.html, and unfortunately i didn't see anything obviously wrong. we use 3 connections to load the page, so we are definitely reusing the connections. pipelining is indeed disabled, and enabling it doesn't seem to do much to improve performance. so pipelining per se doesn't seem to be the problem... something else having to do with my patch must be the culprit. hmm...
i discovered one interesting thing... only the first half of the images on https://pki.mcom.com/color/color.html load slowly. the last half loads very quickly. upon inspection of the log file this seems to correspond to the point at which the initial page load completes. for some reason, the first 8k of the first page loads and then the last ~2k doesn't load until much much later. there's nothing in the HTTP log to indicate why. why is it taking so long to fetch the toplevel document? are there any SSL tools that might help explain this?
Did the pipelining patch add any form of synchronization, to prevent concurrent calls to PR_Read() or PR_Write() that could have been concurrent before? Another possibility is that we're getting hit by Nagle's algorithm.
subsequent page loads are also very fast. the problem seems to only show up on the first page load. hmm... a tcpdump shows the toplevel document not finishing until halfway through the entire page load.
> Did the pipelining patch add any form of synchronization, to prevent concurrent > calls to PR_Read() or PR_Write() that could have been concurrent before? no. in fact, nothing should have changed when pipelining is disabled. when pipelining is enabled we add intermediate data structures, and there is some extra buffering, but none of that should be involved here. the HTTP log confirms that pipelining is being skipped each and every time. > Another possibility is that we're getting hit by Nagle's algorithm. hmm... but why now? we seem to consume all of the data that is in the socket whenever possible. so, the server should be encouraged to send the next packet. unless for some reason we aren't ACK'ing their packets. but now we're talking about the TCP stack. argh!
after staring at the tcpdump for a while, i noticed that it looks like there is a 1 second gap between the client request and the server response for each image. the ACK for the request happens immediately. this seems slow considering how fast the page loads under linux (whole thing in less than a second). and the client ACKs the response immediately as well. so, i doubt Nagle's algorithm is involved here. i'll upload my tcpdump in case anyone else wants to take a look.
another thing about my tcpdump is that ic a lot of these: [bad tcp cksum cef8!] from the client side. that seems bad, but then maybe it's just a red herring.
notice, no bad tcp cksum warnings!
FWIW: mach-o builds under OSX seem fine. this bug only seems to be a problem for the CFM build.
It is not the case for me that only the first page is slow. Every single SSL page in the online banking area of Bankofamerica.com, as well as all of the SSL pages on Half.com are excruciatingly slow. It doesn't change noticably between pages -- they're all equally slow. Where do I get a mach-o build? Do I just need to build it myself? It doesn't look like they exist in the nightly or 1.0 release directories.
In ethereal, I'm seeing a pattern like this: 111 36.040759 <myhostname> onlinebanking-nw.bank TCP 56672 > https [ACK] Seq=<seq num> Ack=<ack num> 112 65.814919 onlinebanking-nw.bank <myhostname> SSLv3 Encrypted Alert That is, my machine sends an ACK to bankofamerica, then about 30 seconds elapse, then BofA sends an encrypted alert and immediately closes the connection. Right after that, my machine makes another connection out and the cycle starts over. It seems like Mozilla is waiting for the remote end to close the connection before it starts getting the next file. And that's taking about 30 seconds. Is it possible that Mozilla is sending "Connection: keep-alive" with that request and then failing to re-use the connection? Am I way off track?
Oops. I also meant to mention that it looks like Mozilla isn't displaying the last bit of the file until that 30 second timeout happens and the connection is closed. One of the images partially displayed, and then after the timeout, the rest of it displayed. I can see from the packet dump that no more data was sent 30 seconds later, and I can see from the browser display that the whole image is there. So it seems that the whole image was sent right away, then the 30 second timeout happens, the connection closes and only then does mozilla stop spinning and display the rest of the image.
Also, I've confirmed that the server in question (onlinebanking-nw.bankofamerica.com) does support keep-alive connections and does time them out after 30 seconds -- the same amount of delay I'm seeing for each connection in my packet dump. If this is the cause, then surely the reason we're only seeing this on some sites is that not all servers allow persistent connections in the first place.
Does disabling keep-alive result in any improvement?
Yes, in fact that completely fixes the problem for me. I had forgotten that that was a settable preference.
I'll just add that disabling persistent connections fixed all of my slow SSL problems as well, on 1.0 release.
when i said that the initial page load was slow and the subsequent page load was faster, i meant that the subsequent load of the same page. for example, reload for me is fast, whereas initial load is very very slow.
ok, so i backed out my pipelining patch and suddenly everything loads quickly again... so, somehow that patch is truly the culprit here. hopefully i'll be able to figure out the connection... -> me
Assignee: javi → darin
Component: Client Library → Networking: HTTP
Product: PSM → Browser
QA Contact: junruh → tever
Target Milestone: --- → mozilla1.0.1
Version: 2.3 → other
Attached patch v1 patchSplinter Review
so i reviewed my checkin and noticed that i had accidentally removed a return rv; from nsHttpTransaction.cpp ... i put it back, and that fixed this bug!!! it actually seems like the missing return rv; should have screwed up other platforms as well. anyhow, this is a really simple fix ... i'm kicking myself for letting this bug slip into the tree :(
look here to see how the error was introduced into the source code: http://makeashorterlink.com/?F2A462C01 (differences between nsHttpTransaction.cpp v1.51 & v1.52) just scroll to the very bottom of the page and look for the call to HandleContent.
Comment on attachment 87437 [details] [diff] [review] v1 patch sr=beard
Attachment #87437 - Flags: superreview+
Comment on attachment 87437 [details] [diff] [review] v1 patch r=sdagley
Attachment #87437 - Flags: review+
fixed-on-trunk
Status: NEW → RESOLVED
Closed: 23 years ago
Keywords: mozilla1.0.1
Resolution: --- → FIXED
*** Bug 150271 has been marked as a duplicate of this bug. ***
*** Bug 146950 has been marked as a duplicate of this bug. ***
Verified on the 6/13 trunk. Mac OSX 10.1.5 and 9.2.2. This page, with 100 gifs https://pki.mcom.com/color/color.html downloads in less than a second. Client auth on Mac OSX is still slow, though. Please check this into the branch.
Status: RESOLVED → VERIFIED
Keywords: adt1.0.1
Whiteboard: [adt1 RTM]
*** Bug 111420 has been marked as a duplicate of this bug. ***
Keywords: topembed+
Depends on: 146950
please land on the 1.0.1 braqnch. once there, remove the "mozilla1.0.1+" keyword and add the "fixed1.0.1" keyword.
adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch. pls check this into the 1.0 branch, then add the "fixed1.0.1" keyword.
Keywords: adt1.0.1adt1.0.1+
fixed1.0.1
Verified on 6/14 branch.
Keywords: fixed1.0.1
Comment on attachment 87437 [details] [diff] [review] v1 patch Please land this on the 1.0.1 branch. Once there, replace the "mozilla1.0.1+" keyword with the "fixed1.0.1" keyword.
Attachment #87437 - Flags: approval+
Comment on attachment 87437 [details] [diff] [review] v1 patch Never mind. What ever happened to marking the patch as approved? :-)
*** Bug 152917 has been marked as a duplicate of this bug. ***
I confirm that client auth is still exceedingly slow, compare https://www.hilander.com with https://www.hilander.com/ssltest (the latter is the same page as the former, except it requires client authentication). It takes about 45 seconds for the software security module password prompt to appear. Note that this is with pipelining turned off (which did fix the problems I was having with non client-side auth pages).
ahp+mozilla@hilander.com : What do you mean with slow ? You use an old build without this fix ! (mozilla1.0) look in the keywords field : fixed1.0.1 it was checked into the tree at 06/14 (comment #61) and your build ID is 20020611..
client auth slowness is bug 139802.
I'll throw my $0.02 late into the game. Using 2002061908, I spent about 30 minutes going through HTTPS connections. Fast. Way fast now. Suprisingly better than IE 5.1 on OSX. Good job.
*** Bug 153109 has been marked as a duplicate of this bug. ***
*** Bug 147306 has been marked as a duplicate of this bug. ***
*** Bug 156358 has been marked as a duplicate of this bug. ***
*** Bug 158131 has been marked as a duplicate of this bug. ***
*** Bug 153780 has been marked as a duplicate of this bug. ***
*** Bug 160111 has been marked as a duplicate of this bug. ***
removing fixed1.0.1 keyword
Keywords: fixed1.0.1
No longer blocks: 150046
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: