Closed Bug 147979 Opened 22 years ago Closed 22 years ago

Mac-SSL connections very slow

Categories

(Core :: Networking: HTTP, defect, P2)

PowerPC
macOS
defect

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: johne, Assigned: darin.moz)

References

()

Details

(Keywords: regression, topembed+, Whiteboard: [adt1 RTM])

Attachments

(4 files)

All https:// connections I've used on MacOS X have been painfully slow.  It can
take in excess of 30 seconds to display pages.  Works just fine from IE 5.1.

MacOS X 10.1.4
Build ID: 2002052305 (rc3)

The mac is a PowerBook G4 800MHz, just a few days old so there shouldn't be any
cruft getting in the way.
do you have HTTP pipelining enabled?  if so, does disabling it improve performance?
Pipelining was not enabled.  I have enabled it, totally quit out of Mozilla, and
restarted.  There is no perceptual difference, things are still quite slow for
https: connections.

Doing things like banking sessions are still increadibly slow
(http://www.commonwealthcu.org/ , will automatically switch to https: connection).

Any other debugging you want me to do?
John, try using a new Mozilla user profile. Let us know if that behaves any
differently.
-> PSM
Assignee: darin → ssaux
Component: Networking: HTTP → Client Library
Product: Browser → PSM
QA Contact: tever → junruh
Version: other → 2.0
I couldn't immediatly find a way in preferences to create a new user profile.  I
faked it by creating a whole new user account on the mac and trying mozilla from
there.

Same slowness.

While viewing http://www.commonwealthcu.org/ (switches to https)

Under the new account, the normal "You're switching to an encrypted page" dialog
came up.  However, it came up 'slowly' as well, after about 5 seconds and half
the page was displayed.  Don't know if this means anything.

While watching `top`, mozilla is not CPU bound (1-4%).

Sorry, I've only had the mac for a week.  I don't know how to do the normal
profiling and what not.  :(
*** Bug 143483 has been marked as a duplicate of this bug. ***
See also bug 146950.
Confirming on 6/3 Mac OSX trunk build.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
*** Bug 148854 has been marked as a duplicate of this bug. ***
Keywords: nsbeta1
Version: 2.0 → 2.3
javi, can you take a look at the TCP trace? Any idea what could be the matter on
Mac?
Assignee: ssaux → javi
*** Bug 148243 has been marked as a duplicate of this bug. ***
*** Bug 148311 has been marked as a duplicate of this bug. ***
I looked through the first two SSL connections in the TCP trace.

The first connection was a full handshake:

0.135s between the server's SYN ACK and the client's first handshake record
0.048s for the server to reply with its cert
2.985s for the client to respond to the server's cert
0.085s for the server to respond
0.006s for the client to respond
0.049s for the server to send its first data record

The second connection was a resume:

0.052s between the server's SYN ACK and the client's first handshake record
0.032s for the server to reply (no cert in record)
0.420s for the client to respond
0.047s for the server to send its first data record

I timed a connection with my watch.

Entering "www.commonwealthcu.org" in the address bar and hit return.

40 seconds for page to display.
70 seconds for the Mozilla logo to stop spinning.

The bottom of the page, in the status bar, says

Document: Done (65.798 secs)
Keywords: regression
*** Bug 115072 has been marked as a duplicate of this bug. ***
i've actually experienced a worsening of this problem since 0.9.9. i just got
1.0 (Build ID: 2002052918) and turned on pipelining, and the problem is the
worst i've ever seen it. it was still pretty bad before i turned on pipelining,
though.

i'm using Mac OS 10.1.5 on a 400MHz G4.
ok, i just did a comparison with pipelining on and pipelining off to test SSL
slowness.

try

https://webbanking.waterhousebank.com/

with pipelining and then without.

i guess this could be a server problem rather than a browser problem? it seems
like there might be too many caveats for pipelining to make it a truly primetime
feature. is it worth filing another bug?
*** Bug 148295 has been marked as a duplicate of this bug. ***
It should be noted that this is a very bad problem on SOME SSL sites, but not
all. At least that's how it is for me. Bankofamerica.com and Half.com are
unusably slow for me in Mozilla 1.0 on OS X 10.1.5, but Amazon.com's SSL pages
seem just as fast as non-SSL pages.

A bug that's bad enough to force me to fire up IE at least once a day is a very
annoying bug...
More clues. The 6.2.3 RTM build (May 9) which had PSM 2.1, is very fast. The 
same day's trunk build, with PSM 2.3, downloads SSL pages very slowly. At this 
in-house test page, it downloads the gifs one at a time, instead of all at once. 
https://pki.mcom.com/color/color.html.
Times for the above URL
6.2.3 - .775 seconds page load time
5/9/02 trunk - 35.52 seconds
Blocks: 150046
*** Bug 147908 has been marked as a duplicate of this bug. ***
*** Bug 150330 has been marked as a duplicate of this bug. ***
Note that in bug 150330, [https://www.etrade.com/] seems to actually hang Mozilla.
I've narrowed it down. These are the dates when Mac SSL slowed way down. All 
builds are using PSM 2.3.
Mac OSX 4-29-branch - fast
Mac OSX 4-30-branch - slow

Mac OSX 4-19-trunk - fast
Mac OSX 4-20-trunk - slow
Summary: SSL connections very slow → Mac-SSL connections very slow
According to this query:
 
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=MOZILLA_1_0_0_BRANCH&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=04%2F29%2F2002+08%3A00%3A00&maxdate=04%2F30%2F2002+12%3A00%3A00&cvsroot=%2Fcvsroot

The only change to network code was Darin's patch for 93054, the pipelining bug.
Darin, do you think your patch could have an unexpected sideeffect on Mac, even
when pipelining is disabled?

It might worth trying to back out this patch for testing.
i have a hard time believing that the pipelining patch could have caused this
regression unless pipelining was somehow enabled.  disabling the pref should
revert behavior back to the pre-pipelining days.  the code path with pipelining
disabled is almost identical to the code path without the pipelining patch!

pipelining is known to not work well with IIS/4.0 servers (see bug 144442).  TD
Waterhouse runs IIS/4.0, so i'm not surprised by comment #18.  the latest trunk
and branch builds include code to disable pipelining when speaking to an IIS/4.0
server.

junruh: when you ran those performance tests did you have pipelining enabled?
I have disabled pipelining on Mozilla 1.0 release.  Same problem.

Server: Apache/1.3.23 (Unix) mod_gzip/1.3.19.1a PHP/4.1.2 mod_perl/1.26
mod_ssl/2.8.6 OpenSSL/0.9.6c

The page I test downloaded (intranet page) is nothing more than MRTG/RRD graphs,
one right after another.  Still takes forver to download.
All testing was done with new a profile for every build, so pipelining was 
disabled. Circumstancial evidence points to the pipelining code. Code was 
checked into the trunk on 4/19. The trunk Mac SSL connections slowed the next 
day. Code was checked into the branch on 4/29. The branch Mac ssl connections 
slowed the next day.
hmm.. very odd that this would have something to do with the pipelining patch,
but it certainly does seem like a strong connection.  anyhow, i captured a HTTP
log while loading https://pki.mcom.com/color/color.html, and unfortunately i
didn't see anything obviously wrong.  we use 3 connections to load the page, so
we are definitely reusing the connections.  pipelining is indeed disabled, and
enabling it doesn't seem to do much to improve performance.  so pipelining per
se doesn't seem to be the problem... something else having to do with my patch
must be the culprit.  hmm...
i discovered one interesting thing... only the first half of the images on
https://pki.mcom.com/color/color.html load slowly.  the last half loads very
quickly.  upon inspection of the log file this seems to correspond to the point
at which the initial page load completes.  for some reason, the first 8k of the
first page loads and then the last ~2k doesn't load until much much later. 
there's nothing in the HTTP log to indicate why.

why is it taking so long to fetch the toplevel document?  are there any SSL
tools that might help explain this?
Did the pipelining patch add any form of synchronization, to prevent concurrent
calls to PR_Read() or PR_Write() that could have been concurrent before?

Another possibility is that we're getting hit by Nagle's algorithm.
subsequent page loads are also very fast.  the problem seems to only show up on
the first page load.  hmm...

a tcpdump shows the toplevel document not finishing until halfway through the
entire page load.
> Did the pipelining patch add any form of synchronization, to prevent concurrent
> calls to PR_Read() or PR_Write() that could have been concurrent before?

no.  in fact, nothing should have changed when pipelining is disabled.  when
pipelining is enabled we add intermediate data structures, and there is some
extra buffering, but none of that should be involved here.  the HTTP log
confirms that pipelining is being skipped each and every time.

> Another possibility is that we're getting hit by Nagle's algorithm.

hmm... but why now?  we seem to consume all of the data that is in the socket
whenever possible.  so, the server should be encouraged to send the next packet.
 unless for some reason we aren't ACK'ing their packets.  but now we're talking
about the TCP stack.  argh!
after staring at the tcpdump for a while, i noticed that it looks like there is
a 1 second gap between the client request and the server response for each
image.  the ACK for the request happens immediately.  this seems slow
considering how fast the page loads under linux (whole thing in less than a
second).  and the client ACKs the response immediately as well.  so, i doubt
Nagle's algorithm is involved here.

i'll upload my tcpdump in case anyone else wants to take a look.
another thing about my tcpdump is that ic a lot of these:

  [bad tcp cksum cef8!]

from the client side.  that seems bad, but then maybe it's just a red herring.
notice, no bad tcp cksum warnings!
FWIW: mach-o builds under OSX seem fine.  this bug only seems to be a problem
for the CFM build.
It is not the case for me that only the first page is slow. Every single SSL
page in the online banking area of Bankofamerica.com, as well as all of the SSL
pages on Half.com are excruciatingly slow. It doesn't change noticably between
pages -- they're all equally slow.

Where do I get a mach-o build? Do I just need to build it myself? It doesn't
look like they exist in the nightly or 1.0 release directories.
In ethereal, I'm seeing a pattern like this:

111 36.040759 <myhostname>          onlinebanking-nw.bank TCP    56672 > https
[ACK] Seq=<seq num> Ack=<ack num>
112 65.814919 onlinebanking-nw.bank <myhostname>          SSLv3  Encrypted Alert


That is, my machine sends an ACK to bankofamerica, then about 30 seconds elapse,
then BofA sends an encrypted alert and immediately closes the connection. Right
after that, my machine makes another connection out and the cycle starts over. 

It seems like Mozilla is waiting for the remote end to close the connection
before it starts getting the next file. And that's taking about 30 seconds. Is
it possible that Mozilla is sending "Connection: keep-alive" with that request
and then failing to re-use the connection? Am I way off track?
Oops. I also meant to mention that it looks like Mozilla isn't displaying the
last bit of the file until that 30 second timeout happens and the connection is
closed. One of the images partially displayed, and then after the timeout, the
rest of it displayed. I can see from the packet dump that no more data was sent
30 seconds later, and I can see from the browser display that the whole image is
there. So it seems that the whole image was sent right away, then the 30 second
timeout happens, the connection closes and only then does mozilla stop spinning
and display the rest of the image.
Also, I've confirmed that the server in question
(onlinebanking-nw.bankofamerica.com) does support keep-alive connections and
does time them out after 30 seconds -- the same amount of delay I'm seeing for
each connection in my packet dump.

If this is the cause, then surely the reason we're only seeing this on some
sites is that not all servers allow persistent connections in the first place.
Does disabling keep-alive result in any improvement?
Yes, in fact that completely fixes the problem for me.

I had forgotten that that was a settable preference.
I'll just add that disabling persistent connections fixed all of my slow SSL
problems as well, on 1.0 release.
when i said that the initial page load was slow and the subsequent page load was
faster, i meant that the subsequent load of the same page.  for example, reload
for me is fast, whereas initial load is very very slow.
ok, so i backed out my pipelining patch and suddenly everything loads quickly
again... so, somehow that patch is truly the culprit here.  hopefully i'll be
able to figure out the connection...

-> me
Assignee: javi → darin
Component: Client Library → Networking: HTTP
Product: PSM → Browser
QA Contact: junruh → tever
Target Milestone: --- → mozilla1.0.1
Version: 2.3 → other
Attached patch v1 patchSplinter Review
so i reviewed my checkin and noticed that i had accidentally removed a return
rv; from nsHttpTransaction.cpp ... i put it back, and that fixed this bug!!!

it actually seems like the missing return rv; should have screwed up other
platforms as well.  anyhow, this is a really simple fix ... i'm kicking myself
for letting this bug slip into the tree :(
look here to see how the error was introduced into the source code:

http://makeashorterlink.com/?F2A462C01

(differences between nsHttpTransaction.cpp v1.51 & v1.52)

just scroll to the very bottom of the page and look for the call to HandleContent.
Comment on attachment 87437 [details] [diff] [review]
v1 patch

sr=beard
Attachment #87437 - Flags: superreview+
Comment on attachment 87437 [details] [diff] [review]
v1 patch

r=sdagley
Attachment #87437 - Flags: review+
fixed-on-trunk
Status: NEW → RESOLVED
Closed: 22 years ago
Keywords: mozilla1.0.1
Resolution: --- → FIXED
*** Bug 150271 has been marked as a duplicate of this bug. ***
*** Bug 146950 has been marked as a duplicate of this bug. ***
Verified on the 6/13 trunk. Mac OSX 10.1.5 and 9.2.2. This page, with 100 gifs 
https://pki.mcom.com/color/color.html downloads in less than a second. Client 
auth on Mac OSX is still slow, though. Please check this into the branch.
Status: RESOLVED → VERIFIED
Keywords: adt1.0.1
Whiteboard: [adt1 RTM]
*** Bug 111420 has been marked as a duplicate of this bug. ***
Keywords: topembed+
Depends on: 146950
please land on the 1.0.1 braqnch. once there, remove the "mozilla1.0.1+" keyword
and add the "fixed1.0.1" keyword.
adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch. pls check
this into the 1.0 branch, then add the "fixed1.0.1" keyword.
Keywords: adt1.0.1adt1.0.1+
fixed1.0.1
Verified on 6/14 branch.
Keywords: fixed1.0.1
Comment on attachment 87437 [details] [diff] [review]
v1 patch

Please land this on the 1.0.1 branch.  Once there, replace the
"mozilla1.0.1+" keyword with the "fixed1.0.1" keyword.
Attachment #87437 - Flags: approval+
Comment on attachment 87437 [details] [diff] [review]
v1 patch

Never mind.  What ever happened to marking the patch as approved? :-)
*** Bug 152917 has been marked as a duplicate of this bug. ***
I confirm that client auth is still exceedingly slow, compare
https://www.hilander.com with https://www.hilander.com/ssltest (the latter is
the same page as the former, except it requires client authentication).  It
takes about 45 seconds for the software security module password prompt to
appear.  Note that this is with pipelining turned off (which did fix the
problems I was having with non client-side auth pages).
ahp+mozilla@hilander.com : 
What do you mean with slow ?
You use an old build without this fix ! (mozilla1.0)
look in the keywords field : fixed1.0.1
it was checked into the tree at 06/14 (comment #61) and your build ID is 20020611..
client auth slowness is bug 139802.
I'll throw my $0.02 late into the game.

Using 2002061908, I spent about 30 minutes going through HTTPS connections. 
Fast.  Way fast now.  Suprisingly better than IE 5.1 on OSX.  Good job.
*** Bug 153109 has been marked as a duplicate of this bug. ***
*** Bug 147306 has been marked as a duplicate of this bug. ***
*** Bug 156358 has been marked as a duplicate of this bug. ***
*** Bug 158131 has been marked as a duplicate of this bug. ***
*** Bug 153780 has been marked as a duplicate of this bug. ***
*** Bug 160111 has been marked as a duplicate of this bug. ***
removing fixed1.0.1 keyword
Keywords: fixed1.0.1
No longer blocks: 150046
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: