559676 - nsSocketTransportService::DoPollIteration hogging the CPU while uploading a large file to an SSL web server

Reporter

Description

•

15 years ago

Attached file Shark profile — Details

I tried uploading a large file to an SSL server (the file was a sql text file being uploaded to a phpMyAdmin installation on an SSL web server), and I noticed that Firefox is hogging my CPU at 100%. The browser's UI was responsive, and when I tried to load some other pages in other tabs, the network loads for pages like google.com were extremely slow (like about 2-3 minutes), and network loads for other ssl pages (like AMO) were basically not finishing and timing out after some time. I created a shark profile which I'm attaching to this bug. The profile shows 94.9% of the time being spent in nsSocketTransportService::DoPollIteration. As soon as I canceled the page load on the original tab, everything went back to normal.

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Attachment #439367 - Attachment mime type: application/octet-stream → text/plain

Boris Zbarsky [:bzbarsky]

Comment 1

•

15 years ago

Ehsan, can you do an NSPR log with nsSocketTransport:5? Are we ending up in the busy-wait branch under nsSocketTransportService::Poll? It looks like the actual time in the profile is spent locking and unlocking, right?

WADA:World Anti-bad-Duping Agency

Comment 2

•

15 years ago

(In reply to comment #1) > It looks like the actual time in the profile is spent locking and unlocking, right? While I'm looking some ssl modules in order to try to analyze bug 538283, I met next code. It looks that set of ssl_ReleaseXmitBufLock , PR_Sleep(PR_INTERVAL_NO_WAIT), and ssl_GetXmitBufLock(ss) is executed for each data of MAX_FRAGMENT_LENGTH. Can it produce problem like bug 538283? > http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/ssl/ssl3con.c#2313 > 2313 ssl3_SendApplicationData(sslSocket *ss, const unsigned char *in, > 2314 PRInt32 len, PRInt32 flags) >(snip) > 2342 while (len > totalSent) { > 2343 PRInt32 sent, toSend; > 2344 > 2345 if (totalSent > 0) { > 2346 /* > 2347 * The thread yield is intended to give the reader thread a > 2348 * chance to get some cycles while the writer thread is in > 2349 * the middle of a large application data write. (See > 2350 * Bugzilla bug 127740, comment #1.) > 2351 */ > 2352 ssl_ReleaseXmitBufLock(ss); > 2353 PR_Sleep(PR_INTERVAL_NO_WAIT); /* PR_Yield(); */ > 2354 ssl_GetXmitBufLock(ss); > 2355 } > 2356 toSend = PR_MIN(len - totalSent, MAX_FRAGMENT_LENGTH); > 2357 sent = ssl3_SendRecord(ss, content_application_data, > 2358 in + totalSent, toSend, flags); >(snip) > 2373 }

(no longer active)

Reporter

Comment 3

•

15 years ago

Attached file nspr log — Details

(In reply to comment #1) > Ehsan, can you do an NSPR log with nsSocketTransport:5? I'm attaching a log that I produced here. Unfortunately I couldn't reproduce the exact problem, but when running the test again, Firefox was using ~50% of the CPU, so it's possible that the 100% CPU usage that I saw before was a combination of this and other stuff going on in other tabs. > Are we ending up in > the busy-wait branch under nsSocketTransportService::Poll? I'm not sure how I can tell that using this log. > It looks like the > actual time in the profile is spent locking and unlocking, right? Yes, that's what seems to be happening as far as I can tell. Comment 2 also seems relevant in this regard, but I don't know that code so I'm not sure.

Boris Zbarsky [:bzbarsky]

Comment 4

•

15 years ago

> I'm not sure how I can tell that using this log. The busy wait branch uses a 25ms timeout, so you could look for "timeout = 25" in the log. Doesn't look like we are.

Boris Zbarsky [:bzbarsky]

Comment 5

•

15 years ago

Yeah, comment 2 seems right on the money. There's got to be a better way to do this! Especially given that per comment 0 this "thread yield" is not succeeding at giving new handshakes a chance to run anyway.

Assignee: nobody → nobody

Component: Networking → Libraries

Product: Core → NSS

QA Contact: networking → libraries

Version: Trunk → unspecified

Boris Zbarsky [:bzbarsky]

Comment 6

•

15 years ago

If there's something the consumer can do to avoid this, by the way, let me know. It seems like if we used a synchronization primitive that tells us whether anyone else is waiting on it we could optimize most of this away in cases when no one else is, right?

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Version: unspecified → 3.12.6

WADA:World Anti-bad-Duping Agency

Comment 7

•

15 years ago

(In reply to comment #5) > Yeah, comment 2 seems right on the money. Boris, Can my comment #2 be a payback for your lecture to me about SGML's entity? :-) Can my comment #2 be apology to you for my noises in some other bugs? :-) (In reply to comment #6) > It seems like if we used a synchronization primitive that tells us whether anyone else is waiting on it we could optimize most of this away in cases when no one else is, right? Some ways; (i) Queue for lock/resource waiting among multiple processes/tasks relevant to some resources. If waitor of something exists in queue, communicate each other, and release lock for other lock requestor if required. (ii) Periodical lock/resource contention check to avoid dead lock or forever-wait, and notification to lock/resoure owner. In this case, (ii) type way is better than (i) because I think simulteneoues read access to same server&port of SSL with new handshaking while uploading is exceptional situation. Another way is probably use of new Socket/new SSL socket if someone already uses SSL connection with same server for different purpose(send or read). Even though SSL has capability of two way communication, I believe SSL doesn't prohibit to use a connection for send only and to use another connection for receive only. (Half is Off-Topic, half is question to architect/developer of Mozilla familiy) As written in the source, reason why current code is used is stated in bug 27740 #1 by Nelson Bolyard (:MisterSSL) in 2002. I was very surprized because such code, which was ordinal during co-operative multi-tasking era such as System 7/Mac OS 8/9, still remains in Mozilla familiy who should be inovative software of muti-processinng/multi-tasking on multi-CPU. I thought this kind of code is used only in deep level of OS code for specific purpose nowadays.

Boris Zbarsky [:bzbarsky]

Comment 8

•

15 years ago

> Can my comment #2 be apology to you for my noises in some other bugs? :-) Yep. ;)

Nelson Bolyard (seldom reads bugmail)

Comment 9

•

15 years ago

Guys, I think you misunderstand the code you see in comment 2. It is NOT a "busy wait". It actually SENDS a record each time through the loop. It merely also looks to see if there is other work to be done between each record sent. If you think this is a "busy wait" causing a 100% PCU loop when no data is being sent, you're barking up the wrong tree.

Boris Zbarsky [:bzbarsky]

Comment 10

•

15 years ago

We don't think that. We do think that this code is managing to saturate a modern processor with the locking/unlocking it does during the send, though. Which is pretty impressive.

WADA:World Anti-bad-Duping Agency

Updated

•

15 years ago

Blocks: 538283

Christian :Biesinger (don't email me, ping me on IRC)

Updated

•

15 years ago

Summary: nsSocketTRansportServuce::DoPollIteration hogging the CPU while uploading a large file to an SSL web server → nsSocketTransportService::DoPollIteration hogging the CPU while uploading a large file to an SSL web server

Nelson Bolyard (seldom reads bugmail)

Comment 11

•

15 years ago

You're sure it's the locking and unlocking, and not the hashing and encrypting that's using up the CPU? Recall that the reason the unlocking and locking was put there was to mitigate hogging the CPU that occurred before it was added. My guess is that removing the yield altogether will result in worse hogging. We could yield less often, say, once every 10 times through the loop, and we could yield for more time (than zero) when we do. Let me suggest you play with it and try some different things.

Boris Zbarsky [:bzbarsky]

Comment 12

•

15 years ago

> You're sure it's the locking and unlocking, and not the hashing and encrypting > that's using up the CPU? You _did_ read the profile information attached in comment 0, right? It's all lock access, with no hashing/encrypting in sight... On the other hand, if the hashing/encrypting is done on a separate thread, then it might not show up in that export depending on the profiling options used. Ehsan?

(no longer active)

Reporter

Comment 13

•

15 years ago

(In reply to comment #12) > On the other hand, if the hashing/encrypting is done on a separate thread, then > it might not show up in that export depending on the profiling options used. > Ehsan? I used the default shark profiling options. The profile seems to be captured from a single thread though (not the main thread, FWIW). Is there anything else that I should have done?

WADA:World Anti-bad-Duping Agency

Comment 14

•

15 years ago

Attached file Win's Performance Counter log of Thunderbird threads for SMTP sending — Details

Performance Counter log for next(interval=1 sec): - Processor utilization (Core2 Duo. So it should be doubled upon evaluation) - TCP Segments Receive, Send - Thunderbird's total CPU utilization - Thunderbird's CPU utilization of each Thread (Test) (0) No TCP connection by Thunderbird occurs except with a SMTP by (1) or (2) (1) Send a mail(1.25MB) via Gmil's SMTP(SSL,port=465) (2) Send same mail data via other SMTP(no StartTLS, no SSL/TLS, port=587) (Quick observation) (0) TCP\Connections Established=18 : No Tb/SMTP connection (SeaMonkey uses) TCP\Connections Established=19 : Tb/SMTP connection is added TCP\Segments Sent/sec is very large = Data is sent to SMTP server (1) SSL CPU is consumed by Thread(thunderbird/3) and Thread(thunderbird/8) Tb Total : 20% to 30% thunderbird/3 : 4% to 18%, Priority Currnet=12, sometimes 8/9/10 thunderbird/8 : 3% to 10%, Priority Currnet=11 TCP\Segments Sent/sec is 2000 to 4000 : from 05/12/2010 12:37:42.581 to 05/12/2010 12:38:05.581 (23 sec) TCP\Segments Received/sec during sending is 2000 to 4000 (data passed via loopback is counted?) (2) no SSL, no TLS, no StartTLS CPU is consumed by Thread(thunderbird/3) only Tb Total : 10% to 43% thunderbird/3 : 1% to 14% thunderbird/8 : ZERO TCP\Segments Sent/sec is around 1400 : from 05/12/2010 12:39:03.581 to 05/12/2010 12:39:10.581 (7 sec) TCP\Segments Received/sec during sending is around 400 => Thread(thunderbird/3) : Thread for main window Thread(thunderbird/8) : Thread for SSL processing Note: Tb uses a set of loopback conection when SSL is used. It's probably for communication between thunderbird/3 and thunderbird/8. (Over all observaton) If SSL is used, "rougly 10% higher CPU utilization" continues "roughly 3 times longer". Next may be a reason of non-responsiveness. > thunderbird/3 : 4% to 18%, Priority Currnet=12, sometimes lowered to 8/9/10 > thunderbird/8 : 3% to 10%, Priority Currnet=11 Because total CPU power required to send data using SSL is never reduced by adding unlock/sleep/lock, adding unlock/sleep/lock merely increases overheads. It's merely decreases displayed value of CPU utilization by reducing CPU power consumption speed. I think there is no need to bypass primitive OS's scheduling problme by application's frequent sleep request any more, because OS we supports are alredy has minimum scheduler of "Multilevel Feedback Queue". > http://en.wikipedia.org/wiki/Scheduling_(computing)#Summary I think solution of CPU hog + nonresponsiveness due to large CPU power requirement in short period shouldn't be application's frequent sleep request. I think it should be clever thread separation and correct thread priority setting(within an application, among application under same OS), and efficient smaller task scheduling/dispatching mechanism in an application if OS doesn't have such capability. Scheduling problem or contention problem between "send task" and "receive task" looks different problem from original CPU hog issue, because problem sounds still existent even after insertion of unlock/sleep/lock.

Nelson Bolyard (seldom reads bugmail)

Comment 15

•

15 years ago

The function name ssl3_SendApplicationData doesn't appear anywhere in the shark output attached to comment 0. All the PR_Lock and PR_Unlock calls are coming from PSM, not NSS, with one exception, which is NOT in ssl3_SendApplicationData. The only NSS function seen anywhere in that output is ssl_Poll, which is called by PR_Poll, which is called by PSM. In short, the evidence does not support the conclusion that the code in ssl3_SendApplicationData is in any way relevant to whatever problem you're seeing.

WADA:World Anti-bad-Duping Agency

Comment 16

•

15 years ago

(In reply to comment #15) > In short, the evidence does not support the conclusion that the code in > ssl3_SendApplicationData is in any way relevant to whatever problem you're > seeing. I understand it well. Attached data is never evidence of whole of this bug and is never evidence of ssl3_SendApplicationData is culprit of this bug. However; 1. CPU 100% is involved in original comment #0, and CPU 50% with NSPR logging is reported in this bug. NSPR logging effectively reduces CPU power consumption rate by SSL, by producing "no activity period of SSL". 2. There really are bug reports of CPU 100% of Tb with SSL(Bug 538283, Bug 544054) and unresponsiveness of Thunderbird and other applications. These are phenomenon even when data is simply passed to SMTP server or IMAP server which uses SSL. And, "CPU utilization of 100%/2" and "doubled data upload time with NSPR logging enabled" was also reported in Bug 538283. 3. Atached log is for phenomenon of 2. Total CPU power requirement when SSL is used(CPU utilization while data sending * time to take send whole data with SSL) was far higher than my guess. 4. Needless to say, apparently, adding unlock/sleep/lock merely increases overhead from view point of total CPU power usage when SSL is used. It merely reduces value displayed as CPU utilization, although it can protect from unresponsiveness of other Mozilla components and/or other applications on same PC, by reducing CPU power consumption rate by SSL. 5. I think unlock/lock is expensive work especially when milti-processing, milti-tasking, in multi-CPU environment. Please note that co-operative multi-tasking on single-CPU is never real multi-tasking environment from view point of resource locking. It's almost same as resource locking in non-multi-tasking on single-CPU. And, in the case, resuorce locking can be simply storing flag at a common memory location. Is there no possibility that "sleep of longer time" is similar to "enable NSPR logging"? I think next are better to be analyzed again; (i) What is cause of original CPU 100%/CPU hog/unresponsiveness of Mozilla (and other applications) when you added unlock/sleep/lock. (ii) Why other "handshaking request for receive" is not scheduled as you say in comment of source code and/or in bug 127740, even though you added unlock/sleep/lock as solution of the problem.

Boris Zbarsky [:bzbarsky]

Comment 17

•

15 years ago

> The function name ssl3_SendApplicationData doesn't appear anywhere in the > shark output That can actually happen depending on what the compiler does in terms of inlining and exported symbols, but it may well be that some other set of locks is the issue here. Ehsan, since you have the actual non-flat shark profile, can you look at the per-line blame for the PSM functions involved and see whether they're taking their own locks or calling to nss code that takes locks?

(no longer active)

Reporter

Comment 18

•

15 years ago

(In reply to comment #17) > > The function name ssl3_SendApplicationData doesn't appear anywhere in the > > shark output > > That can actually happen depending on what the compiler does in terms of > inlining and exported symbols, but it may well be that some other set of locks > is the issue here. Ehsan, since you have the actual non-flat shark profile, > can you look at the per-line blame for the PSM functions involved and see > whether they're taking their own locks or calling to nss code that takes locks? Unfortunately, I didn't save the full shark profile. :( That said, I used this form as a test case, so maybe someone who knows where to look can use it to see if they can reproduce the problem: https://people.mozilla.com/~eakhgari/dummyform.html The uncompressed form of attachment 444418 [details] should be way big to test this bug with the above URL.

Nelson Bolyard (seldom reads bugmail)

Comment 19

•

15 years ago

The details in the shark report attached to comment 0 clearly show that 63.5% of the time is being spent BUSY WAITING using PR_Poll, and only 26.7% of the time is being spent productively working on sockets. The code path about which comment 2 complains is in the productive part. I suggest that you focus on the busy wait. The base of the busy wait loop starts here: http://mxr.mozilla.org/mozilla1.9.2/source/netwerk/base/src/nsSocketTransportService2.cpp#524 See lines 538-539. Now, it's possible that you may eventually end up back in NSS code, such as in function ssl_Poll https://mxr.mozilla.org/security/source/security/nss/lib/ssl/sslsock.c#1726 because that code IS getting called within the busy wait loop, but if you do, at least you'll be looking at code that is potentially relevant to the problem.

WADA:World Anti-bad-Duping Agency

Comment 20

•

15 years ago

(In reply to comment #19) > 63.5% of the time is being spent BUSY WAITING using PR_Poll, and only > 26.7% of the time is being spent productively working on sockets. > The code path about which comment 2 complains is in the productive part. > I suggest that you focus on the busy wait. What I could observe in comment #14 looks "26.7% by poroductive work" part only. How can we know the "63.5% of BUSY WAITING" is involved in CPU 100% problem of bug 538283 with mail server(SMTP and IMAP)? Is there logging feature(such as NSPR logging) for SSL which we Thunderbird user can use? Can we force condition of "receive task is waiting for someting to handshake" while uploading large mail data to IMAP server or SMTP server?

Boris Zbarsky [:bzbarsky]

Comment 21

•

15 years ago

> https://people.mozilla.com/~eakhgari/dummyform.html I just tried that. My CPU usage for the Firefox process stayed at 10% during the upload.... Now maybe it was just gating on my slow (768Kb/s) upload bandwidth? My profile of that, in any case, shows a bunch of time painting, about 4% actually encrypting, not much locking at all. Moving to Core:General until we have more data.

Assignee: nobody → nobody

Component: Libraries → General

Product: NSS → Core

QA Contact: libraries → general

Version: 3.12.6 → Trunk

Damon Sicore (:damons)

Comment 22

•

15 years ago

Based on Comment 21 above, it looks like we need more info before we can decide to block on this. Clearing nom for now, but please re-nom when we have data that supports blocking.

blocking2.0: ? → ---

WADA:World Anti-bad-Duping Agency

Comment 23

•

15 years ago

CPU 100% due to retransmission of packet like bug 541367? In that case, timeout or error due to max retry count occurred. Cause was found router's bug. See bug 541367 comment #17. Similar phenomenon is reported to bug 566548, and analysis of problem is in progress. If retransmission is successful, session error doesn't occur. But if retransmission occurs frequently, CPU 100% due to retransmission can happen.

WADA:World Anti-bad-Duping Agency

Comment 24

•

15 years ago

Ehsan Akhgari [:ehsan](bug opner), can you check with network.tcp.sendbuffer = 32768, 65536, 262144 etc.?(default is 131072, See bug 566548, please) By the way, when you get nspr log next time, add timestamp to log, please. > nspr_log_modules=timestamp,xxx:5

(no longer active)

Reporter

Comment 25

•

15 years ago

Like I said in comment 3, I wasn't able to reproduce this problem after the initial report... Maybe our QA folks can help in reproducing this problem?

Keywords: qawanted

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Updated

•

13 years ago

No longer blocks: 538283

Tetsuro Kato (tete)

Comment 26

•

10 years ago

I recreated this on Firefox 32.0 candidates build1 (win32) when I was uploading a file to VirusTotal. This did not occur on Firefox 31.0.

Wayne Mery (:wsmwk)

Updated

•

10 years ago

Comment 27

•

9 years ago

if the regressor of 1083930 is backed out this is worksforme.. given that this bug is older than that regression I think WFM is the right disposition. We'll fix the current problem in 1083930

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WORKSFORME

Shark profile 15 years ago (no longer active) 25.95 KB, text/plain		Details
nspr log 15 years ago (no longer active) 174.45 KB, application/x-bzip2		Details
Win's Performance Counter log of Thunderbird threads for SMTP sending 15 years ago WADA:World Anti-bad-Duping Agency 112.19 KB, text/plain		Details