Closed
Bug 559676
Opened 15 years ago
Closed 9 years ago
nsSocketTransportService::DoPollIteration hogging the CPU while uploading a large file to an SSL web server
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: ehsan.akhgari, Unassigned)
References
Details
(Keywords: perf, qawanted)
Attachments
(3 files)
I tried uploading a large file to an SSL server (the file was a sql text file being uploaded to a phpMyAdmin installation on an SSL web server), and I noticed that Firefox is hogging my CPU at 100%. The browser's UI was responsive, and when I tried to load some other pages in other tabs, the network loads for pages like google.com were extremely slow (like about 2-3 minutes), and network loads for other ssl pages (like AMO) were basically not finishing and timing out after some time.
I created a shark profile which I'm attaching to this bug. The profile shows 94.9% of the time being spent in nsSocketTransportService::DoPollIteration.
As soon as I canceled the page load on the original tab, everything went back to normal.
![]() |
||
Updated•15 years ago
|
Attachment #439367 -
Attachment mime type: application/octet-stream → text/plain
![]() |
||
Comment 1•15 years ago
|
||
Ehsan, can you do an NSPR log with nsSocketTransport:5? Are we ending up in the busy-wait branch under nsSocketTransportService::Poll? It looks like the actual time in the profile is spent locking and unlocking, right?
Comment 2•15 years ago
|
||
(In reply to comment #1)
> It looks like the actual time in the profile is spent locking and unlocking, right?
While I'm looking some ssl modules in order to try to analyze bug 538283, I met next code. It looks that set of ssl_ReleaseXmitBufLock
, PR_Sleep(PR_INTERVAL_NO_WAIT), and ssl_GetXmitBufLock(ss) is executed for each data of MAX_FRAGMENT_LENGTH.
Can it produce problem like bug 538283?
> http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/ssl/ssl3con.c#2313
> 2313 ssl3_SendApplicationData(sslSocket *ss, const unsigned char *in,
> 2314 PRInt32 len, PRInt32 flags)
>(snip)
> 2342 while (len > totalSent) {
> 2343 PRInt32 sent, toSend;
> 2344
> 2345 if (totalSent > 0) {
> 2346 /*
> 2347 * The thread yield is intended to give the reader thread a
> 2348 * chance to get some cycles while the writer thread is in
> 2349 * the middle of a large application data write. (See
> 2350 * Bugzilla bug 127740, comment #1.)
> 2351 */
> 2352 ssl_ReleaseXmitBufLock(ss);
> 2353 PR_Sleep(PR_INTERVAL_NO_WAIT); /* PR_Yield(); */
> 2354 ssl_GetXmitBufLock(ss);
> 2355 }
> 2356 toSend = PR_MIN(len - totalSent, MAX_FRAGMENT_LENGTH);
> 2357 sent = ssl3_SendRecord(ss, content_application_data,
> 2358 in + totalSent, toSend, flags);
>(snip)
> 2373 }
Reporter | ||
Comment 3•15 years ago
|
||
(In reply to comment #1)
> Ehsan, can you do an NSPR log with nsSocketTransport:5?
I'm attaching a log that I produced here. Unfortunately I couldn't reproduce the exact problem, but when running the test again, Firefox was using ~50% of the CPU, so it's possible that the 100% CPU usage that I saw before was a combination of this and other stuff going on in other tabs.
> Are we ending up in
> the busy-wait branch under nsSocketTransportService::Poll?
I'm not sure how I can tell that using this log.
> It looks like the
> actual time in the profile is spent locking and unlocking, right?
Yes, that's what seems to be happening as far as I can tell. Comment 2 also seems relevant in this regard, but I don't know that code so I'm not sure.
![]() |
||
Comment 4•15 years ago
|
||
> I'm not sure how I can tell that using this log.
The busy wait branch uses a 25ms timeout, so you could look for "timeout = 25" in the log. Doesn't look like we are.
![]() |
||
Comment 5•15 years ago
|
||
Yeah, comment 2 seems right on the money. There's got to be a better way to do this! Especially given that per comment 0 this "thread yield" is not succeeding at giving new handshakes a chance to run anyway.
Assignee: nobody → nobody
Component: Networking → Libraries
Product: Core → NSS
QA Contact: networking → libraries
Version: Trunk → unspecified
![]() |
||
Comment 6•15 years ago
|
||
If there's something the consumer can do to avoid this, by the way, let me know.
It seems like if we used a synchronization primitive that tells us whether anyone else is waiting on it we could optimize most of this away in cases when no one else is, right?
![]() |
||
Updated•15 years ago
|
Version: unspecified → 3.12.6
Comment 7•15 years ago
|
||
(In reply to comment #5)
> Yeah, comment 2 seems right on the money.
Boris,
Can my comment #2 be a payback for your lecture to me about SGML's entity? :-)
Can my comment #2 be apology to you for my noises in some other bugs? :-)
(In reply to comment #6)
> It seems like if we used a synchronization primitive that tells us whether
anyone else is waiting on it we could optimize most of this away in cases when
no one else is, right?
Some ways;
(i) Queue for lock/resource waiting among multiple processes/tasks relevant to some resources.
If waitor of something exists in queue, communicate each other, and release lock for other lock requestor if required.
(ii) Periodical lock/resource contention check to avoid dead lock or forever-wait, and notification to lock/resoure owner.
In this case, (ii) type way is better than (i) because I think simulteneoues read access to same server&port of SSL with new handshaking while uploading is exceptional situation.
Another way is probably use of new Socket/new SSL socket if someone already uses SSL connection with same server for different purpose(send or read). Even though SSL has capability of two way communication, I believe SSL doesn't prohibit to use a connection for send only and to use another connection for receive only.
(Half is Off-Topic, half is question to architect/developer of Mozilla familiy)
As written in the source, reason why current code is used is stated in bug 27740 #1 by Nelson Bolyard (:MisterSSL) in 2002.
I was very surprized because such code, which was ordinal during co-operative multi-tasking era such as System 7/Mac OS 8/9, still remains in Mozilla familiy who should be inovative software of muti-processinng/multi-tasking on multi-CPU. I thought this kind of code is used only in deep level of OS code for specific purpose nowadays.
![]() |
||
Comment 8•15 years ago
|
||
> Can my comment #2 be apology to you for my noises in some other bugs? :-)
Yep. ;)
Comment 9•15 years ago
|
||
Guys, I think you misunderstand the code you see in comment 2.
It is NOT a "busy wait".
It actually SENDS a record each time through the loop. It merely also looks
to see if there is other work to be done between each record sent.
If you think this is a "busy wait" causing a 100% PCU loop when no data is
being sent, you're barking up the wrong tree.
![]() |
||
Comment 10•15 years ago
|
||
We don't think that. We do think that this code is managing to saturate a modern processor with the locking/unlocking it does during the send, though. Which is pretty impressive.
Updated•15 years ago
|
Summary: nsSocketTRansportServuce::DoPollIteration hogging the CPU while uploading a large file to an SSL web server → nsSocketTransportService::DoPollIteration hogging the CPU while uploading a large file to an SSL web server
Comment 11•15 years ago
|
||
You're sure it's the locking and unlocking, and not the hashing and encrypting
that's using up the CPU?
Recall that the reason the unlocking and locking was put there was to mitigate
hogging the CPU that occurred before it was added.
My guess is that removing the yield altogether will result in worse hogging.
We could yield less often, say, once every 10 times through the loop, and
we could yield for more time (than zero) when we do. Let me suggest you play
with it and try some different things.
![]() |
||
Comment 12•15 years ago
|
||
> You're sure it's the locking and unlocking, and not the hashing and encrypting
> that's using up the CPU?
You _did_ read the profile information attached in comment 0, right? It's all lock access, with no hashing/encrypting in sight...
On the other hand, if the hashing/encrypting is done on a separate thread, then it might not show up in that export depending on the profiling options used. Ehsan?
Reporter | ||
Comment 13•15 years ago
|
||
(In reply to comment #12)
> On the other hand, if the hashing/encrypting is done on a separate thread, then
> it might not show up in that export depending on the profiling options used.
> Ehsan?
I used the default shark profiling options. The profile seems to be captured from a single thread though (not the main thread, FWIW). Is there anything else that I should have done?
Comment 14•15 years ago
|
||
Performance Counter log for next(interval=1 sec):
- Processor utilization (Core2 Duo. So it should be doubled upon evaluation)
- TCP Segments Receive, Send
- Thunderbird's total CPU utilization
- Thunderbird's CPU utilization of each Thread
(Test)
(0) No TCP connection by Thunderbird occurs except with a SMTP by (1) or (2)
(1) Send a mail(1.25MB) via Gmil's SMTP(SSL,port=465)
(2) Send same mail data via other SMTP(no StartTLS, no SSL/TLS, port=587)
(Quick observation)
(0) TCP\Connections Established=18 : No Tb/SMTP connection (SeaMonkey uses)
TCP\Connections Established=19 : Tb/SMTP connection is added
TCP\Segments Sent/sec is very large = Data is sent to SMTP server
(1) SSL
CPU is consumed by Thread(thunderbird/3) and Thread(thunderbird/8)
Tb Total : 20% to 30%
thunderbird/3 : 4% to 18%, Priority Currnet=12, sometimes 8/9/10
thunderbird/8 : 3% to 10%, Priority Currnet=11
TCP\Segments Sent/sec is 2000 to 4000 :
from 05/12/2010 12:37:42.581
to 05/12/2010 12:38:05.581 (23 sec)
TCP\Segments Received/sec during sending is 2000 to 4000
(data passed via loopback is counted?)
(2) no SSL, no TLS, no StartTLS
CPU is consumed by Thread(thunderbird/3) only
Tb Total : 10% to 43%
thunderbird/3 : 1% to 14%
thunderbird/8 : ZERO
TCP\Segments Sent/sec is around 1400 :
from 05/12/2010 12:39:03.581
to 05/12/2010 12:39:10.581 (7 sec)
TCP\Segments Received/sec during sending is around 400
=> Thread(thunderbird/3) : Thread for main window
Thread(thunderbird/8) : Thread for SSL processing
Note: Tb uses a set of loopback conection when SSL is used. It's probably
for communication between thunderbird/3 and thunderbird/8.
(Over all observaton)
If SSL is used, "rougly 10% higher CPU utilization" continues "roughly 3 times longer".
Next may be a reason of non-responsiveness.
> thunderbird/3 : 4% to 18%, Priority Currnet=12, sometimes lowered to 8/9/10
> thunderbird/8 : 3% to 10%, Priority Currnet=11
Because total CPU power required to send data using SSL is never reduced by adding unlock/sleep/lock, adding unlock/sleep/lock merely increases overheads. It's merely decreases displayed value of CPU utilization by reducing CPU power consumption speed.
I think there is no need to bypass primitive OS's scheduling problme by application's frequent sleep request any more, because OS we supports are alredy has minimum scheduler of "Multilevel Feedback Queue".
> http://en.wikipedia.org/wiki/Scheduling_(computing)#Summary
I think solution of CPU hog + nonresponsiveness due to large CPU power requirement in short period shouldn't be application's frequent sleep request. I think it should be clever thread separation and correct thread priority setting(within an application, among application under same OS), and efficient smaller task scheduling/dispatching mechanism in an application if OS doesn't have such capability.
Scheduling problem or contention problem between "send task" and "receive task" looks different problem from original CPU hog issue, because problem sounds still existent even after insertion of unlock/sleep/lock.
Comment 15•15 years ago
|
||
The function name ssl3_SendApplicationData doesn't appear anywhere in the
shark output attached to comment 0. All the PR_Lock and PR_Unlock calls
are coming from PSM, not NSS, with one exception, which is NOT in
ssl3_SendApplicationData. The only NSS function seen anywhere in that
output is ssl_Poll, which is called by PR_Poll, which is called by PSM.
In short, the evidence does not support the conclusion that the code in
ssl3_SendApplicationData is in any way relevant to whatever problem you're
seeing.
Comment 16•15 years ago
|
||
(In reply to comment #15)
> In short, the evidence does not support the conclusion that the code in
> ssl3_SendApplicationData is in any way relevant to whatever problem you're
> seeing.
I understand it well. Attached data is never evidence of whole of this bug and is never evidence of ssl3_SendApplicationData is culprit of this bug.
However;
1. CPU 100% is involved in original comment #0, and CPU 50% with NSPR logging
is reported in this bug. NSPR logging effectively reduces CPU power
consumption rate by SSL, by producing "no activity period of SSL".
2. There really are bug reports of CPU 100% of Tb with SSL(Bug 538283,
Bug 544054) and unresponsiveness of Thunderbird and other applications.
These are phenomenon even when data is simply passed to SMTP server or
IMAP server which uses SSL. And, "CPU utilization of 100%/2" and
"doubled data upload time with NSPR logging enabled" was also reported
in Bug 538283.
3. Atached log is for phenomenon of 2.
Total CPU power requirement when SSL is used(CPU utilization while data
sending * time to take send whole data with SSL) was far higher than my
guess.
4. Needless to say, apparently, adding unlock/sleep/lock merely increases
overhead from view point of total CPU power usage when SSL is used.
It merely reduces value displayed as CPU utilization, although it can
protect from unresponsiveness of other Mozilla components and/or other
applications on same PC, by reducing CPU power consumption rate by SSL.
5. I think unlock/lock is expensive work especially when milti-processing,
milti-tasking, in multi-CPU environment.
Please note that co-operative multi-tasking on single-CPU is never real
multi-tasking environment from view point of resource locking.
It's almost same as resource locking in non-multi-tasking on single-CPU.
And, in the case, resuorce locking can be simply storing flag at a common
memory location.
Is there no possibility that "sleep of longer time" is similar to "enable NSPR logging"?
I think next are better to be analyzed again;
(i) What is cause of original CPU 100%/CPU hog/unresponsiveness of Mozilla
(and other applications) when you added unlock/sleep/lock.
(ii) Why other "handshaking request for receive" is not scheduled as you
say in comment of source code and/or in bug 127740, even though you added
unlock/sleep/lock as solution of the problem.
![]() |
||
Comment 17•15 years ago
|
||
> The function name ssl3_SendApplicationData doesn't appear anywhere in the
> shark output
That can actually happen depending on what the compiler does in terms of inlining and exported symbols, but it may well be that some other set of locks is the issue here. Ehsan, since you have the actual non-flat shark profile, can you look at the per-line blame for the PSM functions involved and see whether they're taking their own locks or calling to nss code that takes locks?
Reporter | ||
Comment 18•15 years ago
|
||
(In reply to comment #17)
> > The function name ssl3_SendApplicationData doesn't appear anywhere in the
> > shark output
>
> That can actually happen depending on what the compiler does in terms of
> inlining and exported symbols, but it may well be that some other set of locks
> is the issue here. Ehsan, since you have the actual non-flat shark profile,
> can you look at the per-line blame for the PSM functions involved and see
> whether they're taking their own locks or calling to nss code that takes locks?
Unfortunately, I didn't save the full shark profile. :(
That said, I used this form as a test case, so maybe someone who knows where to look can use it to see if they can reproduce the problem:
https://people.mozilla.com/~eakhgari/dummyform.html
The uncompressed form of attachment 444418 [details] should be way big to test this bug with the above URL.
Comment 19•15 years ago
|
||
The details in the shark report attached to comment 0 clearly show that
63.5% of the time is being spent BUSY WAITING using PR_Poll, and only
26.7% of the time is being spent productively working on sockets.
The code path about which comment 2 complains is in the productive part.
I suggest that you focus on the busy wait.
The base of the busy wait loop starts here:
http://mxr.mozilla.org/mozilla1.9.2/source/netwerk/base/src/nsSocketTransportService2.cpp#524
See lines 538-539.
Now, it's possible that you may eventually end up back in NSS code, such as
in function ssl_Poll
https://mxr.mozilla.org/security/source/security/nss/lib/ssl/sslsock.c#1726
because that code IS getting called within the busy wait loop, but if you do, at least you'll be looking at code that is potentially relevant to the problem.
Comment 20•15 years ago
|
||
(In reply to comment #19)
> 63.5% of the time is being spent BUSY WAITING using PR_Poll, and only
> 26.7% of the time is being spent productively working on sockets.
> The code path about which comment 2 complains is in the productive part.
> I suggest that you focus on the busy wait.
What I could observe in comment #14 looks "26.7% by poroductive work" part only. How can we know the "63.5% of BUSY WAITING" is involved in CPU 100% problem of bug 538283 with mail server(SMTP and IMAP)? Is there logging feature(such as NSPR logging) for SSL which we Thunderbird user can use? Can we force condition of "receive task is waiting for someting to handshake" while uploading large mail data to IMAP server or SMTP server?
![]() |
||
Comment 21•15 years ago
|
||
> https://people.mozilla.com/~eakhgari/dummyform.html
I just tried that. My CPU usage for the Firefox process stayed at 10% during the upload....
Now maybe it was just gating on my slow (768Kb/s) upload bandwidth?
My profile of that, in any case, shows a bunch of time painting, about 4% actually encrypting, not much locking at all.
Moving to Core:General until we have more data.
Assignee: nobody → nobody
Component: Libraries → General
Product: NSS → Core
QA Contact: libraries → general
Version: 3.12.6 → Trunk
Comment 22•15 years ago
|
||
Based on Comment 21 above, it looks like we need more info before we can decide to block on this. Clearing nom for now, but please re-nom when we have data that supports blocking.
blocking2.0: ? → ---
Comment 23•15 years ago
|
||
CPU 100% due to retransmission of packet like bug 541367?
In that case, timeout or error due to max retry count occurred. Cause was found router's bug. See bug 541367 comment #17. Similar phenomenon is reported to bug 566548, and analysis of problem is in progress.
If retransmission is successful, session error doesn't occur. But if retransmission occurs frequently, CPU 100% due to retransmission can happen.
Comment 24•15 years ago
|
||
Ehsan Akhgari [:ehsan](bug opner), can you check with network.tcp.sendbuffer = 32768, 65536, 262144 etc.?(default is 131072, See bug 566548, please)
By the way, when you get nspr log next time, add timestamp to log, please.
> nspr_log_modules=timestamp,xxx:5
Reporter | ||
Comment 25•15 years ago
|
||
Like I said in comment 3, I wasn't able to reproduce this problem after the initial report...
Maybe our QA folks can help in reproducing this problem?
Keywords: qawanted
Comment 26•10 years ago
|
||
I recreated this on Firefox 32.0 candidates build1 (win32) when I was uploading a file to VirusTotal.
This did not occur on Firefox 31.0.
Comment 27•9 years ago
|
||
if the regressor of 1083930 is backed out this is worksforme.. given that this bug is older than that regression I think WFM is the right disposition. We'll fix the current problem in 1083930
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•