Closed
Bug 153769
Opened 22 years ago
Closed 22 years ago
[PATCH] Race condition causing connection timeout when using pop3s (pop3 with ssl)
Categories
(Core Graveyard :: Security: UI, defect, P2)
Tracking
(Not tracked)
VERIFIED
FIXED
Future
People
(Reporter: michael, Assigned: KaiE)
References
Details
Attachments
(1 file)
3.17 KB,
patch
|
Details | Diff | Splinter Review |
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020529 BuildID: 2002052918 When using using pop3s (connection to server on the local machine) the connection will more often than not time out while performing the login stages (it fails about 9 times out of 10). Reproducible: Sometimes Steps to Reproduce: 1. rm -rf .mozilla 2. Start mozilla, set up new mail account with secure connection (SSL) 3. Attemp to download emails 4. If successful, repeat step 3 Actual Results: Mozilla is apparently trying to connect to the pop3s server. It won't connect to anything else (pop3 or http). Eventually, the pop3s connection attempt times out. Expected Results: Connected to the pop3s server, checked for new mail and collected any that were waiting. In nsSSLIOLayerWrite (security/manager/ssl/src/nsNSSIOLayer.cpp) the socket is put into blocking mode for the first write (even if it has been set as a non-blocking socket). If a read occurs at the wrong time, it will block. (At the higher levels, reads always request more data than is available so the low level read will fetch the data that is there, then go round again to see if there is any more - this is where it is blocking.) I have got round this by adding a lock to nsNSSSocketInfo which is grabbed by the nsSSLIOLayerRead function (always) and the nsSSLIOLayerWrite function (only on the first write where it is changing the socket to blocking mode and only if was previously in non-blocking mode). I have no idea if this is the Right Thing (but it works for me :). I will attach the patch for this in case it is of any interest. (The files affected are nsNSSIOLayer.cpp and nsNSSIOLayer.h) As an aside, if anyone adds a comment to this and knows how to get hold of the output of the PR_LOG( .. ) calls I would find this very helpful.
Reporter | ||
Comment 1•22 years ago
|
||
Reporter | ||
Comment 2•22 years ago
|
||
Knew I'd forgotten something. The patch is for 1.0. Previous 0.9.x versions and 1.1a also display the same behaviour.
Comment 3•22 years ago
|
||
>POP
Assignee: mstoltz → naving
Component: Security: General → Networking: POP
QA Contact: junruh → sheelar
Comment 4•22 years ago
|
||
Marking NEW for patch loving. Reporter please email one of the owners of the module http://www.mozilla.org/owners.html and cc: reviewers@mozilla.org asking for a review and super review of that patch. Thanks again for the patch.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Race condition causing connection timeout when using pop3s (pop3 with ssl) → [PATCH] Race condition causing connection timeout when using pop3s (pop3 with ssl)
Assignee | ||
Comment 5•22 years ago
|
||
Michael, thanks for the patch, but in the future, please use "diff -u" to produce unified patches, they are easier to read and apply, and are usually used for patches attached in bugzilla. The patch itself looks good, but I want to have a closer look for potential side effects or deadlocks, before I give the reviewed attribute.
Comment 6•22 years ago
|
||
This bug should go to the owner of this code. Who owns this code ? looks like javi@netscape.com reassigning to javi.
Assignee: naving → javi
Comment 7•22 years ago
|
||
assigning to Kai.
Assignee: javi → kaie
Priority: -- → P2
Target Milestone: --- → Future
Assignee | ||
Comment 8•22 years ago
|
||
See also bug 87902. Before I review the patch, I want to review the SSL I/O layer code again.
Assignee | ||
Comment 9•22 years ago
|
||
over to PSM
Component: Networking: POP → Client Library
Keywords: nsbeta1
Product: MailNews → PSM
QA Contact: sheelar → junruh
Version: other → unspecified
Assignee | ||
Comment 10•22 years ago
|
||
Michael, first let me say, I've been using POP3/SSL on Linux every day for about a year for my private e-mail account, and I have never run into the problem you describe. I tried to understand what you are doing with your patch. Bascially one can describe Mozilla's I/O as layered functions, that generally pass read requests through to lower layer to do the real work, and can add their own special handling on certain events. Layer 1: Necko and Mozilla's network thread Layer 2: Mozilla's PSM logic to encapsulate security I/O logic from the rest of the Mozilla code. Layer 3: The NSS library which is a general purpose library to do SSL communication. In layer 2, we have a read and a write function. Normally, I/O is asynchronous. But still I assume that Layer 1 would never use multithreading to call Layer2Read and Layer2Write at the same time. On the first write in Layer2Write, we switch to blocked I/O to detect some special failure. The lock you added the following locks: Layer2Read { lock call Layer3Read unlock } Layer2Write { lock switch to blocking mode if necessary call Layer3Write do some error checks and data structure updates unlock } I can not understand how this could have helped. I only understood, if during our blocking call to Layer3Write, Layer 1 would be reached and executed a call to Layer2Read. I would assume that's not possible. My only guess is that your change caused a different timing. I see that Layer 2 (the code you were trying to change) is using obsolete workarounds, that I did not fully understand. Those workarounds included writing null bytes to trigger other activity. Actually, when I tried to change that workaround in an attempt to fix bug 87902, I broke the functionality. So I consider the existing code is broken. I have created a patch to bug 87902 that removes the workarounds and changes the code to a simpler logic. Our tests show, that SSL communication with the latest patch from 87902 still works. I will check in 87902 today, and I hope everything will continue to work. Please try out the patch from bug 87902 or just test tomorrow's trunk build and let us know whether you still see your problem. Before I check in I will make a final test with my patch and POP3/SSL, which I realized I have not yet tested.
Reporter | ||
Comment 11•22 years ago
|
||
Thanks for your detailed response, Kai. I tried the patch (on 1.1a) and also the 20020815 build but I still have the same problem. My assumption is that reads and writes are being handled on separate threads with the write thread being the one passing through the POP3 protocol state machine and the read thread waiting on select and sending a message to the write thread when data had been collected and put into the pipe. The order of events would then be: 1) Layer2Write puts socket in blocking mode 2) Layer2Write calls Layer3Write 3) Server responds and data becomes available to read 4) Read thread is woken up, calls down to Layer3Read and blocks (because read is always called with a request for more data than is available) 5) Layer3Write returns and Layer2Write puts socket back into non-blocking mode. 6) Read thread is now stuck in a blocking read so no message gets back to the write thread that data is available to process. The function of the lock was to hold up the read until the write had put the socket back into non-blocking mode. This would effectively be the situation that you describe with Layer 1 being responsible for setting off reads and writes at the same time. I have now managed to run the code in a debugger (with break points on the Layer2Write and Layer2Read) and when stepping over the call to Layer3Write, there is another thread that then hits the breakpoint in Layer2Read but I haven't managed to work out exactly how/where the two threads are being set up. Unfortunately, I don't have a remote POP3 account that can be accessed over SSL so I can't test this but I guess that that it would be OK and I assume, from your experience, that this isn't a very big problem.
Assignee | ||
Comment 12•22 years ago
|
||
Michael, you may want to track bug 163605. I have hope that bug will fix your problem.
Reporter | ||
Comment 13•22 years ago
|
||
Thanks for keeping me in touch. I have applied your patch from bug 163605 (v5) to Mozilla 1.2a (which displayed the same behaviour as previous versions) and the problem has gone away. Thankyou.
Assignee | ||
Comment 14•22 years ago
|
||
This should be fixed starting with tomorrow's build, by the patch for bug 163605. Please reopen if you still can reproduce the problem.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•