Closed Bug 100353 Opened 23 years ago Closed 23 years ago

Copy sent mail to IMAP servers "Sent" folder appears to hang NS6

Categories

(MailNews Core :: Networking: IMAP, defect)

PowerPC
All
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9.5

People

(Reporter: scalkins, Assigned: sfraser_bugs)

References

Details

(Keywords: hang, perf, regression, Whiteboard: [PDT+], [ETA 10.03])

Attachments

(6 files, 6 obsolete files)

34.79 KB, application/octet-stream
Details
47.70 KB, application/octet-stream
Details
1.59 KB, patch
Details | Diff | Splinter Review
6.77 KB, text/plain
Details
34.30 KB, text/plain
Details
22.98 KB, patch
Details | Diff | Splinter Review
Saw this on Mac only Mac OSX 2001-09-17-04 0.9.4 Branch (10.x) Mac OS 9.1 2001-09-11-04 Branch (8.x) Steps to repro: 1) Login to Mail with an IMAP account. 2)Set Mail prefs to "Place a copy" of sent mail in the "Sent" folder on yourIMAP server. 3)Compose a new message, and attach either one or both of the files attached on this email. One file, is a bmp file which is 38k, and trhe other file is a .tiff file which is 48k. 4)Send the message Actual results: Mail status dialogs show the mail was sent, but when the "Copying to sent mail folder dialog" appears, it seems to hang mail for 10-15 minutes while this action is being performed. This is exacerbated when you attach both files, because it seems to take 10-15 minutes for each attachment, (20-30 minutes) to copy the file to the sent mail folder on the server. NOTE: Nominating nsbranch nsenterprise because most users will be led to believe the app has hung because you can't launch browser or do anything else till the action completes. They will force quit the app rather than wait this long. This gives a bad impression to users of our product. The .tiff and bmp files are not uncommon, and we saw the same behavior with .jpg files. Copying the files to the "Sent" mail folder locally does not exhibit this behavior, and is quick.(A few seconds)
Keywords: perf
P.S. When the action finally completes, the mail is copied to the sent mail folder. The recipient also does get the message with attachments.
There was bug logged for Mac before. I need to find out that this is dup bug or another regression again
Keywords: regression
Karen, There is bug 88489 but that is for large attachments. These are small files, so I logged a different bug.
Keywords: hang
this is bug needs to be fixed or users will hang up everytime they send a message that has an attachment with a size around 28kb. I tested this sending (1) .gif file 28kb as an attachment, it hangs the app. I tested this sending (1) .jpeg 4kb, it hangs the app. I tested this doing a Send Page (our home page) it hangs the app. In all of these cases I waited 1 hr then did a force quit. It fails only at saving to the IMAP Sent folder. If I use the Local folder it finished sending and saving within 15 seconds. On OS9.1 these same tests (copy to IMAP) takes 2-3 minutes which is bad, but not as bad as OSX. Somebody please bump this up to a "+", the work around forces users to change their way of organizing their messages.
Cavin, can you take a look at this tomorrow? Sounds like a stopper to me. nsBranch+.
Keywords: nsbranchnsbranch+
Target Milestone: --- → mozilla0.9.5
re-assinging to cavin for real this time. Cavin, can you take a look at this tomorrow for a possible branch fix?
Assignee: mscott → cavin
I am looking at it...
Here is what I have observed: I am using a PPP connection at 26K, I am generating a message that include the tiff attachment. When I sent it, SMTP take about 15 Sec, then the copy start but after 1-2 sec, the UI froze for about 10 secs and then the copy continue for about 8 secs, sometime more. The UI freeze occurs when we write the data to the output stream in nsImapProtocol::SendData. However, if I turn on PR_Log for nsSocketTransport, the UI freeze disappears!
Attached file IMAP Log
I added some debug code into nsImapProtocol::SendData to demonstrate how long nsIOutputStream::Write blocks. the interresting lines are: 48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133722 48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 36 48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 133723 [1] 48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133724 48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384 48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 133726 [2] 48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133727 48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384 48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 133730 [3] 48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133731 48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384 48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 141486 [7755] 48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 141486 48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384 48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 149350 [7864] 48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 149350 48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 2547 48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 149351 [1] 48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 149351 48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 2 48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 149351 [0] the first number after START or END is the abosolute time in milliseconds the second number after END between [] is the time spend in the function in milliseconds. You can really see that twice we get stuck in the write call for about 8 seconds which correspond to the UI freeze.
darin, can you take a look at the imap log and give us some suggestions on what might have gone wrong for some of the stream write calls to complete in 7-8 seconds? The first and second 16K bytes took only a few milliseconds to complete while the 3rd and 4th 16K bytes took 7.7 and 7.8 seconds respectively to finish. We also changed NS_SOCKET_TRANSPORT_BUFFER_SIZE to 128K (from 8K) and tried the same scenario, but the performance did not improve at all. Let me know if you have questions on the log. Thanks.
Our initial theory was that the socket thread was getting backed up and couldn't send the incoming bits up fast enough. So as Cavin mentioned has a test of that theory we jacked up the value of NS_SOCKET_TRANSPORT_BUFFER_SIZE to make it large enough to handle the entire message we were trying to send to the socket. So the socket shouldn't have gotten backed up. But we still saw the same long lags.
is this method being called on the UI thread? you should never call a blocking read/write method from the UI thread... you are guaranteed to have unknown delays (possibly very long) before the methods return.
No this isn't being called from the UI thread. I'd like to think we know better than that =). Each imap connection runs in it's own thread and we frequently block that thread while waiting for new data or writing data using blocking writes to the socket.
whew :) so, then its not the UI that is hung, but something else? mail is not being responsive... is that because the particular IMAP thread is busy... and the user is effectively waiting for that thread to do something?
Actually the problem isn't with the responsive ness of the UI thread. Our problem is that on the mac, writing 8K chunks of data to a nsSocketTransport is taking 7 to 8 seconds. We are noticing that the first couple writes seem to happen pretty quickly. Then our blocking write call starts taking 7 to 8 seconds before the socket transport comes back to us. It shouldn't take that long to send 8K of data so something is getting backed up somewhere. Our inital guess was that the socket transport stream was just full and we were waiting for it to finish sending data before it could take more. So as a test of that theory we jacked up the size of the socket transport buffer to be larger than our 100K message. We still saw the same 7 to 8 second delays for a couple of the 8k writes we were performing. So something funky is going on for the Mac when writing to a socket. At least that's our best guess.
Oh and you'll love this Darin. JF says if he turns socket logging on for the Mac. Then these huge 7 to 8 seconds lags on a couple of the blocking writes go away! I haven't been able to corraborate that. But that's pretty wild.
FYI: I've tried to setup a smaller buffer and sebment size (512 bytes) but that had no effect at all! I'll try to look at the OpenTransport code in nspr...
I'm not sure I understand why making the socket buffer smaller would have had any effect on the problem? Well your results backed that up =).
I was trying to see if we were blocking during the physical transmission on the wire. Using a smaller buffer cause to send small blocks rather than big one, at least that's what I was hopping!
the socket writes are unbuffered, so changing the buffer sizes should indeed have no effect.
what are the chances of getting this fixed for 0.9.4 branch?
Whiteboard: [PDT], [ETA ?]
we still doesn't have a clue what's the matter here, therefore no ETA yet!
ETA for this bug is still unknown because we don't know what the cause is yet. Once we know what's causing the problem then we'll get a better ETA.
I did the same test under MacOS X with yesterday Mozilla debug build as well with the Official Netscape 6.1. In both case it hang during the copy to IMAP bug unfortunatly we stay frozen for ever, I had to kill the application. I get the following error on the console: Assertion failure: resultOT == T_GODATA, at macsockotpt.c:412
Re-assigning.
Assignee: cavin → ducarroz
Whiteboard: [PDT], [ETA ?] → [PDT], [ETA n/a]
*** Bug 102175 has been marked as a duplicate of this bug. ***
Here is so far what I have found: nsImapProtocol::SendData calls nsSocketBOS::Write. nsSocketBOS::Write calls PR_Write() which failed with rv=NS_BASE_STREAM_WOULD_BLOCK. Therefore nsSocketBOS::Write calls Poll() which succeed and then try again to write. When we are blocking, we get stuck in this loop PR_WRITE - POLL - PR_WRITE for about 500 times until we get out. This process takes couple seconds (3 with an ISDN connection, 7 with a dial-up) During the copy of the message, we block 1 or 2 times. Also, I have observed that when we are done uploading the data, the progress bar still spinning for a while (10-15 secs) until the copy operation is finished. I don't know what we are doing or waiting for!
Status: NEW → ASSIGNED
sorry if this is already mentioned, but does it matter if the IMAP connection is via SSL?
darin, we aren't using SSL in these examples. Hope that helps.
ok.. i suspect there is something funny happening in the mac PR_Poll code... it is clearly malfunctioning if it is setting PR_POLL_WRITE _and_ then returning PR_WOULD_BLOCK_ERROR from a subsequent PR_Write call. however, a quick check of the code reveals that we are not verifying that PR_POLL_WRITE is set. hmm... it would be good to log the pd.out_flags in nsSocketBS::Poll to ensure that PR_POLL_WRITE is getting set. though, i'm not sure what we could/should do if PR_POLL_WRITE is not set... i mean, if PR_POLL_EXCEPT is being set, then i'd think we'd need to abort the socket connection, but that won't help in getting the data uploaded :P so, i'm really suspecting the mac NSPR polling code. cc'ing some mac folks for help.
That would not surprise me.
I have modified the function nsSocketBS::Poll to not set anymore the flag PR_POLL_EXCEPT but that did not have any effect.
ok.. sounds like we need to do some NSPR debugging. anyone with mac experience want to help out?
kind of hard to debug, if I use break points to debug PR_Poll, the problem goes away :-(
Working on it now.
This is happening because OT returns a kOTFlowErr in response to the OTSnd, causing us to return an error from PR_Write(). It's indended that the caller will then call PR_Poll, which will be woken when the Notifier gets a T_GODATA. However, it's possible for the notifier to fire when both the polling thread and writing thread members on the socket have been cleared, so we fail to wake the polling thread. Working on a solution now.
IMAP logs won't tell me anything new, I think. Still debugging.
A couple of things that I've found: The GetState() function definately needs this change: resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd); switch (resultOT) { case T_DATAXFER: case T_INREL: - *writeReady = PR_TRUE; + *writeReady = fd->secret->md.writeReady; break; default: *writeReady = PR_FALSE; } This fixes the problem where PR_Poll says that the socket is writable when it's not. But with that change, things still don't work (especially on Mac OS X). A second problem that I can see is that SendReceiveStream() (called from PR_Read/PR_Write for sockets) fails to correctly handle write that are only partially successful. It loops, calling OTSnd() until no data is left to send. However, if the second OTSnd() fails (because of flow control) then we simply return -1, rather than returning the amount of data we did send.
with macosx commercial build 2001-10-01-04-0.9.4 sed web page didn't hang but it still fails to send the page correctly
Hardware: Macintosh → All
simon: good find! it seems like a common mistake of Read/Write impl's.
Simon patch remove the "hang" However, we are still very very slow to send IMAP data (IMAP is using non bloking mode). I try to make SendReiveStream return the amount of bytes currently send/read even in case of an error (any). I return the amount only when this one is not 0, else I return -1. This doesn't not help. It still took between 11 to 30 second to save a draft message on IMAP while it takes only 4 secs on Windows (same message, same connection). But, if I try to bee a little bit smarter and do the following: Index: mozilla/nsprpub/pr/src/md/mac/macsockotpt.c =================================================================== RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v retrieving revision 3.32.2.1 diff -w -u -5 -r3.32.2.1 macsockotpt.c --- macsockotpt.c 2001/09/20 03:42:25 3.32.2.1 +++ macsockotpt.c 2001/10/01 19:57:44 @@ -1539,12 +1539,22 @@ ErrorExit: PR_ASSERT(opCode == kSTREAM_SEND ? fd->secret->md.write.thread == nil : fd->secret->md.read.thread == nil); macsock_map_error(err); + switch (err) + { + case kOTFlowErr: + case kOTNoDataErr: + case kEAGAINErr: + case kEWOULDBLOCKErr: + return amount - bytesLeft; + + default: return -1; } +} The send is not very fast but we have couple of hang that last 2-3 secs. The hang seems to occur into nsSocketBOS::Write because we loop without waiting. Also, should we apply the same changes to the function SendReceiveDgram? I presume yes!
For what I am seeing, PR_Poll block for too long! we are waiting and waiting while their is no network activity! When SendReceiveStream return 0 instead of -1 (see my patch above), nsSocketBOS::Write will just call PR_WRITE without calling Poll. That hang the UI but it's very fast!
"hang seems to occur into nsSocketBOS::Write because we loop without waiting" i'm not sure i follow this statement: without waiting for what? we call PR_Poll... isn't that appropriate?
in nsSocketBOS::Write, if PR_Write return 0 when no data has been written instead of returning -1 like it's currently the case, Poll will not be called and it will just try to write again. The hang could be cause by the fact I have couple of printf in the loop! What looks suspicious for me is that in SendReceiveStream, fd->secret->nonblocking is true but it nsSocketBOS::Write, we detect a NS_BASE_STREAM_WOULD_BLOCK error! does it make sense to be non blocking at one level and blocking in another!
Tracy, I was wondering why you changed OS to all. Are you seeing this on linux and windows as well? Using today's branch builds win98 and linux is working fine when copying to imap sent folder. The problem is only with MAC.
the socket is non-blocking, because there can be both a AsyncRead request as well as a open nsIOutputStream to the socket. so, we configure the socket to be non-blocking and fake a blocking operation by using the PR_Poll/PR_Write loop in nsSocketBOS::Write.
PR_Write should not return 0 bytes written... it should return some sort of error to indicate why nothing could be written. the code in nsSocketBOS::Write should probably detect 0 bytes written... but, then what should it do with that case? treat it as WOULD_BLOCK, or simply return 0 bytes written to its caller? bottom line.. i think PR_Write should not return 0 bytes written, because it leaves us in a rather ambiguous state.
I agree with you. ALl I did was to try to understand the problem. My conclusion so far is that we spent to much time into PR_POLL. We need to figure out why?
I'm all over this, but I need to spend some more time debugging. I ended up with something like ducarroz's fix in SendReceiveStream, but there is still a problem with a missed notification, that prevents the IMAP thread from being woken up to read the final server acknowledgement of the send.
Sheela, I'm seeing web page send failure on all platforms. That is why I changed the OS. But upon reviewing this bug further, I belive what I'm seeing is different behavior. Send a web page brings up an error message saying the file containing the web page could not be sent. If you click okay to the error message, the mail message is sent but upon receiving the message the web page is not attached, just the link to the page. If the fix to this bug doesn't correct that behavior as well, I will log a new bug. changing OS back to Mac
Hardware: All → Macintosh
Something confuses me here. With the Mac NSPR fixes described above, there is still a problem where, after sending the file attachments on save to IMAP Sent folder, we stall in nsImapProtocol::UploadMessageFromFile() while waiting for the server's response. We stall down in nsImapProtocol::CreateNewLineFromSocket(), waiting on the m_dataAvailableMonitor. The Mac NSPR code does get a notification that says that the data is available, but it's dropped on the floor. I suspect this may be because we have two threads doing nested PR_Polls on the same socket.
Darin pointed out that two threads should not both be polling for read at the same time, but it's ok for one to poll for reading, and one for writing. Fixing Mac NSPR to deal with this fixes the bug.
Assignee: ducarroz → sfraser
Status: ASSIGNED → NEW
This fix works (still needs testing on Mac OS X), but needs a bit of tidying up. I think we need to review Mac's PR_Poll to ensure that it's returning the correct flags in the various less common situations (connect, disconnect etc). The patch basically allows 2 threads to poll the same socket, as long as only one is reading or writing at a time.
Status: NEW → ASSIGNED
Tested the fix on a dual CPU Mac OS X box. It works fine there too.
what are the chances this will be ready by 10.03?
I expect to have a final patch by 10/03
Attachment #51648 - Attachment is obsolete: true
Attachment 51793 [details] [diff] fixes several problems in Mac NSPR that caused this: 1. PR_Write to a socket whose underlying endpoint is under OT flow control (i.e. is temporarily unable to accept data) now behaves correctly, only returning -1 if no data was written. If > 0 bytes were written, it returns the number of bytes written. 2. PR_Poll is fixed to only return the writable flag if the socket is writable. Previously, we'd say that a socket was writable when it was actually under flow control. 3. It's now OK to call PR_Poll on the same socket from two different threads, as long as one thread is reading, and one writing. Ignore whitespace diffs in the patch; the source has been detabbed somewhat, and lines up nicely. Looking for r= from wtc and gordon. Note that I spun a separate bug for fixing PR_Poll to report correct status on connect/disconnect (bug 102791).
Comment on attachment 51793 [details] [diff] [review] Final NSPR patch to fix polling on different threads etc. Simon, See my comments below. >Index: pr/src/md/mac/macsockotpt.c >=================================================================== >RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v >retrieving revision 3.32.2.1 >diff -b -u -2 -r3.32.2.1 macsockotpt.c >--- macsockotpt.c 2001/09/20 03:42:25 3.32.2.1 >+++ macsockotpt.c 2001/10/03 00:19:57 >@@ -306,4 +306,5 @@ > // Async callback routine. > // A5 is OK. Cannot allocate memory here >+// Ref: http://gemma.apple.com/techpubs/mac/NetworkingOT/NetworkingWOT-100.html > static pascal void NotifierRoutine(void * contextPtr, OTEventCode code, OTResult result, void * cookie) > { >@@ -311,6 +312,6 @@ > _MDFileDesc * md = &(secret->md); > EndpointRef endpoint = (EndpointRef)secret->md.osfd; >- PRThread * thread = NULL; >- PRThread * pollThread = md->poll.thread; >+ PRThread * readThread = NULL; >+ PRThread * writeThread = NULL; > OSStatus err; > OTResult resultOT; It would be nice to add a comment that says: readThread is also used for 'misc'. >@@ -366,22 +367,17 @@ > err = OTRcvDisconnect(endpoint, &discon); > PR_ASSERT(err == kOTNoError); >- secret->md.exceptReady = PR_TRUE; >+ secret->md.exceptReady = PR_TRUE; // XXX Check this If you mark the socket exceptReady, you are compatible with Winsock. If you mark the socket readReady and writeReady, you are compatible with most Unix socket implementations. I suggest that you go for Unix socket compatibility. >@@ -1514,7 +1511,12 @@ > case kEWOULDBLOCKErr: > if (fd->secret->nonblocking) { >+ if (bytesLeft == amount) { > err = result; > goto ErrorExit; > } >+ else { // return how much was sent >+ return amount - bytesLeft; >+ } >+ } > > WaitOnThisThread(me, timeout); Please make sure the indentation of the nested if-else under this 'case' looks right. I can't tell in Netscape 6.x, with tab stop set to 8. >@@ -1696,5 +1698,4 @@ > } > >- > static PRBool GetState(PRFileDesc *fd, PRBool *readReady, PRBool *writeReady, PRBool *exceptReady) > { >@@ -1709,10 +1710,18 @@ > *exceptReady = fd->secret->md.exceptReady; > >+ *writeReady = PR_FALSE; This is not necessary. > resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd); > switch (resultOT) { >- case T_DATAXFER: >- case T_INREL: >- *writeReady = PR_TRUE; >+ case T_DATAXFER: // data transfer >+ *writeReady = fd->secret->md.writeReady; >+ break; >+ >+ case T_INREL: // incoming orderly release >+ *writeReady = fd->secret->md.writeReady; > break; When the socket is in the T_INREL state, we should mark it readable and arrange for the next read to return a zero byte count. >+ >+ case T_OUTCON: // outgoing connection pending >+ case T_INCON: // incoming connection pending >+ case T_OUTREL: // outgoing orderly release > default: > *writeReady = PR_FALSE; It's not necessary to add these cases because 'default' is going to cover them. >@@ -1812,5 +1821,13 @@ > if (bottomFD && (_PR_FILEDESC_OPEN == bottomFD->secret->state)) > { >- bottomFD->secret->md.poll.thread = thread; >+ if (pd->in_flags & PR_POLL_READ) { >+ PR_ASSERT(thread == NULL || bottomFD->secret->md.read.thread == NULL); >+ bottomFD->secret->md.read.thread = thread; >+ } >+ >+ if (pd->in_flags & PR_POLL_WRITE) { >+ PR_ASSERT(thread == NULL || bottomFD->secret->md.write.thread == NULL); >+ bottomFD->secret->md.write.thread = thread; >+ } > } > } Are you sure that it is necessary to have 'thread == NULL' in the assertions?
Attachment #51793 - Flags: needs-work+
> >Index: pr/src/md/mac/macsockotpt.c > >=================================================================== > >RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v > >retrieving revision 3.32.2.1 > >diff -b -u -2 -r3.32.2.1 macsockotpt.c > >--- macsockotpt.c 2001/09/20 03:42:25 3.32.2.1 > >+++ macsockotpt.c 2001/10/03 00:19:57 > >@@ -306,4 +306,5 @@ > > // Async callback routine. > > // A5 is OK. Cannot allocate memory here > >+// Ref: http://gemma.apple.com/techpubs/mac/NetworkingOT/NetworkingWOT-100.html > > static pascal void NotifierRoutine(void * contextPtr, OTEventCode code, OTResult result, void * cookie) > > { > >@@ -311,6 +312,6 @@ > > _MDFileDesc * md = &(secret->md); > > EndpointRef endpoint = (EndpointRef)secret->md.osfd; > >- PRThread * thread = NULL; > >- PRThread * pollThread = md->poll.thread; > >+ PRThread * readThread = NULL; > >+ PRThread * writeThread = NULL; > > OSStatus err; > > OTResult resultOT; > > It would be nice to add a comment that says: > readThread is also used for 'misc'. Will do. > >@@ -366,22 +367,17 @@ > > err = OTRcvDisconnect(endpoint, &discon); > > PR_ASSERT(err == kOTNoError); > >- secret->md.exceptReady = PR_TRUE; > >+ secret->md.exceptReady = PR_TRUE; // XXX Check this > > If you mark the socket exceptReady, you are compatible with Winsock. > If you mark the socket readReady and writeReady, you are compatible > with most Unix socket implementations. I suggest that you go for > Unix socket compatibility. Yes. I'll fix this under bug 102791. > >@@ -1514,7 +1511,12 @@ > > case kEWOULDBLOCKErr: > > if (fd->secret->nonblocking) { > >+ if (bytesLeft == amount) { > > err = result; > > goto ErrorExit; > > } > >+ else { // return how much was sent > >+ return amount - bytesLeft; > >+ } > >+ } > > > > WaitOnThisThread(me, timeout); > > Please make sure the indentation of the nested if-else under this 'case' > looks right. I can't tell in Netscape 6.x, with tab stop set to 8. It's OK in the source. I diffed ignoring space changes. > >@@ -1696,5 +1698,4 @@ > > } > > > >- > > static PRBool GetState(PRFileDesc *fd, PRBool *readReady, PRBool *writeReady, PRBool *exceptReady) > > { > >@@ -1709,10 +1710,18 @@ > > *exceptReady = fd->secret->md.exceptReady; > > > >+ *writeReady = PR_FALSE; > > This is not necessary. Right, I'll fix. > > resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd); > > switch (resultOT) { > >- case T_DATAXFER: > >- case T_INREL: > >- *writeReady = PR_TRUE; > >+ case T_DATAXFER: // data transfer > >+ *writeReady = fd->secret->md.writeReady; > >+ break; > >+ > >+ case T_INREL: // incoming orderly release > >+ *writeReady = fd->secret->md.writeReady; > > break; > > When the socket is in the T_INREL state, we should mark it readable > and arrange for the next read to return a zero byte count. Yes. Another item for bug 102791. > >+ > >+ case T_OUTCON: // outgoing connection pending > >+ case T_INCON: // incoming connection pending > >+ case T_OUTREL: // outgoing orderly release > > default: > > *writeReady = PR_FALSE; > > It's not necessary to add these cases because 'default' > is going to cover them. Correct, but for me, it's clearer to see them all there, and I believe the compiled code will be the same. Compilers on other platforms would warn about missing enum values in the case ;) > >@@ -1812,5 +1821,13 @@ > > if (bottomFD && (_PR_FILEDESC_OPEN == bottomFD->secret->state)) > > { > >- bottomFD->secret->md.poll.thread = thread; > >+ if (pd->in_flags & PR_POLL_READ) { > >+ PR_ASSERT(thread == NULL || bottomFD->secret->md.read.thread == NULL); > >+ bottomFD->secret->md.read.thread = thread; > >+ } > >+ > >+ if (pd->in_flags & PR_POLL_WRITE) { > >+ PR_ASSERT(thread == NULL || bottomFD->secret->md.write.thread == NULL); > >+ bottomFD->secret->md.write.thread = thread; > >+ } > > } > > } > > Are you sure that it is necessary to have 'thread == NULL' in > the assertions? Yes, because we use this same function to null out the threads at the end of a Poll() call.
With this code, we're seeing two new assertions in NSPR: 1. The assertion that the reading thread is NULL just before the OTRcv() in SendReceiveStream() is hit, because FTP uses nsSocketTransport::IsAlive(), which calls PR_Recv(mSocketFD, &c, 1, PR_MSG_PEEK, 0) while another thread is polling. 2. We hit the PR_ASSERT(0) in _MD_poll, because thread->md.osErrCode is set to kECONNRESETErr by the NotifierRoutine after it gets a T_DISCONNECT message. Both these assertions seem benign.
Updating ETA per Simon's comments in the bug - ETA 10.03.
Whiteboard: [PDT], [ETA n/a] → [PDT], [ETA 10.03]
Thinking more about those assertions, I think we may need to modify how MD_Poll deals with errors. One side effect of the changes in this patch is that the notifier will now send along an error code when waking the polling thread; previously, we always woke the polling thread with kOTNoError. We thus have to be careful to ensure that poll returns the appropriate flags in this case; with the current patch, poll is always returning -1 on error.
Attachment #51793 - Attachment is obsolete: true
Attachment #51793 - Flags: needs-work+
Attached patch Final patch (obsolete) — Splinter Review
The latest patch fixes outstanding issues. It removes the code in _MD_Poll that causes poll() to return -1 when a socket disconnected, which we now get because we set the error from OTRcvDisconnect() on the thread that gets woken up. Instead, we correctly return status from Poll, and return errors on socket read/ write attempts, using existing code. I also removed some assertions that got hit when doing https, in SendReceiveStream(). The 'T_INREL' case in GetState no longer sets *readReady to true, because this is already handled by the conditional on fd->secret->md.orderlyDisconnect above in the same function. I've tested http, ftp, https, IMAP, secure IMAP, SMTP, secure SMTP, news, secure news, and gopher on both Mac OS and Mac OS X, and everything works well.
Note on removing the assertions: after further investigation, I can explain all assertions: 1. The assertion that the reading thread is NULL just before the OTRcv() in SendReceiveStream() is hit, because FTP uses nsSocketTransport::IsAlive(), which calls PR_Recv(mSocketFD, &c, 1, PR_MSG_PEEK, 0) while another thread is polling. Here, one thread is polling while another thread calls PR_Recv with the PR_MSG_PEEK flag. SendReceiveStream() could save and restore the read thread here, though there is the danger that we'd miss a notification that should have woken the polling thread. Maybe PR_Rcv() with the PR_MSG_PEEK flag should not actually read from the endpoint, but just call OTCountDataBytes() and check the endpoint state? There is a second case when this assertion fires, which is when someone does a blocking read, and OTRcv returned kOTNoDataErr. For this case, the assertion needs fixing. 2. We fail the assertion in SetDescPollThread() that the writing thread is null, which happens when someone is polling during a connect on a non-blocking socket. This assertion just needs fixing.
Attachment #51948 - Attachment is obsolete: true
Attached patch Patch with correct assertions (obsolete) — Splinter Review
Last patch now only asserts in one case, where we call PR_Rcv() with with the PR_MSG_PEEK flag on a different thread than PR_Poll, and I think that assertion correctly reveals a bad situation.
Attachment #52003 - Flags: review+
r=gordon.
pls get this on the branch, and see if drivers will allow it on the trunk. - PDT+
Whiteboard: [PDT], [ETA 10.03] → [PDT+], [ETA 10.03]
Comment on attachment 52003 [details] [diff] [review] Patch with correct assertions >Index: pr/src/md/mac/macsockotpt.c >=================================================================== >RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v >retrieving revision 3.32.2.1 >diff -b -u -5 -r3.32.2.1 macsockotpt.c >--- macsockotpt.c 2001/09/20 03:42:25 3.32.2.1 >+++ macsockotpt.c 2001/10/04 03:14:28 >@@ -303,17 +303,19 @@ > } > > // Notification routine > // Async callback routine. > // A5 is OK. Cannot allocate memory here >+// Ref: http://gemma.apple.com/techpubs/mac/NetworkingOT/NetworkingWOT-100.html >+// > static pascal void NotifierRoutine(void * contextPtr, OTEventCode code, OTResult result, void * cookie) > { > PRFilePrivate *secret = (PRFilePrivate *) contextPtr; > _MDFileDesc * md = &(secret->md); > EndpointRef endpoint = (EndpointRef)secret->md.osfd; >- PRThread * thread = NULL; >- PRThread * pollThread = md->poll.thread; >+ PRThread * readThread = NULL; >+ PRThread * writeThread = NULL; > OSStatus err; > OTResult resultOT; > TDiscon discon; > > switch (code) It would be nice to add a comment saying something like: readThread is also used for "misc". >@@ -1707,15 +1721,34 @@ > > *readReady = fd->secret->md.readReady && (availableData > 0); > *exceptReady = fd->secret->md.exceptReady; > > resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd); >- switch (resultOT) { >- case T_DATAXFER: >- case T_INREL: >+ switch (resultOT) >+ { It would be nice to emulate the original indentation style in the file (specifically, the placement of the curly braces). >+ case T_UNBND: >+ case T_IDLE: >+ // the socket is not connected. Emulating BSD sockets, >+ // we mark it readable and writable. The next PR_Read >+ // or PR_Write will then fail. Usually, in this situation, >+ // fd->secret->md.exceptReady is also set, and returned if >+ // anyone is polling for it. >+ *readReady = PR_TRUE; > *writeReady = PR_TRUE; > break; These two cases are wrong. When an endpoint is in the T_UNBND or T_IDLE state, it is neither readable nor writable. These two states are different from the "disconnected" state, which is considered an error condition and is reported by marking the endpoint both readable and writable. >+ case T_INREL: // incoming orderly release >+ *writeReady = PR_FALSE; >+ break; This is wrong. When an endpoint is in the T_INREL state, you can still write to the endpoint as long as flow control does not kick in. The only thing you can say is that there is no more data to be read from the endpoint. In Unix, this condition is reported by marking the socket readable and arranging for the next read to return a zero byte count. >@@ -1809,22 +1842,34 @@ > if (pd->fd) > { > PRFileDesc *bottomFD = PR_GetIdentitiesLayer(pd->fd, PR_NSPR_IO_LAYER); > if (bottomFD && (_PR_FILEDESC_OPEN == bottomFD->secret->state)) > { >- bottomFD->secret->md.poll.thread = thread; >+ if (pd->in_flags & PR_POLL_READ) { >+ PR_ASSERT(thread == NULL || bottomFD->secret->md.read.thread == NULL); >+ bottomFD->secret->md.read.thread = thread; >+ } >+ >+ if (pd->in_flags & PR_POLL_WRITE) { >+ // it's possible for the writing thread to be non-null during >+ // a non-blocking connect, so we assert that we're on >+ // the same thread, or the thread is null. >+ PR_ASSERT(thread == NULL || >+ (bottomFD->secret->md.write.thread == NULL || >+ bottomFD->secret->md.write.thread == thread)); >+ bottomFD->secret->md.write.thread = thread; >+ } > } > } > } > } The second assertion is overly restrictive. It precludes having one thread initiating a non-blocking connect and another thread polling for the completion of the connect. Ideally, "bottomFD->secret->md.write.thread == thread" should be replaced by the condition "a non-blocking connect is in progress".
Attachment #52003 - Flags: needs-work+
After looking at Apple's OT docs more closely, I think the handling of an endpoint in the T_IDLE state is complicated by how that state is entered. I don't have a simple solution. The T_UNBND state should be reported as not readable and not writable.
Attached patch Patch addressing wtc's issues (obsolete) — Splinter Review
Attachment #52086 - Attachment is obsolete: true
Attachment #52003 - Attachment is obsolete: true
> It would be nice to add a comment saying something like: > readThread is also used for "misc". Damn, forgot to add to the diff. Will add before checking in. > It would be nice to emulate the original indentation > style in the file (specifically, the placement of the > curly braces). done. > These two cases are wrong. When an endpoint is in the > T_UNBND or T_IDLE state, it is neither readable nor > writable. These two states are different from the > "disconnected" state, which is considered an error > condition and is reported by marking the endpoint both > readable and writable. Fixed. > After looking at Apple's OT docs more closely, I > think the handling of an endpoint in the T_IDLE > state is complicated by how that state is entered. > I don't have a simple solution. I added a PR_ASSERT for this case. I've never hit it so far. > The second assertion is overly restrictive. It precludes having > one thread initiating a non-blocking connect and another thread > polling for the completion of the connect. > > Ideally, "bottomFD->secret->md.write.thread == thread" should be > replaced by the condition "a non-blocking connect is in progress". This is hard to know unless I add another state variable to the file desc struct or something. I think the current assertions will do for now; I'll try to come up with something more accurate for bug 102791.
Comment on attachment 52164 [details] [diff] [review] Patch addressing wtc's issues >+ case T_INREL: // incoming orderly release > *writeReady = PR_TRUE; > break; This is still not correct. This should say *writeReady = fd->secret->md.writeReady; I think you also need to set *readReady to PR_TRUE if the endpoint is in the T_INREL state.
Attachment #52164 - Flags: review+
Attachment #52164 - Flags: needs-work+
> This is still not correct. This should say > *writeReady = fd->secret->md.writeReady; Correct. I'll do that. > I think you also need to set *readReady to PR_TRUE if > the endpoint is in the T_INREL state. *readReady gets set to true in this state by the existing code just above: // hack to emulate BSD sockets; say that a socket that has disconnected // is still readable. size_t availableData = 1; if (!fd->secret->md.orderlyDisconnect) OTCountDataBytes((EndpointRef)fd->secret->md.osfd, &availableData); *readReady = fd->secret->md.readReady && (availableData > 0);
Did this make it on the 094 branch? Pls advise ...
This is not checked in anywhere yet. Still waiting for a final r= from wtc.
Simon: I was waiting for you to attach a new patch. You can check in your latest patch after fixing the case T_INREL problem and adding a comment about readThread also used for "misc".
Attachment #52164 - Attachment is obsolete: true
This is checked in on NSPR tip and branch, and on Mozilla 0.9.4 branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
verified mac commercial build 2001-10-09-04-trunk
Status: RESOLVED → VERIFIED
Verified on Mac 9.1 10-09-03-0.9.4 brnach build Verified on Mac OSX 10-09-04-0.9.4 branch build No hang anymore when sending these two attachments.
*** Bug 88489 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: