Closed Bug 100353 Opened 23 years ago Closed 23 years ago

Copy sent mail to IMAP servers "Sent" folder appears to hang NS6

Categories

(MailNews Core :: Networking: IMAP, defect)

PowerPC
All
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9.5

People

(Reporter: scalkins, Assigned: sfraser_bugs)

References

Details

(Keywords: hang, perf, regression, Whiteboard: [PDT+], [ETA 10.03])

Attachments

(6 files, 6 obsolete files)

34.79 KB, application/octet-stream
Details
47.70 KB, application/octet-stream
Details
1.59 KB, patch
Details | Diff | Splinter Review
6.77 KB, text/plain
Details
34.30 KB, text/plain
Details
22.98 KB, patch
Details | Diff | Splinter Review
Saw this on Mac only
Mac OSX 2001-09-17-04 0.9.4 Branch (10.x)
Mac OS 9.1 2001-09-11-04 Branch (8.x)

Steps to repro:
1) Login to Mail with an IMAP account.
2)Set Mail prefs to "Place a copy" of sent mail in the  "Sent" folder on
yourIMAP server.
3)Compose a new message, and attach either one or both of the files attached on
this email. One file, is a bmp file which is 38k, and trhe other file is a .tiff
file which is 48k.
4)Send the message

Actual results: Mail status dialogs show the mail was sent, but when the
"Copying to sent mail folder dialog" appears, it seems to hang mail for 10-15
minutes while this action is being performed. This is exacerbated when you
attach both files, because it seems to take 10-15 minutes for each attachment,
(20-30 minutes) to copy the file to the sent mail folder on the server. 

NOTE: Nominating nsbranch nsenterprise because most users will be led to believe
the app has hung because you can't launch browser or do anything else till the
action completes. They will force quit the app rather than wait this long.
This gives a bad impression to users of our product. The .tiff and bmp files are
not uncommon, and we saw the same behavior with .jpg files.
Copying the files to the "Sent" mail folder locally does not exhibit this
behavior, and is quick.(A few seconds)
Keywords: perf
P.S. When the action finally completes, the mail is copied to the sent mail
folder. The recipient also does get the message with attachments.
There was bug logged for Mac before.
I need to find out that this is dup bug or another regression again
Keywords: regression
Karen, There is bug 88489  but that is for large attachments. These are small
files, so I logged a  different bug.
Keywords: hang
this is bug needs to be fixed or users will hang up everytime they send a
message that has an attachment with a size around 28kb.  I tested this sending
(1) .gif file 28kb as an attachment, it hangs the app.  I tested this sending
(1) .jpeg 4kb, it hangs the app.  I tested this doing a Send Page (our home
page) it hangs the app.  In all of these cases I waited 1 hr then did a force
quit.  It fails only at saving to the IMAP Sent folder.  If I use the Local
folder it finished sending and saving within 15 seconds.   On OS9.1 these same
tests (copy to IMAP) takes 2-3 minutes which is bad, but not as bad as OSX.  
Somebody please bump this up to a "+", the work around forces users to change
their way of organizing their messages.
Cavin, can you take a look at this tomorrow? Sounds like a stopper to me. nsBranch+.
Keywords: nsbranchnsbranch+
Target Milestone: --- → mozilla0.9.5
re-assinging to cavin for real this time. Cavin, can you take a look at this
tomorrow for a possible branch fix?
Assignee: mscott → cavin
I am looking at it...
Here is what I have observed:

I am using a PPP connection at 26K, I am generating a message that include the
tiff attachment.

When I sent it, SMTP take about 15 Sec, then the copy start but after 1-2 sec,
the UI  froze for about 10 secs and then the copy continue for about 8 secs,
sometime more.

The UI freeze occurs when we write the data to the output stream in
nsImapProtocol::SendData.

However, if I turn on PR_Log for nsSocketTransport, the UI freeze disappears!
Attached file IMAP Log
I added some debug code into nsImapProtocol::SendData to demonstrate how long
nsIOutputStream::Write blocks.

the interresting lines are:
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133722
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 36
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 133723 [1]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133724
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 133726 [2]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133727
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 133730 [3]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133731
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 141486 [7755]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 141486
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 149350 [7864]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 149350
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 2547
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 149351 [1]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 149351
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 2
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 149351 [0]

the first number after START or END is the abosolute time in milliseconds
the second number after END between [] is the time spend in the function in
milliseconds.

You can really see that twice we get stuck in the write call for about 8 seconds
which correspond to the UI freeze.
darin, can you take a look at the imap log and give us some suggestions on what 
might have gone wrong for some of the stream write calls to complete in 7-8 
seconds? The first and second 16K bytes took only a few milliseconds to complete 
while the 3rd and 4th 16K bytes took 7.7 and 7.8 seconds respectively to finish.

We also changed NS_SOCKET_TRANSPORT_BUFFER_SIZE to 128K (from 8K) and tried the 
same scenario, but the performance did not improve at all.

Let me know if you have questions on the log. Thanks.
Our initial theory was that the socket thread was getting backed up and couldn't
send the incoming bits up fast enough. So as Cavin mentioned has a test of that
theory we jacked up the value of NS_SOCKET_TRANSPORT_BUFFER_SIZE
 to make it large enough to handle the entire message we were trying to send to
the socket. So the socket shouldn't have gotten backed up. But we still saw the
same long lags.
is this method being called on the UI thread?  you should never call a blocking
read/write method from the UI thread... you are guaranteed to have unknown
delays (possibly very long) before the methods return.
No this isn't being called from the UI thread. I'd like to think we know better
than that =).

Each imap connection runs in it's own thread and we frequently block that thread
while waiting for new data or writing data using blocking writes to the socket.
whew :)  so, then its not the UI that is hung, but something else?  mail is not
being responsive... is that because the particular IMAP thread is busy... and
the user is effectively waiting for that thread to do something?
Actually the problem isn't with the responsive ness of the UI thread. Our
problem is that on the mac, writing 8K chunks of data to a nsSocketTransport is
taking 7 to 8 seconds. We are noticing that the first couple writes seem to
happen pretty quickly. Then our blocking write call starts taking 7 to 8 seconds
before the socket transport comes back to us. It shouldn't take that long to
send 8K of data so something is getting backed up somewhere. Our inital guess
was that the socket transport stream was just full and we were waiting for it to
finish sending data before it could take more. So as a test of that theory we
jacked up the size of the socket transport buffer to be larger than our 100K
message. We still saw the same 7 to 8 second delays for a couple of the 8k
writes we were performing. So something funky is going on for the Mac when
writing to a socket. At least that's our best guess.
Oh and you'll love this Darin. JF says if he turns socket logging on for the
Mac. Then these huge 7 to 8 seconds lags on a couple of the blocking writes go
away! I haven't been able to corraborate that. But that's pretty wild.
FYI: I've tried to setup a smaller buffer and sebment size (512 bytes) but that had no effect at all! I'll try to look at 
the OpenTransport code in nspr...
I'm not sure I understand why making the socket buffer smaller would have had
any effect on the problem? Well your results backed that up =). 
I was trying to see if we were blocking during the physical transmission on the
wire. Using a smaller buffer cause to send small blocks rather than big one, at
least that's what I was  hopping!
the socket writes are unbuffered, so changing the buffer sizes should indeed
have no effect.
what are the chances of getting this fixed for 0.9.4 branch?
Whiteboard: [PDT], [ETA ?]
we still doesn't have a clue what's the matter here, therefore no ETA yet!
ETA for this bug is still unknown because we don't know what the cause is yet.
Once we know what's causing the problem then we'll get a better ETA.
I did the same test under MacOS X with yesterday Mozilla debug build as well
with the Official Netscape 6.1. In both case it hang during the copy to IMAP bug
unfortunatly we stay frozen for ever, I had to kill the application. I get the
following error on the console:

Assertion failure: resultOT == T_GODATA, at macsockotpt.c:412

Re-assigning.
Assignee: cavin → ducarroz
Whiteboard: [PDT], [ETA ?] → [PDT], [ETA n/a]
*** Bug 102175 has been marked as a duplicate of this bug. ***
Here is so far what I have found:

nsImapProtocol::SendData calls nsSocketBOS::Write.

nsSocketBOS::Write calls PR_Write() which failed with 
rv=NS_BASE_STREAM_WOULD_BLOCK.
Therefore nsSocketBOS::Write calls Poll() which succeed and then try again to
write. When we are blocking, we get stuck in this loop PR_WRITE - POLL -
PR_WRITE for about 500 times until we get out. This process takes couple seconds
(3 with an ISDN connection, 7 with a dial-up)

During the copy of the message, we block 1 or 2 times. Also, I have observed
that when we are done uploading the data, the progress bar still spinning for a
while (10-15 secs) until the copy operation is finished. I don't know what we
are doing or waiting for!
Status: NEW → ASSIGNED
sorry if this is already mentioned, but does it matter if the IMAP connection is
via SSL?
darin, we aren't using SSL in these examples. Hope that helps.
ok.. i suspect there is something funny happening in the mac PR_Poll code... it
is clearly malfunctioning if it is setting PR_POLL_WRITE _and_ then returning
PR_WOULD_BLOCK_ERROR from a subsequent PR_Write call.

however, a quick check of the code reveals that we are not verifying that
PR_POLL_WRITE is set.  hmm... it would be good to log the pd.out_flags in
nsSocketBS::Poll to ensure that PR_POLL_WRITE is getting set.

though, i'm not sure what we could/should do if PR_POLL_WRITE is not set... i
mean, if PR_POLL_EXCEPT is being set, then i'd think we'd need to abort the
socket connection, but that won't help in getting the data uploaded :P

so, i'm really suspecting the mac NSPR polling code.  cc'ing some mac folks for
help.
That would not surprise me.
I have modified the function nsSocketBS::Poll to not set anymore the flag 
PR_POLL_EXCEPT but that did not have any effect.
ok.. sounds like we need to do some NSPR debugging.  anyone with mac experience
want to help out?
kind of hard to debug, if I use break points to debug PR_Poll, the problem goes
away :-(
Working on it now.
This is happening because OT returns a kOTFlowErr in response to the OTSnd, 
causing us to return an error from PR_Write(). It's indended that the caller will 
then call PR_Poll, which will be woken when the Notifier gets a T_GODATA. 
However, it's possible for the notifier to fire when both the polling thread and 
writing thread members on the socket have been cleared, so we fail to wake the 
polling thread. Working on a solution now.
IMAP logs won't tell me anything new, I think. Still debugging.
A couple of things that I've found:

The GetState() function definately needs this change:

     resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd);
     switch (resultOT)    {
         case T_DATAXFER:
         case T_INREL:
-            *writeReady = PR_TRUE;
+            *writeReady = fd->secret->md.writeReady;
             break;
         default:
             *writeReady = PR_FALSE;
     }

This fixes the problem where PR_Poll says that the socket is writable when it's 
not.

But with that change, things still don't work (especially on Mac OS X). A second 
problem that I can see is that SendReceiveStream() (called from PR_Read/PR_Write 
for sockets) fails to correctly handle write that are only partially successful. 
It loops, calling OTSnd() until no data is left to send. However, if the second 
OTSnd() fails (because of flow control) then we simply return -1, rather than 
returning the amount of data we did send.
with macosx commercial build 2001-10-01-04-0.9.4 sed web page didn't hang but it 
still fails to send the page correctly
Hardware: Macintosh → All
simon: good find!  it seems like a common mistake of Read/Write impl's.
Simon patch remove the "hang"
However, we are still very very slow to send IMAP data (IMAP is using non
bloking mode). I try to make SendReiveStream return the amount of bytes
currently send/read even in case of an error (any). I return the amount only
when this one is not 0, else I return -1. This doesn't not help. It still took
between 11 to 30 second to save a draft message on IMAP while it takes only 4
secs on Windows (same message, same connection).

But, if I try to bee a little bit smarter and do the following:

 Index: mozilla/nsprpub/pr/src/md/mac/macsockotpt.c
===================================================================
RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v
retrieving revision 3.32.2.1
diff -w -u -5 -r3.32.2.1 macsockotpt.c
--- macsockotpt.c    2001/09/20 03:42:25    3.32.2.1
+++ macsockotpt.c    2001/10/01 19:57:44
@@ -1539,12 +1539,22 @@
 
 ErrorExit:
     PR_ASSERT(opCode == kSTREAM_SEND ? fd->secret->md.write.thread == nil :
                                        fd->secret->md.read.thread  == nil);
     macsock_map_error(err);
+    switch (err)
+    {
+                case kOTFlowErr:
+                case kOTNoDataErr:
+                case kEAGAINErr:
+                case kEWOULDBLOCKErr:
+                  return amount - bytesLeft;
+                
+                default:
     return -1;
 }                               
+}                               


The send is not very fast but we have couple of hang that last 2-3 secs. The
hang seems to occur into nsSocketBOS::Write because we loop without waiting.

Also, should we apply the same changes to the function SendReceiveDgram? I
presume yes!
For what I am seeing, PR_Poll block for too long! we are waiting and waiting
while their is no network activity! When SendReceiveStream return 0 instead of
-1 (see my patch above), nsSocketBOS::Write will just call PR_WRITE without
calling Poll. That hang the UI but it's very fast!
"hang seems to occur into nsSocketBOS::Write because we loop without waiting"

i'm not sure i follow this statement: without waiting for what?  we call
PR_Poll... isn't that appropriate?
in nsSocketBOS::Write, if PR_Write return 0 when no data has been written
instead of returning -1 like it's currently the case, Poll will not be called
and it will just try to write again.

The hang could be cause by the fact I have couple of printf in the loop!

What looks suspicious for me is that in SendReceiveStream,
fd->secret->nonblocking is true but it nsSocketBOS::Write, we detect a
NS_BASE_STREAM_WOULD_BLOCK error! does it make sense to be non blocking at one
level and blocking in another!
Tracy,
I was wondering why you changed OS to all. Are you seeing this on linux and 
windows as well? Using today's branch builds win98 and linux is working fine 
when copying to imap sent folder. The problem is only with MAC. 
the socket is non-blocking, because there can be both a AsyncRead request as
well as a open nsIOutputStream to the socket.  so, we configure the socket to be
non-blocking and fake a blocking operation by using the PR_Poll/PR_Write loop in
nsSocketBOS::Write.
PR_Write should not return 0 bytes written... it should return some sort of
error to indicate why nothing could be written.

the code in nsSocketBOS::Write should probably detect 0 bytes written... but,
then what should it do with that case?  treat it as WOULD_BLOCK, or simply
return 0 bytes written to its caller?

bottom line.. i think PR_Write should not return 0 bytes written, because it
leaves us in a rather ambiguous state.
I agree with you. ALl I did was to try to understand the problem. My conclusion
so far is that we spent to much time into PR_POLL. We need to figure out why?
I'm all over this, but I need to spend some more time debugging. I ended up with 
something like ducarroz's fix in SendReceiveStream, but there is still a problem 
with a missed notification, that prevents the IMAP thread from being woken up to 
read the final server acknowledgement of the send.
Sheela,

I'm seeing web page send failure on all platforms.  That is why I changed the 
OS.  But upon reviewing this bug further, I belive what I'm seeing is different 
behavior.  Send a web page brings up an error message saying the file containing 
the web page could not be sent.  If you click okay to the error message, the 
mail message is sent but upon receiving the message the web page is not 
attached, just the link to the page. If the fix to this bug doesn't correct that 
behavior as well, I will log a new bug.

changing OS back to Mac
Hardware: All → Macintosh
Something confuses me here. With the Mac NSPR fixes described above, there is 
still a problem where, after sending the file attachments on save to IMAP Sent 
folder, we stall in nsImapProtocol::UploadMessageFromFile() while waiting for the 
server's response. We stall down in nsImapProtocol::CreateNewLineFromSocket(), 
waiting on the m_dataAvailableMonitor. The Mac NSPR code does get a notification 
that says that the data is available, but it's dropped on the floor. I suspect 
this may be because we have two threads doing nested PR_Polls on the same socket.
Darin pointed out that two threads should not both be polling for read at the 
same time, but it's ok for one to poll for reading, and one for writing. Fixing 
Mac NSPR to deal with this fixes the bug.
Assignee: ducarroz → sfraser
Status: ASSIGNED → NEW
This fix works (still needs testing on Mac OS X), but needs a bit of tidying up. 
I think we need to review Mac's PR_Poll to ensure that it's returning the correct 
flags in the various less common situations (connect, disconnect etc).

The patch basically allows 2 threads to poll the same socket, as long as only one 
is reading or writing at a time.
Status: NEW → ASSIGNED
Tested the fix on a dual CPU Mac OS X box. It works fine there too.
what are the chances this will be ready by 10.03?
I expect to have a final patch by 10/03
Attachment #51648 - Attachment is obsolete: true
Attachment 51793 [details] [diff] fixes several problems in Mac NSPR that caused this:

1. PR_Write to a socket whose underlying endpoint is under OT flow control (i.e.
   is temporarily unable to accept data) now behaves correctly, only returning
   -1 if no data was written. If > 0 bytes were written, it returns the number
   of bytes written.

2. PR_Poll is fixed to only return the writable flag if the socket is writable.
   Previously, we'd say that a socket was writable when it was actually under
   flow control.

3. It's now OK to call PR_Poll on the same socket from two different threads, as
   long as one thread is reading, and one writing.

Ignore whitespace diffs in the patch; the source has been detabbed somewhat, and 
lines up nicely. Looking for r= from wtc and gordon.

Note that I spun a separate bug for fixing PR_Poll to report correct status on 
connect/disconnect (bug 102791).
Comment on attachment 51793 [details] [diff] [review]
Final NSPR patch to fix polling on different threads etc.

Simon,

See my comments below.

>Index: pr/src/md/mac/macsockotpt.c
>===================================================================
>RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v
>retrieving revision 3.32.2.1
>diff -b -u -2 -r3.32.2.1 macsockotpt.c
>--- macsockotpt.c	2001/09/20 03:42:25	3.32.2.1
>+++ macsockotpt.c	2001/10/03 00:19:57
>@@ -306,4 +306,5 @@
> // Async callback routine.
> // A5 is OK. Cannot allocate memory here
>+// Ref: http://gemma.apple.com/techpubs/mac/NetworkingOT/NetworkingWOT-100.html
> static pascal void  NotifierRoutine(void * contextPtr, OTEventCode code, OTResult result, void * cookie)
> {
>@@ -311,6 +312,6 @@
> 	_MDFileDesc * md       = &(secret->md);
> 	EndpointRef   endpoint = (EndpointRef)secret->md.osfd;
>-    PRThread *    thread   = NULL;
>-    PRThread *	  pollThread = md->poll.thread;
>+    PRThread *    readThread   = NULL;
>+    PRThread *    writeThread  = NULL;
> 	OSStatus      err;
> 	OTResult	  resultOT;

It would be nice to add a comment that says:
    readThread is also used for 'misc'.

>@@ -366,22 +367,17 @@
>             err = OTRcvDisconnect(endpoint, &discon);
>             PR_ASSERT(err == kOTNoError);
>-            secret->md.exceptReady     = PR_TRUE;
>+            secret->md.exceptReady = PR_TRUE;       // XXX Check this

If you mark the socket exceptReady, you are compatible with Winsock.
If you mark the socket readReady and writeReady, you are compatible
with most Unix socket implementations.  I suggest that you go for
Unix socket compatibility.

>@@ -1514,7 +1511,12 @@
> 				case kEWOULDBLOCKErr:
> 					if (fd->secret->nonblocking) {
>+					    if (bytesLeft == amount) {
> 						err = result;
> 						goto ErrorExit;
> 					}
>+						else {      // return how much was sent
>+						    return amount - bytesLeft;
>+						}
>+					}
> 
> 					WaitOnThisThread(me, timeout);

Please make sure the indentation of the nested if-else under this 'case'
looks right.  I can't tell in Netscape 6.x, with tab stop set to 8.

>@@ -1696,5 +1698,4 @@
> }                               
> 
>-
> static PRBool GetState(PRFileDesc *fd, PRBool *readReady, PRBool *writeReady, PRBool *exceptReady)
> {
>@@ -1709,10 +1710,18 @@
> 	*exceptReady = fd->secret->md.exceptReady;
> 
>+    *writeReady = PR_FALSE;

This is not necessary.

>     resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd);
>     switch (resultOT)    {
>-        case T_DATAXFER:
>-        case T_INREL:
>-            *writeReady = PR_TRUE;
>+        case T_DATAXFER:        // data transfer
>+            *writeReady = fd->secret->md.writeReady;
>+            break;
>+
>+        case T_INREL:           // incoming orderly release
>+            *writeReady = fd->secret->md.writeReady;
>             break;

When the socket is in the T_INREL state, we should mark it readable
and arrange for the next read to return a zero byte count.

>+
>+        case T_OUTCON:          // outgoing connection pending  
>+        case T_INCON:           // incoming connection pending
>+        case T_OUTREL:          // outgoing orderly release
>         default:
>             *writeReady = PR_FALSE;

It's not necessary to add these cases because 'default'
is going to cover them.

>@@ -1812,5 +1821,13 @@
>             if (bottomFD && (_PR_FILEDESC_OPEN == bottomFD->secret->state))
>             {
>-                bottomFD->secret->md.poll.thread = thread;
>+                if (pd->in_flags & PR_POLL_READ) {
>+                    PR_ASSERT(thread == NULL || bottomFD->secret->md.read.thread == NULL);
>+                    bottomFD->secret->md.read.thread = thread;
>+                }
>+
>+                if (pd->in_flags & PR_POLL_WRITE) {
>+                    PR_ASSERT(thread == NULL || bottomFD->secret->md.write.thread == NULL);
>+                    bottomFD->secret->md.write.thread = thread;
>+                }
>             }
>         }

Are you sure that it is necessary to have 'thread == NULL' in
the assertions?
Attachment #51793 - Flags: needs-work+
> >Index: pr/src/md/mac/macsockotpt.c
> >===================================================================
> >RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v
> >retrieving revision 3.32.2.1
> >diff -b -u -2 -r3.32.2.1 macsockotpt.c
> >--- macsockotpt.c      2001/09/20 03:42:25     3.32.2.1
> >+++ macsockotpt.c      2001/10/03 00:19:57
> >@@ -306,4 +306,5 @@
> > // Async callback routine.
> > // A5 is OK. Cannot allocate memory here
> >+// Ref: http://gemma.apple.com/techpubs/mac/NetworkingOT/NetworkingWOT-100.html
> > static pascal void  NotifierRoutine(void * contextPtr, OTEventCode code, OTResult result, void * cookie)
> > {
> >@@ -311,6 +312,6 @@
> >       _MDFileDesc * md       = &(secret->md);
> >       EndpointRef   endpoint = (EndpointRef)secret->md.osfd;
> >-    PRThread *    thread   = NULL;
> >-    PRThread *          pollThread = md->poll.thread;
> >+    PRThread *    readThread   = NULL;
> >+    PRThread *    writeThread  = NULL;
> >       OSStatus      err;
> >       OTResult          resultOT;
> 
> It would be nice to add a comment that says:
>     readThread is also used for 'misc'.

Will do.

> >@@ -366,22 +367,17 @@
> >             err = OTRcvDisconnect(endpoint, &discon);
> >             PR_ASSERT(err == kOTNoError);
> >-            secret->md.exceptReady     = PR_TRUE;
> >+            secret->md.exceptReady = PR_TRUE;       // XXX Check this
> 
> If you mark the socket exceptReady, you are compatible with Winsock.
> If you mark the socket readReady and writeReady, you are compatible
> with most Unix socket implementations.  I suggest that you go for
> Unix socket compatibility.

Yes. I'll fix this under bug 102791.

> >@@ -1514,7 +1511,12 @@
> >                               case kEWOULDBLOCKErr:
> >                                       if (fd->secret->nonblocking) {
> >+                                          if (bytesLeft == amount) {
> >                                               err = result;
> >                                               goto ErrorExit;
> >                                       }
> >+                                              else {      // return how much was sent
> >+                                                  return amount - bytesLeft;
> >+                                              }
> >+                                      }
> > 
> >                                       WaitOnThisThread(me, timeout);
> 
> Please make sure the indentation of the nested if-else under this 'case'
> looks right.  I can't tell in Netscape 6.x, with tab stop set to 8.

It's OK in the source. I diffed ignoring space changes.

> >@@ -1696,5 +1698,4 @@
> > }                               
> > 
> >-
> > static PRBool GetState(PRFileDesc *fd, PRBool *readReady, PRBool *writeReady, PRBool *exceptReady)
> > {
> >@@ -1709,10 +1710,18 @@
> >       *exceptReady = fd->secret->md.exceptReady;
> > 
> >+    *writeReady = PR_FALSE;
> 
> This is not necessary.

Right, I'll fix.

> >     resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd);
> >     switch (resultOT)    {
> >-        case T_DATAXFER:
> >-        case T_INREL:
> >-            *writeReady = PR_TRUE;
> >+        case T_DATAXFER:        // data transfer
> >+            *writeReady = fd->secret->md.writeReady;
> >+            break;
> >+
> >+        case T_INREL:           // incoming orderly release
> >+            *writeReady = fd->secret->md.writeReady;
> >             break;
> 
> When the socket is in the T_INREL state, we should mark it readable
> and arrange for the next read to return a zero byte count.

Yes. Another item for bug 102791.

> >+
> >+        case T_OUTCON:          // outgoing connection pending  
> >+        case T_INCON:           // incoming connection pending
> >+        case T_OUTREL:          // outgoing orderly release
> >         default:
> >             *writeReady = PR_FALSE;
> 
> It's not necessary to add these cases because 'default'
> is going to cover them.

Correct, but for me, it's clearer to see them all there, and I believe the 
compiled code will be the same. Compilers on other platforms would warn about 
missing enum values in the case  ;)

> >@@ -1812,5 +1821,13 @@
> >             if (bottomFD && (_PR_FILEDESC_OPEN == bottomFD->secret->state))
> >             {
> >-                bottomFD->secret->md.poll.thread = thread;
> >+                if (pd->in_flags & PR_POLL_READ) {
> >+                    PR_ASSERT(thread == NULL || bottomFD->secret->md.read.thread == NULL);
> >+                    bottomFD->secret->md.read.thread = thread;
> >+                }
> >+
> >+                if (pd->in_flags & PR_POLL_WRITE) {
> >+                    PR_ASSERT(thread == NULL || bottomFD->secret->md.write.thread == NULL);
> >+                    bottomFD->secret->md.write.thread = thread;
> >+                }
> >             }
> >         }
> 
> Are you sure that it is necessary to have 'thread == NULL' in
> the assertions?      

Yes, because we use this same function to null out the threads at the end of a 
Poll() call.
With this code, we're seeing two new assertions in NSPR:
1. The assertion that the reading thread is NULL just before the OTRcv() in
   SendReceiveStream() is hit, because FTP uses nsSocketTransport::IsAlive(),
   which calls PR_Recv(mSocketFD, &c, 1, PR_MSG_PEEK, 0) while another thread
   is polling.

2. We hit the PR_ASSERT(0) in _MD_poll, because thread->md.osErrCode is set
   to kECONNRESETErr by the NotifierRoutine after it gets a T_DISCONNECT
   message.

Both these assertions seem benign.
Updating ETA per Simon's comments in the bug - ETA 10.03.
Whiteboard: [PDT], [ETA n/a] → [PDT], [ETA 10.03]
Thinking more about those assertions, I think we may need to modify how MD_Poll 
deals with errors.

One side effect of the changes in this patch is that the notifier will now send 
along an error code when waking the polling thread; previously, we always woke 
the polling thread with kOTNoError. We thus have to be careful to ensure that 
poll returns the appropriate flags in this case; with the current patch, poll is 
always returning -1 on error.
Attachment #51793 - Attachment is obsolete: true
Attachment #51793 - Flags: needs-work+
Attached patch Final patch (obsolete) — Splinter Review
The latest patch fixes outstanding issues. It removes the code in _MD_Poll that 
causes poll() to return -1 when a socket disconnected, which we now get because 
we set the error from OTRcvDisconnect() on the thread that gets woken up. 
Instead, we correctly return status from Poll, and return errors on socket read/
write attempts, using existing code.

I also removed some assertions that got hit when doing https, in 
SendReceiveStream().

The 'T_INREL' case in GetState no longer sets *readReady to true, because this is 
already handled by the conditional on fd->secret->md.orderlyDisconnect above in 
the same function.

I've tested http, ftp, https, IMAP, secure IMAP, SMTP, secure SMTP, news, secure 
news, and gopher on both Mac OS and Mac OS X, and everything works well.
Note on removing the assertions: after further investigation, I can explain all 
assertions:

1. The assertion that the reading thread is NULL just before the OTRcv() in
   SendReceiveStream() is hit, because FTP uses nsSocketTransport::IsAlive(),
   which calls PR_Recv(mSocketFD, &c, 1, PR_MSG_PEEK, 0) while another thread
   is polling.

Here, one thread is polling while another thread calls PR_Recv with the 
PR_MSG_PEEK flag. SendReceiveStream() could save and restore the read thread 
here, though there is the danger that we'd miss a notification that should have 
woken the polling thread.

Maybe PR_Rcv() with the PR_MSG_PEEK flag should not actually read from the 
endpoint, but just call OTCountDataBytes() and check the endpoint state?

There is a second case when this assertion fires, which is when someone does a 
blocking read, and OTRcv returned kOTNoDataErr. For this case, the assertion 
needs fixing.

2. We fail the assertion in SetDescPollThread() that the writing thread is null,
   which happens when someone is polling during a connect on a non-blocking
   socket.

This assertion just needs fixing.
Attachment #51948 - Attachment is obsolete: true
Attached patch Patch with correct assertions (obsolete) — Splinter Review
Last patch now only asserts in one case, where we call PR_Rcv() with with the 
PR_MSG_PEEK flag on a different thread than PR_Poll, and I think that assertion 
correctly reveals a bad situation.
Attachment #52003 - Flags: review+
r=gordon.
pls get this on the branch, and see if drivers will allow it on the trunk. - PDT+
Whiteboard: [PDT], [ETA 10.03] → [PDT+], [ETA 10.03]
Comment on attachment 52003 [details] [diff] [review]
Patch with correct assertions


>Index: pr/src/md/mac/macsockotpt.c
>===================================================================
>RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v
>retrieving revision 3.32.2.1
>diff -b -u -5 -r3.32.2.1 macsockotpt.c
>--- macsockotpt.c	2001/09/20 03:42:25	3.32.2.1
>+++ macsockotpt.c	2001/10/04 03:14:28
>@@ -303,17 +303,19 @@
> }
> 
> // Notification routine
> // Async callback routine.
> // A5 is OK. Cannot allocate memory here
>+// Ref: http://gemma.apple.com/techpubs/mac/NetworkingOT/NetworkingWOT-100.html
>+//
> static pascal void  NotifierRoutine(void * contextPtr, OTEventCode code, OTResult result, void * cookie)
> {
> 	PRFilePrivate *secret  = (PRFilePrivate *) contextPtr;
> 	_MDFileDesc * md       = &(secret->md);
> 	EndpointRef   endpoint = (EndpointRef)secret->md.osfd;
>-    PRThread *    thread   = NULL;
>-    PRThread *	  pollThread = md->poll.thread;
>+    PRThread *    readThread   = NULL;
>+    PRThread *    writeThread  = NULL;
> 	OSStatus      err;
> 	OTResult	  resultOT;
>     TDiscon		  discon;
> 
>     switch (code)

It would be nice to add a comment saying something like:
    readThread is also used for "misc".

>@@ -1707,15 +1721,34 @@
> 
>     *readReady = fd->secret->md.readReady && (availableData > 0);
> 	*exceptReady = fd->secret->md.exceptReady;
> 
>     resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd);
>-    switch (resultOT)    {
>-        case T_DATAXFER:
>-        case T_INREL:
>+    switch (resultOT)
>+    {

It would be nice to emulate the original indentation
style in the file (specifically, the placement of the
curly braces).

>+        case T_UNBND:
>+        case T_IDLE:
>+            // the socket is not connected. Emulating BSD sockets,
>+            // we mark it readable and writable. The next PR_Read
>+            // or PR_Write will then fail. Usually, in this situation,
>+            // fd->secret->md.exceptReady is also set, and returned if
>+            // anyone is polling for it.
>+            *readReady = PR_TRUE;
>             *writeReady = PR_TRUE;
>             break;

These two cases are wrong.  When an endpoint is in the
T_UNBND or T_IDLE state, it is neither readable nor
writable.  These two states are different from the
"disconnected" state, which is considered an error
condition and is reported by marking the endpoint both
readable and writable.

>+        case T_INREL:           // incoming orderly release
>+            *writeReady = PR_FALSE;
>+            break;

This is wrong.  When an endpoint is in the T_INREL state,
you can still write to the endpoint as long as flow control
does not kick in.  The only thing you can say is that there
is no more data to be read from the endpoint.  In Unix,
this condition is reported by marking the socket readable
and arranging for the next read to return a zero byte count.

>@@ -1809,22 +1842,34 @@
>         if (pd->fd)
>         { 
>             PRFileDesc *bottomFD = PR_GetIdentitiesLayer(pd->fd, PR_NSPR_IO_LAYER);
>             if (bottomFD && (_PR_FILEDESC_OPEN == bottomFD->secret->state))
>             {
>-                bottomFD->secret->md.poll.thread = thread;
>+                if (pd->in_flags & PR_POLL_READ) {
>+                    PR_ASSERT(thread == NULL || bottomFD->secret->md.read.thread == NULL);
>+                    bottomFD->secret->md.read.thread = thread;
>+                }
>+
>+                if (pd->in_flags & PR_POLL_WRITE) {
>+                    // it's possible for the writing thread to be non-null during
>+                    // a non-blocking connect, so we assert that we're on
>+                    // the same thread, or the thread is null.
>+                    PR_ASSERT(thread == NULL ||
>+                        (bottomFD->secret->md.write.thread == NULL ||
>+                         bottomFD->secret->md.write.thread == thread));
>+                    bottomFD->secret->md.write.thread = thread;
>+                }
>             }
>         }        
>     }
> }

The second assertion is overly restrictive.  It precludes having
one thread initiating a non-blocking connect and another thread
polling for the completion of the connect.

Ideally, "bottomFD->secret->md.write.thread == thread" should be
replaced by the condition "a non-blocking connect is in progress".
Attachment #52003 - Flags: needs-work+
After looking at Apple's OT docs more closely, I
think the handling of an endpoint in the T_IDLE
state is complicated by how that state is entered.
I don't have a simple solution.

The T_UNBND state should be reported as not
readable and not writable.
Attached patch Patch addressing wtc's issues (obsolete) — Splinter Review
Attachment #52086 - Attachment is obsolete: true
Attachment #52003 - Attachment is obsolete: true
> It would be nice to add a comment saying something like:
>     readThread is also used for "misc".

Damn, forgot to add to the diff. Will add before checking in.

> It would be nice to emulate the original indentation
> style in the file (specifically, the placement of the
> curly braces).

done.

> These two cases are wrong.  When an endpoint is in the
> T_UNBND or T_IDLE state, it is neither readable nor
> writable.  These two states are different from the
> "disconnected" state, which is considered an error
> condition and is reported by marking the endpoint both
> readable and writable.

Fixed.

> After looking at Apple's OT docs more closely, I
> think the handling of an endpoint in the T_IDLE
> state is complicated by how that state is entered.
> I don't have a simple solution.

I added a PR_ASSERT for this case. I've never hit it so far.

> The second assertion is overly restrictive.  It precludes having
> one thread initiating a non-blocking connect and another thread
> polling for the completion of the connect.
> 
> Ideally, "bottomFD->secret->md.write.thread == thread" should be
> replaced by the condition "a non-blocking connect is in progress".

This is hard to know unless I add another state variable to the file desc
struct or something. I think the current assertions will do for now; I'll try to 
come up with something more accurate for bug 102791.
Comment on attachment 52164 [details] [diff] [review]
Patch addressing wtc's issues

>+        case T_INREL:           // incoming orderly release
>             *writeReady = PR_TRUE;
>             break;

This is still not correct.  This should say
              *writeReady = fd->secret->md.writeReady;
I think you also need to set *readReady to PR_TRUE if
the endpoint is in the T_INREL state.
Attachment #52164 - Flags: review+
Attachment #52164 - Flags: needs-work+
> This is still not correct.  This should say
>               *writeReady = fd->secret->md.writeReady;

Correct. I'll do that.

> I think you also need to set *readReady to PR_TRUE if
> the endpoint is in the T_INREL state.

*readReady gets set to true in this state by the existing code just above:

    // hack to emulate BSD sockets; say that a socket that has disconnected
    // is still readable.
    size_t   availableData = 1;
    if (!fd->secret->md.orderlyDisconnect)
        OTCountDataBytes((EndpointRef)fd->secret->md.osfd, &availableData);

    *readReady = fd->secret->md.readReady && (availableData > 0);
Did this make it on the 094 branch? Pls advise ...
This is not checked in anywhere yet. Still waiting for a final r= from wtc.
Simon: I was waiting for you to attach a new patch.
You can check in your latest patch after fixing the
case T_INREL problem and adding a comment about
readThread also used for "misc".

Attachment #52164 - Attachment is obsolete: true
This is checked in on NSPR tip and branch, and on Mozilla 0.9.4 branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
verified mac  commercial build 2001-10-09-04-trunk
Status: RESOLVED → VERIFIED
Verified on Mac 9.1 10-09-03-0.9.4 brnach build
Verified on Mac OSX 10-09-04-0.9.4 branch build
No hang anymore when sending these two attachments.
*** Bug 88489 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: