Closed
Bug 100353
Opened 23 years ago
Closed 23 years ago
Copy sent mail to IMAP servers "Sent" folder appears to hang NS6
Categories
(MailNews Core :: Networking: IMAP, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla0.9.5
People
(Reporter: scalkins, Assigned: sfraser_bugs)
References
Details
(Keywords: hang, perf, regression, Whiteboard: [PDT+], [ETA 10.03])
Attachments
(6 files, 6 obsolete files)
34.79 KB,
application/octet-stream
|
Details | |
47.70 KB,
application/octet-stream
|
Details | |
1.59 KB,
patch
|
Details | Diff | Splinter Review | |
6.77 KB,
text/plain
|
Details | |
34.30 KB,
text/plain
|
Details | |
22.98 KB,
patch
|
Details | Diff | Splinter Review |
Saw this on Mac only
Mac OSX 2001-09-17-04 0.9.4 Branch (10.x)
Mac OS 9.1 2001-09-11-04 Branch (8.x)
Steps to repro:
1) Login to Mail with an IMAP account.
2)Set Mail prefs to "Place a copy" of sent mail in the "Sent" folder on
yourIMAP server.
3)Compose a new message, and attach either one or both of the files attached on
this email. One file, is a bmp file which is 38k, and trhe other file is a .tiff
file which is 48k.
4)Send the message
Actual results: Mail status dialogs show the mail was sent, but when the
"Copying to sent mail folder dialog" appears, it seems to hang mail for 10-15
minutes while this action is being performed. This is exacerbated when you
attach both files, because it seems to take 10-15 minutes for each attachment,
(20-30 minutes) to copy the file to the sent mail folder on the server.
NOTE: Nominating nsbranch nsenterprise because most users will be led to believe
the app has hung because you can't launch browser or do anything else till the
action completes. They will force quit the app rather than wait this long.
This gives a bad impression to users of our product. The .tiff and bmp files are
not uncommon, and we saw the same behavior with .jpg files.
Copying the files to the "Sent" mail folder locally does not exhibit this
behavior, and is quick.(A few seconds)
Keywords: nsbranch,
nsenterprise
P.S. When the action finally completes, the mail is copied to the sent mail
folder. The recipient also does get the message with attachments.
Comment 4•23 years ago
|
||
There was bug logged for Mac before.
I need to find out that this is dup bug or another regression again
Keywords: regression
Karen, There is bug 88489 but that is for large attachments. These are small
files, so I logged a different bug.
this is bug needs to be fixed or users will hang up everytime they send a
message that has an attachment with a size around 28kb. I tested this sending
(1) .gif file 28kb as an attachment, it hangs the app. I tested this sending
(1) .jpeg 4kb, it hangs the app. I tested this doing a Send Page (our home
page) it hangs the app. In all of these cases I waited 1 hr then did a force
quit. It fails only at saving to the IMAP Sent folder. If I use the Local
folder it finished sending and saving within 15 seconds. On OS9.1 these same
tests (copy to IMAP) takes 2-3 minutes which is bad, but not as bad as OSX.
Somebody please bump this up to a "+", the work around forces users to change
their way of organizing their messages.
Comment 7•23 years ago
|
||
Cavin, can you take a look at this tomorrow? Sounds like a stopper to me. nsBranch+.
Comment 8•23 years ago
|
||
re-assinging to cavin for real this time. Cavin, can you take a look at this
tomorrow for a possible branch fix?
Assignee: mscott → cavin
Comment 9•23 years ago
|
||
I am looking at it...
Comment 10•23 years ago
|
||
Here is what I have observed:
I am using a PPP connection at 26K, I am generating a message that include the
tiff attachment.
When I sent it, SMTP take about 15 Sec, then the copy start but after 1-2 sec,
the UI froze for about 10 secs and then the copy continue for about 8 secs,
sometime more.
The UI freeze occurs when we write the data to the output stream in
nsImapProtocol::SendData.
However, if I turn on PR_Log for nsSocketTransport, the UI freeze disappears!
Comment 11•23 years ago
|
||
Comment 12•23 years ago
|
||
Comment 13•23 years ago
|
||
I added some debug code into nsImapProtocol::SendData to demonstrate how long
nsIOutputStream::Write blocks.
the interresting lines are:
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133722
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 36
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 133723 [1]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133724
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 133726 [2]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133727
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 133730 [3]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 133731
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 141486 [7755]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 141486
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 16384
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 149350 [7864]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 149350
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 2547
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 149351 [1]
48[c8f3394]: nsmail-1:S-Drafts:SendData:START: 149351
48[c8f3394]: nsmail-1:S-Drafts:SendData:buffer size: 2
48[c8f3394]: nsmail-1:S-Drafts:SendData:END: 149351 [0]
the first number after START or END is the abosolute time in milliseconds
the second number after END between [] is the time spend in the function in
milliseconds.
You can really see that twice we get stuck in the write call for about 8 seconds
which correspond to the UI freeze.
Comment 14•23 years ago
|
||
darin, can you take a look at the imap log and give us some suggestions on what
might have gone wrong for some of the stream write calls to complete in 7-8
seconds? The first and second 16K bytes took only a few milliseconds to complete
while the 3rd and 4th 16K bytes took 7.7 and 7.8 seconds respectively to finish.
We also changed NS_SOCKET_TRANSPORT_BUFFER_SIZE to 128K (from 8K) and tried the
same scenario, but the performance did not improve at all.
Let me know if you have questions on the log. Thanks.
Comment 15•23 years ago
|
||
Our initial theory was that the socket thread was getting backed up and couldn't
send the incoming bits up fast enough. So as Cavin mentioned has a test of that
theory we jacked up the value of NS_SOCKET_TRANSPORT_BUFFER_SIZE
to make it large enough to handle the entire message we were trying to send to
the socket. So the socket shouldn't have gotten backed up. But we still saw the
same long lags.
Comment 16•23 years ago
|
||
is this method being called on the UI thread? you should never call a blocking
read/write method from the UI thread... you are guaranteed to have unknown
delays (possibly very long) before the methods return.
Comment 17•23 years ago
|
||
No this isn't being called from the UI thread. I'd like to think we know better
than that =).
Each imap connection runs in it's own thread and we frequently block that thread
while waiting for new data or writing data using blocking writes to the socket.
Comment 18•23 years ago
|
||
whew :) so, then its not the UI that is hung, but something else? mail is not
being responsive... is that because the particular IMAP thread is busy... and
the user is effectively waiting for that thread to do something?
Comment 19•23 years ago
|
||
Actually the problem isn't with the responsive ness of the UI thread. Our
problem is that on the mac, writing 8K chunks of data to a nsSocketTransport is
taking 7 to 8 seconds. We are noticing that the first couple writes seem to
happen pretty quickly. Then our blocking write call starts taking 7 to 8 seconds
before the socket transport comes back to us. It shouldn't take that long to
send 8K of data so something is getting backed up somewhere. Our inital guess
was that the socket transport stream was just full and we were waiting for it to
finish sending data before it could take more. So as a test of that theory we
jacked up the size of the socket transport buffer to be larger than our 100K
message. We still saw the same 7 to 8 second delays for a couple of the 8k
writes we were performing. So something funky is going on for the Mac when
writing to a socket. At least that's our best guess.
Comment 20•23 years ago
|
||
Oh and you'll love this Darin. JF says if he turns socket logging on for the
Mac. Then these huge 7 to 8 seconds lags on a couple of the blocking writes go
away! I haven't been able to corraborate that. But that's pretty wild.
Comment 21•23 years ago
|
||
FYI: I've tried to setup a smaller buffer and sebment size (512 bytes) but that had no effect at all! I'll try to look at
the OpenTransport code in nspr...
Comment 22•23 years ago
|
||
I'm not sure I understand why making the socket buffer smaller would have had
any effect on the problem? Well your results backed that up =).
Comment 23•23 years ago
|
||
I was trying to see if we were blocking during the physical transmission on the
wire. Using a smaller buffer cause to send small blocks rather than big one, at
least that's what I was hopping!
Comment 24•23 years ago
|
||
the socket writes are unbuffered, so changing the buffer sizes should indeed
have no effect.
Comment 25•23 years ago
|
||
what are the chances of getting this fixed for 0.9.4 branch?
Whiteboard: [PDT], [ETA ?]
Comment 26•23 years ago
|
||
we still doesn't have a clue what's the matter here, therefore no ETA yet!
Comment 27•23 years ago
|
||
ETA for this bug is still unknown because we don't know what the cause is yet.
Once we know what's causing the problem then we'll get a better ETA.
Comment 28•23 years ago
|
||
I did the same test under MacOS X with yesterday Mozilla debug build as well
with the Official Netscape 6.1. In both case it hang during the copy to IMAP bug
unfortunatly we stay frozen for ever, I had to kill the application. I get the
following error on the console:
Assertion failure: resultOT == T_GODATA, at macsockotpt.c:412
Updated•23 years ago
|
Whiteboard: [PDT], [ETA ?] → [PDT], [ETA n/a]
Comment 30•23 years ago
|
||
*** Bug 102175 has been marked as a duplicate of this bug. ***
Comment 31•23 years ago
|
||
Here is so far what I have found:
nsImapProtocol::SendData calls nsSocketBOS::Write.
nsSocketBOS::Write calls PR_Write() which failed with
rv=NS_BASE_STREAM_WOULD_BLOCK.
Therefore nsSocketBOS::Write calls Poll() which succeed and then try again to
write. When we are blocking, we get stuck in this loop PR_WRITE - POLL -
PR_WRITE for about 500 times until we get out. This process takes couple seconds
(3 with an ISDN connection, 7 with a dial-up)
During the copy of the message, we block 1 or 2 times. Also, I have observed
that when we are done uploading the data, the progress bar still spinning for a
while (10-15 secs) until the copy operation is finished. I don't know what we
are doing or waiting for!
Status: NEW → ASSIGNED
Comment 32•23 years ago
|
||
sorry if this is already mentioned, but does it matter if the IMAP connection is
via SSL?
Comment 33•23 years ago
|
||
darin, we aren't using SSL in these examples. Hope that helps.
Comment 34•23 years ago
|
||
ok.. i suspect there is something funny happening in the mac PR_Poll code... it
is clearly malfunctioning if it is setting PR_POLL_WRITE _and_ then returning
PR_WOULD_BLOCK_ERROR from a subsequent PR_Write call.
however, a quick check of the code reveals that we are not verifying that
PR_POLL_WRITE is set. hmm... it would be good to log the pd.out_flags in
nsSocketBS::Poll to ensure that PR_POLL_WRITE is getting set.
though, i'm not sure what we could/should do if PR_POLL_WRITE is not set... i
mean, if PR_POLL_EXCEPT is being set, then i'd think we'd need to abort the
socket connection, but that won't help in getting the data uploaded :P
so, i'm really suspecting the mac NSPR polling code. cc'ing some mac folks for
help.
Assignee | ||
Comment 35•23 years ago
|
||
That would not surprise me.
Comment 36•23 years ago
|
||
I have modified the function nsSocketBS::Poll to not set anymore the flag
PR_POLL_EXCEPT but that did not have any effect.
Comment 37•23 years ago
|
||
ok.. sounds like we need to do some NSPR debugging. anyone with mac experience
want to help out?
Comment 38•23 years ago
|
||
kind of hard to debug, if I use break points to debug PR_Poll, the problem goes
away :-(
Assignee | ||
Comment 39•23 years ago
|
||
Working on it now.
Assignee | ||
Comment 40•23 years ago
|
||
This is happening because OT returns a kOTFlowErr in response to the OTSnd,
causing us to return an error from PR_Write(). It's indended that the caller will
then call PR_Poll, which will be woken when the Notifier gets a T_GODATA.
However, it's possible for the notifier to fire when both the polling thread and
writing thread members on the socket have been cleared, so we fail to wake the
polling thread. Working on a solution now.
Comment 41•23 years ago
|
||
Assignee | ||
Comment 42•23 years ago
|
||
IMAP logs won't tell me anything new, I think. Still debugging.
Assignee | ||
Comment 43•23 years ago
|
||
A couple of things that I've found:
The GetState() function definately needs this change:
resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd);
switch (resultOT) {
case T_DATAXFER:
case T_INREL:
- *writeReady = PR_TRUE;
+ *writeReady = fd->secret->md.writeReady;
break;
default:
*writeReady = PR_FALSE;
}
This fixes the problem where PR_Poll says that the socket is writable when it's
not.
But with that change, things still don't work (especially on Mac OS X). A second
problem that I can see is that SendReceiveStream() (called from PR_Read/PR_Write
for sockets) fails to correctly handle write that are only partially successful.
It loops, calling OTSnd() until no data is left to send. However, if the second
OTSnd() fails (because of flow control) then we simply return -1, rather than
returning the amount of data we did send.
Comment 44•23 years ago
|
||
with macosx commercial build 2001-10-01-04-0.9.4 sed web page didn't hang but it
still fails to send the page correctly
Hardware: Macintosh → All
Comment 45•23 years ago
|
||
simon: good find! it seems like a common mistake of Read/Write impl's.
Comment 46•23 years ago
|
||
Simon patch remove the "hang"
However, we are still very very slow to send IMAP data (IMAP is using non
bloking mode). I try to make SendReiveStream return the amount of bytes
currently send/read even in case of an error (any). I return the amount only
when this one is not 0, else I return -1. This doesn't not help. It still took
between 11 to 30 second to save a draft message on IMAP while it takes only 4
secs on Windows (same message, same connection).
But, if I try to bee a little bit smarter and do the following:
Index: mozilla/nsprpub/pr/src/md/mac/macsockotpt.c
===================================================================
RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v
retrieving revision 3.32.2.1
diff -w -u -5 -r3.32.2.1 macsockotpt.c
--- macsockotpt.c 2001/09/20 03:42:25 3.32.2.1
+++ macsockotpt.c 2001/10/01 19:57:44
@@ -1539,12 +1539,22 @@
ErrorExit:
PR_ASSERT(opCode == kSTREAM_SEND ? fd->secret->md.write.thread == nil :
fd->secret->md.read.thread == nil);
macsock_map_error(err);
+ switch (err)
+ {
+ case kOTFlowErr:
+ case kOTNoDataErr:
+ case kEAGAINErr:
+ case kEWOULDBLOCKErr:
+ return amount - bytesLeft;
+
+ default:
return -1;
}
+}
The send is not very fast but we have couple of hang that last 2-3 secs. The
hang seems to occur into nsSocketBOS::Write because we loop without waiting.
Also, should we apply the same changes to the function SendReceiveDgram? I
presume yes!
Comment 47•23 years ago
|
||
For what I am seeing, PR_Poll block for too long! we are waiting and waiting
while their is no network activity! When SendReceiveStream return 0 instead of
-1 (see my patch above), nsSocketBOS::Write will just call PR_WRITE without
calling Poll. That hang the UI but it's very fast!
Comment 48•23 years ago
|
||
"hang seems to occur into nsSocketBOS::Write because we loop without waiting"
i'm not sure i follow this statement: without waiting for what? we call
PR_Poll... isn't that appropriate?
Comment 49•23 years ago
|
||
in nsSocketBOS::Write, if PR_Write return 0 when no data has been written
instead of returning -1 like it's currently the case, Poll will not be called
and it will just try to write again.
The hang could be cause by the fact I have couple of printf in the loop!
What looks suspicious for me is that in SendReceiveStream,
fd->secret->nonblocking is true but it nsSocketBOS::Write, we detect a
NS_BASE_STREAM_WOULD_BLOCK error! does it make sense to be non blocking at one
level and blocking in another!
Comment 50•23 years ago
|
||
Tracy,
I was wondering why you changed OS to all. Are you seeing this on linux and
windows as well? Using today's branch builds win98 and linux is working fine
when copying to imap sent folder. The problem is only with MAC.
Comment 51•23 years ago
|
||
the socket is non-blocking, because there can be both a AsyncRead request as
well as a open nsIOutputStream to the socket. so, we configure the socket to be
non-blocking and fake a blocking operation by using the PR_Poll/PR_Write loop in
nsSocketBOS::Write.
Comment 52•23 years ago
|
||
PR_Write should not return 0 bytes written... it should return some sort of
error to indicate why nothing could be written.
the code in nsSocketBOS::Write should probably detect 0 bytes written... but,
then what should it do with that case? treat it as WOULD_BLOCK, or simply
return 0 bytes written to its caller?
bottom line.. i think PR_Write should not return 0 bytes written, because it
leaves us in a rather ambiguous state.
Comment 53•23 years ago
|
||
I agree with you. ALl I did was to try to understand the problem. My conclusion
so far is that we spent to much time into PR_POLL. We need to figure out why?
Assignee | ||
Comment 54•23 years ago
|
||
I'm all over this, but I need to spend some more time debugging. I ended up with
something like ducarroz's fix in SendReceiveStream, but there is still a problem
with a missed notification, that prevents the IMAP thread from being woken up to
read the final server acknowledgement of the send.
Comment 55•23 years ago
|
||
Sheela,
I'm seeing web page send failure on all platforms. That is why I changed the
OS. But upon reviewing this bug further, I belive what I'm seeing is different
behavior. Send a web page brings up an error message saying the file containing
the web page could not be sent. If you click okay to the error message, the
mail message is sent but upon receiving the message the web page is not
attached, just the link to the page. If the fix to this bug doesn't correct that
behavior as well, I will log a new bug.
changing OS back to Mac
Hardware: All → Macintosh
Assignee | ||
Comment 56•23 years ago
|
||
Something confuses me here. With the Mac NSPR fixes described above, there is
still a problem where, after sending the file attachments on save to IMAP Sent
folder, we stall in nsImapProtocol::UploadMessageFromFile() while waiting for the
server's response. We stall down in nsImapProtocol::CreateNewLineFromSocket(),
waiting on the m_dataAvailableMonitor. The Mac NSPR code does get a notification
that says that the data is available, but it's dropped on the floor. I suspect
this may be because we have two threads doing nested PR_Polls on the same socket.
Assignee | ||
Comment 57•23 years ago
|
||
Darin pointed out that two threads should not both be polling for read at the
same time, but it's ok for one to poll for reading, and one for writing. Fixing
Mac NSPR to deal with this fixes the bug.
Assignee: ducarroz → sfraser
Status: ASSIGNED → NEW
Assignee | ||
Comment 58•23 years ago
|
||
Assignee | ||
Comment 59•23 years ago
|
||
This fix works (still needs testing on Mac OS X), but needs a bit of tidying up.
I think we need to review Mac's PR_Poll to ensure that it's returning the correct
flags in the various less common situations (connect, disconnect etc).
The patch basically allows 2 threads to poll the same socket, as long as only one
is reading or writing at a time.
Status: NEW → ASSIGNED
Assignee | ||
Comment 60•23 years ago
|
||
Tested the fix on a dual CPU Mac OS X box. It works fine there too.
Comment 61•23 years ago
|
||
what are the chances this will be ready by 10.03?
Assignee | ||
Comment 62•23 years ago
|
||
I expect to have a final patch by 10/03
Assignee | ||
Updated•23 years ago
|
Attachment #51648 -
Attachment is obsolete: true
Assignee | ||
Comment 63•23 years ago
|
||
Assignee | ||
Comment 64•23 years ago
|
||
Attachment 51793 [details] [diff] fixes several problems in Mac NSPR that caused this:
1. PR_Write to a socket whose underlying endpoint is under OT flow control (i.e.
is temporarily unable to accept data) now behaves correctly, only returning
-1 if no data was written. If > 0 bytes were written, it returns the number
of bytes written.
2. PR_Poll is fixed to only return the writable flag if the socket is writable.
Previously, we'd say that a socket was writable when it was actually under
flow control.
3. It's now OK to call PR_Poll on the same socket from two different threads, as
long as one thread is reading, and one writing.
Ignore whitespace diffs in the patch; the source has been detabbed somewhat, and
lines up nicely. Looking for r= from wtc and gordon.
Note that I spun a separate bug for fixing PR_Poll to report correct status on
connect/disconnect (bug 102791).
Comment 65•23 years ago
|
||
Comment on attachment 51793 [details] [diff] [review]
Final NSPR patch to fix polling on different threads etc.
Simon,
See my comments below.
>Index: pr/src/md/mac/macsockotpt.c
>===================================================================
>RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v
>retrieving revision 3.32.2.1
>diff -b -u -2 -r3.32.2.1 macsockotpt.c
>--- macsockotpt.c 2001/09/20 03:42:25 3.32.2.1
>+++ macsockotpt.c 2001/10/03 00:19:57
>@@ -306,4 +306,5 @@
> // Async callback routine.
> // A5 is OK. Cannot allocate memory here
>+// Ref: http://gemma.apple.com/techpubs/mac/NetworkingOT/NetworkingWOT-100.html
> static pascal void NotifierRoutine(void * contextPtr, OTEventCode code, OTResult result, void * cookie)
> {
>@@ -311,6 +312,6 @@
> _MDFileDesc * md = &(secret->md);
> EndpointRef endpoint = (EndpointRef)secret->md.osfd;
>- PRThread * thread = NULL;
>- PRThread * pollThread = md->poll.thread;
>+ PRThread * readThread = NULL;
>+ PRThread * writeThread = NULL;
> OSStatus err;
> OTResult resultOT;
It would be nice to add a comment that says:
readThread is also used for 'misc'.
>@@ -366,22 +367,17 @@
> err = OTRcvDisconnect(endpoint, &discon);
> PR_ASSERT(err == kOTNoError);
>- secret->md.exceptReady = PR_TRUE;
>+ secret->md.exceptReady = PR_TRUE; // XXX Check this
If you mark the socket exceptReady, you are compatible with Winsock.
If you mark the socket readReady and writeReady, you are compatible
with most Unix socket implementations. I suggest that you go for
Unix socket compatibility.
>@@ -1514,7 +1511,12 @@
> case kEWOULDBLOCKErr:
> if (fd->secret->nonblocking) {
>+ if (bytesLeft == amount) {
> err = result;
> goto ErrorExit;
> }
>+ else { // return how much was sent
>+ return amount - bytesLeft;
>+ }
>+ }
>
> WaitOnThisThread(me, timeout);
Please make sure the indentation of the nested if-else under this 'case'
looks right. I can't tell in Netscape 6.x, with tab stop set to 8.
>@@ -1696,5 +1698,4 @@
> }
>
>-
> static PRBool GetState(PRFileDesc *fd, PRBool *readReady, PRBool *writeReady, PRBool *exceptReady)
> {
>@@ -1709,10 +1710,18 @@
> *exceptReady = fd->secret->md.exceptReady;
>
>+ *writeReady = PR_FALSE;
This is not necessary.
> resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd);
> switch (resultOT) {
>- case T_DATAXFER:
>- case T_INREL:
>- *writeReady = PR_TRUE;
>+ case T_DATAXFER: // data transfer
>+ *writeReady = fd->secret->md.writeReady;
>+ break;
>+
>+ case T_INREL: // incoming orderly release
>+ *writeReady = fd->secret->md.writeReady;
> break;
When the socket is in the T_INREL state, we should mark it readable
and arrange for the next read to return a zero byte count.
>+
>+ case T_OUTCON: // outgoing connection pending
>+ case T_INCON: // incoming connection pending
>+ case T_OUTREL: // outgoing orderly release
> default:
> *writeReady = PR_FALSE;
It's not necessary to add these cases because 'default'
is going to cover them.
>@@ -1812,5 +1821,13 @@
> if (bottomFD && (_PR_FILEDESC_OPEN == bottomFD->secret->state))
> {
>- bottomFD->secret->md.poll.thread = thread;
>+ if (pd->in_flags & PR_POLL_READ) {
>+ PR_ASSERT(thread == NULL || bottomFD->secret->md.read.thread == NULL);
>+ bottomFD->secret->md.read.thread = thread;
>+ }
>+
>+ if (pd->in_flags & PR_POLL_WRITE) {
>+ PR_ASSERT(thread == NULL || bottomFD->secret->md.write.thread == NULL);
>+ bottomFD->secret->md.write.thread = thread;
>+ }
> }
> }
Are you sure that it is necessary to have 'thread == NULL' in
the assertions?
Attachment #51793 -
Flags: needs-work+
Assignee | ||
Comment 66•23 years ago
|
||
> >Index: pr/src/md/mac/macsockotpt.c
> >===================================================================
> >RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v
> >retrieving revision 3.32.2.1
> >diff -b -u -2 -r3.32.2.1 macsockotpt.c
> >--- macsockotpt.c 2001/09/20 03:42:25 3.32.2.1
> >+++ macsockotpt.c 2001/10/03 00:19:57
> >@@ -306,4 +306,5 @@
> > // Async callback routine.
> > // A5 is OK. Cannot allocate memory here
> >+// Ref: http://gemma.apple.com/techpubs/mac/NetworkingOT/NetworkingWOT-100.html
> > static pascal void NotifierRoutine(void * contextPtr, OTEventCode code, OTResult result, void * cookie)
> > {
> >@@ -311,6 +312,6 @@
> > _MDFileDesc * md = &(secret->md);
> > EndpointRef endpoint = (EndpointRef)secret->md.osfd;
> >- PRThread * thread = NULL;
> >- PRThread * pollThread = md->poll.thread;
> >+ PRThread * readThread = NULL;
> >+ PRThread * writeThread = NULL;
> > OSStatus err;
> > OTResult resultOT;
>
> It would be nice to add a comment that says:
> readThread is also used for 'misc'.
Will do.
> >@@ -366,22 +367,17 @@
> > err = OTRcvDisconnect(endpoint, &discon);
> > PR_ASSERT(err == kOTNoError);
> >- secret->md.exceptReady = PR_TRUE;
> >+ secret->md.exceptReady = PR_TRUE; // XXX Check this
>
> If you mark the socket exceptReady, you are compatible with Winsock.
> If you mark the socket readReady and writeReady, you are compatible
> with most Unix socket implementations. I suggest that you go for
> Unix socket compatibility.
Yes. I'll fix this under bug 102791.
> >@@ -1514,7 +1511,12 @@
> > case kEWOULDBLOCKErr:
> > if (fd->secret->nonblocking) {
> >+ if (bytesLeft == amount) {
> > err = result;
> > goto ErrorExit;
> > }
> >+ else { // return how much was sent
> >+ return amount - bytesLeft;
> >+ }
> >+ }
> >
> > WaitOnThisThread(me, timeout);
>
> Please make sure the indentation of the nested if-else under this 'case'
> looks right. I can't tell in Netscape 6.x, with tab stop set to 8.
It's OK in the source. I diffed ignoring space changes.
> >@@ -1696,5 +1698,4 @@
> > }
> >
> >-
> > static PRBool GetState(PRFileDesc *fd, PRBool *readReady, PRBool *writeReady, PRBool *exceptReady)
> > {
> >@@ -1709,10 +1710,18 @@
> > *exceptReady = fd->secret->md.exceptReady;
> >
> >+ *writeReady = PR_FALSE;
>
> This is not necessary.
Right, I'll fix.
> > resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd);
> > switch (resultOT) {
> >- case T_DATAXFER:
> >- case T_INREL:
> >- *writeReady = PR_TRUE;
> >+ case T_DATAXFER: // data transfer
> >+ *writeReady = fd->secret->md.writeReady;
> >+ break;
> >+
> >+ case T_INREL: // incoming orderly release
> >+ *writeReady = fd->secret->md.writeReady;
> > break;
>
> When the socket is in the T_INREL state, we should mark it readable
> and arrange for the next read to return a zero byte count.
Yes. Another item for bug 102791.
> >+
> >+ case T_OUTCON: // outgoing connection pending
> >+ case T_INCON: // incoming connection pending
> >+ case T_OUTREL: // outgoing orderly release
> > default:
> > *writeReady = PR_FALSE;
>
> It's not necessary to add these cases because 'default'
> is going to cover them.
Correct, but for me, it's clearer to see them all there, and I believe the
compiled code will be the same. Compilers on other platforms would warn about
missing enum values in the case ;)
> >@@ -1812,5 +1821,13 @@
> > if (bottomFD && (_PR_FILEDESC_OPEN == bottomFD->secret->state))
> > {
> >- bottomFD->secret->md.poll.thread = thread;
> >+ if (pd->in_flags & PR_POLL_READ) {
> >+ PR_ASSERT(thread == NULL || bottomFD->secret->md.read.thread == NULL);
> >+ bottomFD->secret->md.read.thread = thread;
> >+ }
> >+
> >+ if (pd->in_flags & PR_POLL_WRITE) {
> >+ PR_ASSERT(thread == NULL || bottomFD->secret->md.write.thread == NULL);
> >+ bottomFD->secret->md.write.thread = thread;
> >+ }
> > }
> > }
>
> Are you sure that it is necessary to have 'thread == NULL' in
> the assertions?
Yes, because we use this same function to null out the threads at the end of a
Poll() call.
Assignee | ||
Comment 67•23 years ago
|
||
With this code, we're seeing two new assertions in NSPR:
1. The assertion that the reading thread is NULL just before the OTRcv() in
SendReceiveStream() is hit, because FTP uses nsSocketTransport::IsAlive(),
which calls PR_Recv(mSocketFD, &c, 1, PR_MSG_PEEK, 0) while another thread
is polling.
2. We hit the PR_ASSERT(0) in _MD_poll, because thread->md.osErrCode is set
to kECONNRESETErr by the NotifierRoutine after it gets a T_DISCONNECT
message.
Both these assertions seem benign.
Comment 68•23 years ago
|
||
Updating ETA per Simon's comments in the bug - ETA 10.03.
Whiteboard: [PDT], [ETA n/a] → [PDT], [ETA 10.03]
Assignee | ||
Comment 69•23 years ago
|
||
Thinking more about those assertions, I think we may need to modify how MD_Poll
deals with errors.
One side effect of the changes in this patch is that the notifier will now send
along an error code when waking the polling thread; previously, we always woke
the polling thread with kOTNoError. We thus have to be careful to ensure that
poll returns the appropriate flags in this case; with the current patch, poll is
always returning -1 on error.
Assignee | ||
Updated•23 years ago
|
Attachment #51793 -
Attachment is obsolete: true
Attachment #51793 -
Flags: needs-work+
Assignee | ||
Comment 70•23 years ago
|
||
Assignee | ||
Comment 71•23 years ago
|
||
The latest patch fixes outstanding issues. It removes the code in _MD_Poll that
causes poll() to return -1 when a socket disconnected, which we now get because
we set the error from OTRcvDisconnect() on the thread that gets woken up.
Instead, we correctly return status from Poll, and return errors on socket read/
write attempts, using existing code.
I also removed some assertions that got hit when doing https, in
SendReceiveStream().
The 'T_INREL' case in GetState no longer sets *readReady to true, because this is
already handled by the conditional on fd->secret->md.orderlyDisconnect above in
the same function.
I've tested http, ftp, https, IMAP, secure IMAP, SMTP, secure SMTP, news, secure
news, and gopher on both Mac OS and Mac OS X, and everything works well.
Assignee | ||
Comment 72•23 years ago
|
||
Note on removing the assertions: after further investigation, I can explain all
assertions:
1. The assertion that the reading thread is NULL just before the OTRcv() in
SendReceiveStream() is hit, because FTP uses nsSocketTransport::IsAlive(),
which calls PR_Recv(mSocketFD, &c, 1, PR_MSG_PEEK, 0) while another thread
is polling.
Here, one thread is polling while another thread calls PR_Recv with the
PR_MSG_PEEK flag. SendReceiveStream() could save and restore the read thread
here, though there is the danger that we'd miss a notification that should have
woken the polling thread.
Maybe PR_Rcv() with the PR_MSG_PEEK flag should not actually read from the
endpoint, but just call OTCountDataBytes() and check the endpoint state?
There is a second case when this assertion fires, which is when someone does a
blocking read, and OTRcv returned kOTNoDataErr. For this case, the assertion
needs fixing.
2. We fail the assertion in SetDescPollThread() that the writing thread is null,
which happens when someone is polling during a connect on a non-blocking
socket.
This assertion just needs fixing.
Assignee | ||
Updated•23 years ago
|
Attachment #51948 -
Attachment is obsolete: true
Assignee | ||
Comment 73•23 years ago
|
||
Assignee | ||
Comment 74•23 years ago
|
||
Last patch now only asserts in one case, where we call PR_Rcv() with with the
PR_MSG_PEEK flag on a different thread than PR_Poll, and I think that assertion
correctly reveals a bad situation.
Assignee | ||
Comment 75•23 years ago
|
||
Attachment #52003 -
Flags: review+
Comment 76•23 years ago
|
||
r=gordon.
Comment 77•23 years ago
|
||
pls get this on the branch, and see if drivers will allow it on the trunk. - PDT+
Whiteboard: [PDT], [ETA 10.03] → [PDT+], [ETA 10.03]
Comment 78•23 years ago
|
||
Comment on attachment 52003 [details] [diff] [review]
Patch with correct assertions
>Index: pr/src/md/mac/macsockotpt.c
>===================================================================
>RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v
>retrieving revision 3.32.2.1
>diff -b -u -5 -r3.32.2.1 macsockotpt.c
>--- macsockotpt.c 2001/09/20 03:42:25 3.32.2.1
>+++ macsockotpt.c 2001/10/04 03:14:28
>@@ -303,17 +303,19 @@
> }
>
> // Notification routine
> // Async callback routine.
> // A5 is OK. Cannot allocate memory here
>+// Ref: http://gemma.apple.com/techpubs/mac/NetworkingOT/NetworkingWOT-100.html
>+//
> static pascal void NotifierRoutine(void * contextPtr, OTEventCode code, OTResult result, void * cookie)
> {
> PRFilePrivate *secret = (PRFilePrivate *) contextPtr;
> _MDFileDesc * md = &(secret->md);
> EndpointRef endpoint = (EndpointRef)secret->md.osfd;
>- PRThread * thread = NULL;
>- PRThread * pollThread = md->poll.thread;
>+ PRThread * readThread = NULL;
>+ PRThread * writeThread = NULL;
> OSStatus err;
> OTResult resultOT;
> TDiscon discon;
>
> switch (code)
It would be nice to add a comment saying something like:
readThread is also used for "misc".
>@@ -1707,15 +1721,34 @@
>
> *readReady = fd->secret->md.readReady && (availableData > 0);
> *exceptReady = fd->secret->md.exceptReady;
>
> resultOT = OTGetEndpointState((EndpointRef)fd->secret->md.osfd);
>- switch (resultOT) {
>- case T_DATAXFER:
>- case T_INREL:
>+ switch (resultOT)
>+ {
It would be nice to emulate the original indentation
style in the file (specifically, the placement of the
curly braces).
>+ case T_UNBND:
>+ case T_IDLE:
>+ // the socket is not connected. Emulating BSD sockets,
>+ // we mark it readable and writable. The next PR_Read
>+ // or PR_Write will then fail. Usually, in this situation,
>+ // fd->secret->md.exceptReady is also set, and returned if
>+ // anyone is polling for it.
>+ *readReady = PR_TRUE;
> *writeReady = PR_TRUE;
> break;
These two cases are wrong. When an endpoint is in the
T_UNBND or T_IDLE state, it is neither readable nor
writable. These two states are different from the
"disconnected" state, which is considered an error
condition and is reported by marking the endpoint both
readable and writable.
>+ case T_INREL: // incoming orderly release
>+ *writeReady = PR_FALSE;
>+ break;
This is wrong. When an endpoint is in the T_INREL state,
you can still write to the endpoint as long as flow control
does not kick in. The only thing you can say is that there
is no more data to be read from the endpoint. In Unix,
this condition is reported by marking the socket readable
and arranging for the next read to return a zero byte count.
>@@ -1809,22 +1842,34 @@
> if (pd->fd)
> {
> PRFileDesc *bottomFD = PR_GetIdentitiesLayer(pd->fd, PR_NSPR_IO_LAYER);
> if (bottomFD && (_PR_FILEDESC_OPEN == bottomFD->secret->state))
> {
>- bottomFD->secret->md.poll.thread = thread;
>+ if (pd->in_flags & PR_POLL_READ) {
>+ PR_ASSERT(thread == NULL || bottomFD->secret->md.read.thread == NULL);
>+ bottomFD->secret->md.read.thread = thread;
>+ }
>+
>+ if (pd->in_flags & PR_POLL_WRITE) {
>+ // it's possible for the writing thread to be non-null during
>+ // a non-blocking connect, so we assert that we're on
>+ // the same thread, or the thread is null.
>+ PR_ASSERT(thread == NULL ||
>+ (bottomFD->secret->md.write.thread == NULL ||
>+ bottomFD->secret->md.write.thread == thread));
>+ bottomFD->secret->md.write.thread = thread;
>+ }
> }
> }
> }
> }
The second assertion is overly restrictive. It precludes having
one thread initiating a non-blocking connect and another thread
polling for the completion of the connect.
Ideally, "bottomFD->secret->md.write.thread == thread" should be
replaced by the condition "a non-blocking connect is in progress".
Attachment #52003 -
Flags: needs-work+
Comment 79•23 years ago
|
||
After looking at Apple's OT docs more closely, I
think the handling of an endpoint in the T_IDLE
state is complicated by how that state is entered.
I don't have a simple solution.
The T_UNBND state should be reported as not
readable and not writable.
Assignee | ||
Comment 80•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #52086 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #52003 -
Attachment is obsolete: true
Assignee | ||
Comment 81•23 years ago
|
||
> It would be nice to add a comment saying something like:
> readThread is also used for "misc".
Damn, forgot to add to the diff. Will add before checking in.
> It would be nice to emulate the original indentation
> style in the file (specifically, the placement of the
> curly braces).
done.
> These two cases are wrong. When an endpoint is in the
> T_UNBND or T_IDLE state, it is neither readable nor
> writable. These two states are different from the
> "disconnected" state, which is considered an error
> condition and is reported by marking the endpoint both
> readable and writable.
Fixed.
> After looking at Apple's OT docs more closely, I
> think the handling of an endpoint in the T_IDLE
> state is complicated by how that state is entered.
> I don't have a simple solution.
I added a PR_ASSERT for this case. I've never hit it so far.
> The second assertion is overly restrictive. It precludes having
> one thread initiating a non-blocking connect and another thread
> polling for the completion of the connect.
>
> Ideally, "bottomFD->secret->md.write.thread == thread" should be
> replaced by the condition "a non-blocking connect is in progress".
This is hard to know unless I add another state variable to the file desc
struct or something. I think the current assertions will do for now; I'll try to
come up with something more accurate for bug 102791.
Comment 82•23 years ago
|
||
Comment on attachment 52164 [details] [diff] [review]
Patch addressing wtc's issues
>+ case T_INREL: // incoming orderly release
> *writeReady = PR_TRUE;
> break;
This is still not correct. This should say
*writeReady = fd->secret->md.writeReady;
I think you also need to set *readReady to PR_TRUE if
the endpoint is in the T_INREL state.
Attachment #52164 -
Flags: review+
Attachment #52164 -
Flags: needs-work+
Assignee | ||
Comment 83•23 years ago
|
||
> This is still not correct. This should say
> *writeReady = fd->secret->md.writeReady;
Correct. I'll do that.
> I think you also need to set *readReady to PR_TRUE if
> the endpoint is in the T_INREL state.
*readReady gets set to true in this state by the existing code just above:
// hack to emulate BSD sockets; say that a socket that has disconnected
// is still readable.
size_t availableData = 1;
if (!fd->secret->md.orderlyDisconnect)
OTCountDataBytes((EndpointRef)fd->secret->md.osfd, &availableData);
*readReady = fd->secret->md.readReady && (availableData > 0);
Comment 84•23 years ago
|
||
Did this make it on the 094 branch? Pls advise ...
Assignee | ||
Comment 85•23 years ago
|
||
This is not checked in anywhere yet. Still waiting for a final r= from wtc.
Comment 86•23 years ago
|
||
Simon: I was waiting for you to attach a new patch.
You can check in your latest patch after fixing the
case T_INREL problem and adding a comment about
readThread also used for "misc".
Assignee | ||
Updated•23 years ago
|
Attachment #52164 -
Attachment is obsolete: true
Assignee | ||
Comment 87•23 years ago
|
||
Assignee | ||
Comment 88•23 years ago
|
||
This is checked in on NSPR tip and branch, and on Mozilla 0.9.4 branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 89•23 years ago
|
||
verified mac commercial build 2001-10-09-04-trunk
Status: RESOLVED → VERIFIED
Comment 90•23 years ago
|
||
Verified on Mac 9.1 10-09-03-0.9.4 brnach build
Verified on Mac OSX 10-09-04-0.9.4 branch build
No hang anymore when sending these two attachments.
Comment 91•23 years ago
|
||
*** Bug 88489 has been marked as a duplicate of this bug. ***
Updated•20 years ago
|
Product: MailNews → Core
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•