Closed Bug 369036 Opened 13 years ago Closed 12 years ago

infinite loop in pt_SolarisSendFile when file truncated while sending


(NSPR :: NSPR, defect, P2)



(Not tracked)



(Reporter: nelson, Assigned: julien.pierre)



(1 file, 2 obsolete files)

A Sun colleague wrote to me:
---------------- Quote ----------------------
If a content file is truncated while [the server] is sending it via 
sendfilev(), it falls to infinite loop of pt_poll_now().

In current specification, sendfile() returns 0 when there is no more
data to send. So if content file becomes smaller after caller has checked
the size, sendfile() couldn't send as much as the previous size and
falls to infinite loop.

Now, simple fix is suggested. Just checking if return value is zero or not.

I think we can use same simple method to avoid this problem in
pt_solaris_sendfile_cont(). I.e, modifying conditions on line 1068 to
"(count == -1 || count == 0)" in following code.

1058 static PRBool pt_solaris_sendfile_cont(pt_Continuation *op, PRInt16 revents)
1059 {
1060     struct sendfilevec *vec = (struct sendfilevec *) op->arg2.buffer;
1061     size_t xferred;
1062     ssize_t count;
1064     count = SOLARIS_SENDFILEV(op->arg1.osfd, vec, op->arg3.amount, &xferred);
1065     op->syserrno = errno;
1066     PR_ASSERT((count == -1) || (count == xferred));
1068     if (count == -1) {
1069         if (op->syserrno != EWOULDBLOCK && op->syserrno != EAGAIN
1070                 && op->syserrno != EINTR) {
1071             op->result.code = -1;
1072             return PR_TRUE;
1073         }
1074         count = xferred;
1075     }
1076     PR_ASSERT(count <= op->nbytes_to_send);

Truss log
18223/142:       0.0583 poll(0xDDD8F5B0, 1, 5000)                       = 1
18223/142:              fd=351 ev=POLLPRI|POLLOUT rev=POLLOUT
18223/142:       0.0587 sendfilev(0, 351, 0xDDD8F660, 1, 0xDDD8F544)    = 0
                 sfv_fd=51       sfv_flag=0x0    sfv_off=262144  sfv_len=7650
18223/142:       0.0588 poll(0xDDD8F5B0, 1, 5000)                       = 1
18223/142:              fd=351 ev=POLLPRI|POLLOUT rev=POLLOUT
18223/142:       0.0589 sendfilev(0, 351, 0xDDD8F660, 1, 0xDDD8F544)    = 0
                 sfv_fd=51       sfv_flag=0x0    sfv_off=262144  sfv_len=7650
18223/142:       0.0590 poll(0xDDD8F5B0, 1, 5000)                       = 1
18223/142:              fd=351 ev=POLLPRI|POLLOUT rev=POLLOUT
18223/142:       0.0591 sendfilev(0, 351, 0xDDD8F660, 1, 0xDDD8F544)    = 0
                 sfv_fd=51       sfv_flag=0x0    sfv_off=262144  sfv_len=7650
18223/142:       0.0592 poll(0xDDD8F5B0, 1, 5000)                       = 1
18223/142:              fd=351 ev=POLLPRI|POLLOUT rev=POLLOUT

Truss log with -u :: option
  -> libnspr4:pt_solaris_sendfile_cont(0xe0adf61c, 0x4, 0x1388, 0x0)
    -> libsendfile:sendfilev(0x88, 0xe0adf660, 0x1, 0xe0adf544)
      -> libc:__systemcall(0xe0adf4d8, 0x4e, 0x0, 0x88)
         sendfilev(0, 136, 0xE0ADF660, 1, 0xE0ADF544)    = 0
      <- libc:__systemcall() = 0
    <- libsendfile:sendfilev() = 0
    -> libc:___errno(0x0, 0xe0adf660, 0x1, 0xe0adf544)
      -> libthread:thr_main(0x0, 0x242f4, 0x0, 0x0)
      <- libthread:thr_main() = 0
      -> libthread:_thr_errnop(0xfe1434e4, 0xe0adf660, 0x1, 0xe0adf544)
    <- libc:___errno() = 0xe1a2a474
  <- libnspr4:pt_solaris_sendfile_cont() = 0
  -> libthread:poll(0xe0adf5b0, 0x1, 0x1388, 0x0)
    -> libthread:_save_nv_regs(0xe1a2a5c0, 0x0, 0x0, 0x0)
    <- libthread:_save_nv_regs() = 0xe1a2a5c0
    -> libc:_poll(0xe0adf5b0, 0x1, 0x1388, 0x0)
       poll(0xE0ADF5B0, 1, 5000)                       = 1
    <- libc:_poll() = 1
  <- libthread:poll() = 1
  -> libnspr4:pt_solaris_sendfile_cont(0xe0adf61c, 0x4, 0x1388, 0x0)
--------------- end quote -----------------------

I observe that sendfilev is also called from pt_SolarisSendFile,
so if the proposed change is correct, it needs to be applied in more
places than the one shown above.
Julien, since this is important to Sun, perhaps you should take this bug.
Priority: -- → P2
Target Milestone: 4.6.6 → 4.6.7
Assignee: wtc → julien.pierre.boogz
Target Milestone: 4.6.7 → 4.6.8
I looked at the man page for sendfile on Solaris. It doesn't state that sendfile will return 0 if there is no more data to send, such as for file truncation. In fact, the coding example about how to transfer a file to a socket first does an fstat to get the file size, and then goes into a senfile loop which can only be aborted when the total number of bytes have been sent, or if sendfile returns -1. This is the same that NSPR does. Based on that example and the current documentation, I would say that sendfile returning 0 is a bug in sendfile since this is not specified behavior, unless we asked to send 0 bytes. On the other hand, there is also no error specified for "no more data" or "end of file". I will talk to someone in the Solaris team about this case.

We have a similar loop for sendfile on other platforms (AIX, Linux, HP-UX). It is possible that the behavior of sendfile on those platforms is not consistent. 

If we find that we want to use a return value of zero to abort the loop, we should not treat it the same as -1, since that leads to a path that checks the OS error code, and there presumably would be none if sendfile returns 0.
Ever confirmed: true
I had a discussion with the Solaris engineer. The 0 return for sendfile/sendfilev is currently undocumented, but will be soon. It indicates an EOF condition on the input file descriptor. Thus, we should treat it as an error because it means the file changed size since the beginning of the sendfile operation. Unfortunately, not all versions of Solaris are currently consistent - some will return -1 and EINVAL, but the official take of the Solaris team is that zero is the expected return value for this case where the file was truncated/overwritten during the sendfile.

The engineer pointed out that this sendfile behavior exists on other platforms as well, such as Linux and FreeBSD . I checked the Linux man pages, and the zero is not documented there either. I haven't verified that this bug actually exists there.
Attachment #278123 - Flags: review?(wtc)
Attachment #278123 - Flags: superreview?(nelson)
Comment on attachment 278123 [details] [diff] [review]
Fix Solaris infinite loop problem

This patch appears to correctly do what you intended for it to do. But I think it is VERY unfortunate that it will return
EINVAL (which will be translated into PR_INVALID_ARGUMENTS_ERROR) in the end-of-file case.  EINVAL means "the programmer made a programming error, allowing an invalid argument to be passed to this function".  But in this
case, there is no invalid argument, and the programmer has not necessarily made any error.

NSPR has another error that seems perfect for this case:
PR_END_OF_FILE_ERROR.  I think NSPR should return it. 
I realize this means that NSPR may return different errors on different versions of Solaris, but nonetheless I think it is more important for us to deliver the most meaningful error we can in the places where we can, than to consistently deliver the wrong error code.  Returning PR_END_OF_FILE_ERROR will require changes to _MD_solaris_map_sendfile_error() to return that error code (it does not presently ever return that error code).  But I think this is the right thing to do.

So, this is r+, with a strong suggestion of returning 
another error code.
Attachment #278123 - Flags: superreview?(nelson) → superreview+

I agree that it would be nice to return PR_END_OF_FILE_ERROR for this 0 return case . _MD_solaris_map_sendfile_error just calls _MD_unix_map_default_error, whose role it is to translate an errno into a PRErrorCode and set it. Unfortunately, there is no Solaris error code I could find that maps to EOF . I didn't want to create a fake errno in order to be able to set PR_END_OF_FILE_ERROR . If I do that, there is the risk of conflict with a new error code in a later release of the OS later on. I wish there was an EOF I could just set. In its absence, I thought that EINVAL was the next best. But I can see how it would be confusing. I thought about using EIO, which indicates an unknown error reading the file system. Do you know that would be better ?
I meant to ask if you thought EIO would be better.
Comment on attachment 278123 [details] [diff] [review]
Fix Solaris infinite loop problem

r=wtc.  Please use the prevalent multi-line comment
formatting style in this file:

     * line 1
     * line 2

and change "sendfile" to "sendfilev".  Thanks for
fixing this bug.
Attachment #278123 - Flags: review?(wtc) → review+
Julien, look at _MD_unix_map_lockf_error for an example of how most 
function-specific error mapping functions work.  It handles special
cases for that function, and defers all other normal cases to the 
generic _MD_unix_map_default_error function.  I was proposing that 
_MD_solaris_map_sendfile_error be modified to handle the EOF case
as a special case, just as _MD_unix_map_lockf_error handles EACCES 
as a special case.

As seen at
The error mapping function is called when "count == -1", without 
regard to the value of "syserrno", so I was going to suggest using the
value zero to signify end-of-file.
Comment on attachment 278123 [details] [diff] [review]
Fix Solaris infinite loop problem

To avoid the issue with EINVAL, you can handle a
return of count == 0 as a success.

In pt_solaris_sendfile_cont, change
    if (count < op->nbytes_to_send) {
    /* your comment here */
    if (0 < count && count < op->nbytes_to_send) {

In pt_SolarisSendFile, change
    if (count != -1 && count < nbytes_to_send) {
    /* your comment here */
    if (0 < count && count < nbytes_to_send) {

It is a race condition to truncate a file while a
sendfilev on that file is in progress.  So it is
reasonable for PR_SendFile to succeed as if it
were called after the file had been truncated.
If you don't like what I suggested, you can also handle
count == 0 separately and call PR_SetError(PR_END_OF_FILE_ERROR, 0)
directly.  pt_solaris_sendfile_cont will need to use the
op.nbytes_to_send field to communicate to pt_SolarisSendFile
that we didn't send all the bytes in spite of a positive
return value from pt_Continue.  So you'll need to
make pt_solaris_sendfile_count set op.nbytes_to_send to
the proper value whenever it returns successfully.

OK. I didn't know you wanted to use the zero for the error code that gets passed into _MD_solaris_map_sendfile_error . My concern was finding a value that wouldn't conflict with anything else. Zero should not conflict with any other value for errno by definition, so it could be special-cased there. I will attach an updated patch to do that.


I would prefer not to handle this as a success. IMO, if a race condition was detected, the application should not silently succeed.
Attached patch patch update (obsolete) — Splinter Review
Fix comments.

Map 0 sendfile return to PR_END_OF_FILE_ERROR .
Attachment #278123 - Attachment is obsolete: true
Attachment #278158 - Flags: superreview?(nelson)
Attachment #278158 - Flags: review?(wtc)
Comment on attachment 278158 [details] [diff] [review]
patch update

looks good to me
Attachment #278158 - Flags: superreview?(nelson) → superreview+
Comment on attachment 278158 [details] [diff] [review]
patch update


The case 0 in  _MD_solaris_map_sendfile_error
could use a short comment.

In ptio.c, the multi-line comments should be formatted

     * line 1
     * line 2
     * line 3
Attachment #278158 - Flags: review?(wtc) → review+
Attached patch As checked inSplinter Review
Thanks for the reviews. I just fixed the comments.

Checked in to NSPR_4_6_BRANCH :

Checking in pr/src/md/unix/unix_errors.c;
/cvsroot/mozilla/nsprpub/pr/src/md/unix/unix_errors.c,v  <--  unix_errors.c
new revision:; previous revision: 3.26
Checking in pr/src/pthreads/ptio.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptio.c,v  <--  ptio.c
new revision:; previous revision:

And to the NSPR trunk :

Checking in pr/src/md/unix/unix_errors.c;
/cvsroot/mozilla/nsprpub/pr/src/md/unix/unix_errors.c,v  <--  unix_errors.c
new revision: 3.27; previous revision: 3.26
Checking in pr/src/pthreads/ptio.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptio.c,v  <--  ptio.c
new revision: 3.107; previous revision: 3.106
Attachment #278158 - Attachment is obsolete: true
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.