Closed
Bug 369036
Opened 18 years ago
Closed 17 years ago
infinite loop in pt_SolarisSendFile when file truncated while sending
Categories
(NSPR :: NSPR, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
4.6.8
People
(Reporter: nelson, Assigned: julien.pierre)
Details
Attachments
(1 file, 2 obsolete files)
3.23 KB,
patch
|
Details | Diff | Splinter Review |
A Sun colleague wrote to me: ---------------- Quote ---------------------- If a content file is truncated while [the server] is sending it via sendfilev(), it falls to infinite loop of pt_poll_now(). In current specification, sendfile() returns 0 when there is no more data to send. So if content file becomes smaller after caller has checked the size, sendfile() couldn't send as much as the previous size and falls to infinite loop. Now, simple fix is suggested. Just checking if return value is zero or not. I think we can use same simple method to avoid this problem in pt_solaris_sendfile_cont(). I.e, modifying conditions on line 1068 to "(count == -1 || count == 0)" in following code. 1058 static PRBool pt_solaris_sendfile_cont(pt_Continuation *op, PRInt16 revents) 1059 { 1060 struct sendfilevec *vec = (struct sendfilevec *) op->arg2.buffer; 1061 size_t xferred; 1062 ssize_t count; 1063 1064 count = SOLARIS_SENDFILEV(op->arg1.osfd, vec, op->arg3.amount, &xferred); 1065 op->syserrno = errno; 1066 PR_ASSERT((count == -1) || (count == xferred)); 1067 1068 if (count == -1) { 1069 if (op->syserrno != EWOULDBLOCK && op->syserrno != EAGAIN 1070 && op->syserrno != EINTR) { 1071 op->result.code = -1; 1072 return PR_TRUE; 1073 } 1074 count = xferred; 1075 } 1076 PR_ASSERT(count <= op->nbytes_to_send); Truss log ========== 18223/142: 0.0583 poll(0xDDD8F5B0, 1, 5000) = 1 18223/142: fd=351 ev=POLLPRI|POLLOUT rev=POLLOUT 18223/142: 0.0587 sendfilev(0, 351, 0xDDD8F660, 1, 0xDDD8F544) = 0 sfv_fd=51 sfv_flag=0x0 sfv_off=262144 sfv_len=7650 18223/142: 0.0588 poll(0xDDD8F5B0, 1, 5000) = 1 18223/142: fd=351 ev=POLLPRI|POLLOUT rev=POLLOUT 18223/142: 0.0589 sendfilev(0, 351, 0xDDD8F660, 1, 0xDDD8F544) = 0 sfv_fd=51 sfv_flag=0x0 sfv_off=262144 sfv_len=7650 18223/142: 0.0590 poll(0xDDD8F5B0, 1, 5000) = 1 18223/142: fd=351 ev=POLLPRI|POLLOUT rev=POLLOUT 18223/142: 0.0591 sendfilev(0, 351, 0xDDD8F660, 1, 0xDDD8F544) = 0 sfv_fd=51 sfv_flag=0x0 sfv_off=262144 sfv_len=7650 18223/142: 0.0592 poll(0xDDD8F5B0, 1, 5000) = 1 18223/142: fd=351 ev=POLLPRI|POLLOUT rev=POLLOUT Truss log with -u :: option =========================== -> libnspr4:pt_solaris_sendfile_cont(0xe0adf61c, 0x4, 0x1388, 0x0) -> libsendfile:sendfilev(0x88, 0xe0adf660, 0x1, 0xe0adf544) -> libc:__systemcall(0xe0adf4d8, 0x4e, 0x0, 0x88) sendfilev(0, 136, 0xE0ADF660, 1, 0xE0ADF544) = 0 <- libc:__systemcall() = 0 <- libsendfile:sendfilev() = 0 -> libc:___errno(0x0, 0xe0adf660, 0x1, 0xe0adf544) -> libthread:thr_main(0x0, 0x242f4, 0x0, 0x0) <- libthread:thr_main() = 0 -> libthread:_thr_errnop(0xfe1434e4, 0xe0adf660, 0x1, 0xe0adf544) <- libc:___errno() = 0xe1a2a474 <- libnspr4:pt_solaris_sendfile_cont() = 0 -> libthread:poll(0xe0adf5b0, 0x1, 0x1388, 0x0) -> libthread:_save_nv_regs(0xe1a2a5c0, 0x0, 0x0, 0x0) <- libthread:_save_nv_regs() = 0xe1a2a5c0 -> libc:_poll(0xe0adf5b0, 0x1, 0x1388, 0x0) poll(0xE0ADF5B0, 1, 5000) = 1 <- libc:_poll() = 1 <- libthread:poll() = 1 -> libnspr4:pt_solaris_sendfile_cont(0xe0adf61c, 0x4, 0x1388, 0x0) --------------- end quote ----------------------- I observe that sendfilev is also called from pt_SolarisSendFile, so if the proposed change is correct, it needs to be applied in more places than the one shown above.
Reporter | ||
Comment 1•17 years ago
|
||
Julien, since this is important to Sun, perhaps you should take this bug.
Priority: -- → P2
Target Milestone: 4.6.6 → 4.6.7
Assignee | ||
Updated•17 years ago
|
Assignee: wtc → julien.pierre.boogz
Updated•17 years ago
|
Target Milestone: 4.6.7 → 4.6.8
Assignee | ||
Comment 2•17 years ago
|
||
I looked at the man page for sendfile on Solaris. It doesn't state that sendfile will return 0 if there is no more data to send, such as for file truncation. In fact, the coding example about how to transfer a file to a socket first does an fstat to get the file size, and then goes into a senfile loop which can only be aborted when the total number of bytes have been sent, or if sendfile returns -1. This is the same that NSPR does. Based on that example and the current documentation, I would say that sendfile returning 0 is a bug in sendfile since this is not specified behavior, unless we asked to send 0 bytes. On the other hand, there is also no error specified for "no more data" or "end of file". I will talk to someone in the Solaris team about this case. We have a similar loop for sendfile on other platforms (AIX, Linux, HP-UX). It is possible that the behavior of sendfile on those platforms is not consistent. If we find that we want to use a return value of zero to abort the loop, we should not treat it the same as -1, since that leads to a path that checks the OS error code, and there presumably would be none if sendfile returns 0.
Assignee | ||
Updated•17 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 3•17 years ago
|
||
I had a discussion with the Solaris engineer. The 0 return for sendfile/sendfilev is currently undocumented, but will be soon. It indicates an EOF condition on the input file descriptor. Thus, we should treat it as an error because it means the file changed size since the beginning of the sendfile operation. Unfortunately, not all versions of Solaris are currently consistent - some will return -1 and EINVAL, but the official take of the Solaris team is that zero is the expected return value for this case where the file was truncated/overwritten during the sendfile. The engineer pointed out that this sendfile behavior exists on other platforms as well, such as Linux and FreeBSD . I checked the Linux man pages, and the zero is not documented there either. I haven't verified that this bug actually exists there.
Assignee | ||
Comment 4•17 years ago
|
||
Attachment #278123 -
Flags: review?(wtc)
Assignee | ||
Updated•17 years ago
|
Attachment #278123 -
Flags: superreview?(nelson)
Reporter | ||
Comment 5•17 years ago
|
||
Comment on attachment 278123 [details] [diff] [review] Fix Solaris infinite loop problem This patch appears to correctly do what you intended for it to do. But I think it is VERY unfortunate that it will return EINVAL (which will be translated into PR_INVALID_ARGUMENTS_ERROR) in the end-of-file case. EINVAL means "the programmer made a programming error, allowing an invalid argument to be passed to this function". But in this case, there is no invalid argument, and the programmer has not necessarily made any error. NSPR has another error that seems perfect for this case: PR_END_OF_FILE_ERROR. I think NSPR should return it. I realize this means that NSPR may return different errors on different versions of Solaris, but nonetheless I think it is more important for us to deliver the most meaningful error we can in the places where we can, than to consistently deliver the wrong error code. Returning PR_END_OF_FILE_ERROR will require changes to _MD_solaris_map_sendfile_error() to return that error code (it does not presently ever return that error code). But I think this is the right thing to do. So, this is r+, with a strong suggestion of returning another error code.
Attachment #278123 -
Flags: superreview?(nelson) → superreview+
Assignee | ||
Comment 6•17 years ago
|
||
Nelson, I agree that it would be nice to return PR_END_OF_FILE_ERROR for this 0 return case . _MD_solaris_map_sendfile_error just calls _MD_unix_map_default_error, whose role it is to translate an errno into a PRErrorCode and set it. Unfortunately, there is no Solaris error code I could find that maps to EOF . I didn't want to create a fake errno in order to be able to set PR_END_OF_FILE_ERROR . If I do that, there is the risk of conflict with a new error code in a later release of the OS later on. I wish there was an EOF I could just set. In its absence, I thought that EINVAL was the next best. But I can see how it would be confusing. I thought about using EIO, which indicates an unknown error reading the file system. Do you know that would be better ?
Assignee | ||
Comment 7•17 years ago
|
||
I meant to ask if you thought EIO would be better.
Comment 8•17 years ago
|
||
Comment on attachment 278123 [details] [diff] [review] Fix Solaris infinite loop problem r=wtc. Please use the prevalent multi-line comment formatting style in this file: /* * line 1 * line 2 */ and change "sendfile" to "sendfilev". Thanks for fixing this bug.
Attachment #278123 -
Flags: review?(wtc) → review+
Reporter | ||
Comment 9•17 years ago
|
||
Julien, look at _MD_unix_map_lockf_error for an example of how most function-specific error mapping functions work. It handles special cases for that function, and defers all other normal cases to the generic _MD_unix_map_default_error function. I was proposing that _MD_solaris_map_sendfile_error be modified to handle the EOF case as a special case, just as _MD_unix_map_lockf_error handles EACCES as a special case. As seen at http://lxr.mozilla.org/nspr/source/nsprpub/pr/src/pthreads/ptio.c#2454 The error mapping function is called when "count == -1", without regard to the value of "syserrno", so I was going to suggest using the value zero to signify end-of-file.
Comment 10•17 years ago
|
||
Comment on attachment 278123 [details] [diff] [review] Fix Solaris infinite loop problem To avoid the issue with EINVAL, you can handle a return of count == 0 as a success. In pt_solaris_sendfile_cont, change if (count < op->nbytes_to_send) { to /* your comment here */ if (0 < count && count < op->nbytes_to_send) { In pt_SolarisSendFile, change if (count != -1 && count < nbytes_to_send) { to /* your comment here */ if (0 < count && count < nbytes_to_send) { It is a race condition to truncate a file while a sendfilev on that file is in progress. So it is reasonable for PR_SendFile to succeed as if it were called after the file had been truncated.
Comment 11•17 years ago
|
||
If you don't like what I suggested, you can also handle count == 0 separately and call PR_SetError(PR_END_OF_FILE_ERROR, 0) directly. pt_solaris_sendfile_cont will need to use the op.nbytes_to_send field to communicate to pt_SolarisSendFile that we didn't send all the bytes in spite of a positive return value from pt_Continue. So you'll need to make pt_solaris_sendfile_count set op.nbytes_to_send to the proper value whenever it returns successfully.
Assignee | ||
Comment 12•17 years ago
|
||
Nelson, OK. I didn't know you wanted to use the zero for the error code that gets passed into _MD_solaris_map_sendfile_error . My concern was finding a value that wouldn't conflict with anything else. Zero should not conflict with any other value for errno by definition, so it could be special-cased there. I will attach an updated patch to do that. Wan-Teh, I would prefer not to handle this as a success. IMO, if a race condition was detected, the application should not silently succeed.
Assignee | ||
Comment 13•17 years ago
|
||
Fix comments. Map 0 sendfile return to PR_END_OF_FILE_ERROR .
Attachment #278123 -
Attachment is obsolete: true
Assignee | ||
Updated•17 years ago
|
Attachment #278158 -
Flags: superreview?(nelson)
Attachment #278158 -
Flags: review?(wtc)
Reporter | ||
Comment 14•17 years ago
|
||
Comment on attachment 278158 [details] [diff] [review] patch update looks good to me
Attachment #278158 -
Flags: superreview?(nelson) → superreview+
Comment 15•17 years ago
|
||
Comment on attachment 278158 [details] [diff] [review] patch update r=wtc. The case 0 in _MD_solaris_map_sendfile_error could use a short comment. In ptio.c, the multi-line comments should be formatted /* * line 1 * line 2 * line 3 */
Attachment #278158 -
Flags: review?(wtc) → review+
Assignee | ||
Comment 16•17 years ago
|
||
Thanks for the reviews. I just fixed the comments. Checked in to NSPR_4_6_BRANCH : Checking in pr/src/md/unix/unix_errors.c; /cvsroot/mozilla/nsprpub/pr/src/md/unix/unix_errors.c,v <-- unix_errors.c new revision: 3.26.2.1; previous revision: 3.26 done Checking in pr/src/pthreads/ptio.c; /cvsroot/mozilla/nsprpub/pr/src/pthreads/ptio.c,v <-- ptio.c new revision: 3.101.2.3; previous revision: 3.101.2.2 done And to the NSPR trunk : Checking in pr/src/md/unix/unix_errors.c; /cvsroot/mozilla/nsprpub/pr/src/md/unix/unix_errors.c,v <-- unix_errors.c new revision: 3.27; previous revision: 3.26 done Checking in pr/src/pthreads/ptio.c; /cvsroot/mozilla/nsprpub/pr/src/pthreads/ptio.c,v <-- ptio.c new revision: 3.107; previous revision: 3.106 done
Attachment #278158 -
Attachment is obsolete: true
Assignee | ||
Updated•17 years ago
|
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•