Closed Bug 281222 Opened 20 years ago Closed 17 years ago

The file copy under unix can fail and produce zero sized file in some circumstances

Categories

(Core :: XPCOM, defect)

Other
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: wbonnet, Unassigned)

Details

(Whiteboard: thunderbird only)

Attachments

(4 obsolete files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.7.5) Gecko/20041108 Firefox/1.0 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.7.5) Gecko/20041108 Firefox/1.0 Function CrudeFileCopy in nsFileSpecUnix.cpp show some bugs. In some cases it can fail and generate a zero sized file. Analysis will be provided with patch proposal coming with this bug report Reproducible: Always Actual Results: Output file is zero sized Expected Results: File should have some data
The CrudeFileCopy was buggy in some case the copied data can be lost and a zero sized file is produced Here is the actual code, comments are added inline 430 stat_result = stat (in, &in_stat); returned value should be tested, no need to continue and try to read the file if we cannot stat it 432 ifp = fopen (in, "r"); 433 if (!ifp) 434 { 435 return -1; 436 } 437 438 ofp = fopen (out, "w"); 439 if (!ofp) 440 { 441 fclose (ifp); 442 return -1; 443 } 444 The following loop is buggy 445 while ((rbytes = fread (buf, 1, sizeof(buf), ifp)) > 0) fread returns zero for FEOF and errors. In this code feof is not tested, thus all errors are treated as eof and loop is skipped without error handling. Next instruction after loop is fclose, thus a zero sized file is created in case of error read can fail on NFS mount, and it will fail from time to time... 446 { 447 while (rbytes > 0) this while loop is useless since fwrite will buffer the data and write all the 1024 bytes without loop. 448 { 449 if ( (wbytes = fwrite (buf, 1, rbytes, ofp)) < 0 ) buf has to be replaced by an index pointer if we loop on remaining rbytes ! or the first data are always written, index is not moving 450 { 451 fclose (ofp); 452 fclose (ifp); 453 unlink(out); 454 return -1; 455 } 456 rbytes -= wbytes; buf pointer should be incremented 457 } 458 } 459 fclose (ofp); fclose can fail, data are actually written after a fsync call was successful this code works in 99.9% of the time. But if the path is a NFS mount then it will fail from time to time and buffer won't be flushed 460 fclose (ifp); 461 462 if (stat_result == 0) 463 chmod (out, in_stat.st_mode & 0777); This algorithm was buggy and it can explain why from time to time we have zero sized filespec (.msf) in mailbox stored on /home mount by NFS. But maybe it is not the only reason why...
Attachment #173498 - Flags: review?(mscott)
Summary: The file copy under unix can fail and produce zero sized file in some circomstances → The file copy under unix can fail and produce zero sized file in some circumstances
Comment on attachment 173498 [details] [diff] [review] This patch fix some bug in CrudeCopyFile on unix platform changing to Darin for his linux expertise :-)
Attachment #173498 - Flags: review?(mscott) → review?(darin)
Confirming, we should consider for tb1.0.1. /be
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking-aviary1.0.1+
darin, brendan, who should this bug be assign to. i will have cycles next week.
This is needed for tbird 1.0.1 as I understand it, and I'm not sure what the schedule looks like for that. If you have cycles to spend on this next week, then that sounds great to me :)
william, did you have any test cases that show the failure(s) or did you discover these problems through code inspection?
why don't we just reuse the nsLocalFile code? for example, just create two nsLocalFile's and copy one to the other. Something like (ignoring error checking): nsCOMPtr <nsILocalFile> inFile; NS_NewLocalFile(in, PR_TRUE, getter_AddRefs(inFile)); nsCOMPtr <nsILocalFile> outFile; NS_NewLocalFile(out, PR_TRUE, getter_AddRefs(outFile)); nsCOMPtr <nsIFile> source = do_QueryInterface(inFile); source.CopyTo(outFile, EmptyString());
I agree with dougt. Let's leverage nsIFile/nsILocalFile as much as possible here. nsFileSpec is an obsolete deprecated API. Its implementation doesn't have to be performant or anything like that.
I discovered it trough code inspection after i fixed the bug on drag and drop loosing folders. The problem was that this function was failing on zero sized files, so i had a close look to it. I think you can reproduce it easily if you make a simple test program which use this function to copy files on a nfs mount. Maybe you will have to had some extra load on your network depending on bandwith and so on. File access through NFS has to be "difficult" some time. I do agree that if there is already a function to copy files under unix, it's better to use it since this code is in obsolete part. But i am a newbie to mozilla code and i don't know yet all the function available :) that's why i choosed to fix it ;)
I just had a quick look to the nsLocalFile i am almost sure it has some bugs also... so be careful if you just replace old code by a call to the new function CopyTo. The code will not work properly on NFS drive for the some of the reasons why CrudeFileCopy was failing, syncing data on disk. In the copy loop : while ((bytesRead = PR_Read(oldFD, buf, BUFSIZ)) > 0) { there is no test for errors and eof, thus if an error occur at first read, it will be handled as an eof, and an empty file will be created. After the files are closed, an error code is returned (NS_ERROR_OUT_OF_MEMORY which not the real error reason). In this implementation, the caller has to handle the error code and destroy the new file. I think it would be better if the method would "unlink" the new file since the copy failed. Otherwise some file may stay on the file system without ever been deleted. PRInt32 bytesWritten = PR_Write(newFD, buf, bytesRead); if (bytesWritten < 0) { bytesRead = -1; break; } Here we test if bytesWritten is < 0 to detect errors. That's ok according to the man, since PR_Write is mapped to a write call, it will return -1 if it fails. But in case of success it returns the number of written bytes. If for some reasons it does not write all data we asked to be written, we have to handle it as an error. Thus the test should be : PRInt32 bytesWritten = PR_Write(newFD, buf, bytesRead); if (bytesWritten < bytesRead) { The following lines : // close the files PR_Close(newFD); PR_Close(oldFD); has the same problem as in CrudeFileCopy. On an NFS mount sometimes close can fail, return code should be tested. We can be really sure that the data are stored on disk only after calling successfully PR_Sync before PR_Close. Thode should be something like : int l_nRet; // close the files PR_Close(oldFD); l_nRet = PR_Sync(newFD); if (l_nRet != 0) { return NS_ERROR_OUT_OF_MEMORY; // or some more accurate error code } PR_Close(newFD);
Whiteboard: thunderbird only
If you want to write a clean UNIX copy routine on top of NSPR and call it from both places that would be good. It is important that we make nsLocalFileUnix.cpp work well since it is the preferred class now. We could put the implementation in there and call it from nsFileSpec.
Comment on attachment 173498 [details] [diff] [review] This patch fix some bug in CrudeCopyFile on unix platform removing review request in anticipation of new and improved patch :) btw, please use NSPR methods instead of stdio methods, thanks! (we don't need the buffering provided by fread & fwrite.)
Attachment #173498 - Flags: review?(darin)
Ok i'll submit a new patch based on nspr this week end What is the expected release date of thunderbird 1.0.1 ?
This patch is a rewrite of the previous on with the removal of the f* function from stdio. Patch is now using PR_* functions.
Attachment #173498 - Attachment is obsolete: true
Attachment #176394 - Flags: review?(mscott)
Comment on attachment 176394 [details] [diff] [review] This patch fix some bug in CrudeCopyFile on unix platform using NSPR >+ int stat_result = -1, l_nRet; >+ PRFileDesc * l_pFileDescIn, * l_pFileDescOut; // Src and dest filespec pointers the "l_" prefix is non-standard. please try to stick with existing code style in the modified file. that means no "hungarian" notation. >+ while(l_rbytes > 0) // While there is data to copy >+ { >+ // Data buffer size is only 1K ! fwrite function is able to write it in >+ // one call. According to the man fwrite returns a number of elements >+ // written if an error occured. Thus this is no need to handle this case >+ // as a normal case. Written data cannot be smaller than rbytes >+ // than input buffer. >+ l_wbytes = PR_Write(l_pFileDescOut, l_cBuf, l_rbytes); // Copy data to output file >+ if (l_wbytes != l_rbytes) // Test if all data were written >+ { // No the is an error >+ PR_Close(l_pFileDescIn); // Close input file >+ PR_Close(l_pFileDescOut); // Close output file >+ unlink(out); // Destroy output file >+ return -1; // Return an error >+ } >+ >+ // Write is done, we try to get more data >+ l_rbytes = PR_Read(l_pFileDescIn, l_cBuf, sizeof(l_cBuf)); >+ if (l_rbytes < 0) // Test if read was unsuccessfull >+ { // Error case >+ PR_Close(l_pFileDescOut); // Close input file >+ PR_Close(l_pFileDescIn); // Close input file >+ unlink(out); // Destroy output file >+ return -1; // Return an error given all the different places where you need to return and cleanup resources, i think it would make sense to move the bulk of this code (the loop) into a subroutine. then you can write close like this: // open files do_copy(); // close files // unlink out if necessary that way you end up with much smaller and simpler code.
This patch includes Darin's comments : . removal of l_ hungarian notation, . add a DoCopy method to get a cleaner code)
Attachment #176394 - Attachment is obsolete: true
Attachment #176454 - Flags: review?(mscott)
is there an enduser test we could run to verify this as fixed (once it is fixed)? or is this more of a backend fix?
moving to 1.1 for investigation.
Flags: blocking-aviary1.0.1+ → blocking-aviary1.1+
Attachment #176394 - Flags: review?(mscott)
Comment on attachment 176454 [details] [diff] [review] Fix the in CrudeCopyFile on unix platform using NSPR including Darin's comments >+static int CrudeFileCopy_DoCopy(PRFileDesc * pFileDescIn, PRFileDesc * pFileDescOut, char * pBuf, long bufferSize) >+ // If EOF is reached and no data available, PR_Read returns 0. A negative return no data _is_ ... >+ // value means error occured means _an_ ... >+ rbytes = PR_Read(pFileDescIn, pBuf, bufferSize); >+ if (rbytes < 0) // Test if read was unsuccessfull >+ { // Error case >+ return -1; // Return an error >+ } drop the else (after return) [and unindent the codeblock]: >+ else // Success case also, don't bother commenting like this ^ we try to stick to 80 cols, if you're going to stick comments, i'd rather they be above the lines of code instead of to their right. >+ { ^ please don't include trailing whitespace >+ while(rbytes > 0) // While there is data to copy ^ please make sure you're following house style for spaces before ()s, personally i try to be consistent between if/for/while >+ { >+ // Data buffer size is only 1K ! fwrite function is able to write it in >+ // one call. According to the man fwrite returns a number of elements the man _page,_ ... >+ // written if an error occured. Thus this is no need to handle this case this => there ? >+ // as a normal case. Written data cannot be smaller than rbytes >+ // than input buffer. than => i.e., the size of the (?) >+ wbytes = PR_Write(pFileDescOut, pBuf, rbytes); // Copy data to output file >+ if (wbytes != rbytes) // Test if all data were written were => was >+ { // No the is an error the is => this (?) >+ return -1; // Return an error >+ } >+ >+ // Write is done, we try to get more data >+ rbytes = PR_Read(pFileDescIn, pBuf, bufferSize); >+ if (rbytes < 0) // Test if read was unsuccessfull unsuccessful >+ { // Error case >+ return -1; // Return an error >+ } >+ } >+ } >- while ((rbytes = fread (buf, 1, sizeof(buf), ifp)) > 0) >- while (rbytes > 0) style clearly likes a space before condition parens. [let's ignore the spaces after fread ...] >+static int CrudeFileCopy(const char* in, const char* out) >+//---------------------------------------------------------------------------------------- >+{ >+ char buf[1024]; // Used as buffer for the copy loop >+ PRInt32 rbytes, wbytes; // Number of bytes read and written in copy loop >+ struct stat in_stat; // Stores informations from the stat syscall information >+ int stat_result = -1, ret; >+ PRFileDesc * pFileDescIn, * pFileDescOut; // Src and dest filespec pointers 'filespec' smells like nsFileSpec, can you avoid reminding us of that? :) >+ // Check if the pointers to filenames are valid NS_ASSERTION(in && out, "CrudeFileCopy should be called with pointers to filenames..."); >+ if (!in || !out) >+ return -1; how about: NS_ENSURE_TRUE(in && out, -1); >+ // Check if the pointers are the sames, if so, no need to copy A to A same (please fix all occurrences) >+ if (in == out) >+ return 0; >+ >+ // Check if content of the pointers are the sames, if so, no need to copy A to A >+ if (strcmp(in,out) == 0) people tend to complain about strcmp :) >+ return 0; >+ >+ // Retrieve the 'in' file attributes >+ stat_result = stat (in, &in_stat); i believe this is the first time i've seen you use a space after a function name >+ if(stat_result != 0) >+ { >+ // test if stat failed, it can happens if file doesnt exist, is not either "can happen" or "happens" "the file" "does not" (or doesn't if you use isn't, which sounds bad...) >+ // readable or not available ( can happen with NFS mounts ) "is not" >+ return(-1); // Return an error and the first time you parenthesized a return value (please don't) >+ } >+ >+ // Open the input file for binary read >+ pFileDescIn = PR_Open(in, PR_RDONLY, 0444); >+ if (pFileDescIn == 0) >+ { >+ return -1; // Open failed, return an error >+ } >+ >+ // Open the output file for binary write >+ pFileDescOut = PR_Open(out, PR_WRONLY | PR_CREATE_FILE, 0666); please put a comment explaining why you're using 0666. or better, don't :). i just wrote a patch to remove some 0666's, i don't feel like writing another. most likely you should be getting the value from the original file. since you're setting it later, how about using 0600 now, this should prevent someone from poisoning it. >+ // Copy the data >+ if (CrudeFileCopy_DoCopy(pFileDescIn, pFileDescOut, buf, sizeof(buf)) != 0) >+ { // Error case >+ PR_Close(pFileDescOut); // Close input file >+ PR_Close(pFileDescIn); // Close input file >+ unlink(out); // Destroy output file >+ return -1; // Return an error >+ } >+ >+ // There is no need for error handling here. This should have worked, but even >+ // if it fails, the data are now copied to destination, thus there is no need >+ // for the function to fail on the input file error >+ PR_Close(pFileDescIn); >+ >+ // It is better to call fsync and test return code before exiting to be sure >+ // data are actually written on disk. Data loss can happen with NFS mounts >+ if (PR_Sync(pFileDescOut) != PR_SUCCESS) >+ { // Test if the fsync function succeeded >+ PR_Close(pFileDescOut); // Close output file >+ unlink(out); // Try to destroy the file not PR_Delete? >+ return(-1); // Return an error >+ } >+ >+ // Copy is done, close both file >+ if (PR_Close(pFileDescOut) != PR_SUCCESS) // Close output file >+ { // Output file was not closed :( >+ unlink(out); // Try to destroy the file >+ return(-1); // Return an error >+ } >+ >+ ret = chmod(out, in_stat.st_mode & 0777); // Set the new file file mode >+ if (ret != 0) // Test if the chmod function succeeded >+ { >+ unlink(out); // Try to destroy the file >+ return(-1); // Return an error >+ } > >- if (stat_result == 0) >- chmod (out, in_stat.st_mode & 0777); >+ return 0; // Still here ? ok so it worked :) >+} // nsFileSpec::CrudeFileCopy > >@@ -497,24 +575,24 @@ nsresult nsFileSpec::MoveToDir(const nsF >- ((nsFileSpec*)this)->Delete(PR_FALSE); >+ ((nsFileSpec*)this)->Delete(PR_FALSE); thank you for killing these tabs
Attachment #176454 - Flags: review?(mscott) → review-
This patche fixes the issues identified by Timeless (including my English mistakes ;) )
Attachment #176454 - Attachment is obsolete: true
Attachment #183076 - Flags: review?(mscott)
Flags: blocking-aviary1.5+
Hi There was no activity on this bug for several months from now. Is it possible to know if it is planned to change its status and accept the patch ? Thx in advance
Whenever it gets review. Scott, you seem to be the one the review was requested of, presumably because mailnews is the only consumer of this code...
Please Scott do you when it will be possible to have a review for this bug ?
Hi it seems i had no answer yet :( would it be possible to have a review please ? Kind regards, William
Assignee: dougt → nobody
QA Contact: xpcom
Hi This bug is waiting for a review since May 2005. That's more than a year now that patch has been submitted, and faulty code documented and bug explanation provided. It's a long time for a critical bug. Would it be possible please to have a review of the patch ? Regards, William
Comment on attachment 183076 [details] [diff] [review] Fix the previous patch, including Timeless comments Maybe timeless or darin can take another look at this...
Attachment #183076 - Attachment description: Fix the previous patche, including Timeless comments → Fix the previous patch, including Timeless comments
Attachment #183076 - Flags: superreview?(darin)
Attachment #183076 - Flags: review?(timeless)
Comment on attachment 183076 [details] [diff] [review] Fix the previous patch, including Timeless comments // Check if content of the pointers are the sames, if so, no need to copy A to A sames => same > return(-1); please be consistent w/ parens or lack for a statement, in this function that's return w/o ()s. (which is also the general mozilla style)
Attachment #183076 - Flags: review?(timeless)
Attachment #183076 - Flags: review?(dougt)
Attachment #183076 - Flags: review+
Attachment #183076 - Flags: superreview?(darin) → superreview+
Comment on attachment 183076 [details] [diff] [review] Fix the previous patch, including Timeless comments mozilla/xpcom/obsolete/nsFileSpecUnix.cpp 1.10
Attachment #183076 - Attachment is obsolete: true
Attachment #183076 - Flags: review?(mscott)
Attachment #183076 - Flags: review?(dougt)
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: