Closed Bug 1174500 Opened 9 years ago Closed 8 years ago

[META] RFC: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4)

Categories

(MailNews Core :: Networking, defect)

defect
Not set
major

Tracking

(firefox41 affected)

RESOLVED DUPLICATE of bug 1242030
Tracking Status
firefox41 --- affected

People

(Reporter: ishikawa, Assigned: ishikawa)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Keywords: meta)

Attachments

(1 obsolete file)

Attached patch More error processing. (obsolete) — — Splinter Review
+++ This bug was initially created as a clone of Bug #1121842 +++ This is step-4 suggested in the original 1121842. > Step-4. More error processing - Removing a Seek that interferes with buffering write to Inbox. - Added a few comments about failures during CIFS mount failures. - Add sched_yield() (in an failed attempt to make GUI more responsive.) - Tried to use faster buffered write along with more checking for I/O errors in message download code - Added error checks to |Seek()| calls. - Checking the return value of Close() and other wrapper functions of low-level I/O system calls to catch file system issues ASAP. There seems to be a design failure: the code does not handle such errors gracefully. At least, we should tell the user that the file system error(s) are corrupting user data. Some actions mentioned avoce have been subsequently moved to the other patches in the series. > Step 1 is addressed by Bug 1122698. > Step 2 is addressed by Bug 1134527. > Step 3 is addressed by Bug 1134529. Step 4 is this one. Step 5 is actually split into two: one part now needs to be applied before Step 0 (for sanity check of seek operation), (Bug 1116055) and enabiling buffering of write is now enabled in another one liner patch. --- begin quote of the origina 1121842 --- This is to be a meta bug entry for the following issues. I depend on TB for the workflow at the office and my personal correspondence on several PCs (under linux and Windows). So its correct operation (and smooth operation hopefully) as a cross-platform mailer is very important for me. So I files this bugzilla entry. Problem: There is a confusion as to which routine should be responsible for closing the file stream associated with a variable m_outFileStream in nsPop3Sink.cpp. Because of the confusion (?), there are multiple instances of unnecessary extra bogus Close calls across a few files. Also, Pop3 code is full of unchecked |Close|, |Flush|, and |Write| calls. (Imap code, too. That will be needed to be taken care of eventually. Sorry I am not using imap right now and not feel motivated to tackle that group of files immediately until nsPops3Sink.cpp is fixed. But someone can take the lead after seeing this.) If the underlying file system experiences a glitch, then it is quite likely that thunderbird treats failed download as success and delete the message on the POP3 server happily. Totally unacceptable behavior. [ The glitch of the file system: almost filled-up file system, transient network problems for remotely mounted file system, incorrect file system permission caused by administrative error an an NFS server, USB memory stick where file is to be stored fell of the PC (!) etc. I have personally experienced and confirmed the first three error scenarios with TB before in different context and experienced data loss. I know some people seemed to have suffered from the last one according to bugzilla entries. ] Background of discovery: I tried to enable output buffering in nsPop3Sink.cpp for performance reasons. [ Bug 1116055 - Performance issue: Failure to use buffered write (comm-central thunderbird) ] But When I enabled output buffering in nsPop3Sink.cpp in addition to the patch in bug 1116055, it caused the failure to incorporate download messages. It looks timing dependent. Anyway, I always found a stream closed prematurely before writing finishes. That prompted me to debug and study the code carefully. Then I found that there are bogus Close() calls to already closed file streams in the code. This bogus |Close()| became apparent as soon as I added error checking of returned value of |Close()| in several places. Such extra bogus |Close| calls are RAMPANT during execution, and I confirm it after static code analysis. But then, when I started to think of how to fix the situation, I noticed there seemed to be a confusion about which function should call |Close()| on the buffer stream variable, m_outFileStream in nsPop3Sink.cpp. So I traced the history of m_outFileStream (where it is set, where it is used as parameter to external functions [which in turn may Close it], when the file associated with it are Opened, Closed, etc.) After the analysis, I came up with a plan to clean up the current incorrect code that invokes bogus |Close| on already closed streams. (Such bogus calls interfere with smooth gdb debugging. There are simply too many such calls that would return NS_BASE_STREAM_CLOSED during execution. Oh yes, come to think of it buffered output routine does not report error situation completely to my taste. (See Bug 1120046 - RFC mozilla/netwerk/base/src/nsBufferedStreams.cpp: better error reporting and maybe adding thread-race lock ) And the source code is not quite right statically, too, even if calling Close on already closed stream should be NOOP in principle. It is too confusing to see an already closed stream closed only several lines down while reading the code. ) We should also check the return value of Close, Flush and Write properly. ======================================== Plan for Improvement ======================================== My Plan to clean up the code is as follows. There are four steps. Step 1. Remove Extra ->Close() calls (and Flush() before Close().). Let us remove extra / unnecessary / bogus |Close| calls. The removal is based on the two proposals below. PROPOSAL-1: DiscardNewMessage should not close the file stream passed as the 1st argument. It is caller's responsibility to close it.. PROPOSAL-2: FinishNewMessage should not close the file stream passed as the first argument. It is caller's responsibility to close it. The reasoning is given in a crude write-up I created from checking the code at the end. (This will be posted as the next comment.) I checked the usage of two functions above, and with a few patches, the proposals ought to work. This patch will be filed as a different bugzilla on which this meta bugzilla entry depends. After the removal of unnecessary calls of |Close|, we go to Step-2. Step-2. Add error value checking of Close() and Flush(); First step. Simply add NS_ERROR(). Better than nothing. At least, we will see the error printed during testing of DEBUG version of TB. (For better error handling, we will wait for Step 4.) The patch for this will be posted as a separate bugzilla entry. After Close() and Flush() are taken care of, we check the error return of Write() and the mismatch of the # of written bytes and requested bytes. Step-3. Add error value checking of Write(); Check the return value of Write and if the requested # of bytes matches the # of really written bytes. First step. Simply add NS_ERROR(). Better than nothing. At least, we will see the error printed during testing of DEBUG version of TB. (For better error handling, we will wait for Step 4.) The patch for this will be posted as a separate bugzilla entry. Step-4. More error processing I understand that some places may need elaborate error handling, better than NS_ERROR(), and error return path. Take care of them in Step-4. The patch for this will be posted as a separate bugzilla entry. I think I will need a few different bugzilla entries since in a few places, the error handling seems to be necessarily complex. Step-5. Enabling buffered output. (This is not a correctness fix. This is a performance improvement.) Additionally, introduce buffering output to the file stream created in these files appropriately. I will post the crude memo/write-up in the next comment. --- end comment
Assignee: nobody → ishikawa
This is going to be in a series of patches: ======================================== ORDER ======================================== 1 A check_fileptr_change.patch: Bug 1116055: Check the unexpected change of file seek position and ask the user to report it bug 1116055 This was considered a necessary step so that we can know that if any user experiences a strange seek position change. 2 A fix-close-step-1.patch: Bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of a series) bug 1122698 3 A fix-close-part-2.patch: Bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series) bug 1134527 4 A fix-close-part-3-write-error-check.patch: Bug 1134529: Check Write failures (part-3 of a series) bug 1134529 5 A removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4) bug 1117450 6 A 1116055-acelist.patch: One liner patch posted in Bug 1116055 to return a buffered stream from mbox::getNewOutputStream by aceman Bug 1176857 - C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation) I have created a separate bug for this since it will be necessary to apply this patch only after other patches landed. (This is because enabling the buffering requires proper error checking of |Close()| and |Flush()|. These are only taken care of AFTER 1-5 patches above are applied.) Maybe about a month after the first patch (bug 1116055) above is applied, and other patches 2-5 have been applied, I can enable this patch. (Assuming there is NO ONE who reports strange seek position change on their computer.) 7 A add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes to take care of short read Bug 1170606 This is theoretically independent of 1-6, but my local patch assumes the changes in 1-6 to be in place. (The patch may not apply cleanly without 1-6 in advance.) There are some additional shor-read issues UNDER M-C portion of the source tree. They can be applied indepdent of this series. TIA
Attachment #8622043 - Flags: review?(neil)
Sorry there was a typo. > 5 A removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked > Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4) > > bug 1117450 This should read "bug 1174500"
Blocks: 1176857
Blocks: 1121842
No longer depends on: 1121842
Summary of the patch in this bugzilla First off, I am sorry that I deleted trailing blanks in a file (nsImapService.cpp). This results in many whitespace-change-only diff hunks. I thought *I* added the trailing space myself and deleted them wholesale and only many weeks later, I realized that it was not me, but the trailing white spaces which I deleted had been there all along. So the patch is unnecessarily large due to the white space change of the particular file. I appreciate any tips regarding this. Should I split the patch for nsImapService.cpp and somehow reduce the whitespace only changes to minimum (manually)? Anyway the summary of the patch and a few remaining issues. To be frank, I have realized that imap I/O error handling is still a little weak [because I started from pop3 user's point of view], but it has to wait. - more error message for write/close/flush errors. - Disabling |Seek()| that offended buffering. line 793 of nsPop3Sink.cpp - added buffering to local file copy. line 2127 of nsMsgLocalMailFolder.cpp (This has to be followed by the next patch in the series, Bug 1176857 - C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation) so that buffering is truly effective. With the offending |Seek()|, output is flushed at EVERY SINGLE LINE of a message and buffering is rendered useless.) Well, I forgot exactly why there are a few places where we need to enable buffering, but they must be related to downloading/saving of attachments, filtering to folder, etc., I think.) - Introduction of the following member variable. This became necessary when it was not easy to create an input stream from a *BUFFERED* output stream directly. It is possible to create an input stream from an non-buffered output stream as has been done in TB code. But, after enabling output buffering, the control flow dictated that the available output stream was *buffered* when I need to create an input stream, and so I decided to save the original input stream for later use. The variable: nsCOMPtr <nsIInputStream> m_inboxInputStream; // we read from the temporary file above. - (IMAP) I need to check the buffering of offline case in nsImapMailFolder.cpp. This has to wait until the whole patch series is applied. - nsImapOfflineSync.cpp. Line 367, a function ProcessAppendMsgOperation() is declared void and does not seem to pass the error status to caller very well. If we have encountered an error in I/O below this function, the user does not seem to receive any indication of it! Correct me if I am wrong here. Yes, it is true that a particular error (the error of CopyFileMessage) is passed. (line 456) // CopyFileMessage tries to communicate error via // nsIMsgCopyServiceListener passed as the 7th parameter but there are many other errors that can happen in this function which may not be reported in the current code framework to the user. This type of failure to tell the user that the error at low-level has occurred is one more reason to print low-level I/O error messages to error console as I noted in the post to bug 1134529. I am quoting it below. Such error message should help because often we ask "Do you see anything in error console?" and the answer is usually "No." --- begin quote --- Before the whole patch series was approved and applied, I may want to output the error messages added in my patch here in this bugzilla entry into error console using the macros introduced in bug 1116055. This is because of the observation below. Seeing the error in real-life usage will help the developers at analyzing bug reports. Learning what type of I/O errors the user may have encountered, developers can co-relate this to the buggy behavior the user reports, and come up with better fixes, etc. AND MORE TO THE POINT: it was not utterly obvious what error recovery ought to be taken when these write errors occur. This is because the current code simply ignored such I/O errors in many places :-( Thus I could not put in sensible error recovery unless there is already premature "return" with error code or NS_ENSURE_SUCCESS() in place. So I am afraid that seeing the error message in release-build using error console is the only way to learn the errors happened (that is, the type of I/O errors detected my patch). And probably we failed to take recovery action soonish until something went wrong later and the user noticed it then :-( In this sense, my patch is not complete yet, but at least IT IS MUCH BETTER THAN BEFORE. It handles the major error modes in Pop3 (and some in imap code). Perfect is the enemy of good as the saying goes. --- end quote --- PS: I have noticed that return values of |Seek()| and |Tell()| are not checked very well yet. I will add the check in a planned patch to pick up these left-over failures to check low-level I/O function return values (mainly Seek and Tell) after the patch [7] add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes to take care of short read so that existing patches do not have to be modified heavily. PPS: It might be a good idea to merge the patches of - bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of a series) - bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series) - bug 1134529 Check Write failures (part-3 of a series) - bug 1174500 C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4) into one big patch. (Yet, still keeping the following patches separate. - bug 1116055 Performance issue: Failure to use buffered write (comm-central thunderbird) - bug 1176857 - C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation) This may make it easier to check what I/O error is being checked, what action is taken, etc. in one pass.
Depends on: 1242030
In bug 1242030, I have uploaded merged patches from the bug 1122698, bug 1134527, bug 1134529, bug 1174500 The reason for consolidation is that the patch as a whole is much easier to read when consolidated. However, at the same time, I have split the resulting gigantic patch into a few smaller chunks that address files under only certain directories each. I will upload the latest road map that explains the patches and application order shortly. TIA PS: I took care of the space-only changes and revet them so that the patch is much smaller now.
Comment on attachment 8622043 [details] [diff] [review] More error processing. >+++ b/mailnews/local/src/nsPop3Sink.cpp I took a look at the changes in this file. >+ NS_WARNING("(debug): m_downloadingToTempFile path"); >+ >+ Nit: doubled blank line. >+#if 1 Necessary? >+ // CI: why are we disabling filters if (m_uidlDownload) This is to deal with the option to fetch only headers or the first part of a message. There is then UI to download the rest of the message, but the message would already have been filtered, so there's no need to filter it again. >+// CI: set # of messages to download (in advance?). It's used for progress notifications (downloading 1 of N). >+ // Is there any timing window of vulnerability when other phantom >+ // thread performs Seek on the stream before we get to Write >+ // below? I think the POP3 code is all main-thread only. >+ // CI: What are we doing here? >+ // We are extracting a msgKey from m_newMsgHdr and create a local message URI >+ // whatever that is. A message URI is what the front end uses to actually view a message. This is because it has to go though all the gubbins used for displaying web pages. >+ // We can see that m_outFileStream should NOT be nullptr here since > // We can see that m_outFileStream should not be nullptr here since Duplicate line? >+ if (NS_FAILED(rv4)) >+ { >+ NS_WARNING("rv4 m_tmpDownloadFile->SetFileSize(0) failed. rv4 was not NS_OK"); >+ } I'd prefer NS_WARN_IF_FALSE(NS_SUCCEEDED(rv4), "..."); Also you don't need to say "rv4 was not NS_OK", we can assume that's why there's a warning. >+ // CI what does PublishMsgHeader does? It updates the UI. >+ // CI XXX : I see "else if" below, but shouldn't it be a simple "if"? doSelect is not set if aSize is. >+// CI what does "FE" stand for? Folder E*? >+// FE = Front End, i.e. UI, I mean, User Interface >+ // CI XXX Can't we be more imaginative with error code, NS_ERROR_FAILURE? NS_ERROR_NOT_INITIALIZED perhaps?
(In reply to neil@parkwaycc.co.uk from comment #5) > Comment on attachment 8622043 [details] [diff] [review] > More error processing. > Neil, thank you for your comment. I have been on a summer holiday week: I just came back home yesterday. I will try to incorporate your comment into comments in my patch set. Hopefully, I can refresh some parts tonight and tomorrow. I am glad that at least I can move forward with the patch set even in small increment (!) Thank you again.
Neil, thank you again for the review. However, I made a mistake of not obsoleting this bugzilla entry by duping this to bug 1242030. There the patches to files under imap directory is consolidated: fix-close-MERGED-imap-dir.patch Your comment to the file /mailnews/local/src/nsPop3Sink.cpp is now handled in it. (In reply to neil@parkwaycc.co.uk from comment #5) > Comment on attachment 8622043 [details] [diff] [review] > More error processing. > > >+++ b/mailnews/local/src/nsPop3Sink.cpp > I took a look at the changes in this file. > >+ NS_WARNING("(debug): m_downloadingToTempFile path"); > >+ > >+ > Nit: doubled blank line. Was already taken care of. > >+#if 1 > Necessary? Taken out already. > >+ // CI: why are we disabling filters if (m_uidlDownload) > This is to deal with the option to fetch only headers or the first part of a > message. There is then UI to download the rest of the message, but the > message would already have been filtered, so there's no need to filter it > again. I put your comment as a comment in the code after a slight editing. > >+// CI: set # of messages to download (in advance?). > It's used for progress notifications (downloading 1 of N). Again, I put your comment as a comment in the code after a slight editing. > >+ // Is there any timing window of vulnerability when other phantom > >+ // thread performs Seek on the stream before we get to Write > >+ // below? > I think the POP3 code is all main-thread only. Ditto. > >+ // CI: What are we doing here? > >+ // We are extracting a msgKey from m_newMsgHdr and create a local message URI > >+ // whatever that is. > A message URI is what the front end uses to actually view a message. This is > because it has to go though all the gubbins used for displaying web pages. Ditto. > >+ // We can see that m_outFileStream should NOT be nullptr here since > > // We can see that m_outFileStream should not be nullptr here since > Duplicate line? Already taken care of. > >+ if (NS_FAILED(rv4)) > >+ { > >+ NS_WARNING("rv4 m_tmpDownloadFile->SetFileSize(0) failed. rv4 was > not NS_OK"); > >+ } > I'd prefer NS_WARN_IF_FALSE(NS_SUCCEEDED(rv4), "..."); Also you don't need > to say "rv4 was not NS_OK", we can assume that's why there's a warning. I think NS_WARN_IF_FALSE macro is gone? No. But I changed the message by simplifying. I needed to see which return value triggered the warning message for helping me debug error behavior during debugging, I think. > >+ // CI what does PublishMsgHeader does? > It updates the UI. Put this as a slightly edited comment. > >+ // CI XXX : I see "else if" below, but shouldn't it be a simple "if"? > doSelect is not set if aSize is. Put this as a slightly edited comment. > >+// CI what does "FE" stand for? Folder E*? > >+// > FE = Front End, i.e. UI, I mean, User Interface Put this as a comment. You see we have technology transfer problem between the original programmers who wrote Mozilla codebase and those who came late and try to figure out the code to debug it after noticing fatal errors experienced by user community. > >+ // CI XXX Can't we be more imaginative with error code, NS_ERROR_FAILURE? > NS_ERROR_NOT_INITIALIZED perhaps? Changed it as suggested. Again, can you please look at the patches in bug 1242030. Sorry for the confusion. I should have obsoleted all the patches I sent out when I hit upon this brilliant idea of consolidating patches into manageable patches per directory. TIA
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
I think I meant to say patches for the files under local directory was merged into fix-close-MERGED-local-dir-Rev02.patch in bug 1242030
Depends on: 1306914
No longer depends on: 1306914
Attachment #8622043 - Attachment is obsolete: true
Attachment #8622043 - Flags: review?(neil)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: