Closed Bug 1174500 Opened 5 years ago Closed 3 years ago

[META] RFC: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4)

Categories

(MailNews Core :: Networking, defect, major)

defect
Not set
major

Tracking

(firefox41 affected)

RESOLVED DUPLICATE of bug 1242030
Tracking Status
firefox41 --- affected

People

(Reporter: ishikawa, Assigned: ishikawa)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Keywords: meta)

Attachments

(1 obsolete file)

Attached patch More error processing. (obsolete) — Splinter Review
+++ This bug was initially created as a clone of Bug #1121842 +++


This is step-4 suggested in the original 1121842.

> Step-4. More error processing

- Removing a Seek that interferes with buffering write to Inbox.
- Added a few comments about failures during CIFS mount failures.
- Add sched_yield() (in an failed attempt to make GUI more responsive.)
- Tried to use faster buffered write along with more checking for I/O errors in message download code
- Added error checks to |Seek()| calls.
- Checking the return value of Close() and other wrapper functions of
  low-level I/O system calls to catch file system issues ASAP.
  There seems to be a design failure: the code does not handle
  such errors gracefully.
  At least, we should tell the user that the file system error(s)
  are corrupting user data.

Some actions mentioned avoce have been subsequently moved to the other patches in the series.

> Step 1 is addressed by Bug 1122698.

> Step 2 is addressed by Bug 1134527.

> Step 3 is addressed by Bug 1134529.

Step 4 is this one.

Step 5 is actually split into two: one part
now needs to be applied before Step 0 (for sanity check of seek operation), (Bug 1116055)

and enabiling buffering of write is now enabled in another one liner patch.



--- begin quote of the origina 1121842 --- 
This is to be a meta bug entry for the following issues.

I depend on TB for the workflow at the office and my personal correspondence
on several PCs (under linux and Windows).
So its correct operation (and smooth operation hopefully) as a cross-platform mailer is very important for me. So I files this bugzilla entry.

Problem:

There is a confusion as to which routine should be responsible for
closing the file stream associated with a variable m_outFileStream in
nsPop3Sink.cpp.

Because of the confusion (?), there are multiple instances of
unnecessary extra bogus Close calls across a few files.

Also, Pop3 code is full of unchecked |Close|, |Flush|, and |Write| calls.

(Imap code, too. That will be needed to be taken care of
eventually. Sorry I am not using imap right now and not feel motivated
to tackle that group of files immediately until nsPops3Sink.cpp is
fixed. But someone can take the lead after seeing this.)

If the underlying file system experiences a glitch, then it is quite
likely that thunderbird treats failed download as success and delete
the message on the POP3 server happily. Totally unacceptable behavior.

[ The glitch of the file system: almost filled-up file system,
transient network problems for remotely mounted file system, incorrect
file system permission caused by administrative error an an NFS server,
USB memory stick where file is to be stored fell of the PC (!) etc.
I have personally experienced and confirmed the first three error
scenarios with TB before in different context and experienced data loss.
I know some people seemed to have suffered from the last one
according to bugzilla entries. ]

Background of discovery:

I tried to enable output buffering in nsPop3Sink.cpp for performance
reasons.
[  Bug 1116055 - Performance issue: Failure to use buffered write (comm-central thunderbird) ]

But When I enabled output buffering in nsPop3Sink.cpp in addition to
the patch in bug 1116055, it caused the failure to incorporate download
messages. It looks timing dependent.  Anyway, I always found a stream closed
prematurely before writing finishes.

That prompted me to debug and study the code carefully.  Then I found
that there are bogus Close() calls to already closed file streams in
the code.

This bogus |Close()| became apparent as soon as I added error checking
of returned value of |Close()| in several places.

Such extra bogus |Close| calls are RAMPANT  during execution, and I
confirm it after static code analysis.

But then, when I started to think of how to fix the situation,
I noticed there seemed to be a confusion about which function
should call |Close()| on the buffer stream variable, m_outFileStream
in nsPop3Sink.cpp.

So I traced the history of m_outFileStream (where it is set, where it
is used as parameter to external functions [which in turn may Close
it], when the file associated with it are Opened, Closed, etc.)

After the analysis, I came up with a plan to clean up the current
incorrect code that invokes bogus |Close| on already closed
streams. (Such bogus calls interfere with smooth gdb debugging. There
are simply too many such calls that would return
NS_BASE_STREAM_CLOSED during execution. Oh yes, come to think of it
buffered output routine does not report error situation completely to my taste.
(See Bug 1120046 - RFC mozilla/netwerk/base/src/nsBufferedStreams.cpp: better error reporting and maybe adding thread-race lock )
And the source code is not quite right statically, too, even if calling Close on already closed stream should be NOOP in principle.
It is too confusing to see an already closed
stream closed only several lines down while reading the code. )

We should also check the return value of Close, Flush and Write
properly.

========================================
Plan for Improvement
========================================

My Plan to clean up the code is as follows.

There are four steps.

Step 1. Remove Extra ->Close() calls (and Flush() before Close().).

Let us remove extra / unnecessary / bogus |Close| calls.

The removal is based on the two proposals below.

PROPOSAL-1: DiscardNewMessage should not close the file stream passed
as the 1st argument. It is caller's responsibility to close it..

PROPOSAL-2: FinishNewMessage should not close the
file stream passed as the first argument.
It is caller's responsibility to close it.

The reasoning is given in a crude write-up I created from checking the
code at the end. (This will be posted as the next comment.)

I checked the usage of two functions above, and with a few patches,
the proposals ought to work.

This patch will be filed as a different bugzilla on which this meta
bugzilla entry depends.

After the removal of unnecessary calls of |Close|,
we go to Step-2.

Step-2. Add error value checking of Close() and Flush();

   First step. Simply add NS_ERROR(). Better than nothing.

   At least, we will see the error printed during testing of
   DEBUG version of TB.
  (For better error handling, we will wait for Step 4.)
   
The patch for this will be posted as a separate bugzilla entry.

After Close() and Flush() are taken care of, we check
the error return of Write() and the mismatch of
the # of written bytes and requested bytes.
   
Step-3. Add error value checking of Write();
   Check the return value of Write and
   if the requested # of bytes matches the # of really written bytes.
   
   First step. Simply add NS_ERROR(). Better than nothing.
   At least, we will see the error printed during testing of
   DEBUG version of TB.
  (For better error handling, we will wait for Step 4.)

The patch for this will be posted as a separate bugzilla entry.

Step-4. More error processing
   I understand that some places may need elaborate error handling,
   better than NS_ERROR(),  and
   error return path.  Take care of them in Step-4.

   The patch for this will be posted as a separate bugzilla entry.  I
   think I will need a few different bugzilla entries since in a few
   places, the error handling seems to be necessarily complex.

Step-5. Enabling buffered output.

  (This is not a correctness fix. This is a performance improvement.)

  Additionally, introduce buffering output to the file stream
  created in these files appropriately.

I will post the crude memo/write-up in the next comment.
--- end comment
Assignee: nobody → ishikawa
This is going to be in a series of patches:

========================================
ORDER
========================================

1 A check_fileptr_change.patch: Bug 1116055: Check the unexpected change of file seek position and ask the user to report it

   bug 1116055 

   This was considered a necessary step so that we can know that if
   any user experiences a strange seek position change.

2 A fix-close-step-1.patch: Bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of  a series)

   bug 1122698

3 A fix-close-part-2.patch: Bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series)

   bug 1134527

 4 A fix-close-part-3-write-error-check.patch: Bug 1134529: Check Write failures (part-3 of a series)

   bug 1134529

 5 A removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4)

   bug 1117450

 6 A 1116055-acelist.patch: One liner patch posted in Bug 1116055 to return a buffered stream from mbox::getNewOutputStream by aceman

   Bug 1176857 - C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation)

   I have created a separate bug for this since it will be necessary
   to apply this patch only after other patches landed. (This is
   because enabling the buffering requires proper error checking of
   |Close()| and |Flush()|. These are only taken care of AFTER 1-5
   patches above are applied.)

   Maybe about a month after the first patch (bug 1116055) above is
   applied, and other patches 2-5 have been applied, I can enable this
   patch. (Assuming there is NO ONE who reports strange seek position
   change on their computer.)

 7 A add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes to take care of short read

   Bug 1170606

   This is theoretically independent of 1-6, but my local patch
   assumes the changes in 1-6 to be in place. (The patch may not apply
   cleanly without 1-6 in advance.)

  There are some additional shor-read issues UNDER M-C portion of the
  source tree.  They can be applied indepdent of this series.

TIA
Attachment #8622043 - Flags: review?(neil)
Sorry there was a typo.
> 5 A removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked 
> Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4)
>
>   bug 1117450 
This should read "bug 1174500"
Blocks: 1176857
Blocks: 1121842
No longer depends on: 1121842
Summary of the patch in this bugzilla

First off, I am sorry that I deleted trailing blanks in a file
(nsImapService.cpp).
This results in many whitespace-change-only diff hunks. 

I thought *I* added the trailing space myself and deleted them
wholesale and only many weeks later, I realized that it was not me,
but the trailing white spaces which I deleted had been there all along.

So the patch is unnecessarily large due to the white space
change of the particular file. I appreciate any tips regarding
this. Should I split the patch for nsImapService.cpp and
somehow reduce the whitespace only changes to minimum (manually)?

Anyway the summary of the patch and a few remaining issues.  To be
frank, I have realized that imap I/O error handling is still a little
weak [because I started from pop3 user's point of view], but it has to
wait.

- more error message for write/close/flush errors.

- Disabling |Seek()| that offended buffering.
  line 793 of nsPop3Sink.cpp

- added buffering to local file copy.
  line 2127 of nsMsgLocalMailFolder.cpp

  (This has to be followed by the next patch in the series,

   Bug 1176857 - C-C TB Enabling buffering for pop3 download
   (and possibly part of imap file operation)

   so that buffering is truly effective. With the offending
   |Seek()|, output is flushed at EVERY SINGLE LINE of a message
   and buffering is rendered useless.)

  Well, I forgot exactly why there are a few places where we
  need to enable buffering, but they must be related to
  downloading/saving of attachments, filtering to folder, etc., I think.)

-  Introduction of the following member variable.

   This became necessary when it was not easy to create an input
   stream from a *BUFFERED* output stream directly.

   It is possible to create an input stream from an non-buffered
   output stream as has been done in TB code. But, after enabling
   output buffering, the control flow dictated that the available
   output stream was *buffered* when I need to create an input stream,
   and so I decided to save the original input stream for later use.

   The variable:
   nsCOMPtr <nsIInputStream> m_inboxInputStream; // we read from the temporary file above.

- (IMAP) I need to check the buffering of offline case in
  nsImapMailFolder.cpp. This has to wait until the whole patch series
  is applied.

- nsImapOfflineSync.cpp.

  Line 367, a function ProcessAppendMsgOperation() is declared
  void and does not seem to pass the error status to caller very
  well.  If we have encountered an error in I/O below this
  function, the user does not seem to receive any indication of
  it!  Correct me if I am wrong here.

  Yes, it is true that a particular error (the error of
  CopyFileMessage) is passed.  (line 456)

  // CopyFileMessage tries to communicate error via
  // nsIMsgCopyServiceListener passed as the 7th parameter

  but there are many other errors that can happen in this function
  which may not be reported in the current code framework to the user.

  This type of failure to tell the user that the error at low-level
  has occurred is one more reason to print low-level I/O error messages
  to error console as I noted in the post to bug 1134529.  I am
  quoting it below.

  Such error message should help because often we ask "Do you see
  anything in error console?" and the answer is usually "No."
  
  --- begin quote ---
Before the whole patch series was approved and applied, I may want to
output the error messages added in my patch here in this bugzilla
entry into error console using the macros introduced in bug 1116055.

This is because of the observation below.

Seeing the error in real-life usage will help the developers at
analyzing bug reports. Learning what type of I/O errors the user may
have encountered, developers can co-relate this to the buggy behavior
the user reports, and come up with better fixes, etc.

AND MORE TO THE POINT: it was not utterly obvious what error recovery
ought to be taken when these write errors occur.
This is because the current code simply ignored such I/O errors in
many places :-(
Thus I could not put in sensible error recovery unless there is
already premature "return" with error code or NS_ENSURE_SUCCESS() in
place.

So I am afraid that seeing the error message in release-build using
error console is the only way to learn the errors happened (that is,
the type of I/O errors detected my patch).  And probably we failed to
take recovery action soonish until something went wrong later and the
user noticed it then :-(

In this sense, my patch is not complete yet, but at least IT IS MUCH
BETTER THAN BEFORE. It handles the major error modes in Pop3 (and some
in imap code). Perfect is the enemy of good as the saying goes.

--- end quote ---  

PS: I have noticed that return values of |Seek()| and |Tell()|
are not checked very well yet.  I will add the check in a
planned patch to pick up these left-over failures to check
low-level I/O function return values (mainly Seek and Tell)
after the patch

[7] add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes
to take care of short read

so that existing patches do not have to be modified heavily.

PPS: It might be a good idea to merge the patches of

- bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of  a series)
- bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series)
- bug 1134529 Check Write failures (part-3 of a series)
- bug 1174500 C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4)

into one big patch.
(Yet, still keeping the following patches separate.
- bug 1116055 Performance issue: Failure to use buffered write (comm-central thunderbird)
- bug 1176857 - C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation)

This may make it easier to check what I/O error is being checked, what
action is taken, etc. in one pass.
Depends on: 1242030
In bug 1242030, I have uploaded merged patches from the bug 1122698, bug 1134527, bug 1134529, bug  1174500 
The reason for consolidation is that the patch as a whole is much easier to 
read when consolidated.
However, at the same time, I have split the resulting gigantic patch into a few smaller chunks that address files under only certain directories each.

I will upload the latest road map that explains the patches and application order shortly.

TIA

PS: I took care of the space-only changes and revet them so that the patch is much smaller now.
Comment on attachment 8622043 [details] [diff] [review]
More error processing.

>+++ b/mailnews/local/src/nsPop3Sink.cpp
I took a look at the changes in this file.
>+    NS_WARNING("(debug): m_downloadingToTempFile path");
>+
>+
Nit: doubled blank line.
>+#if 1
Necessary?
>+  // CI: why are we disabling filters if (m_uidlDownload)
This is to deal with the option to fetch only headers or the first part of a message. There is then UI to download the rest of the message, but the message would already have been filtered, so there's no need to filter it again.
>+// CI: set # of messages to download (in advance?).
It's used for progress notifications (downloading 1 of N).
>+    // Is there any timing window of vulnerability when other phantom
>+    // thread performs Seek on the stream before we get to Write
>+    // below?
I think the POP3 code is all main-thread only.
>+  // CI: What are we doing here?
>+  // We are extracting a msgKey from m_newMsgHdr and create a local message URI
>+  // whatever that is.
A message URI is what the front end uses to actually view a message. This is because it has to go though all the gubbins used for displaying web pages.
>+      // We can see that m_outFileStream should NOT be  nullptr here since
>       // We can see that m_outFileStream should not be  nullptr here since
Duplicate line?
>+        if (NS_FAILED(rv4))
>+        {
>+            NS_WARNING("rv4 m_tmpDownloadFile->SetFileSize(0) failed.  rv4 was
not NS_OK");
>+        }
I'd prefer NS_WARN_IF_FALSE(NS_SUCCEEDED(rv4), "..."); Also you don't need to say "rv4 was not NS_OK", we can assume that's why there's a warning.
>+    // CI what does PublishMsgHeader does?
It updates the UI.
>+    // CI XXX : I see "else if" below, but shouldn't it be a simple "if"?
doSelect is not set if aSize is.
>+// CI what does "FE" stand for? Folder E*?
>+//
FE = Front End, i.e. UI, I mean, User Interface
>+  // CI XXX Can't we be more imaginative with error code, NS_ERROR_FAILURE?
NS_ERROR_NOT_INITIALIZED perhaps?
(In reply to neil@parkwaycc.co.uk from comment #5)
> Comment on attachment 8622043 [details] [diff] [review]
> More error processing.
> 


Neil, thank you for your comment.
I have been on a summer holiday week: I just came back home yesterday.
I will try to incorporate your comment into comments in my patch set.
Hopefully, I can refresh some parts tonight and tomorrow.

I am glad that at least I can move forward with the patch set even in small increment (!)

Thank you again.
Neil,
thank you again for the review.
However, I made a mistake of not obsoleting this bugzilla entry by duping this
to bug 1242030.
There the patches to files under imap directory is consolidated:

 fix-close-MERGED-imap-dir.patch

Your comment to the file /mailnews/local/src/nsPop3Sink.cpp is now handled in it.

(In reply to neil@parkwaycc.co.uk from comment #5)
> Comment on attachment 8622043 [details] [diff] [review]
> More error processing.
> 
> >+++ b/mailnews/local/src/nsPop3Sink.cpp
> I took a look at the changes in this file.
> >+    NS_WARNING("(debug): m_downloadingToTempFile path");
> >+
> >+
> Nit: doubled blank line.

Was already taken care of.

> >+#if 1
> Necessary?

Taken out already.

> >+  // CI: why are we disabling filters if (m_uidlDownload)
> This is to deal with the option to fetch only headers or the first part of a
> message. There is then UI to download the rest of the message, but the
> message would already have been filtered, so there's no need to filter it
> again.

I put your comment as a comment in the code after a slight editing.


> >+// CI: set # of messages to download (in advance?).
> It's used for progress notifications (downloading 1 of N).

Again, I put your comment as a comment in the code after a slight editing.

> >+    // Is there any timing window of vulnerability when other phantom
> >+    // thread performs Seek on the stream before we get to Write
> >+    // below?
> I think the POP3 code is all main-thread only.

Ditto.

> >+  // CI: What are we doing here?
> >+  // We are extracting a msgKey from m_newMsgHdr and create a local message URI
> >+  // whatever that is.
> A message URI is what the front end uses to actually view a message. This is
> because it has to go though all the gubbins used for displaying web pages.

Ditto.

> >+      // We can see that m_outFileStream should NOT be  nullptr here since
> >       // We can see that m_outFileStream should not be  nullptr here since
> Duplicate line?

Already taken care of.

> >+        if (NS_FAILED(rv4))
> >+        {
> >+            NS_WARNING("rv4 m_tmpDownloadFile->SetFileSize(0) failed.  rv4 was
> not NS_OK");
> >+        }
> I'd prefer NS_WARN_IF_FALSE(NS_SUCCEEDED(rv4), "..."); Also you don't need
> to say "rv4 was not NS_OK", we can assume that's why there's a warning.

I think NS_WARN_IF_FALSE macro is gone? No.
But I changed the message by simplifying.
I needed to see which return value triggered the warning message for helping me debug
error behavior during debugging, I think.

> >+    // CI what does PublishMsgHeader does?
> It updates the UI.

Put this as a slightly edited comment.

> >+    // CI XXX : I see "else if" below, but shouldn't it be a simple "if"?
> doSelect is not set if aSize is.

Put this as a slightly edited comment.

> >+// CI what does "FE" stand for? Folder E*?
> >+//
> FE = Front End, i.e. UI, I mean, User Interface

Put this as a comment.
You see we have technology transfer problem between the original programmers who wrote Mozilla codebase
and those who came late and try to figure out the code to debug it after noticing fatal errors experienced by user community.

> >+  // CI XXX Can't we be more imaginative with error code, NS_ERROR_FAILURE?
> NS_ERROR_NOT_INITIALIZED perhaps?

Changed it as suggested.

Again, can you please look at the patches in bug 1242030.
Sorry for the confusion.
I should have obsoleted all the patches I sent out when I hit upon this brilliant idea of
consolidating patches into manageable patches per directory.

TIA
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1242030
I think I meant to say
patches for the files under local directory was merged into
fix-close-MERGED-local-dir-Rev02.patch
in bug 1242030
Depends on: 1306914
No longer depends on: 1306914
Attachment #8622043 - Attachment is obsolete: true
Attachment #8622043 - Flags: review?(neil)
You need to log in before you can comment on or make changes to this bug.