Closed Bug 1134529 Opened 5 years ago Closed 3 years ago

C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-3)

Categories

(MailNews Core :: Networking: POP, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1242030

People

(Reporter: ishikawa, Assigned: ishikawa)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(1 file, 3 obsolete files)

+++ This bug was initially created as a clone of Bug #1134527 +++

+++ This bug was initially created as a clone of Bug #1122698 +++

This is a branch of meta Bug 1121842 which depends on this
entry.

This is Step-3.
--- begin quote from Bug 1121842
After Close() and Flush() are taken care of, we check
the error return of Write() and the mismatch of
the # of written bytes and requested bytes.
   
Step-3. Add error value checking of Write();
   Check the return value of Write and
   if the requested # of bytes matches the # of really written bytes.
   
   First step. Simply add NS_ERROR(). Better than nothing.
   At least, we will see the error printed during testing of
   DEBUG version of TB.
  (For better error handling, we will wait for Step 4.)

The patch for this will be posted as a separate bugzilla entry.
--- end quote 

This is the work-in-progress patch for step-3.
I did several tests and fixed incorrect handling and added a few band-aid
error processing code snipets and comments.
But this will be needed to work on more. (Maybe step-1 in Bug 1122698 can be checked in, and also step-2 in Bug 1134527 after clean up.)

However, the error processing here needs more testing since
I encountered a crash of TB due to the memory allocation area corruption while
doing error recovery. It seems already freed area is freed again or something like this. We now handle error conditions better (more strictly: before the errors were ignored and silently corrupted data in some cases), but error handling path seems to release already released data:
There are problems of similar pattern in the code.
Case in point: Close against already closed stream.
In step-1 (bug 1122698), I fixed this issue. The original code superficially
tried to avoid |Close()| against already closed stream by setting the stream variable to nullptr hoping that the next caller of Close() will not call it when the stream is nullptr.
However, in the original call hierarchy, the higher-level code passed the global variable that contains the stream value to a leaf function, which copies it to a local copy and the leaf function closed the stream and set the local copy of the stream to nullptr. HOWEVER, it DOES NOT MODIFY the global stream variable. The result is that the higher-level code does not realize the stream is closed when it tries to call |Close()| since the global stream variable is not nullptr. I suspect that the failure to check the return value of |Close()| was that there were so many bogus |Close()| calls to already closed streams and checking the return code would have revealed the series of bogus |Close()| on already closed streams, and simply the checking was dropped to the detriment of
data integrity :-(

Now, I think I see a pattern here. It seems that some error handling path seems to release listener/observer during error recovery, but I think there are
places where the data already released during error recovery is again released higher-level code which does not realize that the data is already is released.
I need to investigate more here.
So the attached patch is definitely a work-in-progress: at least, it does not introduce new bugs (although it uncovered a few issues in the original error handling), and print messages where some error conditions that were ignored before are encountered.

TIA

PS: The following is a comment from Bug 1134527 (Step-2)

This is step-2:
--- begin quote from bug 1121842

After the removal of unnecessary calls of |Close|,
we go to Step-2.

Step-2. Add error value checking of Close() and Flush();

   First step. Simply add NS_ERROR(). Better than nothing.

   At least, we will see the error printed during testing of
   DEBUG version of TB.
  (For better error handling, we will wait for Step 4.)
   
The patch for this will be posted as a separate bugzilla entry.

After Close() and Flush() are taken care of, we check
the error return of Write() and the mismatch of
the # of written bytes and requested bytes.
--- end quote

This is the work-in-progress patch for step-2.


Below is the initial comment from bug 1122698.
This is the step-1
(from bug 1121842)   <=== typo fixed when cloned.
--- quote ---
Step 1. Remove Extra ->Close() calls (and Flush() before Close().).

Let us remove extra / unnecessary / bogus |Close| calls.

The removal is based on the two proposals below.

PROPOSAL-1: DiscardNewMessage should not close the file stream passed
as the 1st argument. It is caller's responsibility to close it..

PROPOSAL-2: FinishNewMessage should not close the
file stream passed as the first argument.
It is caller's responsibility to close it.

The reasoning is given in a crude write-up I created from checking the
code at the end. (This will be posted as the next comment.)

I checked the usage of two functions above, and with a few patches,
the proposals ought to work.

This patch will be filed as a different bugzilla on which this meta
bugzilla entry depends.

--- end quote

Actually, not many |Close| are removed statically, but
their calls during execution will be decreased by 
- setting stream value to nullptr after |Close|, and
- the checking of stream value before calling |Close|.

(Well, that is the theory. We may find out more
bogus calls once we check the return value of Close().
But this is a start.)

Local |make mozmill| test passed, and |make xpcshell-tests| did not introduce any new bugs with the attached patch.
I will still mark this as Work-in-Progress in the series so that
people in the know can figure out where any modifications should be made, etc. by looking at the whole picture.
 
(Hmm. I wonder why there is component for Mail Transfer. This is a mail client.)
Here is an updated patch to match the current tree (as of March 16).
Attachment #8566392 - Attachment is obsolete: true
> 
> Here is an updated patch to match the current tree (as of March 16).
Actually, this applies to the current tree.
This is an update for the current C-C tree
and a patch for checking the file pointer position in  Bug 1116055 - Performance issue: Failure to use buffered write
(comm-central thunderbird)
Assignee: nobody → ishikawa
Attachment #8589759 - Attachment is obsolete: true
This is an updated patch to match the current tree.

This is going to be in a series of patches.

========================================
PLANNED ORDER
========================================

* 1 A check_fileptr_change.patch: Bug 1116055: Check the unexpected change of file seek position and ask the user to report it

   bug 1116055 

   This was considered a necessary step so that we can know that if
   any user experiences a strange seek position change.

* 2 A fix-close-step-1.patch: Bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of  a series)

   bug 1122698

* 3 A fix-close-part-2.patch: Bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series)

   bug 1134527

* 4 A fix-close-part-3-write-error-check.patch: Bug 1134529: Check Write failures (part-3 of a series)

   bug 1134529 (This entry)

5 A removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4)

   bug 1117450

6 A 1116055-acelist.patch: One liner patch posted in Bug 1116055 to return a buffered stream from mbox::getNewOutputStream by aceman

   Bug 1176857 - C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation)

   I have created a separate bug for this since it will be necessary
   to apply this patch only after other patches landed. (This is
   because enabling the buffering requires proper error checking of
   |Close()| and |Flush()|. These are only taken care of AFTER 1-5
   patches above are applied.)

   Maybe about a month after the first patch (bug 1116055) above is
   applied, and other patches 2-5 have been applied, I can enable this
   patch. (Assuming there is NO ONE who reports strange seek position
   change on their computer.)

7 A add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes to take care of short read

   Bug 1170606

   This is theoretically independent of 1-6, but my local patch
   assumes the changes in 1-6 to be in place. (The patch may not apply
   cleanly without 1-6 in advance.)

  There are some additional shor-read issues UNDER M-C portion of the
  source tree.  They can be applied indepdent of this series.

Please feel free to take up the review or move it to somebody else
based on the workload of the potential reviewers.

TIA
Attachment #8600193 - Attachment is obsolete: true
Attachment #8625465 - Flags: review?(Pidgeot18)
Sorry there was a typo of a bugzilla #.
>5 A removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked 
> Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4)
>
>   bug 1117450

This should have read "bug 117450".
(In reply to ISHIKAWA, Chiaki from comment #5)
> Sorry there was a typo of a bugzilla #.
> >5 A removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked 
> > Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4)
> >
> >   bug 1117450
> 
> This should have read "bug 117450".

Ouch. I hate an uncooperating keyboard and a mouse.
It *is* bug 1174500.
As a serious user of TB, I appreciate that TB has been released to the
public for free.
At the same time, I noticed serious I/O issues, and thus
am posting series of patches.

I hope TB will be more robust, and does not eat my message even in the
face of I/O errors :-) [Well, it did before :-( ]

Summary of the patch in this bugzilla.

Main purpose of this patch is to add
the checks to error return of Write() and the mismatch of
the # of written bytes and requested bytes.

First step. Simply add NS_ERROR(). Better than nothing.
At least, we will see the error printed during testing of
DEBUG version of TB.
(For better error handling, we will wait for Step 4, but see my
comment regarding placing the error message into error console using
macros proposed in bug 1116055.)

- a few random debug message printing in nsMsgDBFolder.cpp to
  print error value encountered in simulated network outage.

- XXX comment on line 437 of nsImapOfflineSync.cpp and
  line 3752 of nsImapProtocol.cpp suggests
  imap testing is not quite complete yet.
  (This has to wait. I want to clear the current local patch queue first.)

- in light of short-read issues that I became aware later,
  xxx comment on line 9232-9235 in nsImapMockChannel() needs
  handling and it is indeed handled in the later patch in
  Bug 1170606.
  xxx comment on line 324 of nsMsgMailboxParse.cpp was analyzed in Bug
  1170606 and it was determined that we don't need handle the
  short-read issue here

Printing serious I/O error message in error console:

Before the whole patch series was approved and applied, I may want to
output the error messages added in my patch here in this bugzilla
entry into error console using the macros introduced in bug 1116055.

This is because of the observation below.

Seeing the error in real-life usage will help the developers at
analyzing bug reports. Learning what type of I/O errors the user may
have encountered, developers can co-relate this to the buggy behavior
the user reports, and come up with better fixes, etc.

AND MORE TO THE POINT: it was not utterly obvious what error recovery
ought to be taken when these write errors occur.
This is because the current code simply ignored such I/O errors in
many places :-(
Thus I could not put in sensible error recovery unless there is
already premature "return" with error code or NS_ENSURE_SUCCESS() in
place.

So I am afraid that seeing the error message in release-build using
error console is the only way to learn the errors happened (that is,
the type of I/O errors detected my patch).  And probably we failed to
take recovery action soonish until something went wrong later and the
user noticed it then :-(

In this sense, my patch is not complete yet, but at least IT IS MUCH
BETTER THAN BEFORE. It handles the major error modes in Pop3 (and some
in imap code). Perfect is the enemy of good as the saying goes.

PS: I have noticed that return values of |Seek()| and |Tell()| are not
checked very well. I will add the check in a planned patch to pick
up these left-over failures to check low-level I/O function return
values (mainly Seek and Tell) after the patch

[7] add-short-read-handling.patch: Bug 1170606 - C-C T-B fixes to take
care of short read

so that existing patches do not have to be modified heavily.

PPS: It might be a good idea to merge the patches of

- bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of  a series)
- bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series)
- bug 1134529 Check Write failures (part-3 of a series)
- bug 1174500 C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4)

into one big patch.
(Yet, still keeping the following patches separate.
- bug 1116055 Performance issue: Failure to use buffered write (comm-central thunderbird)
- bug 1176857 - C-C TB Enabling buffering for pop3 download (and possibly part of imap file operation)

This may make it easier to check what I/O error is being checked, what
action is taken, etc. in one pass.
In bug 1242030, I have uploaded merged patches from the bug 1122698, bug 1134527, bug 1134529, bug  1174500 
The reason for consolidation is that the patch as a whole is much easier to 
read when consolidated.
However, at the same time, I have split the resulting gigantic patch into a few smaller chunks that address files under only certain directories each.

I will upload the latest road map that explains the patches and application order shortly.

TIA
As per comment 8, the patch discussed here is now merged and split per directories in bug 1242030.
So I am duping this bugzilla entry to the new bug.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1242030
No longer depends on: 1306914
Comment on attachment 8625465 [details] [diff] [review]
( step-3) work-in-progress patch to add error checks to |Write| (in addtition to step-1 and step 2) [take 3, update to match current tree]

No chance that Joshua will review this. The patch most likely has rot anyway.
Attachment #8625465 - Flags: review?(Pidgeot18)
(In reply to Jorg K (GMT+2) from comment #10)
> Comment on attachment 8625465 [details] [diff] [review]
> ( step-3) work-in-progress patch to add error checks to |Write| (in
> addtition to step-1 and step 2) [take 3, update to match current tree]
> 
> No chance that Joshua will review this. The patch most likely has rot anyway.

You are wright. I am working on updating the patch. It now compiles and succeeds with |make mozmill| test
with April source tree. I need to re-upgrade my local source tree to make sure the patch works with the current tree.
You need to log in before you can comment on or make changes to this bug.