Open
Bug 454359
Opened 16 years ago
Updated 2 years ago
confusing message with "quarantine option" disabled if AppendMsgFromFile fails in nsPop3Sink::IncorporateComplete.
Categories
(MailNews Core :: Networking: POP, defect)
MailNews Core
Networking: POP
Tracking
(Not tracked)
NEW
People
(Reporter: hiro, Unassigned)
References
(Depends on 1 open bug)
Details
(Whiteboard: [patchlove])
Attachments
(1 file, 1 obsolete file)
3.27 KB,
patch
|
Bienvenu
:
review-
Bienvenu
:
superreview-
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.15) Gecko/20080702 Ubuntu/8.04 (hardy) Firefox/2.0.0.15 Kazehakase/0.5.4
Build Identifier:
confusable message if AppendMsgFromFile fails in nsPop3Sink::IncorporateComplete.
If "quarantine option" is disabled, AppendMsgFromFile in nsPop3Sink::IncorporateComplete() fails only if memory deficit or writing data to Inbox so the error message "This message may contain a virus or there is not enough disk space. Skip this message?" is confusable because the error is not concern to temporary file(i.e. the message).
Moreover, if user press OK button on the message dialog, the message will be never received even if the message does not have any virus codes.
Reproducible: Always
Steps to Reproduce:
1.
2.
3.
Reporter | ||
Comment 1•16 years ago
|
||
Just return the original error code.
Attachment #337607 -
Flags: superreview?(bienvenu)
Attachment #337607 -
Flags: review?(bienvenu)
Reporter | ||
Comment 2•16 years ago
|
||
(In reply to comment #0)
> If "quarantine option" is disabled, AppendMsgFromFile in
I meant If "quarantine option" is enabled.
Updated•16 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 3•16 years ago
|
||
I'm not sure that's true - a virus checker might prevent us from reading the temp file, in which case AppendMsgFromFile would fail.
Reporter | ||
Comment 4•16 years ago
|
||
Ah, I see.
AppendMsgFromFile should return error code if reading fails. I will revise the patch.
Reporter | ||
Comment 5•16 years ago
|
||
AppendMsgFromFile returns NS_ERROR_FAILURE if reading from file fails.
Caller of AppendMsgFromFile (in this case, IncorporateComplete) returns NS_MSG_ERROR_WRITING_MAIL_FOLDER if the error code of AppendMsgFromFile is NS_MSG_ERROR_WRITING_MAIL_FOLDER otherwise call HandleTempDownloadFailed.
Attachment #337607 -
Attachment is obsolete: true
Attachment #337787 -
Flags: superreview?(bienvenu)
Attachment #337787 -
Flags: review?(bienvenu)
Attachment #337607 -
Flags: superreview?(bienvenu)
Attachment #337607 -
Flags: review?(bienvenu)
Updated•16 years ago
|
Assignee: nobody → poincare
Component: General → Backend
Product: Thunderbird → MailNews Core
QA Contact: general → backend
Comment 6•14 years ago
|
||
Comment on attachment 337787 [details] [diff] [review]
revised patch
sorry for the delay - we have to treat errors both writing to and reading from the temporary quarantine file as potentially caused by a virus checkers, and I believe this patch breaks that. Do you have a particular reproducible case of an error that you'd like to fix?
Attachment #337787 -
Flags: superreview?(bienvenu)
Attachment #337787 -
Flags: superreview-
Attachment #337787 -
Flags: review?(bienvenu)
Attachment #337787 -
Flags: review-
Updated•13 years ago
|
Assignee: hiikezoe → nobody
Component: Backend → Networking: POP
QA Contact: backend → networking.pop
Summary: confusable message if AppendMsgFromFile fails in nsPop3Sink::IncorporateComplete. → confusing message with "quarantine option" disabled if AppendMsgFromFile fails in nsPop3Sink::IncorporateComplete.
Updated•13 years ago
|
Assignee: nobody → hiikezoe
Comment 7•13 years ago
|
||
Hiroyuki Ikezoe seems to be gone
Whiteboard: [patchlove][has draft patch][needs new assignee]
Comment 10•10 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #8)
> ishikawa, might you be able to finish the patch?
Let me look at this.
Flags: needinfo?(ishikawa)
Comment 11•10 years ago
|
||
The problem here is that it is not sure whether the proposed change is actually wanted. Bienvenu says all writes or reads can fail due to the antivirus checkers. Hiro wants to detect failures that are caused by other reasons. But it is not sure if that is possible. So yes, the message the user gets may be confusing but maybe we can't do better.
Assignee: hiikezoe → nobody
Comment 12•10 years ago
|
||
It does feel like we should do something different. I'm not awake enough ATM to suggest what.
Comment 13•10 years ago
|
||
(In reply to ISHIKAWA, Chiaki from comment #10)
> (In reply to Wayne Mery (:wsmwk) from comment #8)
> > ishikawa, might you be able to finish the patch?
>
> Let me look at this.
Many Japanese organizations including schools, government offices have a fiscal year that starts in April. That means March is the end of fiscal year and people are busy doing the end of the final report, accounting, etc.
My office also was quite busy in March.
Only now, with the message of Wayne, I had the chance to look at this.
Now, today I look at this, and I realize I have been working on
this part of code for some time to make detection of I/O errors better.
I started to work on this because I noticed the lack of error code checking of
so many I/O routines, read/write/flush/close, etc. Awful :-(
I have discovered that the changes were necessary to detect the errors properly
so that I can enable buffering without fear of possible regression errors
that may corrupt message folder. Today, pop3 download does not use bufering and so
downloading 1-2MB mime-encoded file is done by ~70 octets per line WITHOUT
I/O buffering at all: each line invokes WRITE of such smallish octets,
quite awful in terms of I/O performance.
If the user's profile is on a remote server (as in a diskless corporate setting,
it kills performance very badly. Obviously from a few reports of imap users
do not seem to handle network error gracefully at all.)
I have tried to
- detect errors better: current read/write calls often fail to check errors
well. Also, checking on the return value of Close() are missing in many places.
Once we enable buffering, Close() is where the error from the final flush(), such
as network error, file system overflow error, etc. are reported.
- Once the errors are detected, what are the good error recover strategies?
At least for a major issue of closing file streams related to the failure
of return value of Close() I mention above, I have at least tried to
sanitize major use of file streams, and figured that a few routines need to changed semantically.
- There are other places where I am not entirely sure of what should be done.
- The changes I made are not so significant in terms of code complexity, but numerous and scattered all over the place
related to pop3 and imap and file handling of messages.
Interestingly, I found that the checking of file stream being null or not
is not quite correct for this case of saving a message to temporary file
which is a major topic of Hiro's patch.
This is nsPop3Sink.cpp
775 if (m_downloadingToTempFile)
776 {
777 // close file to give virus checkers a chance to do their thing...
778 m_outFileStream->Flush();
779 m_outFileStream->Close();
780 m_newMailParser->FinishHeader();
781 // need to re-open the inbox file stream.
782 bool exists;
783 m_tmpDownloadFile->Exists(&exists);
784 if (!exists)
785 return HandleTempDownloadFailed(aMsgWindow);
786
787 nsCOMPtr <nsIInputStream> inboxInputStream = do_QueryInterface(m_outFileStream);
788 rv = MsgReopenFileStream(m_tmpDownloadFile, inboxInputStream);
789 NS_ENSURE_SUCCESS(rv, HandleTempDownloadFailed(aMsgWindow));
790 if (m_outFileStream)
791 {
792 int64_t tmpDownloadFileSize;
793 uint32_t msgSize;
794 hdr->GetMessageSize(&msgSize);
795 // we need to clone because nsLocalFileUnix caches its stat result,
796 // so it doesn't realize the file has changed size.
797 nsCOMPtr <nsIFile> tmpClone;
798 rv = m_tmpDownloadFile->Clone(getter_AddRefs(tmpClone));
799 NS_ENSURE_SUCCESS(rv, rv);
800 tmpClone->GetFileSize(&tmpDownloadFileSize);
801
802 if (msgSize > tmpDownloadFileSize)
803 rv = NS_MSG_ERROR_WRITING_MAIL_FOLDER;
804 else
805 rv = m_newMailParser->AppendMsgFromStream(inboxInputStream, hdr,
806 msgSize, m_folder);
807 if (NS_FAILED(rv))
808 return HandleTempDownloadFailed(aMsgWindow);
809
810
After a download to a temporary file,
TB closes the file (and this is the timing when anti-virus checker of the yesteryear may
decide to delete the whole file: thus the necessity of having this option of
saving to a temprary file. Otherwise, scuh anti-virus program removes WHOLE INBOX!).
TB then re-opens the file now as read source using
a different filestream.
There is a bug.
See line 790 above.
http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsPop3Sink.cpp#790
TB checks for the null-ness of the original output filestream instead of the newly created input filestream which may be nullptr if anti-virus program deleted the
temporary file.
It took me a full couple of months of looking at the code and try to see what it was supposed to do.
The logic dictates we should check for the inputstream, but the code is a little convoluted especially switching of writing to reading. It was not quite clear what TB wished to perform. But just last week, I was checking the affected code (yeah, I am doing the review of the changes I made to a few files before final patch proposal.)
It became crystal clear what it was trying to do and why the check was incorrect.
So there can be a few improvement in this area definitely.
I have the patches ready.
- the proper error checks where I think the check is needed for I/O errors during
downloading.
(I think the read error check in Hiro's is also mine, but in a slightly
different form. I was not concerned with the appropriateness of the
error code, but uniform error handling code/style, etc.)
- |Close| were properly checked for return values.
- SOME new error handlings were introduced.
- Coupled with |Close| check, and new error handling, I found issues of
where filestreams ought to be cleared and made an improvement.
- But in many places where errors are detected now (where they were not before),
I only inserted a DEBUG-eanbled message for
debugging only and return error code.
That is, I am not sure what the proper error recovery was.
*BUT*, do note that it is already a great improvement and
detect errors as such.
Also, the overall framework of TB message handling seems to
guarantee that, on the next restart of TB, the improper partial message
is removed, etc.
- I also inserted a line of code to enable buffering, and
removed an offending Seek() which should not be there IMHO.
The patches have been extensively tested
with pop3 and buffering for pop3 download.
(imap testing still not good.)
I used my local mail profile and remote profile on a linux CIFS server, NFS,
and even a CIFS server available on a small router: a USB-stick memory now holds
a mail directory for test mail account.
What I did was to simulate network outage by pulling the cable off (actually
tested TB ran inside linux guest inside VirtualBox, and so toggling virtual network
card did it.)
I noticed a few corner cases and fixed them.
Thus, I could test the network outage cases rather well, and
as far as pop3 is concerned, I am ready with the patch.
The patches and descriptions are as follows.
Meta bug is
Bug 1121842 - [META] RFC: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends.
Step 1 is addressed by Bug 1122698.
Step 2 is addressed by Bug 1134527.
Step 3 is addressed by Bug 1134529.
Bug 1122698 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-1)
Bug 1134527 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-2)
Bug 1134529 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-3)
I am afraid that I have not uploaded the bug fix of the inboxInputStream issue
mentioned above.
Also, after these months of effors, I think it would be good idea
to merge the changes into a single patch set.
It really does not make much sense to separate them as I did any more.
I need a few cleanups definitely, but the local patches have been tested and running locally
(under linux 64-bit build.)
But like I said the subtle changes are everywhere, and due to the lack of
better understanding of error recovery strategy which are clearly missing
today [after all, before my changes I/O errors were not caught properly so there were no needs
for error recover before :-( ]
So I was hesitant to push the changes into the tree until the release in May takes place.
Error recovery strategy can wait. After all TB has worked without good
error recover all the time.
However, regression that was noticed before when buffering was enabled an issue.
This has blocked my pushing the patch
because I could not (or did not have the time to) create an xpctest or mozmill test to simulate a heavy download I/O cases that seemed
to trigger a message corruption when buffering was enabled before.
My manually simulated test cases send a few dozen e-mails with
particular heard strings to a test account. TB downloads these n e-mails
that are sorted into a few different holders by filters, AND while that is happening,
I try to move/copy a dozen or so messages during the download from a folder to another folder that are involved in filtering.
Enabling buffering caused a regression before when it was attempted, and I think such a
test that seems to simulate the busted cases back then would be an important addition.
(See some discussions in Bug 1116055 - Performance issue: Failure to use buffered write (comm-central thunderbird if someone is interested in this topic.)
At least a simple non-disturbed performance of TB under such simulation
has to be in the test case.
(Disturbing the operation by simulated network I/O error
is too much to ask today, I think.)
BTW, why the error about incorrect checking of null-ness of wrong filestream
(output stream before closing the temporary download vs input stream to read from
that temporary file, which the AV program may delete completely.)
is not noticed now is probably today's AV software becomes aware of fiasco of
deleting the whole Inbox file of mail programs
and careful not to do that, etc. I am not sure.
TIA
PS: In parallel to think of writing the test for heavy I/O that caused
regression errors when buffering was enabled, or tyring to have someone write it :-)
I was doing the code audit of my change. This is the final step. 1st round
done: mailnews/local/src/nsPop3Sink.cpp
half finished: mailnews/local/src/nsPop3Service.cpp
mailnews/local/src/nsPop3Protocol.cpp
mailnews/local/src/nsMovemailService.cpp
mailnews/local/src/nsParseMailbox.cpp
mailnews/local/src/nsLocalMailFolder.cpp
mailnews/local/src/nsMailboxService.cpp
IMAP
...
You can see that maybe I would have to wait for the next round of release(?)
Anyway, just last week I was looking at these files and was convinced that
the inputstream and outputstream usage is incorrect in one if-statement and
if it works for everybody today, it is because AV programs have wised up and
not delete the temporary file but probably mark it and only when its content is
executable do something clever about it when it is going to be executed, I suppose.
PPS: Another reason I was not so sure of is the error recovery issues.
There does not seem to exist an overall description of such strategy for TB.
Ok, people may say, I have to make sure that I keep the database(s) intact and
roll back them to the previous states after an error.
But WHAT databases exist in TB?
I found a neat summary in a post, but can't find it. Grrrr :-(
Is there a good documentation?
Comment 14•10 years ago
|
||
"Ok, people may say, I have to make sure that I keep the database(s) intact and
roll back them to the previous states after an error.
But WHAT databases exist in TB?
I found a neat summary in a post, but can't find it. Grrrr :-(
Is there a good documentation?"
The main databases are:
1) the mork message summary file
2) panacea.dat for the folder information
3) gloda for global searching
But in addition to these, there are myriad other local variables, caching schemes, etc. that cause issues. Plus, the variables often have conflated meanings. Example: size of database file is used for both information display to the user, as well as a flag of file validity. Update requirements for these two uses are very different. And don't get me started on the New flag conflation issues!
It is all of this legacy stuff that is part of the "technical debt" that makes Thunderbird code so hard to work on.
Comment 15•10 years ago
|
||
(In reply to Kent James (:rkent) from comment #14)
> "Ok, people may say, I have to make sure that I keep the database(s) intact
> and
> roll back them to the previous states after an error.
> But WHAT databases exist in TB?
> I found a neat summary in a post, but can't find it. Grrrr :-(
> Is there a good documentation?"
>
> The main databases are:
Thank you for the pointers.
>
> 1) the mork message summary file
> 2) panacea.dat for the folder information
> 3) gloda for global searching
>
> But in addition to these, there are myriad other local variables, caching
> schemes, etc. that cause issues.
I noticed this caching scheme that is not quite well updated sometimes.
Someone has written down a very summary about the database in a dozen lines or so for
my purpose. I think it is by WADA. I wonder if he is reading this. I found it in a thread (the particular comment was posted some time ago) in a bugzilla
entry to which I am a subscriber, and which had a new comment in the last 10 days or so.
(Oh well, I wish computers are clever enough to find the comment with this type of vague description. The hitch is that WADAS's comment was in the thread and posted some time ago. It could be even a year ago or so.)
> Plus, the variables often have conflated
> meanings. Example: size of database file is used for both information
> display to the user, as well as a flag of file validity. Update requirements
> for these two uses are very different. And don't get me started on the New
> flag conflation issues!
Thank you for reminding me ! :-)
>
> It is all of this legacy stuff that is part of the "technical debt" that
> makes Thunderbird code so hard to work on.
Yes, I agree. The debt is huge. I really feel as I were trying to chip on the edge of a glacier when I try to change something in TB.
I will try to merge some local patches for hardening the error handling (maybe not so much for making the error message user-friendlier as Hiro's original intention here was) into four big patches, and make sure they build and pass |make mozmill| and |make xpcshell-test|
The patches would be in
Bug 1122698 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-1)
Bug 1134527 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-2)
Bug 1134529 - C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (Step-3)
And then another patch for the removal of |Seek| that interferes with true buffering write and a few other error check for imap-related code.
[I may decide to merge all of them after a few feedback cycles with reviewers.
The idea of decomposition is to make sure that it is easy to see that
I am changing the code for adding checks for error return of close/flush, etc., then adding checks for read/write, etc, and then
adding a few error handling that became necessary like properly closing filestream
to Inbox for pop3 download and cosmetic changes and leaving comment for future work. But once each real change is checked and verified, it may be easier to merge them and keep them as a big patch before eventual inclusion in the tree to avoid bitrot.]
Oh yes, I need to have the automated test for the heavy I/O to simulate the
frequent download with filtering to a multiple folders,
and moving/copying of messages at the same time to make sure that the messages folders do not get corrupted. I have done manual testing, but I do need automated testing.
Testing manually takes time and boring although it certainly makes me aware of some issues like "downloading can run only in main thread because a few databases are not
thread-safe, thus GUI-operaton suffers a lot during downloading!", and a few other performance issues, which when someone is forced to work with slow operations is bound to become aware of.
TIA
PS: BTW, Hiro's patch was concerned with 0-byte read error.
With a remote file system, a read may be interrupted and can return
0 octet read, but with |errno| being EINTR (or something) that indicates
that we should re-try read since the error is temporary.
This has not been addressed very well in my code even.
This issue of short read/write should be addressed.
OTOH, HOWEVER, after a couple of months testing,
it looks that the ->Read(), ->Write used in
message download code do not seem to pass the low-level error code as such.
It seems the issue is either taken care of inside the particular
I/O library they use and we don't have to face the issue (or testing is not complete. But causing it in real-world is rather difficult. We may need the LD_PRELOAD type of fake I/O routine to cause such short read/write intentionally to check the processing.).
But I can certainly say that routines, which are defined
under ./mozilla M-C portion of the tree,
that are called from the download code do seem to suffer from the lack
of EINTR-like error handling for retry.
Well, at least the versions of functions defined for POSIX (linux, mac os x, solaris, maybe.) to move/copy files DO suffer from the lack of EINTR-like processing.
I figured it out because there was a report that there were I/O errors when a user profile (with Mail directory) is in a remote CIFS mount and the user runs linux TB client. The user used imap.
However, there were no such errors with Windows TB client with the same remote
server.
Now I realize the routines to move/copy a message file after download of a message is
finished (move/copy may be invoked by filtering or after download is performed to a temporary file instead of Inbox directly)
is implemented in OS-dependent library files.
Basically a SINGLE Win32 system call is used for move/copy under Windows.
Yes a single system call handles move a file to somewhere (even to a remote file system).
It seems these WIN32 system calls handle short read/write,
the network errors and retries under the hood.
So no I/O errors are experienced by Windows TB user with CIFS-mounted user profile and Mail. Great.
Whereas the routines for POSIX OS to move/copy files were
handcoded AND THEY LACK EINTR processing. I know. I fixed the move file routine for error handling a year or so ago,
and wondered aloud why EINTR was not handled in the code. It still is not, I think.
So if excessive number of short read/write occurs in a corporate network. linux TB is toast in such environment.
The problem of linux TB failing miserably with CIFS-mounted user profile and mail directory lies there.
With 20/20 hindsight it may be crystal clear.
But it took me a while to realize the true cause of the issue: I needed to fix the
error handling issues in the download code first,
and only then I came to realize the cause of
the issue experienced by linux TB user with CIFS-mounted user profiel and mail directory.
That windows users don't experience the issue was a clue in the first place, but
even Windows users will suffer from the deficiency in the error handling framework in the message download code. So my work was not a complete waste, but actually
very important to make TB more robust in the face of failures.
Whew. All I want is a rock solid mail client (tm) as a mere mortal user.
It is a long hard climb to reach the nirvana, eh?
Comment 16•9 years ago
|
||
so we should revisit this after Chiaki's patches are done for the issues blocking bug 1121842
Depends on: 1121842
Whiteboard: [patchlove][has draft patch][needs new assignee] → [patchlove]
Comment 17•9 years ago
|
||
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #16)
> so we should revisit this after Chiaki's patches are done for the issues
> blocking bug 1121842
Thank you for reminding. I am trying to build OS X and Windows versions on TryServer and
making a small progress now.
Updated•7 years ago
|
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•