Open Bug 1242030 Opened 9 years ago Updated 3 days ago

Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500

Categories

(MailNews Core :: Networking, defect)

defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: ishikawa, Assigned: ishikawa)

References

(Blocks 4 open bugs)

Details

(Keywords: perf)

Attachments

(13 files, 113 obsolete files)

187.21 KB, application/pdf
Details
124.55 KB, text/plain
Details
18.96 KB, text/plain
Details
34.82 KB, image/png
Details
36.70 KB, image/png
Details
39.73 KB, image/png
Details
36.79 KB, image/png
Details
3.02 MB, application/zip
Details
2.43 MB, application/zip
Details
8.27 KB, patch
Details | Diff | Splinter Review
163.16 KB, text/plain
Details
48 bytes, text/x-phabricator-request
Details | Review
17.27 KB, text/plain
Details
I am loading merged patches from the bug ug 1122698, bug 1134527, bug 1134529, bug 1174500 The reason for consolidation is that the patch as a whole is much easier to read when consolidated. however, I have split the resulting patch into a few smaller chunks that address files under only certain directories. More rational and timeline will be posted shortly. The first patch is to address the following issue: To enhance error processing, I had to change the semantics of a couple of functions: - |discardNewMessage| does not close aOutputStream. (Clarification/Rationale in Bug 1121842, 1122698) - |finishdNewMessage| does not close aOutputStream. (Clarification/Rationale in Bug 1121842, 1122698) This required the changes to files under mailnews/import directory as well. The first patch in the following list addresses this need. A Nulling-filestream-after-close-under-IMPORT-directory.patch: Patches for... A fix-close-MERGED-base-dir.patch: Bug 1122698 Fixing unchecked Close, unc... A fix-close-MERGED-imap-dir.patch: Fixing unchecked Close, unchecked Flush... A fix-close-MERGED-local-dir.patch: Fixing unchecked Close, unchecked Flus... and another patch that is explained later: A enable-buffering-1st-step.patch (explained later in a different bugzilla.) TIA
Attached patch fix-close-MERGED-base-dir.patch (obsolete) — Splinter Review
fix for files under mailnews/base directory.
Assignee: nobody → ishikawa
Attached patch fix-close-MERGED-imap-dir.patch (obsolete) — Splinter Review
fixes to files under mailnews/imap directory
Attached patch fix-close-MERGED-local-dir.patch (obsolete) — Splinter Review
fixes to files under mailnews/local directory
I will post the rationale and roadmap of the patches shortly. TIA
Summary: Conslidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500 → Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500
Component: Untriaged → Networking
Product: Thunderbird → MailNews Core
A full explanation of the patches and road map is in bug 116055 comment 101 https://bugzilla.mozilla.org/show_bug.cgi?id=1116055#c101 I am excerpting the relevant part from post regarding the consolidation. ... [omission.]... ======================================== PLANNED ORDER of landing patches ======================================== bug 1116055 -> bug 1242030 (consolidation of 1122698, 1134527, 1134529, 1174500) -> bug 1242042 -> bug 1176857 -> bug 1242046 -> bug 1242050 -> bug 1170606 An overview of patches in bugzilla entries. ... [omitted] .... [2] Bug 1242030 - Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500 The |Seek| before writing each line was introduced, I think, when function to enable buffering inadvertently resets file position to the beginning. This may be a natural behavior of such functions, but I think it was not documented well and people were caught by surprise. I suspect that the excessive use of |Seek| was an attempt to put a band-aid over this issue, and when the attempt was made to enable buffering, a subtle issue of rewinding the file position seemed to cause file corruption. (I experienced such corruption myself, but thanks to the patch in [1], I could learn the resetting of file position occurred, which was traced to a function to enable buffering, and so I could fix the issue.) I have merged the four patches from bug 1122698, 1134527, 1134529, 1174500 mentioned below, into one patch set in bug 1242030, consisting of small patches to files under a directory each. Each patch is smaller than the whole gigantic consolidated patch since it now only address changes under a particular directory or two. The reason for consolidation is that patch as a whole is much easier to understand this way. (Error checks are uniformly visible in the consolidated patch whereas in the former separate patches, low-level I/O functions whose return values were not checked in the earlier patch are checked later. This makes the reader of the earlier patch sometimes wonder why we don't check the error status of a particular low-level I/O function when in fact such a function is checked in a later patch. Consolidated patch avoids this issue. To enhance error processing, I had to change the semantics of a couple of functions: - |discardNewMessage| does not close aOutputStream. (Clarification/Rationale in Bug 1121842, 1122698) - |finishdNewMessage| does not close aOutputStream. (Clarification/Rationale in Bug 1121842, 1122698) This required the changes to files under mailnews/import directory, too. The first patch in the following list addresses this need. Consolidated patch set in bug Bug 1242030 are split into the following smaller patches. Nulling-filestream-after-close-under-IMPORT-directory.patch fix-close-MERGED-base-dir.patch fix-close-MERGED-imap-dir.patch fix-close-MERGED-local-dir.patch and another patch that is explained later: enable-buffering-1st-step.patch (explained later in a different bugzilla, Bug 1242042.) [3] (Now merged in Bug 1242030) Bug 1122698 fix-close-step-1.patch: Bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of a series) [4] (Now merged in Bug 1242030) Bug 1134527 fix-close-part-2.patch: Bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series) [5] (Now merged in Bug 1242030) Bug 1134529 fix-close-part-3-write-error-check.patch: Bug 1134529: Check Write failures (part-3 of a series) [6] (Now merged in Bug 1242030, but then the essential part of bug 1174500 [was mistyped as 1117450 in a few bugzilla entries. :-( ] removing a |Seek| was spun off as a separate patch: Bug 1242046 ) removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4) ... [omitted] ... TIA
Attachment #8711165 - Flags: review?(neil)
Attachment #8711166 - Flags: review?
Attachment #8711166 - Flags: review? → review?(neil)
Attachment #8711167 - Flags: review?(neil)
Attachment #8711168 - Flags: review?(neil)
Part of updated patch set (Feb 18)
Attachment #8711165 - Attachment is obsolete: true
Attachment #8711165 - Flags: review?(neil)
Attached patch fix-close-MERGED-base-dir.patch (obsolete) — Splinter Review
Attachment #8711166 - Attachment is obsolete: true
Attachment #8711166 - Flags: review?(neil)
Attached patch fix-close-MERGED-imap-dir.patch (obsolete) — Splinter Review
Attachment #8711167 - Attachment is obsolete: true
Attachment #8711167 - Flags: review?(neil)
Attachment #8711168 - Attachment is obsolete: true
Attachment #8711168 - Flags: review?(neil)
This is a new addition on Feb 18. See the next comment for full details.
A full explanation of the updated patches and road map is in bug 116055 comment 107 https://bugzilla.mozilla.org/show_bug.cgi?id=1116055#c107 I am excerpting a relevant part for this bugzilla entry. ======================================== PLANNED ORDER of landing patches ======================================== bug 1116055 -> bug 1242030 (consolidation of 1122698, 1134527, 1134529, 1174500) -> bug 1242042 -> bug 1176857 -> bug 1242046 -> bug 1242050 -> bug 1170606 ======================================== An overview of patches in bugzilla entries. ---------------------------------------- [In February 2016, I fixed the errors observed ONLY UNDER WINDOWS when we use Maildir format for storage. Thus I needed to tweak my patch set somewhat and added a patch to take care of this issue and so I updated this write-up.] ... [2] Bug 1242030 - Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500 The |Seek| before writing each line was introduced, I think, when function to enable buffering inadvertently resets file position to the beginning. This may be a natural behavior of such functions, but I think it was not documented well and people were caught by surprise. I suspect that the excessive use of |Seek| was an attempt to put a band-aid over this issue, and when the attempt was made to enable buffering, a subtle issue of rewinding the file position seemed to cause file corruption. (I experienced such corruption myself, but thanks to the patch in [1], I could learn the resetting of file position occurred, which was traced to a function to enable buffering, and so I could fix the issue.) I have merged the four patches from bug 1122698, 1134527, 1134529, 1174500 mentioned below, into one patch set in bug 1242030, consisting of small patches to files under a directory each. Each patch is smaller than the whole gigantic consolidated patch since it now only address changes under a particular directory or two. The reason for consolidation is that patch as a whole is much easier to understand this way. (Error checks are uniformly visible in the consolidated patch whereas in the former separate patches, low-level I/O functions whose return values were not checked in the earlier patch are checked later. This makes the reader of the earlier patch sometimes wonder why we don't check the error status of a particular low-level I/O function when, in fact, such a function is checked in a later patch. Consolidated patch avoids this issue. To enhance error recovery processing, I had to change the semantics of a couple of functions: - |discardNewMessage| does not close aOutputStream. (Clarification/Rationale in Bug 1121842, 1122698) - |finishdNewMessage| does not close aOutputStream. (Clarification/Rationale in Bug 1121842, 1122698) (Feb 2016) EXCEPT for the version of DiscardNewMessage and FinishNewMessage for Maildir storage format. The original analysis and reasoning is given in Bug 1121842 in a verbose comment a la stream of consciousness monologue. This change required the modifications to files under mailnews/import directory, too. The first patch in the following list addresses this need.: Nulling-filestream-after-close-under-IMPORT-directory.patch *** begin Feb 2016 Feb **** Additional comment about this design But I had to change one thing in Feb 2016 to fix errors only observed under Windows on try-comm-central. (Feb 2016) I said, we don't close file stream EXCEPT for the version of DiscardNewMessage and FinishNewMessage for Maildir storage format. DiscardNewMessage and FinishNewMessage() DO close the stream in that case. This has something to do with renaming/removing a file cannot be done under Windows if an open file descriptor exists for that file. This was the cause of the failures of WINDOWS BUILD on try-comm-central. (Under linux or OSX, it is possible to do so.) My patch set was created originally under linux. However, my desire, explained in bug 1121842, to keep handling the Close() of the file stream to the newly downloaded/copied/moved message file (or mbox, etc.) done on the caller-side of the Finish/DiscardNewMessage() so that yet-to-be-devised better error-handling could be done in the logically higher-level of routines necessitated adding a third argument to the pair of functions to return the indicator whether these functions closed the passed file stream (as the 1st argument) before returning. By looking at the indicator, the callers can decide whether to close the filestream and check the error code on their own. (With the change, Windows DEBUG build on try-comm-central ran without a hitch!) The addition of third argument to Finish/DiscardNewMessage() necessitated an additional patch to modify the code under mailnews/import to import messages from OTHER e-mail clients such as OE5, etc. **** end Feb 2016 addition. Consolidated patch set in bug Bug 1242030 are split into the following smaller patches. Nulling-filestream-after-close-under-IMPORT-directory.patch fix-close-MERGED-base-dir.patch fix-close-MERGED-imap-dir.patch fix-close-MERGED-local-dir.patch (The next one is an addition in Feb 2016) *fix-close-MERGED-add-closedflag-IMPORT-directory.patch and another patch that is explained later: enable-buffering-1st-step.patch (explained later in a different bugzilla, Bug 1242042.) [3] (Now merged in Bug 1242030) Bug 1122698 fix-close-step-1.patch: Bug 1122698 Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends (part 1 of a series) [4] (Now merged in Bug 1242030) Bug 1134527 fix-close-part-2.patch: Bug 1134527 Add error value checking of Close() and Flush() (part-2 of a series) [5] (Now merged in Bug 1242030) Bug 1134529 fix-close-part-3-write-error-check.patch: Bug 1134529: Check Write failures (part-3 of a series) [6] (Now merged in Bug 1242030, but then the essential part of bug 1174500 [was mistyped as 1117450 in a few bugzilla entries. :-( ] removing a |Seek| was spun off as a separate patch: Bug 1242046 ) removing-a-Seek.patch: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends. (step 4) [6.5] (New in Feb 2016) An addition to 1242030 in Feb. fix-close-MERGED-add-closedflag-IMPORT-directory.patch This patch takes care of the modification of files under mailnews/import directory to take care of the addition of the third argument to |FinishNewMessage| and |DiscardNewMessage|. ...
This is a replacement to fix bit rot of fix-close-MERGED-add-closedflag-IMPORT-directory.patch that was uploaded previously. (There have been a Becky import addition, and this patch address the necessary patch in that source code.) This patch is part of the series of patch to enable buffered I/O (mostly interesting is the output part). These patches ought to be applied in this order. Patches marked with "*" are new updated uploads to cope with bit rot. (June 9th, 2016) *[1] Bug 1116055 check_fileptr_change.patch: Bug 1116055: Check the unexpected change of file seek position and ask the user to report it [2] Bug 1242030 - Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500 Consolidated patch set in bug Bug 1242030 are split into the following smaller patches. Nulling-filestream-after-close-under-IMPORT-directory.patch fix-close-MERGED-base-dir.patch fix-close-MERGED-imap-dir.patch fix-close-MERGED-local-dir.patch *fix-close-MERGED-add-closedflag-IMPORT-directory.patch [7] (New in 2016 ) Bug 1242042 - Enabling buffering for file stream to write message for C-C TB A enable-buffering-1st-step.patch: Enabling buffering first step. [8] Bug 1176857 1116055-acelist.patch: One liner patch posted in Bug 1116055 to return a buffered stream from mbox::getNewOutputStream by aceman [9] (NEW in 2016) Bug 1242046: Removing unnecessary |Seek| that caused the C-C TB to operate slowly in terms of I/O removing-a-Seek-rev02.patch: removing a Seek that is performance pig! This was extracted into a new patch from the patches above [5]-[8]. [10] New in 2016 1242050 - C-C TB: use a default output stream buffer size of 16KB instead of 4KB buffer-by-16K-by-default.patch: output buffer by 16KB by default The patches have been tested locally under linux (with |make mozmill|, etc) and on c-c try server. Please check the following tryserver job for testing result, etc. There are no new errors aside from other errors noticed by others. See for example: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=2a082a9b933239bb49f22bab3f8bf6b445326127 Aside from the know existing errors, there are no new errors introduced. TIA
Attachment #8721169 - Attachment is obsolete: true
(In reply to ISHIKAWA, Chiaki from comment #12) > > See for example: > https://treeherder.mozilla.org/#/jobs?repo=try-comm- > central&revision=2a082a9b933239bb49f22bab3f8bf6b445326127 Chiaki, given the probability that patches likely will take some weeks to get reviewed, how comfortable would you feel with me offering this try build to a few properly coached technical testers who are *imap* only, so that we can get some feedback?
Flags: needinfo?(ishikawa)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #13) > (In reply to ISHIKAWA, Chiaki from comment #12) > > > > See for example: > > https://treeherder.mozilla.org/#/jobs?repo=try-comm- > > central&revision=2a082a9b933239bb49f22bab3f8bf6b445326127 > > Chiaki, given the probability that patches likely will take some weeks to > get reviewed, how comfortable would you feel with me offering this try build > to a few properly coached technical testers who are *imap* only, so that we > can get some feedback? That would be great. I am quie confident that this does not introduce any new bugs although it may not solve some known issues (aside from the use of larger buffers and the confirmation of the legacy error handling's working with the patch!) For that matter, I am not sure who the reviewers ought to be. Thank you for the help!
Flags: needinfo?(ishikawa)
Comment on attachment 8721165 [details] [diff] [review] Nulling-filestream-after-close-under-IMPORT-directory.patch >+ // XXX : I wonder what happens if reusable stream is returned >+ // in the following GetNewMsgOutputStream. >+ // In the next while loop, >+ // wouln't the previous outStream with open file be simply >+ // overwritten and we leak file descriptor and such? >+ // It looks that if outStream is not nullptr >+ // new File stream is NOT created and assigned. Old file stream is used. getter_Addrefs actually clears its variable first, so you don't actually pass the previous outStream in. Instead what happens is that GetNewMsgOutputStream saves the old file stream internally. If it's still open, then it just returns it again, otherwise it opens a new stream. >- if (!reusable) >+ if (!reusable) { > outStream->Close(); >+ outStream = nullptr; >+ } > } >- if (outStream) >+ if (outStream) { > outStream->Close(); >+ outStream = nullptr; >+ } [A non-resuable stream is already closed at this point, so it doesn't matter that we're trying to close it again, although there are some cases such as m_dstOutputStream below where it does get set to null, so the consistency would be good for clarity.] >+ // XXX: I believe m_dstOutputStream should be closed here. >+ // Otherwise, a reusable stream won't be closed at all. Maybe we >+ // leak file descriptor. No, it gets closed in cMbxScanner::CleanUp.
(In reply to neil@parkwaycc.co.uk from comment #15) > Comment on attachment 8721165 [details] [diff] [review] > Nulling-filestream-after-close-under-IMPORT-directory.patch > Neil, thank you for your comment. I have been on a summer holiday week: I just came back home yesterday. I will try to incorporate your comment into comments in my patch set. TIA
Trying to accommodate Neal's comment in comment 15.
Attachment #8721165 - Attachment is obsolete: true
Attachment #8790812 - Flags: review?(neil)
Attached patch fix-close-MERGED-base-dir.patch (obsolete) — Splinter Review
Fixing the recent bitrot due to changes of NS_WARN_IF_FALSE to NS_WARNING_ASSERTION. (Some of the use of NS_WARN_IF_FALSE may not be proper to change to NS_WARNING_ASSERTION since I want the error/warning message printed even normal distributed build, but for now, getting the build done is utmost priority.)
Attachment #8721166 - Attachment is obsolete: true
Attached patch fix-close-MERGED-imap-dir.patch (obsolete) — Splinter Review
Fix bitrot. See previous comment.
Attachment #8721167 - Attachment is obsolete: true
Fix bitrot. See previous comment.
Attachment #8721168 - Attachment is obsolete: true
fix bitrot. See previous comment.
Attachment #8761282 - Attachment is obsolete: true
Attached patch fix-close-MERGED-base-dir.patch (obsolete) — Splinter Review
Fixed bitrot and a few other merge issues since Sept 21.
Attachment #8790814 - Attachment is obsolete: true
Attached patch fix-close-MERGED-imap-dir.patch (obsolete) — Splinter Review
Fixed bitrot and a merge issues since Sep 21.
Attachment #8790816 - Attachment is obsolete: true
Fixed bitrot and merge issue since Sep 13.
Attachment #8790820 - Attachment is obsolete: true
Fixed bitrot and a merge issue since Sep 13.
Attachment #8790824 - Attachment is obsolete: true
Depends on: 1116055
Blocks: 1242042
There are many references to maildir. Should this bug block re-enabling of maildir? Also, it would be great to drive this in in the next couple weeks before we lose the window of version 52. Are there other potential reviewers for the patches where Neil is not referenced?
Flags: needinfo?(acelists)
Also, thinking out loud, I'm not sure we'd want approvals for the import patches to block progress on the dependent bugs. (unless not having them presents a significant danger)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #32) > Also, thinking out loud, I'm not sure we'd want approvals for the import > patches to block progress on the dependent bugs. > (unless not having them presents a significant danger) There are a few places where function signature changed and need to be accommodated.
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #31) > There are many references to maildir. Should this bug block re-enabling of > maildir? > > Also, it would be great to drive this in in the next couple weeks before we > lose the window of version 52. Are there other potential reviewers for the > patches where Neil is not referenced? I am wondering who the reviewers would be. If you can suggest some and assign as reviewers, I would appreciate it.
Depends on: 939548
I missed a patch that was created last year due to a hard disk problem last summer. So the order of patch application is bug 1116055 => bug 939548 (this was missed until this weekend) -> bug 1242030 (consolidation of 1122698, 1134527, 1134529, 1174500) -> bug 1242042 -> bug 1176857 -> bug 1242046 -> bug 1242050 -> bug 1170606
Attached patch fix-close-MERGED-base-dir.patch (obsolete) — Splinter Review
Fixed bitrot: #if DEBUG -> #ifdef DEBUG, NS_WARNING_ASSERTION -> open coding since I want an behavior that is both in effect during debug and optimized build.
Attachment #8796871 - Attachment is obsolete: true
Attached patch fix-close-MERGED-imap-dir.patch (obsolete) — Splinter Review
Bitrot. One line change, etc.
Attachment #8796872 - Attachment is obsolete: true
Bitrot, rebase, etc.
Attachment #8796873 - Attachment is obsolete: true
Bitrot, rebase, etc.
Attachment #8796874 - Attachment is obsolete: true
Actually it is not great to split patches according to directories, unless it is a global change (like refactoring some constructs (e.g. removing Iterator()), or adding checking of return values from Close() all over the place). Maybe at least the changes needed for DiscardNewMessage could have been in a single patch so we could see what that actually does across the tree. But it is probably too late to rearrange the patches around after all the work you did already. And maybe some reviewer asked for the consolidation in the past. What I am more concerned with is that you add a ton of rv1...rv4 variables and print out debugging messages if there are errors. Those variable names are unusual in the current tree. But you do that to actually NOT handle the errors returned. The function is not exited and some other rv from other operation is returned. So what is the point of this? The I/O errors are not handled, only printed out in messages that only developers will maybe see. I thought you wanted to handle the errors to prevent data corruption, which you say may be made worse by more buffering.
Flags: needinfo?(acelists)
Version: unspecified → Trunk
(In reply to :aceman from comment #40) > Actually it is not great to split patches according to directories, unless > it is a global change (like refactoring some constructs (e.g. removing > Iterator()), or adding checking of return values from Close() all over the > place). > > Maybe at least the changes needed for DiscardNewMessage could have been in a > single patch so we could see what that actually does across the tree. > Let me think about it. If the reproducing the patches in this fashion is not difficult I may do so over this weekend. > But it is probably too late to rearrange the patches around after all the > work you did already. And maybe some reviewer asked for the consolidation in > the past. > > What I am more concerned with is that you add a ton of rv1...rv4 variables > and print out debugging messages if there are errors. Those variable names > are unusual in the current tree. But you do that to actually NOT handle the > errors returned. The function is not exited and some other rv from other > operation is returned. So what is the point of this? The I/O errors are not > handled, only printed out in messages that only developers will maybe see. I > thought you wanted to handle the errors to prevent data corruption, which > you say may be made worse by more buffering. You noticed a very good point. Let me explain here again. First off, I am not claiming my patches are perfect, but these were necessary to recover from download errors when I simulate the dowload errors by unplugging network cable to simulate errors when my Mail store was on remote file system. Basically the reason where the error return value of low-level I/O function was checked but not serious action is taken is as follows. [In a nutshell, there is no clear-cut agreed upon error recovery scheme in place, nor TB does not seem to be designed and coded with such error recovery in mind in the first place as far as I could tell [except for the crude transaction-like handling of message download], simply I DON'T KNOW WHAT TO DO with error return values in many place in this legacy code base.] Let me explain: During my debug efforts to make sure that error return delayed by buffering from |Write| to |Close| is handled at least minimally sanely, I tested the mail download errors. Then I noticed this - all I could effectively test was the mail downloading failures, for that, there was a very crude transaction-mechanism that uses DiscardNewMessage when there is an error (which unfortunately didn't work before until proper I/O error checks were introduced by my patches). So error returns related to this processing. *ARE* checked and used for such error-recovery actions. - there *are* other I/O errors that I tried to check (I mean I inserted the "rv =" and dumped the value there. But, since there is no overall well-described error recovery strategy, all I could do was to note that there was an error and print the error value (and this helped me to notice the issue with Maildir under Windows, which would have been impossible without such dumps.). My intention was to place the dumps in "ifdef DEBUG", "#endif" so that such dumps are seen in DEBUG build. (But many serious errors are also dumped in non-DEBUG build, too, so that someone using TB would notice). *If* and *when* such an error recovery scheme is devised and agreed upon, the checks (rv = ...) will be now utilized to guide such error recover actions. In the meantime, the dump in DEBUG build should help us to detect where errors occur, and may even ask particular users who suffer errors due to flakey network connection to remote file servers (and this actually is more important in the patch series in bug 1170606 to handle short read) to learn what errors occur in the field, indeed. So my short answer is for many cases, I have no idea what error recovery actions should be taken, frankly speaking [just understanding how some copy operations done asynchronously is beyond this casual user of TB, and devising error recovery strategy is more so], but for future development efforts, I marked where the error return should be checked so that people can work on such future improvement easily. Once the error recovery, etc. is well defined, or some quick advices are given, I will use the error return values to guide some suggested actions. In the meantime, the dump would help any erratic behavior due to I/O errors in the field (assuming people test their environment using DEBUG version once they realize their environment is causing problems..) As I think hard about it, there are probably more users who get bitten by the issue addressed by patches in the next pending bug 1170606, but reaching that bugzilla is still a long way to go. Again, my patches are not complete, and I wished I could wrap up the unhandled cases, but simply I could not, for now. > I thought you > wanted to handle the errors to prevent data corruption, which you say may be > made worse by more buffering. This part is addressed by my patches as far as the major error modes that I could induce by artificial network errors against CIFS/SAMBA, NFS network systems are concerned. I am confident, this alone will save some users in the field (and without my patches, the crude transaction handling does NOT work at all was what I found out. In this sense alone, the patches are worthwhile.) So it is a small improvement. We have to improve more in the future if there is enough man-power, but for now, it is a humble start. I am hoping that the next major patch series in bugzilla bug 1170606 would interest some corporate users, who need to use remote network file system to store their profile including Mail store there [linux's I/O does not handle low-level error retry and short read/write whereas Windows's do. So linux TB has many problems there.], to throw in some resources for TB development, but maybe I am having a pipe dream. [And I also noticed that we need to fix M-C tree as well. Ugh...] TIA Chiaki
I realize that some description about the patch content in comment 11 and comment 12 needs updating due to the latest other landed patches, bitrot, and my better understanding of the code. I will try to update that in the next few days *IIF* I have the time. TIA
I have been updating the patches to address the bitrot. I have found a couple of incorrect merge/consolidation while I am doing it. That is good. I would post them Sunday evening after running through try-comm-central. While I was doing the updating of the patches, I realized there was a very pressing need for using rv2, rv3, rv4, etc., at least different error variables, instead of just using single rv. Namely, the legacy code used |rv| for error processing. Some times it uses |rv| in a contorted manner which I dislike very much and I think rkent dislikes it, too. rv = someFunction(). if (NS_SUCCEEDED(rv)) { .... rv = other_function(); } ... if (NS_SUCCEEDED(rv)) { ... rv = still_other_function(); ... } return rv; What |rv| holds at the end is anyone's guess. We can only learn it at runtime. Also, trying to modify the code by introduciing other error checks is made difficult due to the above rigid use of single variable |rv| for conditional expression. As soon as we assign some new error return value to |rv|, we break the logic of legacy code. But I needed to introduce the checks for many new hither-to unchecked error returns. So I needed to come up with new variables to hold the newly checked return values. Yes, it is unfortunate that I did use generic unimaginative names. (But so is |rv|.) At least, by not touching |rv|, where I checked and dumped error return values only [hoping that we would come to handle the proper error recovery in the future], at least the legacy error handling based on |rv| (or the lack of it) is not disturbed. That is, rv = someFunction(). if (NS_SUCCEEDED(rv)) { .... rv = other_function(); } ... // By using rv2 below, I don't disturb the legacy error processing (or the lack of it) // based on rv. nsresult rv2 = formally_unchecked_IO_operation(); if (NS_FAILED(rv2)) { do something, including return rv2 if necessary. } ... if (NS_SUCCEEDED(rv)) { ... rv = still_other_function(); ... } return rv; [Of course, you will find that some parts DO mix and match the |rv|, |rv1|, etc. for proper [that looked to me proper] error opration]. Also, some I/O errors occur independent of each other in an file-level operation (sometimes they are related and occur in cascade). For good error reporting, we need to record each of these errors into different variables, and may decide to report the error based on a predefined rule of priority. A case in point is a function called CopyToNative(). My memory is hazy, but either this function or its friend was responsible for failing to report the error of moving/copying a message to another folder. That was a bad dataloss error. While debugging the problem, I noticed the function was responsible for not reporting the error, and there are a few possible error cases, and so I introduced different variables to record each of the error (well, back then, I had the composure to come up with aptly named variables instead of |rv1|, |rv2|, etc.), and used the variables to report the errors by favoring some errors over the others. See https://dxr.mozilla.org/comm-central/source/mozilla/xpcom/io/nsLocalFileUnix.cpp#778 especially the code starting around 887 until the function returns error value at the end. I know the use of |rv1|, |rv2|, |rv3|, etc. sucks and ugly, but there were reasons for them. (And I didn't feel the necessity to come up with fancy variable names just for error reporting in a code base that lacks proper error checks in so many places.) TIA
Fixed bitrot and modified some comments so that they are more "appropriate" in C-C source tree.
Attachment #8790812 - Attachment is obsolete: true
Attachment #8790812 - Flags: review?(neil)
Attachment #8807617 - Flags: review?(neil)
Fixed bitrot and modified some comments so that they are more "appropriate" in C-C source tree.
Attachment #8798559 - Attachment is obsolete: true
Fixed bitrot and modified some comments so that they are more "appropriate" in C-C source tree.
Attachment #8798545 - Attachment is obsolete: true
Fixed bitrot and modified some comments so that they are more "appropriate" in C-C source tree.
Attachment #8798560 - Attachment is obsolete: true
Fixed bitrot and modified some comments so that they are more "appropriate" in C-C source tree.
Attachment #8798562 - Attachment is obsolete: true
If you're waiting for Neil, you can wait until Christmas 2029 ;-)
(In reply to Jorg K (GMT+1) from comment #49) > If you're waiting for Neil, you can wait until Christmas 2029 ;-) Will you take up the review, Jorg, then? I need someone's input. I don't have a strong opinion of coding style. etc. All I want is to put in the essential patches for enabling buffered-write. It doesn't make sense to overwhelm the local OS or remote file server with 75-octet writes over and over again for 5MB pdf attachment. (Each December we have a large technology exhibition, and to prepare the handout to the visitors, I need to overlook the drafts for the handout, and about 1000 e-mails with attachments fly in November alone. Failure to use buffered-write *IS* noticeable when you need to save so many relatively large attachments.) The patches have been in my local queue for a very long time now., and I want to get rid of them soon because it interferes with working on new bugs I find. TIA
(In reply to ISHIKAWA, Chiaki from comment #50) > Will you take up the review, Jorg, then? Yes, I'll do the first pass, then over to Aceman ;-)
(In reply to Jorg K (GMT+1) from comment #51) > (In reply to ISHIKAWA, Chiaki from comment #50) > > Will you take up the review, Jorg, then? > Yes, I'll do the first pass, then over to Aceman ;-) Thanks a lot! My day job gets in the way due to some big meetings and a conference in the next 6-7 weeks, but I will try.
Comment on attachment 8807617 [details] [diff] [review] Nulling-filestream-after-close-under-IMPORT-directory.patch That about the other four patches in the bug? We do one by one?
Attachment #8807617 - Flags: review?(neil) → review?(jorgk)
Comment on attachment 8807617 [details] [diff] [review] Nulling-filestream-after-close-under-IMPORT-directory.patch Review of attachment 8807617 [details] [diff] [review]: ----------------------------------------------------------------- Please address the nits below. ::: mailnews/import/applemail/src/nsAppleMailImport.cpp @@ +564,5 @@ > + // internally. If it's still open, then it just returns it > + // again, otherwise it opens a new stream. > + // In a nutshell, if the old outStream was still open, a new > + // File stream is NOT created and assigned. Old file stream is > + // used instead. Please remove comment. This is not the place to capture this information. @@ +586,2 @@ > outStream->Close(); > + outStream = nullptr; I'm OK with this and all the ones that follow. @@ +591,5 @@ > + // A non-resuable stream is already closed at this point, so it > + // doesn't matter that we're trying to close it again, although > + // there are some cases such as m_dstOutputStream below where it > + // does get set to null, so the consistency would be good for > + // clarity. That is why outStream is set to nullptr above and below. Remove comment. ::: mailnews/import/oexpress/nsOE5File.cpp @@ +321,5 @@ > + // it's still open, then it just returns it again, otherwise it > + // opens a new stream. > + // In a nutshell, if the old outStream was still open, a new File > + // stream is NOT created and assigned. Old file stream is used > + // instead. Remove comment. ::: mailnews/import/oexpress/nsOEMailbox.cpp @@ +273,5 @@ > + if (m_mbxFileInputStream) { > + rv = m_mbxFileInputStream->Close(); > + if (NS_FAILED(rv)) { > + IMPORT_LOG1("CleanUp: m_mbxFileInputStream failed: 0x%08x\n", (unsigned) rv); > + } We don't need to log this. @@ +280,5 @@ > + if (m_dstOutputStream) { > + rv = m_dstOutputStream->Close(); > + if (NS_FAILED(rv)) { > + IMPORT_LOG1("CleanUp: m_dstOutputStream failed: 0x%08x\n", (unsigned) rv); > + } And this one either. @@ +353,5 @@ > m_dstOutputStream->Close(); > m_dstOutputStream = nullptr; > } > + > + // cf. reusable m_dstOutputStream does not have to be closed here. cf. meaning? ::: mailnews/import/outlook/src/nsOutlookMail.cpp @@ +442,2 @@ > outputStream->Close(); > + outputStream = nullptr; // code uniformity. Remove comment and trailing space.
(In reply to Jorg K (GMT+1) from comment #53) > Comment on attachment 8807617 [details] [diff] [review] > Nulling-filestream-after-close-under-IMPORT-directory.patch > > [W]hat about the other four patches in the bug? We do one by one? Indeed. I suggest we treat the larger, non-import patches with higher importance than the import patches.
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #55) > (In reply to Jorg K (GMT+1) from comment #53) > > Comment on attachment 8807617 [details] [diff] [review] > > Nulling-filestream-after-close-under-IMPORT-directory.patch > > > > [W]hat about the other four patches in the bug? We do one by one? > > Indeed. I suggest we treat the larger, non-import patches with higher > importance than the import patches. They have to be checked in parallel, sort of, since the changes need to be applied in one bunch. I can't do a lot until Thursday, but the comments in Comment 54 are noted. TIA
Comment on attachment 8807617 [details] [diff] [review] Nulling-filestream-after-close-under-IMPORT-directory.patch Ready when you are.
Attachment #8807617 - Flags: review?(jorgk)
(In reply to Jorg K (GMT+1) from comment #54) > Comment on attachment 8807617 [details] [diff] [review] > Nulling-filestream-after-close-under-IMPORT-directory.patch > > Review of attachment 8807617 [details] [diff] [review]: > ----------------------------------------------------------------- > > Please address the nits below. > > ::: mailnews/import/applemail/src/nsAppleMailImport.cpp > @@ +564,5 @@ > > + // internally. If it's still open, then it just returns it > > + // again, otherwise it opens a new stream. > > + // In a nutshell, if the old outStream was still open, a new > > + // File stream is NOT created and assigned. Old file stream is > > + // used instead. > > Please remove comment. This is not the place to capture this information. > DONE > @@ +586,2 @@ > > outStream->Close(); > > + outStream = nullptr; > > I'm OK with this and all the ones that follow. Thansk. > @@ +591,5 @@ > > + // A non-resuable stream is already closed at this point, so it > > + // doesn't matter that we're trying to close it again, although > > + // there are some cases such as m_dstOutputStream below where it > > + // does get set to null, so the consistency would be good for > > + // clarity. That is why outStream is set to nullptr above and below. > > Remove comment. DONE > > ::: mailnews/import/oexpress/nsOE5File.cpp > @@ +321,5 @@ > > + // it's still open, then it just returns it again, otherwise it > > + // opens a new stream. > > + // In a nutshell, if the old outStream was still open, a new File > > + // stream is NOT created and assigned. Old file stream is used > > + // instead. > > Remove comment. > DONE > ::: mailnews/import/oexpress/nsOEMailbox.cpp > @@ +273,5 @@ > > + if (m_mbxFileInputStream) { > > + rv = m_mbxFileInputStream->Close(); > > + if (NS_FAILED(rv)) { > > + IMPORT_LOG1("CleanUp: m_mbxFileInputStream failed: 0x%08x\n", (unsigned) rv); > > + } > > We don't need to log this. Removed. > > @@ +280,5 @@ > > + if (m_dstOutputStream) { > > + rv = m_dstOutputStream->Close(); > > + if (NS_FAILED(rv)) { > > + IMPORT_LOG1("CleanUp: m_dstOutputStream failed: 0x%08x\n", (unsigned) rv); > > + } > > And this one either. Removed, but I left an XXX TODO comment. > > @@ +353,5 @@ > > m_dstOutputStream->Close(); > > m_dstOutputStream = nullptr; > > } > > + > > + // cf. reusable m_dstOutputStream does not have to be closed here. > > cf. meaning? Originally I had thought that I need to create an else-clause, and close the reuable m_dstOutputStream there. else { close m_dstOutputStream } However, it was pointed out that we don't have to close the reusable one here since it is closed in Cleanup. Either I put else { // no action since reusable m_dstOutputStream is closed in CleanUp. } explicitly, or modify the comment? > > ::: mailnews/import/outlook/src/nsOutlookMail.cpp > @@ +442,2 @@ > > outputStream->Close(); > > + outputStream = nullptr; // code uniformity. > > Remove comment and trailing space. DONE TIA
Attachment #8807617 - Attachment is obsolete: true
Attachment #8809861 - Flags: review?(jorgk)
Handle bitrot caused by the change in Nulling-filestream-after-close-under-IMPORT-directory.patch. Once 8809861: Nulling-filestream-after-close-under-IMPORT-directory.patch is checked in, all other fours including this one need to be checked simultaneously. Should I set the review flag to Jorg myself?
Attachment #8807622 - Attachment is obsolete: true
Flags: needinfo?(vseerror)
Flags: needinfo?(jorgk)
It might be easier to merge the patch in fix-close-MERGED-add-closedflag-IMPORT-directory.patch and Nulling-filestream-after-close-under-IMPORT-directory.patch then it will be a single patch along with three other patches that must go in at the same time. (We can save the uploading of modified fix-close-MERGED-add-closedflag-IMPORT-directory.patch due to the bitrot caused by the change in Nulling-filestream-after-close-under-IMPORT-directory.patch ) The split of Nulling-filestream-after-close-under-IMPORT-directory.patch and fix-close-MERGED-add-closedflag-IMPORT-directory.patch is purely historical. But if Nulling-filestream-after-close-under-IMPORT-directory.patch can go in soon, it is OK. TIA
Comment on attachment 8809863 [details] [diff] [review] bug 1242030-e fix-close-MERGED-add-closedflag-IMPORT-directory.patch > Should I set the review flag to Jorg myself? In future, yes. AIUI Jorg has graciously agreed to review your patches and the two of you (with aceman) are doing a great job.
Flags: needinfo?(vseerror)
Flags: needinfo?(jorgk)
Attachment #8809863 - Flags: review?(jorgk)
Comment on attachment 8809861 [details] [diff] [review] Nulling-filestream-after-close-under-IMPORT-directory.patch Review of attachment 8809861 [details] [diff] [review]: ----------------------------------------------------------------- You didn't do what you said you'd be doing ;-( ::: mailnews/import/applemail/src/nsAppleMailImport.cpp @@ +559,5 @@ > + // outStream with open file is not overwritten and we do NOT > + // leak file descriptor and such. Details: getter_Addrefs > + // actually clears its variable first, so you don't actually > + // pass the previous outStream in. Instead GetNewMsgOutputStream > + // saves the old file stream internally. I think we agreed to remove the comment since this is not the place to capture this information. Besides, there are two implementations of this, mbox and maildir, and I'm not sure that the comment is actually correct. ::: mailnews/import/oexpress/nsOE5File.cpp @@ +317,5 @@ > + // file descriptor and such. > + // Details: getter_Addrefs actually clears its variable first, so > + // you don't actually pass the previous outStream in. Instead > + // GetNewMsgOutputStream saves the old file stream internally. > + remove. ::: mailnews/import/oexpress/nsOEMailbox.cpp @@ +275,5 @@ > + m_mbxFileInputStream = nullptr; > + } > + if (m_dstOutputStream) { > + rv = m_dstOutputStream->Close(); > + // XXX TODO we should check write error Remove. One comment above it enough. @@ +350,5 @@ > } > + > + // cf. reusable m_dstOutputStream does not have to be closed here. > + // It gets closed in CMbxScanner::CleanUp. > + The comment is fine if you remove the empty line. I asked what "cf." means, but you haven't answered. So I'm asking again. I think it would be best not to use abbreviations which are uncommon. See: https://en.wikipedia.org/wiki/Cf.
Attachment #8809861 - Flags: review?(jorgk)
Comment on attachment 8809863 [details] [diff] [review] bug 1242030-e fix-close-MERGED-add-closedflag-IMPORT-directory.patch Review of attachment 8809863 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/import/applemail/src/nsAppleMailImport.cpp @@ +572,5 @@ > > // add the data to the mbox stream > if (NS_SUCCEEDED(nsEmlxHelperUtils::AddEmlxMessageToStream(currentEntry, outStream))) { > mProgress++; > + msgStore->FinishNewMessage(outStream, msgHdr, &closed); I'm not sure how this would compile since the function takes two parameters only. Do you intend to modify this function? Not in this patch though.
Attachment #8809863 - Flags: review?(jorgk)
Modified patch. (In reply to Jorg K (GMT+1) from comment #62) > Comment on attachment 8809861 [details] [diff] [review] > Nulling-filestream-after-close-under-IMPORT-directory.patch > > Review of attachment 8809861 [details] [diff] [review]: > ----------------------------------------------------------------- > > You didn't do what you said you'd be doing ;-( > > ::: mailnews/import/applemail/src/nsAppleMailImport.cpp > @@ +559,5 @@ > > + // outStream with open file is not overwritten and we do NOT > > + // leak file descriptor and such. Details: getter_Addrefs > > + // actually clears its variable first, so you don't actually > > + // pass the previous outStream in. Instead GetNewMsgOutputStream > > + // saves the old file stream internally. > > I think we agreed to remove the comment since this is not the place to > capture this information. > Sorry, misunderstanding on my part. I thought you want the removal of the latter half of the comment paragraph. Previous comment: >> ::: mailnews/import/applemail/src/nsAppleMailImport.cpp >> @@ +564,5 @@ >> > + // internally. If it's still open, then it just returns it >> > + // again, otherwise it opens a new stream. >> > + // In a nutshell, if the old outStream was still open, a new >> > + // File stream is NOT created and assigned. Old file stream is >> > + // used instead. >> >> Please remove comment. This is not the place to capture this information. All right, I will remove the comment entirely. But where do you think would be the appropriate space for the comment which I gleaned from a comment from someone in the know. That was helpful in understanding that the code in question does not leak file descriptor. > Besides, there are two implementations of this, mbox and maildir, and I'm > not sure that the comment is actually correct. Maybe my original comment was not quite clear enough, but so was the comments I got. Basically, mbox: returns reuseable stream. imap: it seems to return reuseable stream. maildir: does not return reuseable stream. So there is a subtle issue with resuseable vs non-resuseable stream, but From an old comment: + // : A non-resuable stream is already closed at this point, so it + // doesn't matter that we're trying to close it again, although + // there are some cases such as m_dstOutputStream below where it + // does get set to null, so the consistency would be good for + // clarity. That is why outStream is set to nullptr above and below. I thought there was some additional comment somewhere about this subtle issue later added based on additional comments in the bugzilla. (Not sure where I put the comment, though...) I am afraid, though, if we need to explain this with such a verbose comment to a NEWCOMER who needs to maintain the code, either the original design was broken or we did not leave enough documentation (I think the latter definitely adds to the difficulty of maintenance, and I feel we should put all the relevant comments we could glean from the seasoned hands during the discussion for long-term maintenance reasons. Yeah, such addition can't be uniforma and anecdotal, but what else can we do with the limited man-power?) > ::: mailnews/import/oexpress/nsOE5File.cpp > @@ +317,5 @@ > > + // file descriptor and such. > > + // Details: getter_Addrefs actually clears its variable first, so > > + // you don't actually pass the previous outStream in. Instead > > + // GetNewMsgOutputStream saves the old file stream internally. > > + > > remove. I removed the whole comment including the lines preceding lines. > > ::: mailnews/import/oexpress/nsOEMailbox.cpp > @@ +275,5 @@ > > + m_mbxFileInputStream = nullptr; > > + } > > + if (m_dstOutputStream) { > > + rv = m_dstOutputStream->Close(); > > + // XXX TODO we should check write error > > Remove. One comment above it enough. DONE. > @@ +350,5 @@ > > } > > + > > + // cf. reusable m_dstOutputStream does not have to be closed here. > > + // It gets closed in CMbxScanner::CleanUp. > > + > > The comment is fine if you remove the empty line. > I asked what "cf." means, but you haven't answered. So I'm asking again. I > think it would be best not to use abbreviations which are uncommon. See: > https://en.wikipedia.org/wiki/Cf. I now I see you asked what "cf." means". I think I originally used cf. in the sense of the following, but not entirely sure of how to include the line information a la mxr (now dxr) and failed to put that location information following cf. >From Wikipedia, the free encyclopedia >The abbreviation cf. (short for the Latin: confer, meaning > "compare")[1] is used in writing to refer the reader to other > material to make a comparison with the topic being discussed. It is > used to form a contrast, for example: "Abbott (2010) found supportive > results in her memory experiment, unlike those of previous work > (cf. Zeller & Williams, 2007)."[2] Many usage guides recommend > against using "cf." in place of "see also" to indicate sources of > additional examples or supporting evidence.[3][4]c. Back when I was a grad student many years ago, not many usage guides discouraged the use of "cf.". So the time changes :-) Anyway, since now I realize CMbxScanner::CleanUp is in the SAME file, I changed the comment lines to // reusable m_dstOutputStream does not have to be closed here. // It gets closed in CMbxScanner::CleanUp in this file. TIA
Attachment #8809861 - Attachment is obsolete: true
Attachment #8810174 - Flags: review?(jorgk)
(In reply to Jorg K (GMT+1) from comment #63) > Comment on attachment 8809863 [details] [diff] [review] > fix-close-MERGED-add-closedflag-IMPORT-directory.patch > > Review of attachment 8809863 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: mailnews/import/applemail/src/nsAppleMailImport.cpp > @@ +572,5 @@ > > > > // add the data to the mbox stream > > if (NS_SUCCEEDED(nsEmlxHelperUtils::AddEmlxMessageToStream(currentEntry, outStream))) { > > mProgress++; > > + msgStore->FinishNewMessage(outStream, msgHdr, &closed); > > I'm not sure how this would compile since the function takes two parameters > only. > > Do you intend to modify this function? Not in this patch though. Yes, I do intend to change this function. The order of the patch application ought to be 1. Nulling-filestream-after-close-under-IMPORT-directory.patch the following four patches MUST be applied simultaneously. fix-close-MERGED-base-dir.patch fix-close-MERGED-imap-dir.patch fix-close-MERGED-local-dir.patch *fix-close-MERGED-add-closedflag-IMPORT-directory.patch The change of the said function is done in fix-close-MERGED-base-dir.patch. TIA
Change to reflect the change in 8810174: Nulling-filestream-after-close-under-IMPORT-directory.patch
Attachment #8810175 - Flags: review?(jorgk)
Comment on attachment 8810174 [details] [diff] [review] bug 1242030-a Nulling-filestream-after-close-under-IMPORT-directory.patch OK, this doesn't do any damage and nulling 'outputStream' after closing it for good housekeeping seems like a good idea that adds robustness. Now regarding the subject of comments and documentation. Sadly we have a lot of poorly documented "legacy" code. What would help is to document it. What does *not* help is interspersing some half-true comments which originated from some hearsay. Take GetNewMsgOutputStream. If DXR serves me well, there are two implementations: nsMsgBrkMBoxStore::GetNewMsgOutputStream() and nsMsgMaildirStore::GetNewMsgOutputStream() to cater for the two stores we support: mbox and maildir. In comment #64 you got yourself confused: you mention mbox, *imap* and maildir. Imap is not a store type as all, it is a protocol. And yes, we retrieve messages via imap and then store them offline locally in one of the two stores. The concept of a "reusable" stream is documented here: https://dxr.mozilla.org/comm-central/rev/82f47febc327f5487158ed2a09331e195ba71ae2/mailnews/base/public/nsIMsgPluggableStore.idl#120 You see what I'm saying? Anecdotal half-truth interspersed in a comment somewhere at a call site somewhere far away from the function doing the (obscure) processing does not help. I appreciate your concern for the newcomers, and we are all newcomers when we first visit code we haven't seen before, but that type of comment will only confuse the newcomer. If you want a to do a good deed, go and analyse the functions in question and document them. And then you can refer to that saying, for example: We don't leak file descriptors here, see ... As for these file descriptors: Where are they? Usually file descriptors come from Unix file operations, but there are none here. Due to many layers of M-C abstraction we are far removed from the file operations. Are you referring to the output streams? Another thing: The source you are fixing here is: nsAppleMailImport.cpp | nsOE5File.cpp | nsOEMailbox.cpp | nsOutlookMail.cpp As far as I know, Apple import is broken for other reasons since 2011 (bug 688166). OE worked but is dead since 2014, it died with XP. Outlook import is permanently disabled (bug 1176748) and I don't know what OE5 is. So whilst we appreciate your work, this actually doesn't address any issues in any active code.
Attachment #8810174 - Flags: review?(jorgk) → review+
Comment on attachment 8810175 [details] [diff] [review] bug 1242030-e fix-close-MERGED-add-closedflag-IMPORT-directory.patch As I said before: I *cannot* review this since the code will positively *not* compile. We *must* review the patches in the order they need to be applied. I fact, it makes no sense to split dependent changes into multiple patches. Each patch in itself must apply and compile and work. The patch that changes the parameters to a function *must* also fix all the call sites. Once again, we are investing a lot of time and effort into what is basically *dead* code: nsAppleMailImport.cpp | nsBeckyMail.cpp | nsOE5File.cpp | nsOEMailbox.cpp | nsOutlookMail.cpp Of these, only Becky is live and working.
Attachment #8810175 - Flags: review?(jorgk)
Attachment #8807620 - Attachment description: fix-close-MERGED-base-dir.patch → bug 1242030-b fix-close-MERGED-base-dir.patch
(In reply to Jorg K (GMT+1) from comment #68) > Comment on attachment 8810175 [details] [diff] [review] > fix-close-MERGED-add-closedflag-IMPORT-directory.patch > > As I said before: > I *cannot* review this since the code will positively *not* compile. We > *must* review the patches in the order they need to be applied. I fact, it > makes no sense to split dependent changes into multiple patches. Each patch > in itself must apply and compile and work. The patch that changes the > parameters to a function *must* also fix all the call sites. > > Once again, we are investing a lot of time and effort into what is basically > *dead* code: > nsAppleMailImport.cpp | nsBeckyMail.cpp | nsOE5File.cpp | nsOEMailbox.cpp | > nsOutlookMail.cpp > > Of these, only Becky is live and working. I can certainly create a gigantic patch that compiles: From my original post: > I am loading merged patches from the bug ug 1122698, bug 1134527, bug 1134529, bug 1174500 > The reason for consolidation is that the patch as a whole is much easier to > read when consolidated. > however, I have split the resulting patch into a few smaller chunks that address files under only > certain directories [omission] > This required the changes to files under mailnews/import directory as well. > The first patch in the following list addresses this need. > A Nulling-filestream-after-close-under-IMPORT-directory.patch: Patches for... > A fix-close-MERGED-base-dir.patch: Bug 1122698 Fixing unchecked Close, unc... > A fix-close-MERGED-imap-dir.patch: Fixing unchecked Close, unchecked Flush... > A fix-close-MERGED-local-dir.patch: Fixing unchecked Close, unchecked Flus... > and another patch that is explained later: > A enable-buffering-1st-step.patch (explained later in a different bugzilla.) (I added fix-close-MERGED-add-closedflag-IMPORT-directory.patch to take care of "the changes to files under mailnews/import directory as well": these files are not tested locally because I use linux for testing the code and these files can be compiled only on try-comm-central for me. These files under mailnews/import could not be tested for quite some time since try-comm-central had an infra issue that broke OSX and Windows buitld in 2015 and thus I did not realize that I had to modify these files until much later. I can certainly create a single big patch that compiles. You can see that these patches as a group have been tested on try-comm-central and ran fine. E.g.: b36b0c6cb71a ISHIKAWA, Chiaki — bug 1242030-e Add third argument to (Finish|Discard)NewMessage(). 230a83bc2cd6 ISHIKAWA, Chiaki — bug 1242030-d (consolidation of 1122698, 1134527, 1134529, 1174500) 122b1a7d92b2 ISHIKAWA, Chiaki — bug 1242030-c (consolidation of 1122698, 1134527, 1134529, 1174500) 1f528b25cb13 ISHIKAWA, Chiaki — bug 1242030-b (consolidation of 1122698, 1134527, 1134529, 1174500) 325f6e8485ec ISHIKAWA, Chiaki — bug 1242030-a: (consolidation of 1122698, 1134527, 1134529, 1174500) Whether one wants to review such a big patch is another question. However, I am attaching one FYI. AHA, I was wondering why you chose to read fix-close-MERGED-add-closedflag-IMPORT-directory.patch first. After modifying and uploading the patches so many times, I simply decided to use the mercurial's qpatch file name as comment for the patch. I am editing the comment a little bit to add "1242030-{a,b,c,d,e}".
Attachment #8807621 - Attachment description: fix-close-MERGED-local-dir-Rev02.patch → bug 1242030-d fix-close-MERGED-local-dir-Rev02.patch
Attachment #8809863 - Attachment description: fix-close-MERGED-add-closedflag-IMPORT-directory.patch → bug 1242030-e fix-close-MERGED-add-closedflag-IMPORT-directory.patch
Attachment #8807618 - Attachment description: fix-close-MERGED-imap-dir.patch → bug 1242030-c fix-close-MERGED-imap-dir.patch
Attachment #8810175 - Attachment description: fix-close-MERGED-add-closedflag-IMPORT-directory.patch → bug 1242030-e fix-close-MERGED-add-closedflag-IMPORT-directory.patch
Attachment #8810174 - Attachment description: Nulling-filestream-after-close-under-IMPORT-directory.patch → bug 1242030-a Nulling-filestream-after-close-under-IMPORT-directory.patch
Attachment #8809863 - Attachment is obsolete: true
This is a big consolidated patch excluding bug 1242030-a Nulling-filestream-after-close-under-IMPORT-directory.patch. Although, it is big, it may be much easier to see the changes in one glance. (browsing might be more like it.) If this is preferred, let us go with this patch. TIA
Flags: needinfo?(jorgk)
(In reply to Jorg K (GMT+1) from comment #67) > Comment on attachment 8810174 [details] [diff] [review] > bug 1242030-a Nulling-filestream-after-close-under-IMPORT-directory.patch > Thank you for the review. One down, four more (or a big consolidated patch) to go. > OK, this doesn't do any damage and nulling 'outputStream' after closing it > for good housekeeping seems like a good idea that adds robustness. > > Now regarding the subject of comments and documentation. Sadly we have a lot > of poorly documented "legacy" code. What would help is to document it. What > does *not* help is interspersing some half-true comments which originated > from some hearsay. > > Take GetNewMsgOutputStream. If DXR serves me well, there are two > implementations: > nsMsgBrkMBoxStore::GetNewMsgOutputStream() and > nsMsgMaildirStore::GetNewMsgOutputStream() to cater for the two stores we > support: mbox and maildir. In comment #64 you got yourself confused: you > mention mbox, *imap* and maildir. Imap is not a store type as all, it is a > protocol. And yes, we retrieve messages via imap and then store them offline > locally in one of the two stores. Yes, I was confused. Now that you mentioned it and considering the confusion I had, I think it would be better to splits the local directory into local -> pop3 mbox maildir common (common operation for both mbox and maildir formats, i.e., work with any future pluggable storage format.) When I was writing the patch the year before and last year, I was confused when I had to look for pop3-related code in local directory and further confused when there are maildir-related code in local. (OK, maybe "local" is meant for saying, we have code for handling "local" storage as in "local vs remote", but then pop3 still ought to be in a separate directory IMHO since imap is in an individual directory. But I digress.) > > The concept of a "reusable" stream is documented here: > https://dxr.mozilla.org/comm-central/rev/ > 82f47febc327f5487158ed2a09331e195ba71ae2/mailnews/base/public/ > nsIMsgPluggableStore.idl#120 > Thanks for the pointer. I was wrong in looking at the implementation(s) of the code themselves. The CPP code does not have any comments to it. E.g.: NS_IMETHODIMP nsMsgMaildirStore::GetNewMsgOutputStream(nsIMsgFolder *aFolder, nsIMsgDBHdr **aNewMsgHdr, bool *aReusable, nsIOutputStream **aResult) I am so used to reading the code with comments at the place of implementation, and I did not realize I should go back to .idl. Maybe I should ask the dxr developer for a feature to open a separate window automatically to show the IDL definition whenever a user searches for a function name that is defined in an IDL. (The latter seems to go a long way to save developers many clicks: today we have to ask for "definition" and then visit the named IDL file at the beginning of the automatically generated header file?) > You see what I'm saying? Anecdotal half-truth interspersed in a comment > somewhere at a call site somewhere far away from the function doing the > (obscure) processing does not help. Generally speaking, yes, I agree. But in this particular instance, I hate to disagree, but the comment was useful to me, and this particular comment was not anecdotal half-truth to me. (See comment 15 ) > I appreciate your concern for the > newcomers, and we are all newcomers when we first visit code we haven't seen > before, but that type of comment will only confuse the newcomer. > > If you want a to do a good deed, go and analyse the functions in question > and document them. And then you can refer to that saying, for example: > We don't leak file descriptors here, see ... Hmm, maybe I should modify .idl comment in more detail, then. > As for these file descriptors: Where are they? Usually file descriptors come > from Unix file operations, but there are none here. Due to many layers of > M-C abstraction we are far removed from the file operations. Are you > referring to the output streams? Yes. The descriptor for the output stream, and until it was clear to me that the output stream is kept around for reusable stream, the processing in the code was very obscure. (True, there is an abstraction, but IMHO, the particular abstraction may not have been either great (do we have to know that a particular stream is reusable or not: for performance reasons, maybe, yes, but still we could have gone a little further or something. I still need to be worried about the file descriptor and file seek position, etc. We may never get away with file seek position, of course, but when I think about it, I will check what the .idl says about the file position when we make a file stream buffered.) I don't see great merits in the I/O handling abstraction in TB code except for the portability alone. As far as I know all the modern OS have the notion of file descriptor/file handle, etc. It is hard to do away with it. Of course, the function names and the like are different and so a wrapper for portability is good, but I am not sure if the particular abstraction was good or not. Maybe in FF, since it has to deal with network stream all the time and there is UDP/TCP differences, etc. So the abstraction may have been good.) But again, I digress. > > Another thing: The source you are fixing here is: > nsAppleMailImport.cpp | nsOE5File.cpp | nsOEMailbox.cpp | nsOutlookMail.cpp > As far as I know, Apple import is broken for other reasons since 2011 (bug > 688166). OE worked but is dead since 2014, it died with XP. Outlook import > is permanently disabled (bug 1176748) and I don't know what OE5 is. So > whilst we appreciate your work, this actually doesn't address any issues in > any active code. As I mention in comment 69, I only modified these files when I realized that I have not tested OSX and Windows very much due to try-comm-central infrastructure issues in 2015 and early 2016 and ran build jobs early this year. (Of course, it helped me to realize the subtle issue of the timing of Close for maildir operation under Windows and eventually I could fix it.) But the patches in these files came rather late, and I may want to add a few more checking of the return value of Close(), but given the status of the files, they can wait. From what you describe, I take that there are no import test for OSX and Windows using some prepared data(?), then. That is a disappointment. Surely there was a test data for importing operation during the codes in questions were created. TIA
Let me see whether I can make any sense of this. As far as I can see, this is a giant clean-up patch that focusses on error handling. That of course can be split into smaller chunks. Also, you propose to add a parameter to DiscardNewMessage() and FinishNewMessage(). Once again, there are two implementations, one for mbox the other one for maildir. Your issue if the closing of the output stream. Well for maildir I can tell you that it *is* closed: https://dxr.mozilla.org/comm-central/rev/dd4587a78cfd200881956427c8d3fd468aaf9758/mailnews/local/src/nsMsgMaildirStore.cpp#642 https://dxr.mozilla.org/comm-central/rev/dd4587a78cfd200881956427c8d3fd468aaf9758/mailnews/local/src/nsMsgMaildirStore.cpp#670 For mbox it is also closed here https://dxr.mozilla.org/comm-central/rev/dd4587a78cfd200881956427c8d3fd468aaf9758/mailnews/local/src/nsMsgBrkMBoxStore.cpp#695 but not here, most likely because the stream is reusable: https://dxr.mozilla.org/comm-central/rev/dd4587a78cfd200881956427c8d3fd468aaf9758/mailnews/local/src/nsMsgBrkMBoxStore.cpp#705 I can't approve the change of the IDL and frankly, I don't think it's necessary. Maildir should just close the stream and make a lot of noise when closing fails. I see no point communicating the close-failure to the caller so it can attempt the close again? Why would it work this time? Mbox only closes the stream in 'discard' but not in 'finish'. Your patch is not going to change this, just look at this snippet: if (!reusable) { if(!closed) <<== new code. outputStream->Close(); outputStream = nullptr; } So if I see it correctly, all this will do is attempt again to close a stream that failed to close before in the maildir case. I don't think I want to approve this, but let's see what Kent says. Once we decide on the general way forward, we can see how to organise the patches. If it's just improving the error handling, it can certainly be cut into chunks. Also, I noticed that at least some of this relies on the new code from bug 939548. So lets get that landed first.
Flags: needinfo?(jorgk) → needinfo?(rkent)
(In reply to Jorg K (GMT+1) from comment #72) > Let me see whether I can make any sense of this. > > As far as I can see, this is a giant clean-up patch that focusses on error > handling. That of course can be split into smaller chunks. > > Also, you propose to add a parameter to DiscardNewMessage() and > FinishNewMessage(). This is for reasons during error handling. First principle: I wanted to to make DsicardMessage() and FinishMessage() not to close the filestream that is pointed by the argument passed to them. The existing crude error recovery scheme in pop3 code tried close the stream and sets the filestream variable to nullptr to detect the filestream was already closed (or avoid closing an already closed stream.) [My memory is hazy here. Bulk of the patch was created about 15 months ago or so.] In the existing code, when I began adding error return value check, I was horrified to see Close() returning errors. But they turned out to be NS_BASE_STREAM_CLOSED in many places. Close() was called on already closed filestream during error processing. I was puzzled and yet at the same time realized that someone in the past may have tried to check the return value of Close() for error, but due to this spaghetti-like convoluted code that spews NS_BASE_STREAM_CLOSED when one checks for Close() value, might have given up. This is a very possible scenario. Anyway, I chased why and I realized that the two routines are to blame. DiscardNewMessage(), FinishNewMessage(): they close the filestream that is pointed at by passed argument and so the callers who hold the original filestream variables don't know they are closed or not and can't use the "set filestream variable nullptr to signal that it is closed" scheme then. So Basically, I tried to move the "close" operation out of DiscardNewMessage() and FinishNewMessage() to outside these two functions so that other functions can use the filestream variable status effectively (null or not) and check the Close() return value at appropriate times without seeing this NS_BASE_STREAM_CLOSED in normal cases. Yes, the change is mainly for making the existing crude error recovery to work at all. Without my patches the code to check for Close() has to special-case NS_BASE_STREAM_CLOSED in many places, and that would have been ugly, and the checking of Close() was necessary to make the crude existing error handling to work. 2nd principle: For Maildir under windows, I had to close the stream despite the first principle. And the reason for adding "closed" flag is to make Maildir operation to work under Windows, Under Windows, for Maildir to work, I cannot seem to defer the closing operation (meaning NOT closing in DiscardMessage() or FinishMessage()) to a later timing in the call sequences during the mail message processing. From a lot of dumping on try-comm-central and dumping on linux locally, I figured that this is because the filestream ended up pointing to a file and/or directory that is being renamed (moved) or deleted. Under linux and OSX (basically a derivative of FreeBSD), a file or directory can be renamed even in there is an open descriptor to it (I think most POSIX-like OS allows it. Most of it.) However, under Windows, it cannot. ( I don't know if MS still claims Windows as POSIX-compliant.) Thus I was forced to change the semantics a bit despite my first principle. That is basically how the patch was developed to the current form. I think the next paragraph needs to be read/understood from the viewpoint of the above. However ... > Once again, there are two implementations, one for mbox > the other one for maildir. > > Your issue if the closing of the output stream. Well for maildir I can tell > you that it *is* closed: > https://dxr.mozilla.org/comm-central/rev/ > dd4587a78cfd200881956427c8d3fd468aaf9758/mailnews/local/src/ > nsMsgMaildirStore.cpp#642 > https://dxr.mozilla.org/comm-central/rev/ > dd4587a78cfd200881956427c8d3fd468aaf9758/mailnews/local/src/ > nsMsgMaildirStore.cpp#670 > > For mbox it is also closed here > https://dxr.mozilla.org/comm-central/rev/ > dd4587a78cfd200881956427c8d3fd468aaf9758/mailnews/local/src/ > nsMsgBrkMBoxStore.cpp#695 > but not here, most likely because the stream is reusable: > https://dxr.mozilla.org/comm-central/rev/ > dd4587a78cfd200881956427c8d3fd468aaf9758/mailnews/local/src/ > nsMsgBrkMBoxStore.cpp#705 > > I can't approve the change of the IDL and frankly, I don't think it's > necessary. Maildir should just close the stream and make a lot of noise when > closing fails. I see no point communicating the close-failure to the caller > so it can attempt the close again? Why would it work this time? > > Mbox only closes the stream in 'discard' but not in 'finish'. Your patch is > not going to change this, just look at this snippet: > if (!reusable) { > if(!closed) <<== new code. > outputStream->Close(); > outputStream = nullptr; > } > > So if I see it correctly, all this will do is attempt again to close a > stream that failed to close before in the maildir case. I don't think I want > to approve this, but let's see what Kent says. However, ... the original patch was created almost 15 months ago and I have to check this by re-reading this again myself. I am not sure what you refer to by "this". It may be the case for non-error cases, but once an error crops up, the current code cannot detect it properly and the crude error recover does not work at all when the buffered-write is enabled. But the basic principle was as I explained in the preceding paragraph(s). There *IS* a possibility of copy&paste error during the maintenance of the patches that may lead you to the above summary, but at least such copy&paste error does not ADD any new errors to |make mozmill| and |make xpcshell-test| and that is what I have checked from time to time on try-comm-central to make sure that my patches that have been changed to cope with bitrot do not introduce new errors. If only all the tree OSs (linux, OSX, and Windows) build worked fine all the time. I could only find and fix the issue with Maildir under Windows when finally "Windows" build became operational early this year. Initially I was tempted to dismiss it as temporary glitch or something like many other timing-related(?) bugs. But as the issue persisted and so I checked the operation of TB by dumping affected code and finally I figured it is the open file descriptor to a file or directory that is being renamed or deleted. If try-comm-central did not have Windows or OSX left in the sand for a prolonged time, I could have created the whole patch in the three-four months early in 2015 and noticed the issue with Maildir under Windows sooner and might have come up with a better solution. If anyone is interested, be my guest. > Once we decide on the general way forward, we can see how to organise the > patches. If it's just improving the error handling, it can certainly be cut > into chunks. > > Also, I noticed that at least some of this relies on the new code from bug > 939548. So lets get that landed first. Yes. Thank you again.
I see what you're saying. Trying to track errors of Close() (which IMHO should never fail unless you call it twice) is not helped by overall confused logic which in fact tries to close the same thing twice, à la comment #15: "so it doesn't matter that we're trying to close it again". That's true only if there is no error checking. IMHO the program logic should always track whether it expects a file to be open or not. "This" referred to the approach you're taking by changing the IDL and adding another parameter. What I'd like to see is this: Look at the logic and find the spots were we try to close things twice. Then fix this logic. Once again, I don't see that what you're proposing is meaningful, take: nsMsgMaildirStore::DiscardNewMessage( ... *closedflag = true; nsMsgMaildirStore::FinishNewMessage( ... *closedflag = true; So here you're always returning true on the new parameter, so why have it? nsMsgBrkMBoxStore::DiscardNewMessage( ... *closedflag = false; nsMsgBrkMBoxStore::FinishNewMessage( ... *closedflag = false; So here you're always returning true on the new parameter, so why have it? This code makes no sense: if (!reusable) { // !reusable means mbox. if(!closed) <<== new code, always returns 'false' for mbox. outputStream->Close(); outputStream = nullptr; } You don't need the 'closed' flag here if you really want to close the stream (which I haven't looked at). So please present a short and succinct patch that addresses things not being closed or closed twice. Also please lose all the debugging you added. We can leave debugging in some "hotspots", but let's not clutter the source code with debugging which won't be useful for the next problem. I believe we can do it without interface changes, so nothing is stopping us from splitting the clean-up into multiple chunks.
I'm not really sure what I am being requested to look at here. I think this cleanup is important work, but I really do not have the time nor (in most cases) the knowledge to give good opinions. Yes it is good if the same method that opened a file closes it, but that is difficult to accomplish in C++ code sometimes. Yes the existing code is really hard to work with and needs refactoring. But I do not have the time to investigate myself.
Flags: needinfo?(rkent)
The question is whether you are in favour of changing the IDL: Old: void discardNewMessage(in nsIOutputStream aOutputStream, in nsIMsgDBHdr aNewHdr); New: void discardNewMessage(in nsIOutputStream aOutputStream, in nsIMsgDBHdr aNewHdr, out boolean closeflag); Especially looking at the planned changes, see comment #74, I don't think it makes sense.
Flags: needinfo?(rkent)
Well the planned change in the IDL just seems to codify what at least on the surface appears to be a poor design. Like I see code like this: // non-reusable streams will get closed by the store. if (reusable) destOutputStream->Close(); return store->FinishNewMessage(destOutputStream, aHdr); which really sounds like it is unclear who is responsible for closing the file. Yes I would prefer a design where the responsibility is clear rather than amorphous like this, and that would say that adding a parameter about whether the stream was closed or not is a bad idea. But I have not investigated whether fixing that design problem is straightforward. That is not a clear answer, but to give more would require me to dive in, understand all the issues, and propose a workable clear plan.
Flags: needinfo?(rkent)
I have a little confusion. > What I'd like to see is this: > Look at the logic and find the spots were we try to close things twice. Then > fix this logic. Once again, I don't see that what you're proposing is > meaningful, take: > nsMsgMaildirStore::DiscardNewMessage( > ... > *closedflag = true; > nsMsgMaildirStore::FinishNewMessage( > ... > *closedflag = true; > So here you're always returning true on the new parameter, so why have it? > > nsMsgBrkMBoxStore::DiscardNewMessage( > ... > *closedflag = false; > nsMsgBrkMBoxStore::FinishNewMessage( > ... > *closedflag = false; > So here you're always returning true on the new parameter, so why have it? You mean 'false' on the last line, don't you? For Mbox, I don't close the stream in DiscardNewMessage() and FinishNewMessage(). And this closedflag business was necessitated to the difficult to fathom interaction of open file descriptor to the file and/or directory that is RENAMED/MOVED in the case of Maildir and I needed to close it at the early timing (as in the existing legacy code) and that was why DiscardNewMessage() and FinishNewMessage() needed to closed it albeit reluctantly. rkent's comment: >which really sounds like it is unclear who is responsible for closing the file. Yes I would prefer a >design where the responsibility is clear rather than amorphous like this, I would very much prefer that the caller is responsible to close the stream (so that the caller side rather than the leaf function closes it WITHOUT setting the class variable that refers to the filestream to null. This approach was the only way the crude error recovery in the existing could work, I think. The crude error handling code relied on filestream variable being set to null after the early close due to error detection and error recovery attempt, but DiscardNewMessage() and FinishNewMessage() closing the stream without setting the class variable to nullptr caused massive screw up during error processing. I am not even sure if the code worked at all as far as the close() error is concerned....) Given that there won't be another new storage format in sight in the life time of TB, I agree there can be some simplifications of code as jorg pointed out. . (I never bothered to apply the logic such as mbox is reusable and maildir is not reusable, etc. I changed the code mechanically to avoid errors and to show how the code was changed from the original.
Hi, I have updated the Big consolidated patch (excluding bug 1242030 - a Nulling-filestream-after-close-under-IMPORT-directory.patch) to fix the bitrot caused by the changes since last November when it was first uploaded. It was last modified back in November, but I could not follow it up due to my day job. Now I have requested a review, but given that there were about two months lapse. Here is what I would request for the initial feedback/review. Point 1: I still don't grok the concern you raised in comment 74, and I wondered about it in comment 78. Can you elaborate a bit on that? Point 2: Other than this issue, I would appreciate the general concern about how the error ought to be reported using what function, etc. TB as mailer is getting short shrift as far as error notification the user is concerned. What would Firefox browser tell the user if the page the user requested does not exist at a sight? A big error 404 in the main browser window. What would Firefox browser tell the user if the site the user tried to access does not seem to exist or seems to exist but does not return a response in reasonable time? A big error page that says "Server not found" in the main browser window in the first case or some error message regarding timeout or transient network error and try again later type of message in the second case. Again in the main window. Right now, contrast this, when the download of the e-mail message fails. I am not sure exactly what error would be provided to the user clearly. Is this shown in the Error Console which is NOT NECESSARILY OPEN all the time (it is not open by default)? I am debugging the I/Oerror and how TB copes with it by dumping the error messags to TTY console where TB is manually invoked. This should not be the case for a user-friendly robust mail client. The above is what I mean by saying that TB is getting short shrift. We really should think of using Error console or other means of output to show errors more clearly to the users. Don't listen to browser developers here. They have it so easy: the main web window is always there for them to report the error so clearly in a visible manner. Back to the patch. I would try to grok point 1 and point 2 to modify the patch and then go through the review cycles several times (given the size of this patch, I don't expect quick turn around nor the review of the whole patch. A partial review and comment from time to time should give me a clue how to proceed, and I will upload modified patch from time to time. BTW, the patch was originally created as large patch like this one, but then it was mentioned that it was difficult to read through the big patch (how one could review such a big patch?) so I tried to separate them per directory. But here it is again in a big chunk as a standalone patch that applies cleanly on its own. This patch and a few follow-up patches passed the treeherder test rather well (except for the known issues outside the realm of this patch). https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=44fc60a362f9510ecddcd5107f4a39d599c0a3d7 This patch appears as https://hg.mozilla.org/try-comm-central/rev/07061f4eda60 in the above submission. TIA PS: I hasten to add that this patch alone WITHOUT the followup patches work.
Attachment #8810249 - Attachment is obsolete: true
Attachment #8826640 - Flags: review?(jorgk)
Looks like I have to repeat what I said in comment #74, so here goes: You propose to change the IDL and add a parameter 'closeflag' to both discardNewMessage() and finishNewMessage(). nsMsgBrkMBoxStore::DiscardNewMessage() and nsMsgBrkMBoxStore::FinishNewMessage() always return 'false' (since they keep the mbox file open for further operations), nsMsgMaildirStore::DiscardNewMessage() and nsMsgMaildirStore::FinishNewMessage() always return 'true' (since they close the file, or at least try). I pointed out that if (!reusable && !closed) outputStream->Close(); makes no sense. For mbox, reusable==true and closed==false, so for mbox, the Close() will not run. For maildir, reusable==false and closed==true, so for maildir, the Close() will not run. Before I review this, can you please do the following: 1) Remove all trailing white-space since I don't want to complain about them individually. 2) Take a good hard look at all the chatty comments and reduce them to a bare minimum. 3) Use XXX, not xxx where required. 4) MOZ_ASSERT() can have a second string argument. Use it. 5) Use MOZ_ASSERT() if possible instead of NS_ERROR(). 6) Use do { } while false, like we did in bug 939548, instead of goto. 7) When calling xx->Close(), set xx to nullptr after issuing a warning. Currently you're undecided whether which comes first. It should be consistent. 8) Make sure you always use two spaces of indentation. And do split the patch into multiple parts, at least. 1) nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp 2) nsImapIncomingServer.cpp | nsImapMailFolder.cpp | nsImapOfflineSync.cpp | nsImapProtocol.cpp | nsImapService.cpp | nsImapUndoTxn.cpp 3) nsAppleMailImport.cpp | nsBeckyMail.cpp | nsOE5File.cpp | nsOEMailbox.cpp | nsOutlookMail.cpp 4) nsLocalMailFolder.cpp | nsMailboxProtocol.cpp | nsMovemailService.cpp | nsMsgBrkMBoxStore.cpp | nsMsgLocalStoreUtils.cpp | nsMsgMaildirStore.cpp | nsParseMailbox.cpp 5) nsPop3Protocol.cpp | nsPop3Service.cpp | nsPop3Sink.cpp | nsPop3Sink.h
Attachment #8826640 - Flags: review?(jorgk)
(In reply to Jorg K (GMT+1) from comment #80) > Looks like I have to repeat what I said in comment #74, so here goes: > > You propose to change the IDL and add a parameter 'closeflag' to both > discardNewMessage() and finishNewMessage(). > > nsMsgBrkMBoxStore::DiscardNewMessage() and > nsMsgBrkMBoxStore::FinishNewMessage() always return 'false' (since they keep > the mbox file open for further operations), > nsMsgMaildirStore::DiscardNewMessage() and > nsMsgMaildirStore::FinishNewMessage() always return 'true' (since they close > the file, or at least try). > > I pointed out that > if (!reusable && !closed) > outputStream->Close(); > makes no sense. > For mbox, reusable==true and closed==false, so for mbox, the Close() will > not run. > For maildir, reusable==false and closed==true, so for maildir, the Close() > will not run. Now I see it. > > Before I review this, can you please do the following: > 1) Remove all trailing white-space since I don't want to complain about them > individually. Fine. But there are white-spaces left over from the original file: I suppose you don't mind the wholesale removal of trailing blanks from the original files. I found the trailing blanks in the original files rather annoying myself. > 2) Take a good hard look at all the chatty comments and reduce them to a > bare minimum. Will do. > 3) Use XXX, not xxx where required. Will do. > 4) MOZ_ASSERT() can have a second string argument. Use it. I will. Not sure if the second argument was available a year ago or so. > 5) Use MOZ_ASSERT() if possible instead of NS_ERROR(). Yes, I will. I was moving in this direction after realizing NS_ERROR() will abort? > 6) Use do { } while false, like we did in bug 939548, instead of goto. OK. Will do. > 7) When calling xx->Close(), set xx to nullptr after issuing a warning. > Currently you're undecided whether which comes first. It should be > consistent. I was not aware where I was inconsistent, but will look them up. > 8) Make sure you always use two spaces of indentation. I hope you won't mind my fixing EMACS modeline in the original files because basically I am just trying to follow what EMACS modeline of each files dictates my emacs session. Unfortunately it seems there are files with 2 or 4 indentations. I hope OTHERS won't mind changing these modelines. It is hard to cate to different people's taste. Yes, there ARE files that the EMACS modeline setting and the indentation in the file does not match at all :-( > > And do split the patch into multiple parts, at least. > 1) nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp > 2) nsImapIncomingServer.cpp | nsImapMailFolder.cpp | nsImapOfflineSync.cpp | > nsImapProtocol.cpp | nsImapService.cpp | nsImapUndoTxn.cpp > 3) nsAppleMailImport.cpp | nsBeckyMail.cpp | nsOE5File.cpp | nsOEMailbox.cpp > | nsOutlookMail.cpp > 4) nsLocalMailFolder.cpp | nsMailboxProtocol.cpp | nsMovemailService.cpp | > nsMsgBrkMBoxStore.cpp | nsMsgLocalStoreUtils.cpp | nsMsgMaildirStore.cpp | > nsParseMailbox.cpp > 5) nsPop3Protocol.cpp | nsPop3Service.cpp | nsPop3Sink.cpp | nsPop3Sink.h I can certainly do this, but then these separate patches are likely to be applied in a group at the same time. I will keep busy. TIA
Yes, I do mind a wholesale removal of trailing spaces since it makes a review absolutely impossible. Please don't do that. I was referring to trailing spaces you added. Please click on "Review", they are all clearly visible in *pink*. Even non-reviewers are allowed to look at their own patch in review mode. Yes, all the parts will land together, but I can review one at a time and we don't have to touch a once finished patch again.
I am going to upload five patches that try to address the requests. 1) nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp 2) nsImapIncomingServer.cpp | nsImapMailFolder.cpp | nsImapOfflineSync.cpp | nsImapProtocol.cpp | nsImapService.cpp | nsImapUndoTxn.cpp 3) nsAppleMailImport.cpp | nsBeckyMail.cpp | nsOE5File.cpp | nsOEMailbox.cpp | nsOutlookMail.cpp 4) nsLocalMailFolder.cpp | nsMailboxProtocol.cpp | nsMovemailService.cpp | nsMsgBrkMBoxStore.cpp | nsMsgLocalStoreUtils.cpp | nsMsgMaildirStore.cpp | nsParseMailbox.cpp 5) nsPop3Protocol.cpp | nsPop3Service.cpp | nsPop3Sink.cpp | nsPop3Sink.h Comments about your requests and how I tried to meet them. (In reply to ISHIKAWA, Chiaki from comment #81) > (In reply to Jorg K (GMT+1) from comment #80) > > Looks like I have to repeat what I said in comment #74, so here goes: > > > > You propose to change the IDL and add a parameter 'closeflag' to both > > discardNewMessage() and finishNewMessage(). > > > > nsMsgBrkMBoxStore::DiscardNewMessage() and > > nsMsgBrkMBoxStore::FinishNewMessage() always return 'false' (since they keep > > the mbox file open for further operations), > > nsMsgMaildirStore::DiscardNewMessage() and > > nsMsgMaildirStore::FinishNewMessage() always return 'true' (since they close > > the file, or at least try). > > > > I pointed out that > > if (!reusable && !closed) > > outputStream->Close(); > > makes no sense. > > For mbox, reusable==true and closed==false, so for mbox, the Close() will > > not run. > > For maildir, reusable==false and closed==true, so for maildir, the Close() > > will not run. > > Now I see it. > These redundant or cant-happen codes are all in import directory. They were done much later than other parts, and when I modified the codes I was not quite sure if these strange conditions happen or not. You show that the condition does not. For now, I turn the code into MOZ_ASSERT() to see if my change ever triggers this strange condition (it should not). For a big change suggested in this bugzilla entry, this sanity check should not hurt. If the patches stabilize and we become confident, we may remove the MOZ_ASSERT() > > Before I review this, can you please do the following: > > 1) Remove all trailing white-space since I don't want to complain about them > > individually. > > Fine. But there are white-spaces left over from the original file: I suppose > you don't mind > the wholesale removal of trailing blanks from the original files. > I found the trailing blanks in the original files rather annoying myself. > I did not do the wholesale removal. I tried to delete extra trailing blanks my patch introduced, but also reverted my whitespace changes my patch has introduced. [In the long run, we may want to set up a "flag day" when only the whitespace changes are applied to the source tree to fix the strange indentations left in the code and remove trailing blanks.] > > 2) Take a good hard look at all the chatty comments and reduce them to a > > bare minimum. > Will do. I have minimized the comment. But it is up to you to decide to judge if it is minimum or can still be trimmed. I left some hints for future error recovery, etc. hoping for new bloods to work on remaining issues. But maybe I am too optimistic. If you, aceman and I are the only people who would touch the code, maybe we can further trim down the comment since we all know what are attempted. > > 3) Use XXX, not xxx where required. > Will do. I turned xxx into XXX, but is there any guideline about - describing lacking features, - describing remaining buggy "features", - suggesting future improvement (even possible algorithms), etc. other than simply using "XXX" ? > > 4) MOZ_ASSERT() can have a second string argument. Use it. > I will. Not sure if the second argument was available a year ago or so. I used MOZ_ASSERT(). In the long run, someone who is familiar with the code might want to turn warning messages printed by NS_WARN_IF() construct (?), but I found the use of "<<" to shut up "value not used" warning extremely UGLY. Can't we have a macro or something for C-C TB? > > > 5) Use MOZ_ASSERT() if possible instead of NS_ERROR(). > Yes, I will. I was moving in this direction after realizing NS_ERROR() will > abort? Yes, I tried to change all NS_ERROR()'s introduced in the patch to MOZ_ASSERT()'s. > > > 6) Use do { } while false, like we did in bug 939548, instead of goto. > OK. Will do. > There is only one place where I introduced goto (where a semaphore/lock was acquired and needs to be released in the error exit path.) I used |do { } while (false)| construct and replaced |goto| with |break|. > > 7) When calling xx->Close(), set xx to nullptr after issuing a warning. > > Currently you're undecided whether which comes first. It should be > > consistent. > I was not aware where I was inconsistent, but will look them up. I followed the suggestion. Now error/warning message is generated and then the pointer is set to nullptr. > > 8) Make sure you always use two spaces of indentation. > I hope you won't mind my fixing EMACS modeline in the original files because > basically I am just trying to follow what EMACS modeline of each files > dictates my > emacs session. Unfortunately it seems there are files with 2 or 4 > indentations. > I hope OTHERS won't mind changing these modelines. It is hard to cate to > different people's taste. > Yes, there ARE files that the EMACS modeline setting and the indentation in > the file does not match at all :-( I tried. I modified the emacs mode line setting in the affected file, too. There is a place where my new modification was inside a strangely indented (not following the two space indentation) which I re-indented using the two space rule (including my modification). > > > > And do split the patch into multiple parts, at least. > > 1) nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp > > 2) nsImapIncomingServer.cpp | nsImapMailFolder.cpp | nsImapOfflineSync.cpp | > > nsImapProtocol.cpp | nsImapService.cpp | nsImapUndoTxn.cpp > > 3) nsAppleMailImport.cpp | nsBeckyMail.cpp | nsOE5File.cpp | nsOEMailbox.cpp > > | nsOutlookMail.cpp > > 4) nsLocalMailFolder.cpp | nsMailboxProtocol.cpp | nsMovemailService.cpp | > > nsMsgBrkMBoxStore.cpp | nsMsgLocalStoreUtils.cpp | nsMsgMaildirStore.cpp | > > nsParseMailbox.cpp > > 5) nsPop3Protocol.cpp | nsPop3Service.cpp | nsPop3Sink.cpp | nsPop3Sink.h > Five patches are grouped according to the file grouping above. > I can certainly do this, but then these separate patches are likely to be > applied in a group at the same time. > > I will keep busy. > > TIA The patches have been tested extensively under Debian GNU/Linux 64-bit and the latest one saw only three errors in |make mozmill| and they are also in the comm-central tryserver jobs. I could not get the patches built on try-comm-central due to some strange build issues. I took care of windows-only syntax error noticed earlier, etc. TIA PS: I took the liberty of MERGING bug 1242030-a Nulling-filestream-after-close-under-IMPORT-directory.patch into this patches (It has been merged into the 3rd patch now.)
Patch for the following files. 1) nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp
Attachment #8807618 - Attachment is obsolete: true
Attachment #8807620 - Attachment is obsolete: true
Attachment #8807621 - Attachment is obsolete: true
Attachment #8810174 - Attachment is obsolete: true
Attachment #8810175 - Attachment is obsolete: true
Attachment #8826640 - Attachment is obsolete: true
Attachment #8830774 - Flags: review?(jorgk)
Attached patch 1242030-new-part-2-imap.patch (obsolete) — Splinter Review
Patch for the following files under mailnews/imap directory: 2) nsImapIncomingServer.cpp | nsImapMailFolder.cpp | nsImapOfflineSync.cpp | nsImapProtocol.cpp | nsImapService.cpp | nsImapUndoTxn.cpp
Attachment #8830778 - Flags: review?(jorgk)
Attached patch 1242030-new-part-3-import.patch (obsolete) — Splinter Review
Patch for the following files under mailnews/import directory: 3) nsAppleMailImport.cpp | nsBeckyMail.cpp | nsOE5File.cpp | nsOEMailbox.cpp | nsOutlookMail.cpp I am locally creating patches under Debian GNU/Linux and so can't test the windows binaries locally (especially the import from the other mail programs).
Attachment #8830782 - Flags: review?(jorgk)
Patches for the following files under mailnews/local: (except for pop3-related files, which are changed in the part-5). 4) nsLocalMailFolder.cpp | nsMailboxProtocol.cpp | nsMovemailService.cpp | nsMsgBrkMBoxStore.cpp | nsMsgLocalStoreUtils.cpp | nsMsgMaildirStore.cpp | nsParseMailbox.cpp
Attachment #8830785 - Flags: review?(jorgk)
Attached patch 1242030-new-part-5-pop3.patch (obsolete) — Splinter Review
This is the patch for the following POP3-related files under mailnews/local: 5) nsPop3Protocol.cpp | nsPop3Service.cpp | nsPop3Sink.cpp | nsPop3Sink.h
Attachment #8830787 - Flags: review?(jorgk)
Comment on attachment 8830774 [details] [diff] [review] 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp Review of attachment 8830774 [details] [diff] [review]: ----------------------------------------------------------------- Great. In this size I can review this patches. Thanks for the persistence. Just a few nits here. ::: mailnews/base/util/nsMsgDBFolder.cpp @@ +780,4 @@ > nsCOMPtr<nsIMsgDBHdr> hdr; > rv = mDatabase->GetMsgHdrForKey(msgKey, getter_AddRefs(hdr)); > + NS_ENSURE_TRUE(hdr, NS_OK); // XXX is returning NS_OK in existing code correct? > + NS_ENSURE_SUCCESS(rv, rv); // XXX why are we returning ERROR value here and not NS_OK? Remove comment. Returning error is normal. You're already questioning the two 'ENSURE's above. Most likely, we're done when we can't find the database or the key. @@ +788,5 @@ > rv = GetMsgInputStream(hdr, &reusable, aFileStream); > + > + // XXX: Ugh, error check was missing previously. 2015 Dec. > + // FURTHERMORE, we need to set the message status to OFFLINE before early error return! > + // March, 2016. Remove comment. We don't need to know about 2015 and 2016 ;-) @@ +795,5 @@ > + fprintf(stderr,"(debug) nsMsgDBFolder::GetOfflineFileStream: GetMsgInputStream(hdr, &reusable, aFileStream); rv=0x%08xn", (unsigned int) rv); > +#endif > + > + if (mDatabase) > + mDatabase->MarkOffline(msgKey, false, nullptr); Explain WHY you need to do this. @@ +903,5 @@ > +#ifdef DEBUG > + // During testing of running TB under linux using another linux CIFS server > + // as mail store, we get error here. > + // Quite likely a READ error at low-level. > + // This error is observed again on March 15, 2015 Remove date. @@ +915,5 @@ > nsCOMPtr<nsISeekableStream> seekableStream(do_QueryInterface(*aInputStream)); > if (seekableStream) > rv = seekableStream->Seek(PR_SEEK_SET, offset); > + if (!(seekableStream || !offset)) { > + MOZ_ASSERT(0, "non-zero offset w/ non-seekable stream"); That's not how MOZ_ASSERT() should be used. You use it like: MOZ_ASSERT(this condition must be true, ...); @@ +1722,5 @@ > m_bytesAddedToLocalMsg = result.Length(); > > nsCOMPtr <nsISeekableStream> seekable; > > + MOZ_ASSERT(m_tempMessageStream); Add second argument here. @@ +1788,5 @@ > if (seekable) > { > mDatabase->MarkOffline(messageKey, true, nullptr); > + > + MOZ_ASSERT(m_tempMessageStream); Add second argument here. @@ +1818,5 @@ > mDatabase->MarkOffline(messageKey, false, nullptr); > // we should truncate the offline store at messageOffset > ReleaseSemaphore(static_cast<nsIMsgFolder*>(this)); > + if (msgStore) { > + // DiscardNewMessage may not close the stream all the time. See bug 1121842 I don't think the reference to the bug helps much. I'd prefer to remove it. @@ +1824,5 @@ > + // Close m_tempMessageStream here if not closed by DiscardNewMessage. > + if(!closed && m_tempMessageStream) { > + rv2 = m_tempMessageStream->Close(); > + } > + m_tempMessageStream = nullptr; // catch reuse with SIGSEGV What? We catch a reuse with a SIGSEGV?? Put: Avoid access of closed stream, or similar. @@ +1827,5 @@ > + } > + m_tempMessageStream = nullptr; // catch reuse with SIGSEGV > + } > + // We should check for errors of rv1 and rv2 but for now only > + // prints error messages. English derailed here: ..., but for now we only print ...
Attachment #8830774 - Flags: review?(jorgk) → feedback+
(In reply to Jorg K (GMT+1) from comment #89) > Comment on attachment 8830774 [details] [diff] [review] > 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp > > Review of attachment 8830774 [details] [diff] [review]: > ----------------------------------------------------------------- > > Great. In this size I can review this patches. Thanks for the persistence. > Just a few nits here. > Thank you for the comment. I will see what I can do clean up my patch. TIA
Attachment #8830774 - Attachment is obsolete: true
Attachment #8832545 - Flags: review?(jorgk)
Attachment #8832545 - Attachment description: (take 2) 12421242030-new-part-1.patch: nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp030-new-part-1.patch → (take 2) 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp
Attachment #8830778 - Attachment is obsolete: true
Attachment #8830778 - Flags: review?(jorgk)
Attachment #8832546 - Flags: review?(jorgk)
Attachment #8830782 - Attachment is obsolete: true
Attachment #8830782 - Flags: review?(jorgk)
Attachment #8832548 - Flags: review?(jorgk)
Attachment #8830785 - Attachment is obsolete: true
Attachment #8830785 - Flags: review?(jorgk)
Attachment #8832549 - Flags: review?(jorgk)
Attachment #8832548 - Attachment description: 1242030-new-part-3-import.patch → (take 2) 1242030-new-part-3-import.patch
Attachment #8830787 - Attachment is obsolete: true
Attachment #8830787 - Flags: review?(jorgk)
Attachment #8832550 - Flags: review?(jorgk)
(In reply to Jorg K (GMT+1) from comment #89) > Comment on attachment 8830774 [details] [diff] [review] > 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp > > Review of attachment 8830774 [details] [diff] [review]: > ----------------------------------------------------------------- > > Great. In this size I can review this patches. Thanks for the persistence. > Just a few nits here. > Thank you again for your comment. I hope the atom vs string and folder release issues will be resolved soon. In the meantime, I updated the patches. Here are some comments. > ::: mailnews/base/util/nsMsgDBFolder.cpp > @@ +780,4 @@ > > nsCOMPtr<nsIMsgDBHdr> hdr; > > rv = mDatabase->GetMsgHdrForKey(msgKey, getter_AddRefs(hdr)); > > + NS_ENSURE_TRUE(hdr, NS_OK); // XXX is returning NS_OK in existing code correct? > > + NS_ENSURE_SUCCESS(rv, rv); // XXX why are we returning ERROR value here and not NS_OK? > > Remove comment. Returning error is normal. You're already questioning the > two 'ENSURE's above. Most likely, we're done when we can't find the database > or the key. I removed the two comments mentioned here, but also removed the other ENSURE comments as well. I am not sure if that was the intention, but back to the original no-comment status. > > @@ +788,5 @@ > > rv = GetMsgInputStream(hdr, &reusable, aFileStream); > > + > > + // XXX: Ugh, error check was missing previously. 2015 Dec. > > + // FURTHERMORE, we need to set the message status to OFFLINE before early error return! > > + // March, 2016. > > Remove comment. We don't need to know about 2015 and 2016 ;-) I removed the comment. > @@ +795,5 @@ > > + fprintf(stderr,"(debug) nsMsgDBFolder::GetOfflineFileStream: GetMsgInputStream(hdr, &reusable, aFileStream); rv=0x%08xn", (unsigned int) rv); > > +#endif > > + > > + if (mDatabase) > > + mDatabase->MarkOffline(msgKey, false, nullptr); > > Explain WHY you need to do this. I added a comment thusly: + // We are exiting early without doing the processing in the latter half of this + // function. + // One thing the latter half does must be repeated here before early return. + // "If we could not find a offline stream, clear the offline flag on the msag and return false, + // which will fall back to reading the message from the server. > > @@ +903,5 @@ > > +#ifdef DEBUG > > + // During testing of running TB under linux using another linux CIFS server > > + // as mail store, we get error here. > > + // Quite likely a READ error at low-level. > > + // This error is observed again on March 15, 2015 > > Remove date. I removed the date line, but retained the first three lines. > > @@ +915,5 @@ > > nsCOMPtr<nsISeekableStream> seekableStream(do_QueryInterface(*aInputStream)); > > if (seekableStream) > > rv = seekableStream->Seek(PR_SEEK_SET, offset); > > + if (!(seekableStream || !offset)) { > > + MOZ_ASSERT(0, "non-zero offset w/ non-seekable stream"); > > That's not how MOZ_ASSERT() should be used. You use it like: > > MOZ_ASSERT(this condition must be true, ...); > OK, in many places in this patch file and the other four patch files, I tried follow this suggestion. However, there are cases where MOZ_ASSERT() is used to print "must not happen" "cannot happen" type of sanity check purposes after if() { } else if () {} ... else { MOZ_ASSERT(0, "..."); } In those cases, I left the usage of MOZ_ASSERT(0, "... string to explain the situation...") since there is NO single succinct condition for the first argument of MOZ_ASSERT(). Also, in many places I left the construct of if (!expr) { MOZ_ASSERT(expr, "failure message"); } Since we want to check the condition for |if (!expr)| to report some error conditions or in the long term, invoke some recovery, etc. Use of MOZ_ASSERT(expr, "...") is a short term solution for this and may be replaced in the future. In that sense, keeping if (!expr) { } construct surrounding it would be useful for coding purposes (and making it easy for merging the changes in the planned future patch sets, too, on my local PC. Sometimes |diff| produces really ugly output if |{ }| is added after |if (expr)|, etc. and having the |{ }| from the beginning makes it easier to deal with handling conflicting patches.: The follow-on patch I am thinking of is: Bug 1170606 - C-C T-B fixes to take care of short read (part of [META] Failure to deal with short read) } > @@ +1722,5 @@ > > m_bytesAddedToLocalMsg = result.Length(); > > > > nsCOMPtr <nsISeekableStream> seekable; > > > > + MOZ_ASSERT(m_tempMessageStream); > > Add second argument here. This is more like a sanity check, and so I modified this as + MOZ_ASSERT(m_tempMessageStream, "Temporary message stream must not be nullptr. (Sanity check)."); > > @@ +1788,5 @@ > > if (seekable) > > { > > mDatabase->MarkOffline(messageKey, true, nullptr); > > + > > + MOZ_ASSERT(m_tempMessageStream); > > Add second argument here. ditto as above. > > @@ +1818,5 @@ > > mDatabase->MarkOffline(messageKey, false, nullptr); > > // we should truncate the offline store at messageOffset > > ReleaseSemaphore(static_cast<nsIMsgFolder*>(this)); > > + if (msgStore) { > > + // DiscardNewMessage may not close the stream all the time. See bug 1121842 > > I don't think the reference to the bug helps much. I'd prefer to remove it. I removed it. So the comment reads as + // DiscardNewMessage may not close the stream all the time. > > @@ +1824,5 @@ > > + // Close m_tempMessageStream here if not closed by DiscardNewMessage. > > + if(!closed && m_tempMessageStream) { > > + rv2 = m_tempMessageStream->Close(); > > + } > > + m_tempMessageStream = nullptr; // catch reuse with SIGSEGV > > What? We catch a reuse with a SIGSEGV?? Put: > Avoid access of closed stream, or similar. Changed. (Original intention was to disovering places where the illegal/unwanted use of already closed stream was done by crashing the code.) + m_tempMessageStream = nullptr; // avoid accessing closed stream This change is also performed in other patch files. > > @@ +1827,5 @@ > > + } > > + m_tempMessageStream = nullptr; // catch reuse with SIGSEGV > > + } > > + // We should check for errors of rv1 and rv2 but for now only > > + // prints error messages. > > English derailed here: ..., but for now we only print ... Fixed this. I modified other patch files to reflect the suggested changes of MOZ_ASSERT() and changed the comment about SIGSEGV. So I refreshed other patches, too. TIa
Comment on attachment 8832545 [details] [diff] [review] (take 2) 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp Review of attachment 8832545 [details] [diff] [review]: ----------------------------------------------------------------- Nice! ::: mailnews/base/util/nsMsgDBFolder.cpp @@ +782,5 @@ > + > + // We are exiting early without doing the processing in the latter half of this > + // function. > + // One thing the latter half does must be repeated here before early return. > + // "If we could not find a offline stream, clear the offline flag on the msag and return false, Nits: Remove opening quote. "an offline stream". Typo: msag. @@ +1005,5 @@ > NS_ENSURE_ARG_POINTER(aMsgDatabase); > nsresult rv = OpenBackupMsgDatabase(); > +#ifdef DEBUG > + // Mounting linux CIFS-server and using it as mail store produced > + // errors with TB running under linux when network error is encountered. Nit: Linux. @@ +1826,1 @@ > #ifdef _DEBUG Not your fault, but should this be DEBUG instead of _DEBUG?
Attachment #8832545 - Flags: review?(jorgk) → review+
Comment on attachment 8832548 [details] [diff] [review] (take 2) 1242030-new-part-3-import.patch Review of attachment 8832548 [details] [diff] [review]: ----------------------------------------------------------------- This is fine. If you really want the paranoia, make it all the same. You realise that this is 80% dead code. Becky works, Apple, Outlook Express (x2) and Outlook are all dead. Sorry, I'm not doing the patches in order since I'm looking for some easy wins on a Saturday night ;-) ::: mailnews/import/applemail/src/nsAppleMailImport.cpp @@ +573,5 @@ > } > + if (!reusable) { > + MOZ_ASSERT(closed, "DiscardNewMessage ought to have closed the stream for non-reusable stream"); // See Bug 1242030 comment 80 > + outStream = nullptr; > + } I think we can lose this paranoia. ::: mailnews/import/becky/src/nsBeckyMail.cpp @@ +536,5 @@ > + rv = msgStore->FinishNewMessage(outputStream, msgHdr, &closed); > + // Sanity check > + // This condition should never happen as of Jan, 2017. > + // See Bug 1242030 comment 80 > + MOZ_ASSERT(reusable || closed, "!reusable && !closed should not happen."); Or at least make all the paranoias the same. Here's a different variation with no setting of the stream to nullptr. @@ +560,2 @@ > if (NS_FAILED(rv)) > + msgStore->DiscardNewMessage(outputStream, msgHdr, &closed); And here you don't have paranoia. I didn't check whether 'reusable' is available here.
Attachment #8832548 - Flags: review?(jorgk) → feedback+
Thank you for the comments. I was away from my PC for the weekend. So I will look into the issues you raised tomorrow.
(In reply to Jorg K (GMT+1) from comment #97) > Comment on attachment 8832545 [details] [diff] [review] > (take 2) 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | > nsMsgDBFolder.cpp > > Review of attachment 8832545 [details] [diff] [review]: > ----------------------------------------------------------------- > > Nice! Thanks! > > ::: mailnews/base/util/nsMsgDBFolder.cpp > @@ +782,5 @@ > > + > > + // We are exiting early without doing the processing in the latter half of this > > + // function. > > + // One thing the latter half does must be repeated here before early return. > > + // "If we could not find a offline stream, clear the offline flag on the msag and return false, > > Nits: Remove opening quote. "an offline stream". Typo: msag. > Fixed. > @@ +1005,5 @@ > > NS_ENSURE_ARG_POINTER(aMsgDatabase); > > nsresult rv = OpenBackupMsgDatabase(); > > +#ifdef DEBUG > > + // Mounting linux CIFS-server and using it as mail store produced > > + // errors with TB running under linux when network error is encountered. > > Nit: Linux. OK, I thought since Linux is not an acronym, it started with a lower letter unlike acronyms such as FORTRAN which stands for FORmula TRANslation. I fixed this and other instances of "linux" I inserted into the patches into "Linux". (So I am updating all the patch files together). However, do note that the predefined C macro under Linux is "linux". How confusing... > @@ +1826,1 @@ > > #ifdef _DEBUG > > Not your fault, but should this be DEBUG instead of _DEBUG? I think DEBUG is what is intended. So I changed into that form. At the same time, I noticed there are "#ifdef DEBUG_bienvenu1" constructs in this file, and it could have been the demangled/mistyped form for that. But I have no idea. TIA
Attachment #8832545 - Attachment is obsolete: true
Attachment #8834473 - Flags: review?(jorgk)
(In reply to Jorg K (GMT+1) from comment #98) > Comment on attachment 8832548 [details] [diff] [review] > (take 2) 1242030-new-part-3-import.patch > > Review of attachment 8832548 [details] [diff] [review]: > ----------------------------------------------------------------- > > This is fine. If you really want the paranoia, make it all the same. Sorry I am not being paranoid uniformly :-) > You realise that this is 80% dead code. Becky works, Apple, Outlook Express > (x2) and Outlook are all dead. > Sorry, I'm not doing the patches in order since I'm looking for some easy > wins on a Saturday night ;-) That is fine! I only changed this file at the last moment when I modified the parameters to a couple of functions and ONLY AFTER I realized that these would not build under Windows (!) on try-central. > ::: mailnews/import/applemail/src/nsAppleMailImport.cpp > @@ +573,5 @@ > > } > > + if (!reusable) { > > + MOZ_ASSERT(closed, "DiscardNewMessage ought to have closed the stream for non-reusable stream"); // See Bug 1242030 comment 80 > > + outStream = nullptr; > > + } > > I think we can lose this paranoia. Fixed. > ::: mailnews/import/becky/src/nsBeckyMail.cpp > @@ +536,5 @@ > > + rv = msgStore->FinishNewMessage(outputStream, msgHdr, &closed); > > + // Sanity check > > + // This condition should never happen as of Jan, 2017. > > + // See Bug 1242030 comment 80 > > + MOZ_ASSERT(reusable || closed, "!reusable && !closed should not happen."); > > Or at least make all the paranoias the same. Here's a different variation > with no setting of the stream to nullptr. Well, since this was near the function inside this code snippet appears, I thought it was a little overkill to set the stream to nullptr. But for the sake of code uniformity (that should make review easier), I inserted the assignment to nullptr. I also did this in another place for the sake of code uniformity. In that place, the stream is actually set to another value, but again the code uniformity shall prevail. > @@ +560,2 @@ > > if (NS_FAILED(rv)) > > + msgStore->DiscardNewMessage(outputStream, msgHdr, &closed); > > And here you don't have paranoia. I didn't check whether 'reusable' is > available here. Well, in reviewing this myself, I could not stand up the failure to check the return value of ->Close() near the line. So I modified the code quite a bit from the view point of error checking. I am not sure if you like it or not. TIA
Attachment #8832548 - Attachment is obsolete: true
Attachment #8834478 - Flags: review?(jorgk)
"linux" -> "Linux" change, etc.
Attachment #8832549 - Attachment is obsolete: true
Attachment #8832549 - Flags: review?(jorgk)
Attachment #8834480 - Flags: review?(jorgk)
"linux" -> "Linux" change, etc.
Attachment #8832550 - Attachment is obsolete: true
Attachment #8832550 - Flags: review?(jorgk)
Attachment #8834482 - Flags: review?(jorgk)
imap patch did not have any "linux" mentioned by me.
Comment on attachment 8834473 [details] [diff] [review] (take 3) 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp #ifdef DEBUG_bienvenu are sadly leftovers from the quirky old past. Could you please go through your patch again and make all the debug output the same. In nsPop3Sink.cpp we settled for: fprintf(stderr,"(seekdebug) ... so here I'd like to have: fprintf(stderr,"(debug) ... You have a mix in your patch: fprintf(stderr,"(debug) nsMsgDBFolder ... fprintf(stderr,"{debug} %s:%d Strange ... fprintf(stderr, "nsMsgDBFolder.cpp:%d: ... fprintf(stderr,"OpenBackupMsgDatabase(); returns ... And inherited from D/B: #ifdef DEBUG nsAutoCString message("Offline message too small: messageSize="); nsAutoCString message("Offline message too small: messageSize="); message.AppendInt(messageSize); Please convert this into a print as well. Otherwise a neat function to do this sort of thing is nsPrintfCString. r+ with this addressed.
Attachment #8834473 - Flags: review?(jorgk) → review+
(In reply to Jorg K (GMT+1) from comment #105) > Comment on attachment 8834473 [details] [diff] [review] > (take 3) 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | > nsMsgDBFolder.cpp > > #ifdef DEBUG_bienvenu are sadly leftovers from the quirky old past. > > Could you please go through your patch again and make all the debug output > the same. In nsPop3Sink.cpp we settled for: > fprintf(stderr,"(seekdebug) ... > so here I'd like to have: > fprintf(stderr,"(debug) ... > You have a mix in your patch: > fprintf(stderr,"(debug) nsMsgDBFolder ... > fprintf(stderr,"{debug} %s:%d Strange ... > fprintf(stderr, "nsMsgDBFolder.cpp:%d: ... > fprintf(stderr,"OpenBackupMsgDatabase(); returns ... > And inherited from D/B: > #ifdef DEBUG > nsAutoCString message("Offline message too small: messageSize="); > nsAutoCString message("Offline message too small: messageSize="); > message.AppendInt(messageSize); > Please convert this into a print as well. > Otherwise a neat function to do this sort of thing is nsPrintfCString. > > r+ with this addressed. I will convert the printed strings to along the line of > fprintf(stderr,"(debug) ... TIA
Comment on attachment 8834478 [details] [diff] [review] (Take 3) 1242030-new-part-3-import.patch Review of attachment 8834478 [details] [diff] [review]: ----------------------------------------------------------------- Please make this consistent. You haven't noticed that OE import has been removed in the meantime? ::: mailnews/import/applemail/src/nsAppleMailImport.cpp @@ +590,3 @@ > } > + > + // XXX we may want to report more detailed error according to the values of |rv| or |rv2| Nit. Uppercase after the XXX and full stop at the end. @@ +592,5 @@ > + // XXX we may want to report more detailed error according to the values of |rv| or |rv2| > + if (NS_FAILED(rv) || NS_FAILED(rv2)) { > + ReportStatus(u"ApplemailImportMailboxConvertError", mailboxName, errorLog); > + SetLogs(successLog, errorLog, aSuccessLog, aErrorLog); > + return NS_ERROR_FAILURE; I'm not so happy to change existing behaviour here. If there was an error, we should still go through to the end. Besides, this is dead code. So this will never be tested. ::: mailnews/import/becky/src/nsBeckyMail.cpp @@ +535,5 @@ > + bool closed = false; > + rv = msgStore->FinishNewMessage(outputStream, msgHdr, &closed); > + // Sanity check > + // This condition should never happen as of Jan, 2017. > + // See Bug 1242030 comment 80 Lose the comment. @@ +536,5 @@ > + rv = msgStore->FinishNewMessage(outputStream, msgHdr, &closed); > + // Sanity check > + // This condition should never happen as of Jan, 2017. > + // See Bug 1242030 comment 80 > + MOZ_ASSERT(reusable || closed, "!reusable && !closed should not happen."); Personally, I'd lose the paranoia, too. @@ +537,5 @@ > + // Sanity check > + // This condition should never happen as of Jan, 2017. > + // See Bug 1242030 comment 80 > + MOZ_ASSERT(reusable || closed, "!reusable && !closed should not happen."); > + outputStream = nullptr; // for code uniformity This isn't right. The old code was: if (!reusable) outputStream->Close(); That's pretty much nonsense since if it's not reusable, was already closed in FinishNewMessage(). So as above this should read: if (closed) outputStream = nullptr; // for code uniformity Or am I missing something? ::: mailnews/import/oexpress/nsOE5File.cpp @@ +474,4 @@ > > + if (!reusable) { > + MOZ_ASSERT(closed, "FinishNewMessage ought to have closed the stream for non-reusable stream"); // See Bug 1242030 comment 80 > + outputStream = nullptr; Above you had |if (closed) outputStream = nullptr;|. And the paranoia looked differently, too: MOZ_ASSERT(reusable || closed, "!reusable && !closed should not happen."); ::: mailnews/import/oexpress/nsOEMailbox.cpp @@ +344,5 @@ > IMPORT_LOG0( "Mbx CopyMbxFileBytes failed\n"); > } > if (!reusable) > { > + MOZ_ASSERT(closed, "DiscardNewMessage ought to have closed the stream for non-reusable stream"); // See Bug 1242030 comment 80 Same comment as above. ::: mailnews/import/outlook/src/nsOutlookMail.cpp @@ +435,3 @@ > } > + if (!reusable) { > + MOZ_ASSERT(closed, "DiscardNewMessage ought to have closed the stream for non-reusable stream"); // See Bug 1242030 comment 80 Same comment as above.
Attachment #8834478 - Flags: review?(jorgk)
(In reply to Jorg K (GMT+1) from comment #107) > Comment on attachment 8834478 [details] [diff] [review] > (Take 3) 1242030-new-part-3-import.patch > > Review of attachment 8834478 [details] [diff] [review]: > ----------------------------------------------------------------- > > Please make this consistent. You haven't noticed that OE import has been > removed in the meantime? > I will revise the patch. In the meantime, I did not realize that OE import is gone. I refreshed the source and only then did I realize the source is gone. The patch won't apply. Strange, DXR still shows the files. I guess it is the delay between the servers and the original HG repository, but it surely is confusing, and I don't find the log that says anything about oexpress removal (I searched "oexpress" in a non case-sensitive manner.) Anyway, I will upload the modified patch.
(In reply to ISHIKAWA, Chiaki from comment #106) > (In reply to Jorg K (GMT+1) from comment #105) > > Comment on attachment 8834473 [details] [diff] [review] > > (take 3) 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | > > nsMsgDBFolder.cpp > > > > #ifdef DEBUG_bienvenu are sadly leftovers from the quirky old past. > > > > Could you please go through your patch again and make all the debug output > > the same. In nsPop3Sink.cpp we settled for: > > fprintf(stderr,"(seekdebug) ... > > so here I'd like to have: > > fprintf(stderr,"(debug) ... > > You have a mix in your patch: > > fprintf(stderr,"(debug) nsMsgDBFolder ... > > fprintf(stderr,"{debug} %s:%d Strange ... > > fprintf(stderr, "nsMsgDBFolder.cpp:%d: ... > > fprintf(stderr,"OpenBackupMsgDatabase(); returns ... > > And inherited from D/B: > > #ifdef DEBUG > > nsAutoCString message("Offline message too small: messageSize="); > > nsAutoCString message("Offline message too small: messageSize="); > > message.AppendInt(messageSize); > > Please convert this into a print as well. > > Otherwise a neat function to do this sort of thing is nsPrintfCString. > > > > r+ with this addressed. > > I will convert the printed strings to along the line of > > fprintf(stderr,"(debug) ... > > TIA I did this for relevant patches in a wholesale manner. Also, I applied the changed mentioned in the comment below. (In reply to Jorg K (GMT+1) from comment #107) > Comment on attachment 8834478 [details] [diff] [review] > (Take 3) 1242030-new-part-3-import.patch > > Review of attachment 8834478 [details] [diff] [review]: > ----------------------------------------------------------------- > > Please make this consistent. You haven't noticed that OE import has been > removed in the meantime? > OE import has been deprecated and so the patch does not mention the OE files any more. (Probably the people who used old OE can now first import the old e-mails to the latest MS mailer and then import the messages to TB now. But I don't use MS mailer, and so can't comment on the suitability of such usage.) > ::: mailnews/import/applemail/src/nsAppleMailImport.cpp > @@ +590,3 @@ > > } > > + > > + // XXX we may want to report more detailed error according to the values of |rv| or |rv2| > > Nit. Uppercase after the XXX and full stop at the end. > Done. > @@ +592,5 @@ > > + // XXX we may want to report more detailed error according to the values of |rv| or |rv2| > > + if (NS_FAILED(rv) || NS_FAILED(rv2)) { > > + ReportStatus(u"ApplemailImportMailboxConvertError", mailboxName, errorLog); > > + SetLogs(successLog, errorLog, aSuccessLog, aErrorLog); > > + return NS_ERROR_FAILURE; > > I'm not so happy to change existing behaviour here. If there was an error, > we should still go through to the end. Fixed. We now go on processing without returning. > Besides, this is dead code. So this will never be tested. (Is Apple Mail, too? But now at least it compiles.) > ::: mailnews/import/becky/src/nsBeckyMail.cpp > @@ +535,5 @@ > > + bool closed = false; > > + rv = msgStore->FinishNewMessage(outputStream, msgHdr, &closed); > > + // Sanity check > > + // This condition should never happen as of Jan, 2017. > > + // See Bug 1242030 comment 80 > > Lose the comment. Done. > > @@ +536,5 @@ > > + rv = msgStore->FinishNewMessage(outputStream, msgHdr, &closed); > > + // Sanity check > > + // This condition should never happen as of Jan, 2017. > > + // See Bug 1242030 comment 80 > > + MOZ_ASSERT(reusable || closed, "!reusable && !closed should not happen."); > > Personally, I'd lose the paranoia, too. Done. > @@ +537,5 @@ > > + // Sanity check > > + // This condition should never happen as of Jan, 2017. > > + // See Bug 1242030 comment 80 > > + MOZ_ASSERT(reusable || closed, "!reusable && !closed should not happen."); > > + outputStream = nullptr; // for code uniformity > > This isn't right. The old code was: > if (!reusable) outputStream->Close(); > That's pretty much nonsense since if it's not reusable, was already closed > in FinishNewMessage(). > So as above this should read: > if (closed) > outputStream = nullptr; // for code uniformity > Or am I missing something? This part and other parts have been changed somewhat especially the part that was touched in a hurry about a week ago. > ::: mailnews/import/oexpress/nsOE5File.cpp > @@ +474,4 @@ > > > > + if (!reusable) { > > + MOZ_ASSERT(closed, "FinishNewMessage ought to have closed the stream for non-reusable stream"); // See Bug 1242030 comment 80 > > + outputStream = nullptr; > > Above you had |if (closed) outputStream = nullptr;|. > And the paranoia looked differently, too: > MOZ_ASSERT(reusable || closed, "!reusable && !closed should not happen."); Fixed/Modified. > ::: mailnews/import/oexpress/nsOEMailbox.cpp > @@ +344,5 @@ > > IMPORT_LOG0( "Mbx CopyMbxFileBytes failed\n"); > > } > > if (!reusable) > > { > > + MOZ_ASSERT(closed, "DiscardNewMessage ought to have closed the stream for non-reusable stream"); // See Bug 1242030 comment 80 > > Same comment as above. Fixed/modified. > ::: mailnews/import/outlook/src/nsOutlookMail.cpp > @@ +435,3 @@ > > } > > + if (!reusable) { > > + MOZ_ASSERT(closed, "DiscardNewMessage ought to have closed the stream for non-reusable stream"); // See Bug 1242030 comment 80 > > Same comment as above. Fixed/Modified. I am going to post the updated patches now.
Fixed |fprintf(stderr, "(debug) ..."| style.
Attachment #8834473 - Attachment is obsolete: true
Attachment #8838451 - Flags: review+
Fixed some flawed logic/modification according comment 107.
Attachment #8834478 - Attachment is obsolete: true
Attachment #8838457 - Flags: review?(jorgk)
"(debug) " string modification. I also added missing " ": |stderr,"| => |stderr, "|
Attachment #8834480 - Attachment is obsolete: true
Attachment #8834480 - Flags: review?(jorgk)
"(debug) " string modification. I also added missing " ": |stderr,"| => |stderr, "| (so now I changed a few existing lines as well.)
Attachment #8834482 - Attachment is obsolete: true
Attachment #8834482 - Flags: review?(jorgk)
Attachment #8838460 - Flags: review?(jorgk)
Attachment #8838459 - Flags: review?(jorgk)
Comment on attachment 8838451 [details] [diff] [review] (take 4: 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | nsMsgDBFolder.cpp r=jorgk Review of attachment 8838451 [details] [diff] [review]: ----------------------------------------------------------------- I had to revisit this, sorry. ::: mailnews/base/util/nsMsgDBFolder.cpp @@ +776,5 @@ > rv = GetMsgInputStream(hdr, &reusable, aFileStream); > + > + if (NS_FAILED(rv)) { > +#ifdef DEBUG > + fprintf(stderr, "(debug) nsMsgDBFolder::GetOfflineFileStream: GetMsgInputStream(hdr, &reusable, aFileStream); rv=0x%08xn", (unsigned int) rv); Sadly I have to come back to part 1. After landing of bug 1340755 you need to print this differently, see: https://hg.mozilla.org/comm-central/rev/541879fbcbe89ae7827850d6807e17ca95013eb2 for examples. @@ +842,5 @@ > +#ifdef DEBUG > + // DEBUG: the error happened while checking network outage condition. > + // XXX TODO We may need to check if dovecot and other imap > + // servers are returning funny "From " line, etc. > + fprintf(stderr, "(debug) %s:%d Strange startOfMsg: <<%s>> \n", __FILE__, __LINE__, startOfMsg); I just noticed another inconsistent debug print. Either put file/line everywhere or remove it everywhere. @@ +896,5 @@ > + // Quite likely a READ error at low-level. > + if (NS_FAILED(rv)) > + { > + fprintf(stderr, "(debug) nsMsgDBFolder.cpp:%d: nsMsgDBFolder::GetMsgInputStream: msgStore->GetMsgInputStream(this, ...) returned error rv=%08x\n", > + __LINE__, (unsigned int) rv); fix print here. @@ +1009,5 @@ > + // errors with TB running under Linux when network error is encountered. > + // 80550006: Mailnews, failure, no = 6 > + if (NS_FAILED(rv)) > + { > + fprintf(stderr, "(debug) OpenBackupMsgDatabase(); returns error=0x%08x\n", (unsigned int) rv); and here. @@ +1709,5 @@ > m_bytesAddedToLocalMsg = result.Length(); > > nsCOMPtr <nsISeekableStream> seekable; > > + MOZ_ASSERT(m_tempMessageStream, "Temporary message stream must not be nullptr. (Sanity check)."); While you're here, remove "(Sanity check)". @@ +1775,5 @@ > if (seekable) > { > mDatabase->MarkOffline(messageKey, true, nullptr); > + > + MOZ_ASSERT(m_tempMessageStream, "Temporary message stream must not be nullptr. (Sanity check)."); and here. @@ +1810,5 @@ > + rv1 = msgStore->DiscardNewMessage(m_tempMessageStream, m_offlineHeader, &closed); > + // Close m_tempMessageStream here if not closed by DiscardNewMessage. > + if(!closed && m_tempMessageStream) { > + rv2 = m_tempMessageStream->Close(); > + } You've changed the behaviour here. Can it happen that msgStore==false? The old code closed the stream in this case. Also, DiscardNewMessage() only closes the stream for maildir, not mbox. With your new code, the Close() will now always happen, in the old code the stream wasn't closed for mbox (since DiscardNewMessage() didn't close it). Is this change intentional? I guess so, since further down you're exiting with an error. Perhaps you could add a short comment saying what you want to do and then make the code reflect this. Like: if (msgStore) { // DiscardNewMessage may not close the stream all the time. rv1 = msgStore->DiscardNewMessage(m_tempMessageStream, m_offlineHeader, &closed); } // Always close stream since we're inside an error branch. if (!closed && m_tempMessageStream) { rv2 = m_tempMessageStream->Close(); } @@ +1831,5 @@ > message.Append(" numOfflineMsgLines="); > message.AppendInt(m_numOfflineMsgLines); > message.Append(" bytesAdded="); > message.AppendInt(m_bytesAddedToLocalMsg); > + fprintf(stderr, "(debug) %s\n", message.get()); I think we can lose the complicated assembly of the message like we did in the other patch.
Attachment #8838451 - Flags: review+
Comment on attachment 8838457 [details] [diff] [review] (take 4) 1242030-new-part-3-import-rev04.patch Review of attachment 8838457 [details] [diff] [review]: ----------------------------------------------------------------- I really have a problem here. Apple and Outlook are currently dead, so there is no way you tested this. If I understand it correctly, you must leave streams open for mbox to write more imported messages. Only for maildir you can close the stream. You've changed that behaviour so I assume Becky import on mbox will not not work any more. r- for that. ::: mailnews/import/applemail/src/nsAppleMailImport.cpp @@ +529,5 @@ > > + bool closed = false; > + > + // Handle |Close| error in the following. > + nsresult rv2; Can you move this to where it's used? @@ +535,3 @@ > while (NS_SUCCEEDED(directoryEnumerator->HasMoreElements(&hasMore)) && hasMore) { > + rv = NS_OK; > + rv2 = NS_OK; Is this necessary? I don't think so, see below. @@ +577,5 @@ > } > + if (outStream) { > + if (!closed) > + rv2 = outStream->Close(); > + outStream = nullptr; Sorry, this seems to be wrong. You close the output stream here in all cases. You should only close it for the !reusable case as the original code did. @@ -573,3 @@ > } > - if (outStream) > - outStream->Close(); You need this close after the while loop in the reusable case. ::: mailnews/import/becky/src/nsBeckyMail.cpp @@ +538,1 @@ > outputStream->Close(); You're changing the behaviour here. The original code left left the stream open in the mbox case. ::: mailnews/import/outlook/src/nsOutlookMail.cpp @@ +439,1 @@ > outputStream->Close(); Same problem here. @@ -438,3 @@ > > - if (outputStream) > - outputStream->Close(); You need this.
Attachment #8838457 - Flags: review?(jorgk) → review-
(In reply to Jorg K (GMT+1) from comment #115) > Comment on attachment 8838457 [details] [diff] [review] > (take 4) 1242030-new-part-3-import-rev04.patch > > Review of attachment 8838457 [details] [diff] [review]: > ----------------------------------------------------------------- > > I really have a problem here. Apple and Outlook are currently dead, so there > is no way you tested this. > HI, I will revert my latest mods (done in January) to the files under /import directory to the older one that have been compiled on tryserver from 2015 to 2016 [and presumably they are tested there, no? After reading your comment, now I am not so sure if there are tests for importing e-mails under Windows and OSX.] I will revert the gratuitous changes I threw in last mongth and only address the comment, debug print statements, and other relevant changes you suggested. TIA
(In reply to ISHIKAWA, Chiaki from comment #116) > (In reply to Jorg K (GMT+1) from comment #115) > > Comment on attachment 8838457 [details] [diff] [review] > > (take 4) 1242030-new-part-3-import-rev04.patch > > > > Review of attachment 8838457 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > I really have a problem here. Apple and Outlook are currently dead, so there > > is no way you tested this. > > > > HI, > I will revert my latest mods (done in January) to the files under /import > directory to the older one that have been compiled on tryserver from 2015 to > 2016 [and presumably they are tested there, no? After reading your comment, > now I am not so sure if there are tests for importing e-mails under Windows > and OSX.] > > I will revert the gratuitous changes I threw in last mongth and only address > the comment, debug print statements, and other relevant changes you > suggested. > > TIA This is the mods I mentioned earlier. TIA
Attachment #8842485 - Flags: review?(jorgk)
This is actually more like work-in-progress patch based on your feedback earlier. But I have to upload this to ask you a couple of quesions. (In reply to Jorg K (GMT+1) from comment #114) > Comment on attachment 8838451 [details] [diff] [review] > (take 4: 1242030-new-part-1.patch: nsIMsgPluggableStore.idl | > nsMsgDBFolder.cpp r=jorgk > > Review of attachment 8838451 [details] [diff] [review]: > ----------------------------------------------------------------- > > I had to revisit this, sorry. > > ::: mailnews/base/util/nsMsgDBFolder.cpp > @@ +776,5 @@ > > rv = GetMsgInputStream(hdr, &reusable, aFileStream); > > + > > + if (NS_FAILED(rv)) { > > +#ifdef DEBUG > > + fprintf(stderr, "(debug) nsMsgDBFolder::GetOfflineFileStream: GetMsgInputStream(hdr, &reusable, aFileStream); rv=0x%08xn", (unsigned int) rv); > > Sadly I have to come back to part 1. After landing of bug 1340755 you need > to print this differently, see: > https://hg.mozilla.org/comm-central/rev/ > 541879fbcbe89ae7827850d6807e17ca95013eb2 for examples. OK, I changed this patch to go with the cast in the said patch. *HOWEVER*, I had already applied proper cast when I wanted to print error return values. I verified that they compile and work on tryserver (linux, linux64, OSX, win32, and win64.) Do I HAVE to modify already working cast a la the mentioned patch? They don't look neat and add nothing to the sanity of code as far as I can tell. If the change in the mentioned patch is just to placate the C compiler's format check, my patch already accommodate that albeit slightly differently. If it is OK, I would rather not bother with this format string issue now that my patch files already *works*. > @@ +842,5 @@ > > +#ifdef DEBUG > > + // DEBUG: the error happened while checking network outage condition. > > + // XXX TODO We may need to check if dovecot and other imap > > + // servers are returning funny "From " line, etc. > > + fprintf(stderr, "(debug) %s:%d Strange startOfMsg: <<%s>> \n", __FILE__, __LINE__, startOfMsg); > > I just noticed another inconsistent debug print. Either put file/line > everywhere or remove it everywhere. > Actually, it would be great if I can print the function name in addition to FILE and LINE. Is __func__ or __function___ an admissible macro? I think I can create a macro a la something like the following. #ifdef gcc #define CODEPOS _CODEPOS(__FILE__, __LINE__, " in " __func__) #else // MS VC? #define CODEPOS _CODEPOS(__FILE__, __LINE__, " in " __function___) #else #define CODEPOS _CODEPOS(__FILE__, __LINE__, "") #define _CODEPOS(a, b, c) #a ":" #b c which I hope will produce something like filepath:1234 in functionname depending on the compiler. I will use this STRING instead of __FILE__, __LINE__ separately to print the location using "%s" in the format string. Do you have any preference for the FILE, LINE, FUNCNAME format? Also, my another question is do you want me to change all other patches to change the deug print using fprintf() to look like then fprintf(stderr, "(debug) %s ...", CODEPOS ... That is file, line, etc. information for ALL the "(debug) ..." messages? I can certainly do that I suppose although it would take a ulittle time (since I want to make sure that my other patches also will apply after such sweeping change.) > @@ +896,5 @@ > > + // Quite likely a READ error at low-level. > > + if (NS_FAILED(rv)) > > + { > > + fprintf(stderr, "(debug) nsMsgDBFolder.cpp:%d: nsMsgDBFolder::GetMsgInputStream: msgStore->GetMsgInputStream(this, ...) returned error rv=%08x\n", > > + __LINE__, (unsigned int) rv); > > fix print here. > I would handle the filepath, line number issue. (the format string issue is already taken care of.) > @@ +1009,5 @@ > > + // errors with TB running under Linux when network error is encountered. > > + // 80550006: Mailnews, failure, no = 6 > > + if (NS_FAILED(rv)) > > + { > > + fprintf(stderr, "(debug) OpenBackupMsgDatabase(); returns error=0x%08x\n", (unsigned int) rv); > > and here. > Ditto. > @@ +1709,5 @@ > > m_bytesAddedToLocalMsg = result.Length(); > > > > nsCOMPtr <nsISeekableStream> seekable; > > > > + MOZ_ASSERT(m_tempMessageStream, "Temporary message stream must not be nullptr. (Sanity check)."); > > While you're here, remove "(Sanity check)". Removed. > > @@ +1775,5 @@ > > if (seekable) > > { > > mDatabase->MarkOffline(messageKey, true, nullptr); > > + > > + MOZ_ASSERT(m_tempMessageStream, "Temporary message stream must not be nullptr. (Sanity check)."); > > and here. Done. > > @@ +1810,5 @@ > > + rv1 = msgStore->DiscardNewMessage(m_tempMessageStream, m_offlineHeader, &closed); > > + // Close m_tempMessageStream here if not closed by DiscardNewMessage. > > + if(!closed && m_tempMessageStream) { > > + rv2 = m_tempMessageStream->Close(); > > + } > > You've changed the behaviour here. Can it happen that msgStore==false? The > old code closed the stream in this case. > > Also, DiscardNewMessage() only closes the stream for maildir, not mbox. With > your new code, the Close() will now always happen, in the old code the > stream wasn't closed for mbox (since DiscardNewMessage() didn't close it). > Is this change intentional? I guess so, since further down you're exiting > with an error. > > Perhaps you could add a short comment saying what you want to do and then > make the code reflect this. > > Like: > if (msgStore) { > // DiscardNewMessage may not close the stream all the time. > rv1 = msgStore->DiscardNewMessage(m_tempMessageStream, m_offlineHeader, > &closed); > } > // Always close stream since we're inside an error branch. > if (!closed && m_tempMessageStream) { > rv2 = m_tempMessageStream->Close(); > } > I added the comment per your suggestion. > @@ +1831,5 @@ > > message.Append(" numOfflineMsgLines="); > > message.AppendInt(m_numOfflineMsgLines); > > message.Append(" bytesAdded="); > > message.AppendInt(m_bytesAddedToLocalMsg); > > + fprintf(stderr, "(debug) %s\n", message.get()); > > I think we can lose the complicated assembly of the message like we did in > the other patch. I dumped them into a single fprintf() statement, but again you want me to add filepath, linenumber, etc. information if I want to add to "(debug) ..." strings uniformly, correct? TIA
Attachment #8842495 - Flags: review?(jorgk)
Attachment #8838457 - Attachment is obsolete: true
Attachment #8838451 - Attachment is obsolete: true
Comment on attachment 8838459 [details] [diff] [review] (take 4) 1242030-new-part-5-pop3.patch Wrong patch: The file you've enclosed here it for part 4.
Attachment #8838459 - Flags: review?(jorgk)
I did a little clean-up run here to avoid any more review cycles. Please provide part 5 (which was missing since you uploaded part 4 twice).
Attachment #8838459 - Attachment is obsolete: true
Attachment #8842495 - Attachment is obsolete: true
Attachment #8842495 - Flags: review?(jorgk)
Attachment #8843616 - Flags: review+
This is how I think it should be. Note that all the code is consistent across the three importers now.
Attachment #8842485 - Attachment is obsolete: true
Attachment #8842485 - Flags: review?(jorgk)
Attachment #8843622 - Flags: review+
Damn, I got this wrong. !reusable==true means maildir and closed==true. reusable==true means mbox and closed==false. So |if (!reusable)| the file is already closed, no need to close it again, all we can do is set the stream pointer to null. If reusable==true, we don't want to do anything in the loop. And I already said so in comment #107: === if (!reusable) outputStream->Close(); That's pretty much nonsense since if it's not reusable, was already closed in FinishNewMessage(). So as above this should read: if (closed) outputStream = nullptr; // for code uniformity Or am I missing something? ===
Attachment #8843622 - Attachment is obsolete: true
Attachment #8843731 - Flags: review+
Oops, wrong indentation.
Attachment #8843731 - Attachment is obsolete: true
Attachment #8843733 - Flags: review+
Attached patch 1242030-new-part-5-pop3.patch (obsolete) — Splinter Review
(In reply to Jorg K (GMT+1) from comment #120) > Created attachment 8843616 [details] [diff] [review] > 1242030-new-part-1.patch (JK v1). > > I did a little clean-up run here to avoid any more review cycles. Please > provide part 5 (which was missing since you uploaded part 4 twice). Thank you for checking the time while there are other pressing bugs. Here is part-5. TIA
Attachment #8844023 - Flags: review?(jorgk)
(In reply to ISHIKAWA, Chiaki from comment #124) > Thank you for checking the time while there are other pressing bugs. Sadly there were a few fires lately, and also the preparation for TB 52. Maybe on the weekend.
Make it even more consistent ;-) Also removed needless resetting of local variable which will go out of scope immediately.
Attachment #8843733 - Attachment is obsolete: true
Attachment #8846441 - Flags: review+
I had time to clean up half the IMAP patch. There is a spot were I was unsure. In nsImapMailFolder::CopyMessagesOffline() you can't close the stream, can you? For mbox you need to keep it open, right? See XXX Chiaki. After you answered this I'll finish off the second part. Take a look at the interdiff to see what I've changed.
Attachment #8832546 - Attachment is obsolete: true
Attachment #8832546 - Flags: review?(jorgk)
(In reply to Jorg K (GMT+1) from comment #127) > Created attachment 8846451 [details] [diff] [review] > 1242030-new-part-2-imap.patch (JK v1). > > I had time to clean up half the IMAP patch. There is a spot were I was > unsure. In nsImapMailFolder::CopyMessagesOffline() you can't close the > stream, can you? For mbox you need to keep it open, right? See XXX Chiaki. > > After you answered this I'll finish off the second part. Take a look at the > interdiff to see what I've changed. Thank you very much! (I guess the dust settles down with the broken build on try servers?) I will look into the issue and report back.
(In reply to ISHIKAWA, Chiaki from comment #129) > I guess the dust settles down with the broken build on try servers? I don't understand. C-C is building most of the time and I'm working hard to keep it that way. So before you do a try build, take a look at the C-C treeherder. If it's green and the builds are working, *rebase* locally before pushing to try, then the build will work.
(In reply to Jorg K (GMT+1) from comment #130) > (In reply to ISHIKAWA, Chiaki from comment #129) > > I guess the dust settles down with the broken build on try servers? > I don't understand. C-C is building most of the time and I'm working hard to > keep it that way. So before you do a try build, take a look at the C-C > treeherder. If it's green and the builds are working, *rebase* locally > before pushing to try, then the build will work. Thank you for keeping C-C TB in good shape so that its build succeeds. I understand that and I was referring to a broken build last weekend when something (a header or something went haywire) and the tree was closed. I think it is over now, correct? I will update the patch after checking the question you posed. TIA
I tried to keep tree closures to a minimum, so far I'm doing well. No need to update anything, just answer the question, see XXX Chiaki.
Thank you again for your clean up and comment. Back to your original question in comment 127: (In reply to Jorg K (GMT+1) from comment #127) > Created attachment 8846451 [details] [diff] [review] > 1242030-new-part-2-imap.patch (JK v1). > > I had time to clean up half the IMAP patch. There is a spot were I was > unsure. In nsImapMailFolder::CopyMessagesOffline() you can't close the > stream, can you? For mbox you need to keep it open, right? See XXX Chiaki. > > After you answered this I'll finish off the second part. Take a look at the > interdiff to see what I've changed. I think we can "close the stream". First of all, the original FinishNewMessage() [before my modification] always closed the first argument stream no matter what. So the stream was closed before in the spot where the issue was raised. So then, my question would be, was the original code buggy in that it did not re-create the output stream on the next loop? It DOES re-creates the output stream for every loop, I think, by using GetOfflineStoreOutputStream(). if (inputStream && hasMsgOffline && !isLocked) { -- Here --> rv = GetOfflineStoreOutputStream(newMailHdr, getter_AddRefs(outputStream)); -- Oh? ---> NS_ENSURE_SUCCESS(rv, rv); CopyOfflineMsgBody(srcFolder, newMailHdr, mailHdr, inputStream, outputStream); nsCOMPtr<nsIMsgPluggableStore> offlineStore; (void) GetMsgStore(getter_AddRefs(offlineStore)); if (offlineStore) { bool closed = false; nsresult rv2 = offlineStore->FinishNewMessage(outputStream, newMailHdr, &closed); if (NS_FAILED(rv2)) NS_WARNING("msgStore->FinishNewMessage(...) failed"); if (closed) outputStream = nullptr; // XXX Chiaki - If this is mbox we don't want to close the stream, do we?? // This runs in a loop and we can't close the stream?? } } So maybe we can change the comment to read // We close the stream here and on the next loop, it would be re-created by GetOfflineStoreOutputStream(). or similar. Then it dawned on me: Oh well, does this mean we now leak OUTPUT STREAM when outputStream is not null upon the next loop execution? It seems so. Before my change to FinishNewMessage(), the stream was closed always. So we should always close the stream (we need to avoid the double close). I have now taken care of this issue in my local patch and am uploading a patch since I verified it compiles locally. *HOWEVER*, as I look at the code I have an uneasy feeling. This has nothing to do with MY patch. Near the line 7270, we have a comment that reads // N.B. We must not return out of the for loop - we need the matching // end notifications to be sent. I really hate to see such a very big open-coded loop. The loop does not fit in a proverbial screenful area (even on my 4K display!). I found after reading the loop carefully that the current code seems to obey this "not return out of the for loop" thingy EXCEPT for the NS_ENSURE_SUCCESS() I marked with "-- Oh?--" above. Maybe, we may have to open code the NS_ENSURE_SUCCESS() to become something like if (NS_FAILED(rv)) { // print some error stuff or something. continue; } But at this stage I am not sure what we need to do in the failure case at all. So we can probably simply insert // XXX TODO WARNING. We are breaking out of the loop despite earlier comment. // We may want to print error/warning message and continue the loop. or similarand let someone else to figure this out later? This is what I inserted in the patch which I am uploading now. TIA
Attachment #8847162 - Flags: review?(jorgk)
(In reply to ISHIKAWA, Chiaki from comment #133) > I think we can "close the stream". > First of all, the original FinishNewMessage() [before my modification] > always closed the first argument stream no matter what. > So the stream was closed before in the spot where the issue was raised. I think you confused yourself here but you won't confuse me. Please check nsMsgBrkMBoxStore::FinishNewMessage() as currently implemented and after your change in part 4. The stream passes in is *NOT* closed. That's the whole idea of mbox. nsMsgDBFolder::GetOfflineStoreOutputStream() calls nsMsgBrkMBoxStore::GetNewMsgOutputStream() for mbox and that function retrieves the stream from a hash table of open streams with |if (m_outputStreams.Get(URI, aResult))|. Note the comment here: https://dxr.mozilla.org/comm-central/rev/d03f6d1f69374c85d4939d0c7cb92021614600f4/mailnews/local/src/nsMsgBrkMBoxStore.h#39 So I believe the answer is: You shouldn't close the stream. As you know, I was never a great friend of changing the IDL to pass out the 'closedflag'. I've just realised that nsMsgBrkMBoxStore::DiscardNewMessage() previously closed the output stream, so I see no reason to change that and to replicate closing the stream into the respective callers. The next question I have is why FinishNewMessage() needs to change. Typically it is called on a stream obtained from GetNewMsgOutputStream() and that already passes out the reusable flag. In some cases it's obtained from GetOfflineStoreOutputStream() which sadly don't pass out the reusable flag. Since there are only a few calls, I suggest to change that interface instead and pass out the reusable flag there.
Pass out 'reusable' also from GetOfflineStoreOutputStream().
Let's see where the journey will take us when we make the reusable flag available at all call sites of DiscardNewMessage() and FinishNewMessage(). This is part 1 adopted to the new scheme.
Part 3 adapted to the new scheme.
OK, here is part 2. The new approach works well so far. So, Chiaki, can you please adapt parts 4 and 5 to the new approach while I keep working on the clean-up of the IMAP part in part 2 which is about 50% done.
Attachment #8838460 - Attachment is obsolete: true
Attachment #8843616 - Attachment is obsolete: true
Attachment #8844023 - Attachment is obsolete: true
Attachment #8846441 - Attachment is obsolete: true
Attachment #8846451 - Attachment is obsolete: true
Attachment #8847162 - Attachment is obsolete: true
Attachment #8838460 - Flags: review?(jorgk)
Attachment #8844023 - Flags: review?(jorgk)
Attachment #8847162 - Flags: review?(jorgk)
Sorry, I have been tied up with some family business and a big event at the office. I will do the cleanup this week. TIA
OK, I finished the IMAP part. Let's keep in mind that we're improving file/stream I/O here. So I removed a lot of changes from nsImapProtocol.cpp which were interfering with the IMAP protocol. In particular nsImapMockChannel::Close() does not fail and also nsSocketTransport::Close() does not fail. The patch is much smaller now. So I'm ready to move onto part 4 and 5.
Attachment #8847363 - Attachment is obsolete: true
(In reply to Jorg K (GMT+1) from comment #140) > Created attachment 8852092 [details] [diff] [review] > 1242030-new-part-2-imap.patch (new approach - JK v2). > > OK, I finished the IMAP part. Let's keep in mind that we're improving > file/stream I/O here. So I removed a lot of changes from nsImapProtocol.cpp > which were interfering with the IMAP protocol. > > In particular nsImapMockChannel::Close() does not fail and also > nsSocketTransport::Close() does not fail. > > The patch is much smaller now. > > So I'm ready to move onto part 4 and 5. Sorry, I am in the middle of hardware transition (due to bad memory sticks, I think, but not sure). It has not been a smooth sailing. [The first attempt to move linux image somehow failed...] I hope I can get this HW issue sorted out this weekend. (That March 31st is the end of budget/fiscal year for many organizations does not help ...) Once the hardware is in good condition, I can work on the patches. Sorry that it took so long for me to get up to speed. TIA
Blocks: 1354011
My next PC (a second hand PC, but with many virtual core) is getting ready and I am moving files to it: but it is not complete yet. Anyway, I can now devote my free time to clean up the patch part-4 and part-5 while I am still fiddling with linux version tuneup (valgrind does not work on stock Debian GNU/Linux kernel for mysterious reason, and so I needed to create 3.19.6 on my own.) I have started to modify the patch part-4 and part-5. But before that, I need to swap my local patch for part-1, part-2, and part-3 with Jorg's. Stay tuned. The multi-core CPU (16!) reduced my build time from 50+ minutes to 35 minutes although the CPU clock is almost 2/3 of the previous (and still used) PC. Not bad. But the testing |make mozmill| is basically a single thread job and so it hurts. I am thinking of making the to-be-the-next PC to have a second CPU with fast clock freq but with small core so that I can run |make mozmill| on the faster CPU. It would be interesting to report the performance. Stay tuned on that front, too!
Hi, Jorg, I had to revert some code changes that you and aceman did. Basically, the following jobs show (please click on linux opt and OS X opt builds) https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=d061b32b36748dde216b7abdeb634a04b66f877e&selectedJob=98862127 https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=b62d0899641aa7a599de1e2c87ad2b238f02e25f&selectedJob=98841448 That the idioms of MOZILLA_ASSERT( expr, "error message"); where |expr| contains NS_SUCCEEDED(variable_that_holds_a_return_value_from_IO_operation) does not get compiled for OPTIMIZE build and thus the attempt to check the error value is not performed, and now the C++ compiler option on tryserver is such that the warning for unused variable is flagged as error. Because I could not compile the source in this state of affairs, I had to open code the above construct as if (! (expr) ) { // surrounding "()" is not necesary if |expr| is as simple as NS_SUCCEEDED(rvN) NS_WARNING("error message"); } Well, this is ugly and an interim measure, but if I/O error happens, I, as a user, ought to be reminded of such situation so that I can look into LAN/WAN issue, and server issue if my mail store is on a remote file system, etc. I am testing the modification for part-4 and part-5 on my local PC now, but I would like to let you know the code idiom change in advance. TIA
Flags: needinfo?(jorgk)
> I am testing the modification for part-4 and part-5 on my local PC now, but I would like to let you know the code idiom change in advance. To clarify, the use of MOZILLA_ASSERT() in part-2 and part-3 had to be modified so that the code is BUILDABLE on tryserver.
I am trying to change part-4 and part-5 according to your suggested change in comment 135 > Pass out 'reusable' also from GetOfflineStoreOutputStream(). But I hit a few difficult parts. (In reply to Jorg K (GMT+2) from comment #134) > (In reply to ISHIKAWA, Chiaki from comment #133) > > I think we can "close the stream". > > First of all, the original FinishNewMessage() [before my modification] > > always closed the first argument stream no matter what. > > So the stream was closed before in the spot where the issue was raised. > I think you confused yourself here but you won't confuse me. > > Please check nsMsgBrkMBoxStore::FinishNewMessage() as currently implemented > and after your change in part 4. The stream passes in is *NOT* closed. > That's the whole idea of mbox. > > nsMsgDBFolder::GetOfflineStoreOutputStream() calls > nsMsgBrkMBoxStore::GetNewMsgOutputStream() for mbox and that function > retrieves the stream from a hash table of open streams with |if > (m_outputStreams.Get(URI, aResult))|. > OK, I followed this far. But I am not entirely sure which / who is responsible for the evemtual |Close()| of the stream. I am not entirely sure if TB is handling the closing correctly here. Isn't there a lealk of open descriptors? However, let us assume it |Close()| is eventually handled somewhere magically, I still need to do the |Flush()| at the timing of FinishNewMessage() so that what is written up to the point using (now proposed) buffered write should be flushed to the disk (and see if there is any errors caused by the file system issues: file system full, can't write to remote file system, etc.). Because otherwise the error that should have been caught at the message boundary is not caught, and this makes the whole idea of calling |fsync| (which I think should be replaced by |fdatasync|) to make sure the data is written to the disk at each message's end. Without calling |Close()|, we cannot guarantee that the buffered output is written to the disk unless we call |Flush()| I will take a fresh look at patch-1, patch-2, and patch-3, and further patch-4 and patch5 in this light. > Note the comment here: > https://dxr.mozilla.org/comm-central/rev/ > d03f6d1f69374c85d4939d0c7cb92021614600f4/mailnews/local/src/ > nsMsgBrkMBoxStore.h#39 > > So I believe the answer is: You shouldn't close the stream. Like I said, I was a little surprised at this. I never thought C-C TB was doing this under the rug. This is not what I thought I observed during debugging time by tracing system calls, but maybe I was not paying enough attention to this issue from this angle. > > As you know, I was never a great friend of changing the IDL to pass out the > 'closedflag'. I've just realised that nsMsgBrkMBoxStore::DiscardNewMessage() > previously closed the output stream, so I see no reason to change that and > to replicate closing the stream into the respective callers. Actually, I think this was one place where I missed the proper error handling of Close(), thinking that we are already aborting the inclusion of new message. DiscardNewMessage() does not report the error of Close() [say, a full file system after Close() caused by the pending buffered write(). ] This is bad. But I did not realize in my previous patch. [If the download is done directly to an |mbox| we definitely have a problem here: wait, do we download this to a temporary file and then append it to |mbox| ? : I think this is done only when a pref is set. ] > The next question I have is why FinishNewMessage() needs to change. > Typically it is called on a stream obtained from GetNewMsgOutputStream() and > that already passes out the reusable flag. In some cases it's obtained from > GetOfflineStoreOutputStream() which sadly don't pass out the reusable flag. > Since there are only a few calls, I suggest to change that interface instead > and pass out the reusable flag there. I found the passing of reusable flag is a little bit hard to cope with because the value generated and its use is a little far away. For example, For Pop3 (part-5), I found that I needed to create a reusable flag that is associated with Pop3 structure because |m_tempMessageStreamReusable| which you introduced is not visible in the source file, but I am not entirely sure if the introduction is of the new field |m_tempMessageStreamReusablePOP3| is acceptable or not. diff --git a/mailnews/local/src/nsPop3Sink.h b/mailnews/local/src/nsPop3Sink.h --- a/mailnews/local/src/nsPop3Sink.h +++ b/mailnews/local/src/nsPop3Sink.h @@ -57,16 +57,18 @@ protected: int32_t m_numMsgsDownloaded; bool m_senderAuthed; nsCString m_outputBuffer; nsCOMPtr<nsIPop3IncomingServer> m_popServer; //Currently the folder we want to update about biff info nsCOMPtr<nsIMsgFolder> m_folder; RefPtr<nsParseNewMailState> m_newMailParser; nsCOMPtr <nsIOutputStream> m_outFileStream; // the file we write to, which may be temporary + nsCOMPtr <nsIInputStream> m_inboxInputStream; // we used to read from the temporary file above, i.e. |m_outFileStream|, which was confusing AND buggy when m_outFileStream became buffered. + bool m_tempMessageStreamReusablePOP3; nsCOMPtr<nsIMsgPluggableStore> m_msgStore; bool m_uidlDownload; bool m_buildMessageUri; bool m_downloadingToTempFile; nsCOMPtr <nsIFile> m_tmpDownloadFile; nsCOMPtr<nsIMsgWindow> m_window; nsCString m_messageUri; nsCString m_baseMessageUri; So things are not as smooth sailing, and given the extensive testing I did with the old patch set recreating network outage against remote file system, I think it would be some time before I can be comfortable with the changes which you introduced and necessary modifications of the patches part-4 and part-5 and the introduction of |Flush()| in place of the original places for |Close()|, and come to think of it, my original patch should have called Flush() no matter what immediately after FinishNewMessage(). However, encouraging thing is that the current local patch set I have make C-C TB pass |make mozmill| beautifully. So aside from possible leaks, and maybe some unforeseen incomplete error-case handling, the code works. (at least as of the following revision.) hg log | head -100 Revision: 21529:9edb175c6e5f Tag: tip fxtree: comm User: Edmund Wong <ewong@pw-wspx.org> Date: Mon May 08 14:29:16 2017 +0800 Summary: Bug 1362913 - Update clang and sccache2 to newest package r=IanN Revision: 21528:35cd2a0110f3 User: Jorg K <jorgk@jorgk.com> Date: Mon May 08 07:23:47 2017 +0200 ...) INFO | (runtestlist.py) | Directories Run: 35, Passed: 1195, Failed: 0 <=== Note NO ERROR! Wow Subsequently it went downhill due to some incompatible changes in M-C or something. I think I need to dig the code more and test it with network errors after mounting a remote file system for mail store before I can upload the patch with some confidence. [It would be really great if mozilla test harness will allow such error-condition testing on the test infrastructure for anyone to see the errors and possible error recovery in actin.] TIA
Flags: needinfo?(jorgk)
Flags: needinfo?(jorgk)
I am afraid that I canceled the request for information form Jorg by mistake, or something. Anyway, I would like to keep Jorg updated on what is going on with the pending patches. TIA
(In reply to ISHIKAWA, Chiaki from comment #143) > That the idioms of > MOZILLA_ASSERT( expr, "error message"); Sure, if only used in debug mode, use DebugOnly<nsresult> rv = ... I read all bugmail, so no need to NI me unless you want to get immediate attention. In part 1 I had to create a |bool m_tempMessageStreamReusable;| to get the flag to where I needed it, so I don't think there's a problem repeating the trick elsewhere. I agree that flushing is a good idea so an error would happen associated with the message and not later.
Flags: needinfo?(jorgk)
Hi, (In reply to Jorg K (GMT+2) from comment #147) > (In reply to ISHIKAWA, Chiaki from comment #143) > > That the idioms of > > MOZILLA_ASSERT( expr, "error message"); > Sure, if only used in debug mode, use DebugOnly<nsresult> rv = ... I am afraid that I wanted to check the I/O failure even in non-debug, i.e., release mode. Anyway, I will produce a complete set of part-{1,2,3,4,5} that compile fine and withstand my local I/O-error test. > > I read all bugmail, so no need to NI me unless you want to get immediate > attention. I thought that you were too busy with firefighting, sort of, and may not have the time to read my comments, and I wanted to make sure that the changes I am going to make is OK with you. Thank you for taking care of build failures due to M-C issues. As I mentioned, I was quite impressed several days ago when I see the following line after |make mozmill|. INFO | (runtestlist.py) | Directories Run: 35, Passed: 1195, Failed: 0 <=== Note NO ERROR! > In part 1 I had to create a |bool m_tempMessageStreamReusable;| to get the > flag to where I needed it, so I don't think there's a problem repeating the > trick elsewhere. Let me see if I can get this coding right. (The only uncomfortable feeling I have is that the setting and using of this flag is so far apart, I am not even sure if I get the paring of the setting and use correctly. There are places where FinishNewMessage() appears suddenly [to my eyes], and I have to wonder where the stream was created...) > I agree that flushing is a good idea so an error would happen associated > with the message and not later. I will insert Flush() in the code. TIA
Progress report. The locally updated patch set compiles, but I have a big problem. The stream to the output is accidentally closed SOMEWHERE when an object is destroyed. The closing is done implicitly at the object destruction as far as I can tell, and so is rather difficult to track. Looks to me that our attempts to keep the stream open for reusable stream backfires. After tracing the close and subsequently |PR_Close()| laboriously, I found that |PR_Close()| (|close|) is called from within object destruction action and NOT by explicit |->Close()| call and thus makes the debugging rather difficult. I think keeping the value of stream longer than before with Jorg's patch to correctly honor the intention of reusable vs non-reusable nature of the output stream backfire somehwere. It is as though (pure guess) a member of an object holds the copy of that value (instead of nullptr) and the stream got closed behind our back when the object is destroyed, and on the next message during message download, we are faced a filestream pointer value with already closed underlying file stream (error NS_BASE_STREAM_CLOSED), which TB tries to use for output, and fail. Duh. I can do two things. 1. Trying to figure out where this happens and thinking of the fix. However, I am not having high hopes for this approach because this is a totally new error with which I am unfamiliar. This error mode never happened in the last 2 years while I worked on this issue of trying to use buffered-write.) (It *could* be a simple copy&paste issue, but I don't think so.) This mode of bug is totally new to me and I am not sure if I can find out and fix it over the course of next few months. I have a few conferences/events I need to be involved starting next month all the way to December in this year, and can't afford to spend much time in debugging. Instead, I am being tempted to go with the following somewhat drastic approach. 2. Rip out this reusable stream thingy, and remove all the quasi-optimization about reusable stream, etc. PROS: code will be simpler and less cluttered. Big win for long-term maintenance. CONS: *potential* slowdown which I now believe is actually bogus. Old timers recall Knuth's dictum: "Premature optimization is the root of all evil (at least most of it) in programming." (Turing Award Lecture: Computer Programming as an Art, D. E. Knuth, pp. 667-673, Vol 17, No. 12, CACM, December 1974). Yes, I am embarrassed to admit I read his pieces about premature optimization in in Software Theory and Practice and maybe British Computer Society publication in real time. Why I think the potential optimization of using "reusability" in C-C TB is bogus : it does not buy us much from my observation. After trying to figure out where the unwanted |close| is called in my local patches by running TB under gdb debugger, and placed a tracepoint on |close| and related routines, I have found that there are SO MANY close/open calls aside from the particular SINGLE |close| which "reusable" flag tries to avoid per message (and the subsequent open for the following message if any) I also have known that |fsync| (rather than |fdatasync| now) is called per message. From these knowledge, I conclude that SINGLE close (and the extra open for the subsequent message if any) is NOT a big performance slowdown as far as I can see. I observe that the reduction of of the number of |write| by the use of buffered output ALONE (which this patch set provides) makes the operation of TB much smoother even with the |close| (and subsequent open for the subsequent message if any). (This performance figure has been observed also by someone who uses linux TB against the office samba server where the TB profile including the mail folder was stored during the development of the earlier patch set.) For this reason, I would weigh on the simplification of the code (for ease of maintenance in the future) by ripping out "reusability" check and associated |->Close()| handling. So what do people think? I can upload the tidied up patch set which compiles [and to my embarrassment, it seems to pass most of |make mozmill| tests.] Then someone can look into the issue if I don&t have the time to do so. This last point about |make mozmill| not catching this issue is verh grave IMHO. (and maybe I should raise a bugzilla about this issue?) I was counting on |make mozmill| to catch this type of errors (failure to download a message under normal conditions due to coding bug) while I was happily incorporating jorg's latest patches into my local tree and modified my patches accordingly. I tested C-C TB with |make mozmill| all the time while updating patches! So I thought everything was A-OK! But obviously it does not catch this simple error :-( Only this week, I realized something was wrong when I started the TB binary on my PC. - Imap can receive e-mail subject lines without issue. - But pop3 with Berkeley mailbox can't. It can't receive a single e-mail message at all due to NS_BASE_STREAM_CLOSED when a |->Seek()| is called (the only |seek| per message we need to call). (<--- so naturally one wants to see where the |close| is done, but I realized that |close| is done when an object is destroyed behind my back. C-C is not calling |->Close()| explicitly. Tough.) - Pop3 with Maildir somehow manages to download and store the message although I see the error messages about already closed stream (NS_BASE_STREAM_CLOSED) in the console when I ran Full debug version of C-C TB. Obviously since the stream is not reusable with Maildir, the impact is minimal. However, that I see the error even in Maildir case suggests that something *IS* wrong. :-( (Too bad that I had to fix some hardware issues of my PCs in my spare time and thus could not check the patches for like 2 months (late Feb, March, and April) Although I now have a powerful standby machine, if the source code is buggy, it is no use. I am not entirely sure where the issue crept in. As far as I could see, there were not many changes in C-C tree related to this. However, due to day time job workload, I don't have the luxury of going back the last several months of changes since the patch was working fine until about January.) Agah, the small size of developer community hurts in these times. Anyway, I will appreciate to hear people's opinion here before drifting to either seemingly endless debug time and/or ripping this "reusable" thing to simplify the code. TIA
Flags: needinfo?(jorgk)
Flags: needinfo?(acelists)
I am comparing the source tree a) which I updated in the last two weeks or so), and b) the one before that. a) can download messages by pop3. b) fails to do so. Creating a whole source tree with a patch set, and then creating another with the other patch set, and comparing them seems to be much easier than trying to grok each patches alone. Anyway, hopefully, I can come up with the working patch in a few days, and if, NOT, I will post the diff for someone to investigate while I am tied up with other things. TIA
I am preparing THREE different patch sets now. 1. The patch set before Jorg introduced "reusable" flag for output stream creation. I needed to revisit this because C-C TB Tryserver added a few more restrictive -Werror check of warnings, and I needed modify/revert some changes due to this. This works. It compiles locally and on C-C tryserver. It can download messages (and tested well before). 2. The patch set accommodate the "reusable" flag Jorg introduced in a few places completely. Unfortunately, the code does not work. TB with this patch refuses to download the second e-mail onward saying that there is an I/O error. The first e-mail message is downloaded correctly, though. I am afraid that the TB code is built in such a way that the stream kept alive (not closed) across multiple message download gets destroyed after a message download. "Destroying a stream" means closing a stream, and the processing of the second message encountered an already closed stream and thus the failure to download a new message from the server. NS_BASE_STREAM_CLOSED is the error. This behavior is not quite expected. Maybe my previous patch was quite eager to close streams explicitly and did not encounter this issue. I had to slightly edit jorg's patch to accommodate the strict -Werror check on C-C tryserver. This compiles locally and on C-C tryserver. But this fails the message download as noted above. I am afraid either there is a fundamental issue in TB code and/or there is a grave copy/paste error somewhere (However, I checked carefully. I think I cannot find such errors myself even if they exist now. I need the proverbial "shallow and wide" eye scrutiny. If Jorg can spot an obvious issue right away, that is great. But |close| occurs as a result of destruction of stream class variable which I did not expect see at all (!) 3. The patch set based on the first 1, but I took out all the usage of "reusable" flag of OUTPUT stream. (reusable flag for INPUT stream is left intact.) This simplifies code very much. (Now I always |Close| streams in FinishNewMessage() and DisardNewMessage()). This works. It compiles locally and on C-C tryserver. I can download messages without problems, and |make mozmill| local test passes cleanly. After struggling with the debugging patch set 2, and looking at my own patch set in 1 above, I am inclined to propose to go with No. 3. This is because the optimization based on "reusable" flag on OUTPUT stream is not that great. I think it is misplaced. Rational: Appending the message to mbox mailbox always occurs at the end. So a SINGLE seek call is enough to place the file pointer for correct writing. Maildir operation requires opening/closing for each message anyway. One |close| for mbox mailbox format per message download (and |open| for the next message) is NOT a big deal. We call |fsync()| per message at message download anyway (which I think should be replaced with |fdatasync|) and this call alone would be large overhead in comparison to a single |close| call. I think |close| is negligible. Besides, |fsync()| requires us to make sure that buffered output (in mozilla library) be flushed to disk by calling |Flush| under the buffered output scheme [or call |Close()|]. So I realize now that I have to call |Flush()| anyway where |Close()| was called before in FinishNewMessage(), etc., and check the return value of such |Flush()| for potential I/O error. With this consideration, I think calling |Close()| is much simpler and be done with it is much saner approach. But doing so makes the shuffling of |reusable| flag for OUTPUT stream meaningless then. OTOH, |reusable| for INPUT stream *MAY* be meaningful since there is a chance for searching for strings in multiple messages, say, and if we can reuse the file stream as we scan the messages, it certainly is useful. And as far as I can tell, there does not seem to be a problem in patch set 3. (Maildir format requires us to open/close per message search, though.) Anyway, as soon as I make sure that the three patch sets build on C-C tryserver including OSX and Windows [I am testing linux64 version locally], I will upload the patches for everybody to see what the code complexity is. (Patch set 3 is prepared in a hurry and so the comments are not tidied up yet. It will be work-in-progress.) Of course, I will test the patch 3 by testing with it pop3 vs imap, mbox vs maildir and copying/moving messages in between different accounts. (I encountered a subtle issue with such copy/move testing with the older version of patch 1.) I will be able to produce the elapsed time of, say, downloading 1000 short messages with patch 1 and patch 3 to show that there is not much different between these versions. (To be frank, the introduction of buffering alone would make the execution speed very fast. Also, a single |open/close| per message pales in terms of execution time when the message is large and many |write| needs to be issued, and a short photo attachment or PDF attachment is "large" enough. to cause such many writes.) One other reason why I prefer patch 3 over others is as follows: This will probably be another patch that follows this bugzilla issue. I realize that the Close() in FinishNewMessage() may fail due to flushing of data to an almost full file system and/or erratic file system (say, unreachable remote file system after a sudden network outage). In that case, instead of proceeding with FinishNewMessage(), TB really ought to switch to the process done by DiscardNewMessage(). I only realized this very low, but definitely real possibility lately when I was pondering how to take care of error code returned by |Flush()| that is called at the end of each message download. (The possibility is very low. We have to see the file full error, for example, at the right moment when the |Flush| or |Close| is called finally at the end of a message download. But this error scenario is real.) I can only say that error handling of message download is not thoroughly investigated and implemented. No wonder why we see reports of broken message folders so often (to my taste, that is). Without the cluttering of "reusable" handling, I thin patch 3 will make it easy for me to handle this rare error scenario. TIA
My current stand point: I would like to fix the failure to use buffering, and then move on the short-read issues as the development reasonably allows to Bug 1170606 because the current bug and Bug 1170606 are the two fatal bugs for TB users under linux who need to store their email messages in a remote Samba/CIFS file system. I want to bring in good enough features: with the current undocumented code base, I can't hope for perfection. That said, today, I am posting three set of patches, variations of the patches posted to this bugzilla entry before. I am posting these three sets to court the opinion regarding which way to go. I hasten to add that I now prefer to go with the third patch. It works and has fewer lines than others. THREE PATCH SETS: 1. The first one based on the use of |reusable| flag and is based on the recent patches from jorg. I cleaned up the last two patches so that it compiles with jorg's patches. But unfortunately this does not work as expected. (more on the failure later). NOTE: In order to make the comparison with the following patch sets 2 and 3, I have resurrected a few possibly unecessary check for |Close()| related functions. If we go with the patch set 1, I would remove them once for all. 2. The second patch set is the old patch set I have presented initially here, and has been cleaned up somewhat so that the comparison with the first one is easy. This one works albeit the use of kludgy "closed" flag. 3. The third one is entirely new. This one does away with "reusable" flag for OUTPUT stream completely. This removes the handling of reusable from the first patch set (and not uses "closed" flag either.) It still uses "reusable" for INPUT stream. This works as expected. It is simpler than the first and the second patch set as you can see by comparing the code.
file 1 of patch set 1. (USE REUSABLE)
file 2 of patch 1 (USE REUSABLE)
file 4 of patch set 1 (USE REUSABLE)
file 5 of patch set 1 (USE REUSABLE)
Sorry this must come at the beginning of patch set 1. file 0 of patch set 1 (USE REUSABLE)
file 1 of patch set 2 (USE CLOSED)
file 2 of patch set 2 (USE CLOSED)
file 3 of patch set 2 (USE CLOSED)
file 4 of patch set 2 (USE CLOSED)
file 5 of patch set 2 (USE CLOSED)
file 1 of patch set 3 (DON'T USE REUSABLE)
file 2 of patch set 3 (DON'T USE REUSABLE)
file 3 of patch set 3 (DON'T USE REUSABLE)
file 4 of patch set 3 (DON'T USE REUSABLE)
file 5 of patch set 3 (DON'T USE REUSABLE)
General comment: The commit message requires slight modification before we proceed further. (some of r=jorgk is no longer valid, for example.) I will do so once we decide on the proper direction to proceed. The second and the third patch sets both work. They take care of, say, 95% of the serious file I/O bug issues (aside from the TBD error recovery issues) I believe they are good enough (better than the current state where the error recovery simply didn't work cleanly even in the case of single message failure (during a series of pop3 dumps.) I said |cleanly|: the supposed error recovery handling did not work as expected. (My patch sets still do not handle the error of |Seek/Tell|. It can wait. Given that the errors caused by |Write()| and |Close()| are the majority of errors during simulated remote file system errors, we are better of with the new patch sets.) --- - Background (if the reader already know this, please skip this part until "- Up to the development early this year" heading part, which is in the next comment . Buffering output introduces the chance of |Close()| failing due to the pending output being flushed before the final close action, and then experience file system error. (file system full, write error due to media or network errors.) So there is a need to check the return value of |Close()|, which has been previously ignored. I realized that the |Close()| returns NS_BASE_STREAM_CLOSED in many places once the check is done and error is artificially introduced. I found out that |FinishMessage()| is closing a stream, but the variable that holds the pointer to the stream structure is NOT cleared to nullptr so that the subsequent code believed that the stream is still live and try to close the stream AGAIN and experienced NS_BASE_STREAM_CLOSED. (Yes, the nullness of this stream pointer is treated as equivalent to stream being still open. An implicit assumption in many parts of the code base. On the other hand, the simple error handling scheme depended on the variable to be nullptr once the error was detected at the lower-level code and the stream was closed.) --- So major restructuring was necessary. Initially, my approach was clearing the variable that holds the pointer to the stream structure must be now DONE outside the |FinishMessage()| and |DiscardMessage()| to avoid the stale file stream referenced later. Clearing the stream variable to nullptr outside the functions needs to be done only when |Close()| is called with these functions. This made the handling difficult. So initially I removed Close() entirely from within the functions and called Close() this outside the functions. That was my initial approach. Basically the change was done with this approach, and the code worked OK during my extensive manual testing of artificially introduced network errors to mimic file system errors by mounting a remote CIFS/Samba, NFS remote servers to store mails there except for the case of using Maildir under Windows. The issue of incorrect behavior Maildir under Windows was noticed on C-C tryserver test runs. It turns out that there is a timing issue. Using Maildir, with my patch, there was a timing when a file with open stream to it was deleted/renamed/moved. Under Posix-like systems such as Linux, Solaris, FreeBSD, etc. it was OK. But under Windows, you cannot delete/rename/move a file with an open file stream against it. So I had to revert the position of |Close()| calls in the original code in the case of Maildir. This means, though, I have to propagate the information about whether |Close()| was called inside the functions (FinishMessage(), for example), and reset/clear the variable that held the stream pointer only if the stream was closed. So far enough background.
- Up to the development early this year. I used "closed" flag returned by these functions to propagate the information whether |Close()| was called in FinishNewMessage, et al, and modified the code again. With this approach, the patch set worked for Maildir under Windows (checked on tryserver), too. All was well. I tested the binary extensively for use against remote file system and introduced network errors to mimic the file system errors. --- Later development early this year: Now, instead of my introducing this "closed" flag to handle the special case when |Close| is called in |FinishNewMessage| et al, Jorg preferred the use of "reusable" flag returned by the functions for streams opened for writing. If the stream is "reusable" for writing subsequent messages, close/open are not performed before the subsequent writing(POP3 case). The stream remains open. If the stream is NOT "reusable" for subsequent messages (Maildir case), it is closed and a different stream (using different filename) is opened later for he subsequent writing. I thought this suggestion was reasonable and tried to modify the code according to his preferred interface change. I began modifying the code, but about late February (or march), my PC began behaving erratically (I think it is hardware-related), and I spend weeks to find out the cause and even obtained replacement hardware but moving the development platform on a local PC, not on a cloud service, takes time. I finally could begin refocusing on this patch after the instability is weeded out more or less around the end of April. But then when I started to test the operation of modified TB in this manner, I found that the new approach based on "reusable" flag does not work :-( With the latest patch I created based on Jorg's idea of using "reusable" instead of "closed" I observed a bug: output stream is closed behind my back. It is not closed by the explicit call of |Close()| directly in the source code. But rather, it is closed somewhere as part of stream variable's destruction , ~destruction, that is. I have never seen this problem of stream getting closed in an implicit manner during my 1+1/2 years of coding/debugging for the initial approach. I am very perplexed. Trying to keep the stream open as long as is desired with "reusable" flag caused this unexpected side effect and TB is not usable with the patch. If I have a serious of messages in POP3 server, then I can download the initial message all right, but on the second message and after, I get file system error and an error popup is shown. Basically, the second message cannot be written to the already closed stream (!). I am not sure why this problem of class object being destroyed that contains the file stream did not get noticed before. Maybe because the "reusable" flag had not been used extensively before since as Jorg noticed and had to modify a function which did NOT return the information properly before. In order to use the suggested new interface throughout the code, jorg had to modify a function which did not return such a flag before. Obviously because of this lack of universality, "reusable" information was not used and tested extensively before. This is my guess. (I could be wrong, of course). My tentative guess of the unexpected |Close|. After the initial debug efforts, I have a theory. Somehow, somewhere a variable that holds the value of the pointer to the file stream, which in my previous patch must have held nullptr due to my code's preference to close the stream(?) and nullify it, gets destroyed due to the function exit and/or class object to which the variable is a member is destroyed. The destruction seems to happen either at the beginning or at the end of the inclusion of one message. AND during the destruction, the stream is closed: this is not what I want. As a result of this implicit closing, the first message is now downloaded and inserted into the Inbox correctly, but on the downloading and inserting the second message, C-C fails since the output stream (for MBox) has now been now closed and yet the m_outstream remains non-null (due to its supposed reusability) and so it tries to access already closed stream and experiences NS_BASE_STREAM_CLOSED eventually during the necessary |Seek()| and |Write()|. You can find the error in https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=23581054b5c31e03f1bab81d3ed7648e8f62601e by looking at the X1 log of linux64 debug test. Looking at the raw log, and search for reusable, you will find a line that looks like this: 18:18:39 INFO - PID 7464 | {debug} IncorporateBegin : m_tempMessageStreamReusablePOP3 = true (This output is written to STDERR by a patch in temp.patch for debugging purposes.) The line is printed during the first message being downloaded and incorporated. However, the TB encounters an error during the second message download: NS_BASE_STREAM_CLOSED --- Quote --- 18:18:39 INFO - PID 7464 | [7464] WARNING: (info) nsBufferedOutputStream::Write returns NS_BASE_STREAM_CLOSED immediately (mStream==null).: file /builds/slave/tb-try-c-cen-l64-d-00000000000/build/mozilla/netwerk/base/nsBufferedStreams.cpp, line 795 18:18:39 INFO - PID 7464 | (debug) m_outFileStream->Flush() returned 0x80470002 18:18:39 INFO - PID 7464 | [7464] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80470002: file /builds/slave/tb-try-c-cen-l64-d-00000000000/build/mailnews/local/src/nsPop3Sink.cpp, line 1040 18:18:39 INFO - PID 7464 | [7464] WARNING: (info) nsBufferedOutputStream::Write returns NS_BASE_STREAM_CLOSED immediately (mStream==null).: file /builds/slave/tb-try-c-cen-l64-d-00000000000/build/mozilla/netwerk/base/nsBufferedStreams.cpp, line 795 --- end quote --- There is no explicit debug dump statement regarding |Close()| being called. This is because |Close()| is called as part of class object destruction. I found this out by tracing |close| system call manually. Hmm...
Hmm... The debugging of this bug in patch set 1 looks tough and I did not have and will not have for the next few months the time to focus in the root cause. (I have to confess I spent maybe a couple of months to track down the issue of Maildir under Windows late 2015 and early 2016. I spent more than one and half a year for the whole debugging effort for this buffering output issue. I am afraid the current bug is probably as time-consuming as this old bug from the viewpoint of an occasional patch contributor. Just for cleaning up the patch sets so that they compile and run tests reasonably well before I upload them here for comparison purposes, I already spent more than a month thanks to the finicky C-C TB tryserver which fails builds from time to time due to reasons not related to my patches per se :-( ) [Note that the version with "reusable" flag may be failing the tests due to the issues mentioned above. However, some tests pass locally if I use |make mozmill SOLO_TEST=.... syntax] My wish: If someone can figure out the bug in the FIRST patch set from the description above soon, that will be great. I think I have given enough clue to see what corners in the code people may want to take a look. Sadly, I don't have the time to do this extensively this month and the next, and quite likely till the end of the year. Cleaning up the three patch sets so that they look very similar albeit the essential differences and making sure that they compile on tryserver today have been all I could. Given the difficulty I had of debugging in my spare time or rather the long duration of this debug step previously in 2015 and early 2016, I would rather see the resurrection of my old patch, a kludge indeed, or the adoption of simplified version without "reusable". I think I and many people prefer the third version, i.e. the simplified version without "reusable". Reason behind my thinking: I want to move on to the next series of patch which is *ESSENTIAL* for using linux C-C TB client against Samba/CIFS share across LAN: Handling short read. Bug 1170606. Otherwise, we won't see benefit of the patches before the TB code is modified to move away the mozilla core somewhat (?) I think with the patch in Bug 1170606 applied to the source AFTER the patch set in this bugzilla, most of the issues of network file system problem [linux TB client storing the messages in remote Samba/CIFS share] will disappear based on the observation of a couple of testers in the past. They saw the errors of message copy, etc. disappeared with my earlier patches including both this bugzilla and Bug 1170606. Without buffering, a single C-C TB client pesters a remote file server with MANY SHORT WRITES (like many 75 octet writes for a few MB of photo, PDF, ppt, etc.) This is a nightmare scenario for performance and possible reason for short packet being returned due to the excessive workload when C-C tries to READ something instead. Again, if someone can find the bug in the latest version based on Jorg's preferred IDL interface (the first patch set), and FIX IT (more important) in the latest patch, that will be great since jorg is likely to be code shepherd for the next few years (right, jorg?) But at this time, I do not have high hopes for the fix, especially if we need to restructure the class object destruction order, etc. I hate to think we have to modify the existing pop3 protocol handler much. My take on the reusable flag.: The reusable flag is handy. But it has the tendency of keeping the usage of the variable very far from the obtaining the value. I feel very uncomfortable about it. In the current spaghetti code, I can't convince myself easily whether the value I am using for "reusable" flag is a correct one unless the both places of obtaining the value and use of it fits in one screenful of lines. (At least, I dumped the value and checked for pop3 case. "reusable" flag is |true| as expected and the code did not close the file stream explicitly. The problem is that the code destructs the stream implicitly behind my back when the class variable is destroyed due to exiting of the scope, or implicit destruction done by the surrounding class objects(?). I never saw such implicit closing during my 1+1/2 years of debugging the original code.) In this sense, my use of "closed" flag has the tendency of keeping the obtaining the value and usage of it very near. Probably next to each other or only a few lines at most. This proximity makes it more convincing and comfortable to use. In a code like C-C TB that gets changed often, this fact alone favors the use of "closed" flag albeit being a kludge IMHO. Having to check the matching of the value generation and usage of it, without any formal methodology [like using an assertion checker] is a waste of developer time over the life of C-C TB maintenance. So I admit |closed| flag is a kludge, but I argue it is a very good kludge indeed for the problem at hand. (I also threw in sanity checks in the forms of some extra assertions that should never trigger in a normal state of operation. This is to make sure that we could detect the subtle bugs in my patch set(s) introduced the chance of incorrect behavior. These are some of the "NEVER HAPPEN" code snippets which jorg noticed and took out. (Sorry I still think it was a good idea to throw them in to keep the code sane. C-C TB code is convoluted in some places. BTW, maybe I was closing the stream too often without checking "reusable" duly? Maybe. But that, by sheer luck, seems to have masked the problem caused by the destruction of variable results in closing the stream which I don't favor.
So can you offer some guidance regarding which way to go? That is, which patch set is to pick up and proceed? For patch set 1, somebody has to find the root cause of the mysterious stream closing done behind our back. It could be due to a simple copy&paste error or something. But the original author (me) of the patch could not find such obvious error easily. Someone else may. TIA
So we have three options now: Option 1: Pass out a reusable flag from getOfflineStoreOutputStream() and do great things with that new information. I coded this in attachment 8847340 [details] [diff] [review] and I believe attachment 8879207 [details] [diff] [review] is identical. We agree that this patch is "stand-alone", it simply adjusts the interface and fixes all the callers. So I believe that that cannot possibly cause any problems or things that are now mysteriously closed. Right? Why don't you run the tests with that patch only, nothing else. Is there now a POP incorporate problem on the second message since someone closed the stream? I don't think so. If so, I'll debug it personally. So what I'm saying is the this general idea cannot be the cause of problems. If you see a mysterious close problem, I suggest you've caused it yourself. Option 2: That's the old ugly "put closed flags everywhere". Let's not discuss this further. Option 3: Apparently a new smart approach without any 'reusable' flag. So DiscardNewMessage() and FinishNewMessage() will *always* close the output stream. So you're basically ignoring the optimisation built into the mbox code that has an array of open URIs and will just try to seek to the end of the appropriate one. If the seek fails, it discards the URI from the list and tries again. Code here: https://dxr.mozilla.org/comm-central/rev/484beb60833c61561f1c447b972e7d4f55e52633/mailnews/local/src/nsMsgBrkMBoxStore.cpp#642 We'd have to get approval for removing the optimisation. Personally, I don't think it's desirable to do so. So to summarise: I think that the sadly non-working option 1 is best.
Flags: needinfo?(jorgk)
(In reply to Jorg K (GMT+2) from comment #173) > So we have three options now: > > Option 1: > Pass out a reusable flag from getOfflineStoreOutputStream() and do great > things with that new information. > I coded this in attachment 8847340 [details] [diff] [review] and I believe > attachment 8879207 [details] [diff] [review] is identical. > > We agree that this patch is "stand-alone", it simply adjusts the interface > and fixes all the callers. So I believe that that cannot possibly cause any > problems or things that are now mysteriously closed. Right? > > Why don't you run the tests with that patch only, nothing else. Is there now > a POP incorporate problem on the second message since someone closed the > stream? I don't think so. If so, I'll debug it personally. So what I'm > saying is the this general idea cannot be the cause of problems. If you see > a mysterious close problem, I suggest you've caused it yourself. > I will do this. I may be too optimistic to incorporate all the changes in one step. > Option 2: > That's the old ugly "put closed flags everywhere". Let's not discuss this > further. Agreed. > Option 3: > Apparently a new smart approach without any 'reusable' flag. So > DiscardNewMessage() and FinishNewMessage() will *always* close the output > stream. So you're basically ignoring the optimisation built into the mbox > code that has an array of open URIs and will just try to seek to the end of > the appropriate one. If the seek fails, it discards the URI from the list > and tries again. Code here: > https://dxr.mozilla.org/comm-central/rev/ > 484beb60833c61561f1c447b972e7d4f55e52633/mailnews/local/src/ > nsMsgBrkMBoxStore.cpp#642 > We'd have to get approval for removing the optimisation. Personally, I don't > think it's desirable to do so. Re optimization, please recall that I am still using the reusable flag for INPUT. I think for the WRITING to output stream for mail download, this optimization is NOT THAT USEFUL since we call fsync (we better call fdatasync) anyway per message. That fysnc (better be fdatasync) call alone is a big I/O operation and a pair of open/close pales in terms of resource usage. I can probably show the timing for 1K (that is one thousand) message downloads which does not show much difference when this optimization is taken out. (Honestly, the optimization afforded by buffering-output alone is much better optimization in terms of its contribution than this removal of a pair of open/close. I noticed this while I was trying to catch where the closing was done using gdb debugger. There are simply too many system calls including |close| during the operation of C-C TB that incur context switch overhad, and because of the added overhead of fsync(or fdatasync: I compile TB with this locally) makes this removal of open/close insignificant. > > So to summarise: I think that the sadly non-working option 1 is best. So I am not that quick to relegate option 3 to dust bin. Anyway, who should we discuss this with when you say: > We'd have to get approval for removing the optimisation. Personally, I don't > think it's desirable to do so. Anyway, I will do the option 1 testing, but give me a couple of weeks. I will be on an extended business trip next week and already I got tied up with the preparation. Thank you again for you comment. TIA
Re. Option 3: Yes, there are many "optimisations" that don't have any benefit, yet cause a lot of headache. Keeping track of a bunch of open streams/files is complicating program logic as we know. It would of course be much simpler do do this: Maildir: Create message file, write file, close. The end. Mbox: Open or create mailbox file, seek to the end, write file, close. The end. Repeat for next message. If we could do any measurements that prove that saving an open/close has little effect, then we really might do option 3.
(In reply to Jorg K (GMT+2) from comment #175) > Re. Option 3: Yes, there are many "optimisations" that don't have any > benefit, yet cause a lot of headache. Keeping track of a bunch of open > streams/files is complicating program logic as we know. It would of course > be much simpler do do this: > Maildir: Create message file, write file, close. The end. > Mbox: Open or create mailbox file, seek to the end, write file, close. The > end. Repeat for next message. > > If we could do any measurements that prove that saving an open/close has > little effect, then we really might do option 3. That is what I intend to show. Will the comparison of the performance of patch set 2 and patch set 3 be interestingfor this purpose? [Actually with patch set 2 and patch set 3, we already use buffering output so the performance is already very good, and the overhead of a pair of open/close per message is slightly more in terms of total performance (the total execution time is much shorter.] I *think* with the kludge, patch set 2 at least tries to reuse the output stream. Patch set 3 opens/closes stream per message download. TEST CASE: (*1) What about the comparison of the execution time of [in the case of Mbox obviously] - Downloading 1K empty body messages (to highlight the overhead of a pair of open/close per message in case of MBox), and - Downloading IK sample messages from real-life (without attachment), say, copies from bugzilla e-mails (resent locally) to show that overhead of handling message bodies alone will mask open/close pretty much. I can count the number of |open|,|close| calls on top of the execution time comparison to make sure I am measuring something meaningful. In both cases, I will use local e-mail account to send and receive 1K messages. WHY: If I use remote servers, ALREADY the benefit of optimization of a single pair of open/close is moot. You get the network timing overhead and fluctuation that I cannot control. That is the whole point. TB's download is rather slow to begin with. A single open/close does not account much in real life, especially so with the unbuffered output. Of course, I store the local e-mail message on local disk. WHY: I could use remote file system to store "local" e-mails to artificially heighten the overhead of a single open/close per message. But then read/write and fsync() overhead will be also very large and eclipse the overhead of s single open/close per message. (Initially, I thought I won't do this scenario unless someone is interested. Now I have a second thought. I will measure this case, also just for the sake of comparison.) As a matter of fact, the overhead of long and large set of headers (coming from remote servers such as from office365, gmail, etc. instead of a server on the same host) and e-mails with large attachments in real life really makes the "optimization" of a pair of open/close per message very moot, especially with today's unbuffered output. Like I said, the overhead of remote-system download already is big enough to make the "optimization" not so useful. BTW, has anyone checked the header length of mails downloaded from office365 lately? It is huge(!). So the case (*1) is something I can think of to make the overhead of a pair of open/close (for Berkeley Mbox) to stand out for checking the usefulness of the "optimizaiton" in the current code. My whole point of this bugzilla is to use buffered output and while I was trying to introduce buffering I noticed this rather premature "optimization" which does not seem to think hard about the real world performance profile. - I found it complicates the logic and makes modification difficult during coding, and - I noticed that the TB's other overhead during run time really masks this open/close for mbox rather insignificant. Enough statements without data. I will post the performance figures of (*1). I could be totally wrong ! :-) Especially, I am assuming that time for |seek| after |open| is O(1). If it is O(log(S)), it still would be acceptable. (S being the size of mbox file.) But if it is O(S), then the performance *MAY* SUFFER. HOWEVER, this is something one must realize. The current run-time of C-C TB, which we call to make an output stream buffered REWINDS the stream to the top, and so we needed to re-seek to the end AFTER buffering is enabled. This was done in my patch series. And this |seek| is called ONCE per each message download in my patch set. So we ALREADY suffer from the |seek| overhead, if any, but I have not noticed any bad effect (of course, buffered output wins.) [I bet modern OS including MS Win has a reasonably designed file system to make seek to the end rather efficient.] (BTW, nbelievably, |seek| is called per each output and without buffering, it is basically called per EVERY SINGLE LINE today...) Yes, the current C-C TB code was not clear enough for me to figure out where we can make the output stream buffered (and then buffering REWINDS the stream back to the beginning (!?). Oh well. There are so many issues I wish I could fix before the move to the new code base begins in the next 24 months(?).) Anyway, obtaining the performance figures for (*1) should be doable reasonably early. Oh well, it may not be doable within this week: it may have to wait until after July 1st. So the measure scenario would be: for mail-storage in ("local disk" "remote file system") do for collection in ("1K empty messages" "1K real world messages without attachment") do measure the execution time of downloading the messages in the collection (also measure the number of open/close during that time) done done I will do the evaluation under linux. If you have a suggestion of a modified test scenario, please let me know. PS: I HAVE KEPT reusable for INPUT stream because I *THINK* accessing previous e-mail messages randomly happens in C-C TB and in that case, keeping an opened stream seems to be a win. Again, this may not be the case. But I wanted to keep my change minimum. (With Maildir this won't be the case obviously.) I really wish developers paid attention to performance figures with measured data during the development of C-C. Just wading through system calls using gdb made me realize that C-C code today spend wasted effort in premature "optimization", which Knuth pointed out many many years ago.
(In reply to ISHIKAWA, Chiaki from comment #176) > Will the comparison of the performance of patch set 2 and patch set 3 be > interestingfor this purpose? Sure, particularly since option 1) doesn't work right now. > I really wish developers paid attention to performance figures with measured > data during the development of C-C. Well, the fact is that I spend more time writing(!) and reading mail than downloading it since this happens in the background while I do other things. So while it's great that we improve and clean-up this area, I don't think it will make users go "Uh, ah, the new TB, blazing fast". The problem I have with TB is that every morning when I start it, it freezes for a few seconds while checking POP mail, etc. I don't know what it's doing. So there it room for improvement (but that's for another discussion, preferably NOT to be had here).
(In reply to Jorg K (GMT+2) from comment #177) > (In reply to ISHIKAWA, Chiaki from comment #176) > > Will the comparison of the performance of patch set 2 and patch set 3 be > > interestingfor this purpose? > Sure, particularly since option 1) doesn't work right now. > Jorg, I did a preliminary performance comparison on my local PC. I am glad I did. There was a performance bug in my patch set 2 (the ugly close flag thing) : Basically it is too eager to close the file stream. Background: I tried to fix the failure to rename/move a message file (using Maildir under Windows) when TB does not close the written message file in FinishNewMEssage(). TB tries to move/rename it while there was still an open file stream to it, and Windows OS failed the operation. We have to close the stream before move/rename can succeed. My fix was basically close the file inside FinishNewMEssage() when we use Maildir. Well, either at that time of fixing it thusly (or even before it), my code was eager to close the file stream even for Berkeley Mbox format then! [And I had a feeling that my patch set 2 was too eager to close the file stream when I noticed that patch set 1 does not work.] As soon as I made the comparison of patch set 2 and 3 and compared the # of open/close calls, I noticed there were NOT that much difference of open/close calls which surprised me very much. I investigated and sure enough, my final patch set 2 somehow closes the file stream even for Berkeley MBox file. So all along in the last 12 months or so, I was testing in this mode, and that is why I did not see the strange file stream class variable being destroyed and as a side-effect closing the stream. I was setting file stream variable to nullptr all the time (!) even for MBox storage after closing it. So it is a performance issue and somehow masked the issue of class variable's destruction closing the file stream implicitly in patch set 1. Oh well. Another thing I noticed is that when I noticed this issue, I tried to compare the performance against the native TB binary available as icedove under Debian. Native TB ought to have this "optimization" enabled. I use Debian GNU/Linux for my local development. I found icedove is blazingly fast. Then I realized my local build of TB is FULL DEBUG builds and so it produces copious debug output to the console, and that slows down the operation completely. So I need to create non-debug local builds (with my patch sets) to compare the speed to C-C TB without my patches anyway. This will take time. So may be we have to wait for the week after. But I will upload a small PDF that shows the comparison that I did last evening. That shows that there are about twice (2 x 1K) open/closes. I think 1K open/close corresponds to the open/close per message (I downloaded 1K messages) and the other approximately 1K open/close correspond to the internal caching of header and other database operations (mork?). Because of the asynchronous nature of DB's caching/flushing stale cache entries, etc., I see sudden decrease of open/close in some runs, and coupled with network issues and server hiccups(?), the execution time for downloading 1K messages change randomly. Although the test was done against local dovecot.pop3 server instance on the same PC, there are enough fluctuations to make the comparison only statistically meaningful. [The linux image uses 4 cores inside VirtualBox running under Windows 10. The native CPU is XEON 4 real core/ 8 vcore type. I have 8GB of memory allocated to it, but I do see occasional heavy paging, etc. I am running emacs, etc. So it is hard to see the pristine comparison, but I have to stress that this is the type of typical user desktop environment running other applications, and unless we see dramatic effect, "premaure optimization" that makes the code tricky and hard to maintain should be thrown away IMHO.] > > I really wish developers paid attention to performance figures with measured > > data during the development of C-C. > Well, the fact is that I spend more time writing(!) and reading mail than > downloading it since this happens in the background while I do other things. > So while it's great that we improve and clean-up this area, I don't think it > will make users go "Uh, ah, the new TB, blazing fast". I understand the different workloads people may encounter. In my case, I noticed the slow writing of big messages because I receive tons of e-mails with large PDF/PPT/doc attachments when a conference/exhibition deadline is approaching, and I just could not believe how slow the downloading of these e-mails with attachments was: so I monitored the system calls when I noticed the small writes caused by unbuffered write. 2MiB attachments at 75 octets per each write results in 26667 write system calls. Ugh, too many. It is slow for local disk write. If your mail store is on a remote system, you are basically dead. I receive like 100 of them a day at least during October - November time frame. A fast buffered write makes me happy. > The problem I have > with TB is that every morning when I start it, it freezes for a few seconds > while checking POP mail, etc. I don't know what it's doing. So there it room > for improvement (but that's for another discussion, preferably NOT to be had > here). I noticed this too. Let's discuss this somewhere else. I am uploading the comparison PDF in the next comment. Actually it is not a comparison. It just shows that patch set 2 and patch set 3 perform about the same due to the bug I mentioned here.
(In reply to ISHIKAWA, Chiaki from comment #179) > Created attachment 8880043 [details] > tb-performance.pdf: patch set 2 and 3 execution time (sec) for downloading 1 > K messages (FULL DEBUG BUILD). We should dwell too much on the large number of open/close for the first run of icedove test. It is based on older TB code and if I understand correctly, there was much change in cache code, etc. Again, I will have more performance tests done by non-DEBUG builds, hopefully in July. TIA
> We should dwell too much on the large number of open/close for the first run of icedove test. I meant to say We should NOT dwell too much on the large number of open/close for the first run of icedove test.
OK, I managed to track down where a funny thing happens. I applied the patch one by one and figured that there is a funny thing going on in nsPop3Sink.cpp code. But my code, assuming the sane behavior of called functions, ought to be correct. So called functions are suspect. Bingo. I found an issue with GetNewMsgOutputStream(). (BTW, I found that file stream's being closed by class object destruction happens for INPUT stream, but that is not the cause of the bug I see. I am experiencing a strange OUTPUT stream issue. Please read on.) I will be in a communication black hole during a trip next week. Problem: For POP3 download using BrkMbox, reusable flag is returned as true by GetNewMsgOutputStream(). Optimization Design background: The optimization of output file stream is like this. Begin Message Delivery IncorporateBegin ^ <- we obtain an output file stream 1st message download IncorporateComplete | IncorporateBegin | <- we want to use the same file stream here 2nd message download IncorporateComplete | IncorporateBegin | <- we want to use the same file stream here again. 3rd message download IncorporateComplete | ... IncorporateBegin | <- we want to use the same file stream here, too. N-th message dowload IncorporateComplete V End Message Delivery Reality of code now: However, when I call GetNewMsgOutputStream() in IncorporateBegin, it somehow fails to return the previously valid m_outFileStream value and returns a different value. This seems to be the cause of the bug I see. Question: How should I call GetNewMsgOutputStream call properly so that the same stream is returned in the case of Berkeley Mbox? We now have an issue since it returns DIFFERENT (!) filepointer (?) although I want it to return the same pointer when the file is reusable?. After seeing this bug , as a bandage, I thought of NOT calling GetNewMsgOutputStream() if reusable flag is true and I already have a valid m_outFileStream so that I can use the old m_outFileStream intact. But that seems INCORRECT since GetNewMsgOutputStream() seems to set/modify an internal MsgHdr DB during its processing. This DB processing must be done for each message downloaded, correct? More on the analysis below. Dump log. A dump from a local run. (This dump [16807] WARNING: You are trying to use the deprecated attribute 'prettiestName'.: file /NREF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgDBFolder.cpp, line 3615 (seekdebug) Seek was necessary in IncorporateBegin() for folder Inbox. (seekdebug) first_pre_seek_pos = 0x0000000000000000, first_post_seek_pos=0x0000000002296264 {debug} m_tempMessageStreamReusablePOP3 = true [16807] WARNING: (debug): ApplyForwardAndReplyFilter getting called.: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 1269 Incorporate message complete. Incorporate message begin: uidl string: 000106c2533eeb2c [16807] WARNING: (debug): IncorporateBegin: !m_downloadingToTempFile path: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 550 {debug} IncorporateBegin : m_tempMessageStreamReusablePOP3 = true m_outFileStream before GetNewMsgOutputStream = 0xb767b00 <=== Here m_outFileStream after GetNewMsgOutputStream = 0xb7d6910 <=== Oops, changed!? (debug) nsCOMPtr <nsIInputStream> inboxInputStream = do_QueryInterface(m_outFileStream, &rv);failed with rv=0x80004002 [16807] WARNING: (debug) We are enabling buffering for m_outFileStream in nsPop3Sink::IncorporateBegin in nsPop3Sink.cpp.: file /NREF-COMM-CENTRAL/comm-central/mailnews/local/src/nsPop3Sink.cpp, line 630 (debug) 639: m_outFileStream->Flush() returned 0x00000000 [16807] WARNING: (debug) Error: ras->Seek() returned error within nsBufferedStream::Seek, so we exit early.: file /NREF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/nsBufferedStreams.cpp, line 194 (debug) Seek in nsPop3Sink.cpp failed: rv=0x80470002 The dump above is created by the modified nsPop3Sink.cpp and a few other files. The change is in the attached temp.patch: This is applied to AFTER the patch set 1 is applied. 5 A 1242030-part0-pass-out-reusable.patch: Bug 1242030 - Pass out 'reusable' from GetOfflineStoreOutputStream(). 6 A 1242030-new-part-1.patch: bug 1242030: Improve error handling in file/stream I/O. Part 1. r=jorgk. 7 A 1242030-new-part-2-imap.patch: bug 1242030: Improve error handling in file/stream I/O. Part 2. r=jorgk. 8 A 1242030-new-part-3-import.patch: bug 1242030: Improve error handling in file/stream I/O. Part 3. r=jorgk. 9 A 1242030-new-part-4-local-less-pop3.patch: bug 1242030: new part-4 new splitting of (consolidation of 1122698, 1134527, ... 10 A 1242030-new-part-5-pop3.patch: bug 1242030: new part-5 new splitting of (consolidation of 1122698, 1134527, 1134529, 11... 11 A temp.patch: temp patch 12 U enable-buffering-1st-step.patch: Bug 1242042: Enabling buffering for file stream to write message for C-C TB (Enabling ... 13 U removing-a-Seek-rev02.patch: Bug 1242046 - Removing unnecessary |Seek| that caused the C-C TB to operate slowly in term... 14 U add-check.patch: add check for pop3 reusability Modified nsPop3Sink.cpp Part of nsPop3Sink::IncorporateBegin() ... else { NS_WARNING("(debug): IncorporateBegin: !m_downloadingToTempFile path"); rv = server->GetMsgStore(getter_AddRefs(m_msgStore)); // bool reusable; NS_ENSURE_SUCCESS(rv, rv); #ifdef DEBUG fprintf(stderr,"{debug} IncorporateBegin : m_tempMessageStreamReusablePOP3 = %s\n", m_tempMessageStreamReusablePOP3 ? "true" : "false"); fprintf(stderr,"m_outFileStream before GetNewMsgOutputStream = %p\n", (void *) m_outFileStream); #endif rv = m_msgStore->GetNewMsgOutputStream(m_folder, getter_AddRefs(newHdr), &m_tempMessageStreamReusablePOP3, getter_AddRefs(m_outFileStream)); #ifdef DEBUG fprintf(stderr,"m_outFileStream after GetNewMsgOutputStream = %p\n", (void *) m_outFileStream); #endif #ifdef DEBUG if (NS_FAILED(rv)) { fflush(stdout); fprintf(stderr, "(debug) IncorporateBegin: m_msgStore->GetNewMsgOutputStream(....) failed. " "rv = 0x%08x\n", (unsigned int) rv); } #endif The above debug dump shows the stream returned after GetNewMsgOutputStream() differs from the previous value. Do we have to modify the code of |GetNewMsgOuptutStream()|? But then, how did the original code work at all? Now that I think about it, maybe when we create buffered output file stream from an original unbuffered sream, have we broken thestream caching scheme? For example we may change the value of m_outFileStream inadvertedly when we enable buffering? 1242030-new-part-5-pop3.patch:579:+ m_outFileStream = NS_BufferOutputStream(m_outFileStream, 64 * 1024 ); Then hash table used inside GetNewMSgOutputStream() ought to be updated but not updated??? From the comment of NS_BufferOutputStream(), it seems so. :-( If buffering succeeds, it seems to return a different file stream pointer. The broken optimization of output file stream is like this. Begin Message Delivery IncorporateBegin ^ <- we obtain an output file stream from within a hash table. 1st message download <- we enable buffering, and the file stream value changes IncorporateComplete | <- and we do a lot on the new stream with side-effects. IncorporateBegin | <- We want to use the same file stream here and obtain 2nd message download <- the stream from the hash table, but it is NOT IncorporateComplete | <-- the BUFFERED stream used for the 1st, and thus error! ... End Message Delivery Anyone has an idea of fixing this cleanly? Like I said, I won't return until next month. BTW, My patch set 2 closed the stream always by mistake and thus, it did not use the optimization feature. My patch set 3 does away with the optimization at all. Only patch set 1 tries to use the optimization feature and uncovers this issue now. TIA BTW, When I look at the code, I think it might be better to make the stream buffered WITHIN getNewMsgOutputStream() now that I noticed this bug. *BUT* please recall we cannot generate an input stream to a file created by a BUFFERED output file stream by a single line of code. This generation of an input stream to the file created by an output stream is necessary for the case of downloading to a temporary file to work with old-fashined virus checker. But unfortunately, we cannot generate an input file stream from BUFFERED output file stream this way easily. Since a BUFFERED output stream cannot be used to produce an input stream to the file by one liner, I had to produce an input stream from the original UNBUFFERED output stream BEFORE making the output stream a BUFFERED one, and save the input file stream value in a variable to use LATER on for reading in from the temporary file. A kludge, but it was necessary. Creating a buffered output file stream inside getNewMSgOutputStream() makes later generation of input stream to the same file very difficult. (I think the low-level file class needs to be more orthogonal to avoid this failure to produce an input stream to a file from a BUFFERED output stream to create the file.)
OK, I need someone's help to move forward. I changed |GetNewMsgOutputStream()| to return a BUFFERED stream instead of unbuffered stream as in the original. This solved the issue of failure of pop3 download when reusable flag is observed. Now C-C TB properly reuses the buffered output file stream for pop3 download, thus the observed open/close system calls during 1K messages download is about half now. Good so far. However, my concern about the following issue turns out to be real and many tests fail now due to this. > Creating a buffered output file stream inside getNewMSgOutputStream() > makes later generation of input stream to the same file very > difficult. (I think the low-level file class needs to be more > orthogonal to avoid this failure to produce an input stream to a file > from a BUFFERED output stream to create the file.) For example, updating a label of a folder generates an input stream to READ the existing label field from output file stream, and then modifies it to add a new label, and WRITE the label using the original output stream. The generation of input file stream from the output stream is done using the following one liner. nsCOMPtr<nsIInputStream> inputStream = do_QueryInterface(fileStream, &rv); This is at https://dxr.mozilla.org/comm-central/source/mailnews/local/src/nsMsgLocalStoreUtils.cpp?q=nsMsgLocalStoreUtils.cpp&redirect_type=direct#204 Here, |fileStream| is an output file stream. The above do_QueryInterface() works for UNBUFFERED file stream. But it does NOT work for BUFFERED file stream. The above statement fails. Since the BUFFERED stream is automatically returned now from |GetNewMSgOutputStream()| by my patch, we have a problem in at least a couple of places. So my question which I asked more than about a couple of years ago when I worked on this buffering issue (my memory is coming back), and at the time, I simply did not bothered to figure out the answer to the question (and instead treated the BrkMbox handling as special to introduce buffering, and used UNBUFFERED stream as before to handle updating of Label for folders.) is this. QUESTION: 1. How can we find out the file path name against which an output file stream, e.g. |fileStream| above, has been created from the stream itself? (Is there a simple funcation call?) and [I will use this path name to open an input stream against the path.] 2. Assuming we succeed to create an input stream to the path found in step 1, what should the file seek position be before other actions are performed with this input stream? (Should it be at the beginning or at the end?, or at the same position pointed by the file seek position of output file stream?) BUFFERED output stream involves a complexity in that, before doing this generation of an input stream, we need to FLUSH the output stream so that every data that is written to the stream is written to the file for real so that reading it from the generated input stream (if necessary, and after proper |Seek()| should be possible. I think a few test failures I observed lately are related to the lack of |Flush| before such reading is attempted now that I think about them. Writing bugzilla increases one's understanding of the code. TIA
I'll look through the new comments on the weekend (or when I find the time).
Flags: needinfo?(jorgk)
(In reply to Jorg K (GMT+2) from comment #184) > I'll look through the new comments on the weekend (or when I find the time). Thank you. In the meantime, I may post the last comment/question to dev-platform-ML, or such. TIA
(In reply to Jorg K (GMT+2) from comment #184) > I'll look through the new comments on the weekend (or when I find the time). Jorg, A good news and a bad news. A good news is that I think I finally found the final bug [famous last words] that prevented the use of reusable stream (which unfortunately affects the file stream usage for user tags, folder labels, and X-Mozilla-Status flag inserted by compact code aside from the simple pop3 download and thus took a long time debugging.). The bad news is that the change is rather invasive (mostly in nsMsgBrkMBoxStore.cpp but a few others are affected, too.), and it would take me a week to clean up the files into tidied up patch. But more to the point, the change is more or less to cope with the fact that the conversion to an input stream from a buffered output stream is not possible. Thus, I changed the code to handle the creation of input stream at the same time when an output stream is created to a file (most likely to be a folder file in the case of MBox file format, but there are other cases, too.) This means passing an input file stream variable when an output file stream variable is assigned in a function, etc. Also, I had to make sure that the internal hashtable for input is maintained correctly as the output stream hash has been managed correctly before. So the change is relatively large although straightforward. There are also a few places where I needed to call Flush() explicitly before the input is performed to the data which has been written to a folder file. Compact doing a file rename ought to invalidates the input stream hash. The old input stream was still alive after the renaming and overwriting of a folder and I saw errors. This is what I found today and I fixed it. I think at least |make mozmill| is clean now. There are errors that I observer almost always on my PC if I run the whole test in tandem. They usually disappear if I run them one by one. So I need to check xpcshell-tests but I already fixed the major ones and so I think the goal is in sight now. When I remove the final bug if any and clean up the patch, we can compare the performance figures when reusable flag is used or not. I would like to measure the combination of following conditions and present them. a) POP3 mail server location: local vs. remote b) Mail store: local disk file system vs. remote file system. [Obviously the latter suffers from the lack of buffering more.] c) Message length: very short ones vs. long ones. The last factor requires explanation. About a few weeks ago, when I did a measurement of downloading of 1K messasges (without any body at all on a local pop3 server), using reusable certainly was very fast. Without the reusable usage, TB toolk like 80 seconds to handle the download while with reusable finished in 10 seconds or so. Yes, it was fast, and I am afraid that I may have included a wrong overhead in the non-reusable version. But in reality, you don't download 1K very short messages [and very short header lines for that matter] from a mail server on a local host usually. That is why I want to compare the real world performance figures with the combination of a) x b) x c). I have a problem, though. For a) mail server: It seems that I cannot create and send 1K messages all at once to my local ISP. This is because there is throttling mechanism. Is there a test pop3 server I can use for this kind of remote testing service? For c) the message size: anyone who has checked the headers from mail service like gmail or MS office365 mail host knows that these days the message headers are LONG. It is not easy to come up with a practical mix of typical messages with varied length. Does anyone have an idea of such typical workload (in terms of message length) of the average users of TB? To be honest, I can't care less about reusable flag usage because my original reason for trying to introduce buffering is to cope with large attachments that arrive my my mail account each day during the preparation of an exhibition and conference. In that case, I think the overhead of writing many octets to file system basically dilutes the merit of cached file streams enabled by using reusable flag. With the arrival of long headers and commercial messages (read HTML e-mails), I think the messages are long enough to make the advantage of stream caching moot especially when we are downloading from remote host (and more so when the mail is stored on a remote file system). However, obviously TB users experience would vary. So I wanted to use a "TYPICAL message length mixture" for a TYPICAL user to let people decide which way to go. I thought of using a post to one of the mozilla discussion news groups or something like that, but even then it won't reflect the very lengthy headers I see often these days at the office, etc. Due to the privacy reasons, I can't easily use the real office e-mails. Sorry about it. In this day and age of "Open Data", I hope there is some kind of data set I can use for this purpose. TIA
Flags: needinfo?(jorgk)
Flags: needinfo?(acelists)
> For a) mail server: It seems that I cannot create and send 1K messages all at once to my local ISP. This is because there is throttling mechanism. Is there a test pop3 server I can use for this kind of remote testing service? No, I should have stated, "If I choose gmail POP3 for this test, would people agree that is a good performance measurement target?". Or corporate users, if there are such users of TB (maybe on Windows), would they prefer testing against a mail server on LAN, not on WAN? My take is that any overhead on the side of the mail server, mail storage file system, and the larger message size would dilute and obscure the advantage of reusable-flag usage and people may not want the large patch for that. Anyway, my patch should be available the weekend of 29 and 30 July barring no major bugs uncovered by the last few failing tests from xpcshell-tests. If someone can suggest a good mixture of messages (with varies length) and a use of remote/LAN mail host for performance figures by then, I will be able to perform the test very quickly and show the performance figures then. TIA
A good news and a bad news. First the good news. I still have a few (actually many) cleanups to do for reusable patch set, but I am glad that linux and OS X test seems more or less good. OS X passes A-OK!. https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=ab119318130c500bd1bd03134beb4b8aee7e45a2 (The patches I used are not cleaned up version. They don't apply one by one I think.) The bad news is windows tests fail: Why? This is related to the fact that under Windows, renaming/deleting a file FAILS while an open file stream/handle to that file exists. Remember the |closedflag| kludge? That was introduced to handle this Windows semantics by special casing it more or less. The same issue is biting me again. (I am developing my patch under linux). Now, I have been able to reduce the failed test cases to the following. xpcshell test under Windows: (Actually, I could remove a few yesterday, and that is why I am writing this.) TEST-UNEXPECTED-FAIL | mailnews/base/test/unit/test_emptyTrash.js | xpcshell return code: 0 [log…] TEST-UNEXPECTED-FAIL | mailnews/imap/test/unit/test_compactOfflineStore.js | xpcshell return code: 0 [log…] TEST-UNEXPECTED-FAIL | mailnews/imap/test/unit/test_largeOfflineStore.js | xpcshell return code: 0 [log…] TEST-UNEXPECTED-FAIL | mailnews/local/test/unit/test_over4GBMailboxes.js | xpcshell return code: 0 [log…] TEST-UNEXPECTED-FAIL | mailnews/local/test/unit/test_nsIMsgLocalMailFolder.js | xpcshell return code: 0 [log…] TEST-UNEXPECTED-FAIL | mailnews/local/test/unit/test_nsIMsgLocalMailFolder.js | subtest_folder_operations - [subtest_folder_operations : 191] false == true [log…] TEST-UNEXPECTED-FAIL | mailnews/local/test/unit/test_over2GBMailboxes.js | xpcshell return code: 0 [log…] Mozmill under Windows: it has been a while that I could compile Windows build, and I was surprised that there are so many failures :-( TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-message-commands-on-msgstore.js | test-message-commands-on-msgstore.js::teardownModule [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-message-commands.js | test-message-commands.js::test_thread_delete_prompt [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-message-window.js | test-message-window.js::test_delete_single_message [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-message-window.js | test-message-window.js::test_del_collapsed_thread [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-message-window.js | test-message-window.js::test_next_unread [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-pane-focus.js | test-pane-focus.js::test_after_delete [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-recent-menu.js | test-recent-menu.js::test_delete_message [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-right-click-middle-click-messages.js | test-right-click-middle-click-messages.js::test_right_click_deletion_nothing_selected [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-right-click-middle-click-messages.js | test-right-click-middle-click-messages.js::test_right_click_deletion_one_other_thing_selected [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-right-click-middle-click-messages.js | test-right-click-middle-click-messages.js::test_right_click_deletion_many_other_things_selected [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-right-click-middle-click-messages.js | test-right-click-middle-click-messages.js::test_right_click_deletion_of_one_selected_thing [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-right-click-middle-click-messages.js | test-right-click-middle-click-messages.js::test_right_click_deletion_of_many_selected_things [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-savedsearch-reload-after-compact.js | test-savedsearch-reload-after-compact.js::test_setup_virtual_folder_and_compact [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-selection.js | test-selection.js::test_selection_extension [log…] TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\mozmill\folder-display\test-selection.js | test-selection.js::test_selection_last_message_deleted [log…] I am ignoring a few test cases (which seem to happen without my patch). Now given that it took me like two months in 2016 to figure out the special Windows issues back and fixed them by introducing "closedflag", I am afraid it would be another two months or so before I can possibly fix the Windows-specific issues that I face now. This time I am not so sure if I can complete the fix. This is because there is a fundamental flaw/feature that the code in TB was written for unbuffered file streams which allow the creation of an INPUT stream from an OUTPUT stream on the fly. This cannot be done with the current File stream implementation. Thus I needed to create a kludge to cope with this in a pecular manner using a stream cache. When an output stream is created (and if an input stream is expected to be used), I created an input stream at the SAME TIME in anticipating the later use of the INPUT stream. It is stored in a persistent storage. (This is only for Berkeley mailbox format only, BTW) Unfortunately, the original code is written in such a manner (it seems to me) that the open input stream is released later when the variable that holds the stream is destroyed. There is NO CLEAR explicit closing of that input stream. Unfortunately, the global persistent cache now exists that holds the input stream, so the closing of the stream is no longer automatic. I have to figure out where the input stream needs to be closed. OK, that is the principle. But finding the right place takes time, and it may not be clear at times. A similar difficulty took me two months to figure out the issues, and then to introduce |closedflag| to take care of the windows issues. I had to close files early to avoid the failure of delete/rename under Window, but that needed special handling elsewhere. Thus |closedflag|. Anyway, given that the long development time is expected to solve the Window-specif issues now, I would like to ask the parties concerned whether (Option 2) - the current approach of using the reusable stream, i.i. honoring the reusability, should be pursued, or (Option 3) - abandon the approach and forget about the reusability and simplify the code. I discussed this in comment 151. Option 1 and option 3 are from the comment 151. (Option 1 that uses closedflag is ugly although it works, and so was abandoned.) I could hack the code by going the way of Option 2, but given the Windows-specific issues, I am afraid I might end up with another ugly hack like option 1 :-( (In reply to Jorg K (GMT+2) from comment #175) > Re. Option 3: Yes, there are many "optimisations" that don't have any > benefit, yet cause a lot of headache. Keeping track of a bunch of open > streams/files is complicating program logic as we know. It would of course > be much simpler do do this: > Maildir: Create message file, write file, close. The end. > Mbox: Open or create mailbox file, seek to the end, write file, close. The > end. Repeat for next message. > > If we could do any measurements that prove that saving an open/close has > little effect, then we really might do option 3. Now that a patch set for reusable stream works for linux (more or less) and OSX, I think I can do a meaningful comparison measurement. That is by comparing the real world performance of the reusable version (option 2) and a version that ignores this reusability optimization completely (option 3). Sorry we cannot do the comparison under Windows... What I need an agreeable test conditions. (In reply to ISHIKAWA, Chiaki from comment #186) > (In reply to Jorg K (GMT+2) from comment #184) > > I'll look through the new comments on the weekend (or when I find the time). > > Jorg, > > A good news and a bad news. > > A good news is that I think I finally found the final bug [famous last > words] that prevented the use of reusable stream (which unfortunately > affects the file stream usage for user tags, folder labels, and > X-Mozilla-Status flag inserted by compact code aside from the simple pop3 > download and thus took a long time debugging.). > ... [omission] .... > > When I remove the final bug if any and clean up the patch, I still need to do more cleanup, but the basic functionality is there (except for Windows) > we can compare the performance figures when reusable flag is used or not. This is what I would like to do now. > I would like to measure the combination of following conditions and present > them. > > a) POP3 mail server location: local vs. remote > b) Mail store: local disk file system vs. remote file system. [Obviously the > latter suffers from the lack of buffering more.] > c) Message length: very short ones vs. long ones. > > The last factor requires explanation. About a few weeks ago, when I did a > measurement of downloading of 1K messages (without any body at all on a > local pop3 server), using reusable certainly was very fast. Without the > reusable usage, TB took like 80 seconds to handle the download while with > reusable finished in 10 seconds or so. > Yes, it was fast, and I am afraid that I may have included a wrong overhead > in the non-reusable version. > > But in reality, you don't download 1K very short messages [and very short > header lines for that matter] > from a mail server on a local host usually. > > That is why I want to compare the real world performance figures with the > combination of a) x b) x c). > > I have a problem, though. > > For a) mail server: It seems that I cannot create and send 1K messages all > at once to my local ISP. This is because > there is throttling mechanism. Is there a test pop3 server I can use for > this kind of remote testing service? > > For c) the message size: anyone who has checked the headers from mail > service like gmail or MS office365 mail host knows that these days the > message headers are LONG. It is not easy to come up with a practical mix of > typical messages with varied length. > Does anyone have an idea of such typical workload (in terms of message > length) of the average users of TB? > [omission] > > So I wanted to use a "TYPICAL message length mixture" for a TYPICAL user to > let people decide which way to go. ... [omission]... What is the typical user expectation regarding a mail download? At the office, I often have to deal with many PDF attachments, and downloading messages and saving the attachments later certainly will be faster by buffered output, and because of the sheer size of the message a few open/close optimization doesn't matter [is my opinion, but needs measurement to back it up.] At the same time, there will be users who only get short messages now and then. Oh, come to think of it, it may be that home users also benefit from buffered output because many photos get exchanged via e-mails (it used to be). Any ideas? I usually have 100-200 messages at the office, and 200-300 at home, many of which are newsletter messages. But I have also a dozen or so PDF attachments, etc., and during a few months of conference/exhibition preparation such big e-mails come in like 50 eadch day :-( People's workload varies. TIA
I posted this to bug 1307085, but I should have posted the bulk of content to this bugzilla. --- begin quote I will polish the patch as you suggested in comment #11. (<= bug 1307085 comment 11) Right now, I am in the process of obtaining a statistics of how the patch(es) in Bug 1242030 - Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500 affect the performance (elapsed time, # of invoked system calls) for downloading "typical mix" of e-mails under various settings. (local mail host vs remote mailhost, mail storage in local disk vs remote file system.) This will guide us to decide whether we use caches of file streams for saving a few pair of open/close per download of one message, or forget about the cache and simplify the code for easier maintenance. [I would think to save the precious man-hour of small developer community, the latter is the better, but let's wait for the statistics.] Well gathering the meaningful data is slow going :-( Finally I picked up about 200 e-mails from real-world usage of my own [from a day last winter], and figured out how to anomize the data so that I can share the messages if need be. [I can't really simulate the effect of long header lines from remote host, and so simply decided to INCLUDE the original header lines of my messages as part the main message body when I send and receive these sample messages during data gathering.] I need a few more steps to create scripts to collect and summarize interesting data. (I will repeat this step a few times until I get a satisfactory data set. If someone needs MORE data, I will let that someone do the job by releasing the script files (used under linux) and the sample messages (characters are transliterated so that the original text cannot be guessed. So MIME attachments are now not an attachment, but rather a part of plain text message.) I am cleaning up the patches including the patch in this bugzilla as part of the process, and so hopefully once the holiday(s) are over (circa Aug 22) and I will have have the data, I can upload the cleaned patch here and in Bug 1242030 with some statistics. TIA --- end quote
(This is a copy of a message I am going to post to dev-apps-thunderbird mailing list). Does anyone has a mail host that can be used for remote download performance? I am looking for a mail host that can be used for pop3 download benchmarking, and which won't block my IP address as being a spam source even if the e-mail messages contain illegible text such as the following: Az Azz, Azz 1, 1111 zz 11:11 AA, Azzzzzz Azzzzz <zzzzzzz@zzzzzz.zzz> zzzzz: > Az 11/11/1111 11:11 AA, A.A. Az zzzzz: >> Az zzzzz zz zz zzzzzz/zzzzzzz zzzz zzzzzz AAAA_AAAAA_AAAA (), >> zz zzz zzz AAAAA zzzzzz zzz zzzzzzz zz zzzzzz zzzzzzzzzz. >> Azzzz zzz zzzz zzzz zz zz AAAA_AAAAA_AAAA () zz zzz zzzz zzz >> zzz zzzzzz zzzzzzzzzzz, zzz zzzzzz zzzzzzzzzz zzzzz zzzzzz zzzzz zzz >> zzzzzzzzzz zz zzzzzz zzzz zz zz zzzz zz zz zz. > > Az, zzzz zz z zzzz zzzzz. Azzzzz. > A zz zzzzzzzz zz zz zzzzz. -- A.A. Background: I am trying to investigate the performance issues of using a cache of open file streams for pop3 download (as is the case today), and NOT USING it. My guess is that for remote pop3 download, the network overhead is large enough to mask the saving of open/close per message. Details: I am creating a patch to use buffered output for pop3 download : bug 1242030 TB does not use buffered output currently. This means we have causes excessive # of write system calls when TB downloads, say, a MIME attachment that contains jpeg file. Typically TB writes 75 octet at a time using write system call and this is performance killer from system perspective. So I am creating a patch set to reduce the # of write system calls by using buffered output. In creating this patch for 1242030, I found the use of cached open stream is a source code maintenance burden. I can make the tentative patch to work for linux and OSX, but not for windows as of now. (There is an issue of stream not closed early enough so that moving/removing a file succeeds: under linux and OSX, it is OK to move or delete a file which has an open file stream to ti. Under Windows, it is NG.) I solved this earlier with a kludge, which was ugly. It was so ugly that jorgk wanted me to re-create the patch without using it, and at the same time he wanted to me to honor the reusable cached open stream. I agreed since the original klduge was certainly ugly, Unfortunately reusable cached open stream bit caused the problem under Windows, which has not been solved in the last month or two. So I create an alternate patch that does away with the cache of open stream. This alternate patch works under Windows, linux, OSX as far as I could check using tryserver. (I only develop under linux locally.) I wonder if I should pursue the patch that uses the cache of open streams a bit further to see if I can make it work under Windows. But this is going to be a few months project. So in order to see if further development would be worth while in terms of performance, I wanted to compare the performance of two patch sets. At least, I can compare the performance of the two different patches under linux NOW. I found that my patch set using cached open stream for pop3 download (under linux only for testing purposes) is certainly faster than not using it when I download messages from the mail server on the SAME linux PC where I run TB. This is understandable since there is not much network overhead in this setup. But in real world, we download e-mail messages from REMOTE mail host, and my guess is that network overhead is large enough to mask any saving of using a cache of open streams during pop3 download. (The cache saves a pair of open and close per message.) So I want to see if the speed difference between the case of using cached open stream and not using is large when REMOTE mail host is used. Thus, I am in search of a free mail host that allows me to create an account and test the performance as noted above. Readers may wonder why I am not using gmail and other popular free mail services. A good question. CAVEAT: In order to make the benchmarking based on a predefined set of messages reproducible by others, I have created a set of e-mail messages based on real e-mail messages I received on a day last year, but ANOMYZED them by replacing the characters in the mail text and mail headers: [A-Z] -> A [a-z] -> z [0-9] -> 0 This is the strange text you see near the beginning of this message. I am benchmarking the performance by sending and receiving these messages. I can make the messages available once I can publish the benchmarking result. I am using the real world messages to reflect the very lengthy header lines attached by services like office365 or gmail. The reason I anomized them is to let others check my benchmark if they want to by using the same set of e-mail messages. But you can see that the e-mail messages contain illegible text and thus I am afraid commercial e-mail hosts would think them as strange spams and may block my PC (or rather IP address) as spam source. This is something I want to avoid. That is why I am looking for a mail host other than gmail and other well known e-mail services and a host that would test the performance by sending 200+ messages at a time and download it. I want a very flexible mail host that won't throttle my mail injection of 200+ messages in succession. This is vital. Just trying to obtain a meaningful benchmark for a C-C binary by repeating the test three times takes time. If injecting messages is throttled and can't be done very quickly, it will take a LONG TIME to obtain a benmchmark result. TIA
Thunderbird now maintains email accounts on thunderbird.net (with Fastspring as the provider). We could probably get a test account setup for you there. Not sure if that meets your needs or not.
(In reply to Kent James (:rkent) from comment #191) > Thunderbird now maintains email accounts on thunderbird.net (with Fastspring > as the provider). We could probably get a test account setup for you there. > Not sure if that meets your needs or not. Thank you for the suggestion, Kent. That sounds nice. I wonder about one thing. Does the mail host throttle the injection of messages in quick succession? For benchmarking purposes, I would like to send about 200 messages in one session, and then retrieve them using C-C TB. If the injection of messages is throttled to discourage spamming-like behavior that may indeed hamper the benchmarking effort. I have to wait for a little longer. Oh well. I can live with that. Anyway, I would like to sign up for this service. So how do I go about it? PS: I have found out that trying to simulate real world remote host using toxiproxy on a local PC using the mail daemon on the same host is a little tricky. To simulate real world situation, I have to simulate the round trip time of packets, and the effective bandwidth for example. I have to measure the ping return trip time (about 5 milliseconds), and the bandwidth to an ISP mail host I am using as simulated host. The bandwidth seems to be about 720KB/sec max. (but usually a lot lower). I measure the return trip time against pop.gmail.com is about 42 millisecond, but not sure what the max bandwidth is for pop.gmail.com. (But I suspect something similar to the local ISP server, maybe up to 1MB / sec. But such guesstimate may never reflects the reality very much unless we have many measured examples. So if I can use a real server (to which others can subscribe to verify my benchmarking if they want to ), that would be much better than a simulated network condition of a simulated remote server using a real local server and network tool that injects delays and limit bandwidth artificially. TIA
I have decided to use toxiproxy under linux to simulate a remote POP3 server. The tool, toxiproxy, allows me to simulate the RTT and bandwidth limit of a real WAN. I used RTT of 5 milliseconds and bandwidth of 720KB/sec, the values I observed for my local ISP's pop3 server, and found the simulation very real. I feel that the download time is about right, i.e., comparable to the real download. So using simulated WAN through which I retrieved messages using POP3, I obtained some measurement data. I have uploaded a few statistical summary of the # of open system calls # of write system calls and # of read system calls. They were measured under linux using strace command. The data are measured for two intervals of operation of TB. One interval is for staring TB and quitting it as soon as the main window is shown in the standard 3pane layout. The quitting is done manually. The other interval is for the duration of TB to download messages from pop3 server. I hit "Get Message" button and then see the ephemeral windows popping near the lower-end corner that shows that the message download is completed. I download 210 messages (prepared messages) for each test. The set of messages has a few JPEG photos and document attachments. The total size of he 210 messages is about 12MB as text. The following set of system call (open, write, and read) summaries is for the original 52.x released thunderbird version. (32-bit) under linux. https://zephyrus00jp.pastebin.mozilla.org/9030976 : to-start3.summary The above is the system call summary for starting and quitting only. https://zephyrus00jp.pastebin.mozilla.org/9030975 : to-download3.summary The above is the system call summary for downloading 210 messages. The following set of summaries is for an incomplete patch to introduce buffered output for downloading. This patch uses reusable flag as suggested by jorg, but currently does not work for windows due to the semantics of windows that fails the renaming and deletion of a file while an open stream exists to the file. (So we can't test the patch under windows). I have proposed to decide whether we should honor the resuable flag (i.e., use the cache of opened stream) or do away with it and simplify the code. This decision requires the performance comparison, and I am reporting the finding. - tr-startup.summary (preliminary) for starting and quitting only could not be uploaded to pastebin (strange, but pastebin tries to parse the text and seems to fail thinking a line is too long or something like that.) and so I will upload it to bugzilla here in the next upload. BTW, I say "preliminary" since the data was collected manually in an ad-hoc manner before a systematic script approach was adopted. https://zephyrus00jp.pastebin.mozilla.org/9030977 : tr-download.summary (preliminary) The above is the system call summary for downloading 210 messages. Finally the following set of summaries is for a patch to introduce buffered writing for downloading. This patch does not use reusable flag and so the code is simpler. https://zephyrus00jp.pastebin.mozilla.org/9030979 : tn-start.summary (preliminary) The above is the system call summary for starting and quitting only. https://zephyrus00jp.pastebin.mozilla.org/9030981 : tn-download.summary The above is the system call summary for downloading 210 messages. In the subsequent comments, I will upload - tr-startup.summary (preliminary) that could not be posted to pastebin service. - the rough framework for obtaining the statistics for deciding whether reusable flag can be ignored and code can be simplified while maintaining reasonable performance. - Graphs showing that the patches that introduces buffered writing but does not use reusable flag is performing rather well in contrast to the original 52.x (32-bit). Graphs are for - simulated network of RTT 5 milliseconds, 720KB/sec bandwidth, and local disk profile (Inbox is located there). - simulated network of RTT 35 milliseconds, 720KB/sec bandwidth, and local disk profile (Inbox is located there). - simulated network of RTT 35 milliseconds, 720KB/sec bandwidth and the mail storage is on a LAN router that supports CIFS mount of USB memory. (Inbox is there). - simulated network of RTT 35 milliseconds, 720KB/sec bandwidth and the mail storage is on a Win10 pro host on the LAN, (Inbox is there). Actually the Windows 10 pro is the host of VirtualBox. There is about 30-35 milliseconds overhead for each message download on my PC setting (explained in the rough framework above) if Inbox is opened and closed for each download. Downloading 210 messages thus introduces 6-7 sec overhead for the patch that does not use the cache of open streams. In order to use buffered writing, however, I had to cache the opened INPUT streams, too, when an stream is opened for writing. Thus there is an overhead of extra open(s) (I am not entirely sure how often it is done) to honor the reusability flag. Please keep this 6-7 sec. overhead in mind when you see the graphs for local disk mail store. It is clearly there. However, during download from a simulated remote server in a simulated WAN of RTT (round trip time) of 5 milliseconds and 720KB/sec bandwidth [this is an observed value for my local ISP's mail host], the download elapsed time is about 60+ seconds or so when my mail store is on local disk, and so the saving of 6 seconds using the cache of opened stream (assuming the patch can be corrected for Windows in a clean manner at all) is moot considering the added complexity of source code. I personally think the added complexity is not worthwhile especially when I consider that the download time is longer when simulated WAN's RTT is 35 milliseconds (this is the RTT to pop3 server of gmail from my office which has a very fast connection. From my home it is 42 msec, and download would be still slower. I also read that RTT of 50 millisecond is often experienced using LTE connection and still larger, such as 125 milliseconds using 3G connection. Obviously such phone connection would be slow for TB to download the messages.) Furthermore, if the local mail store is on a remote file server on the LAN, the download time is significantly larger for the original TB that does NOT USE buffered write and so issues MANY write calls. This issuing of so many write system calls alone makes unmodified TB unacceptable to many enterprises that try to use centralized server to store user's profile, where the Mail Inbox, etc. are located. (I am trying to fix this issue of TB since my office is moving in the direction of storing one's profile that includes mail store to a centralized server in the long run.) For the LAN remote file system test, I used two examples: CIFS service of Windows 10 pro on the same PC (I run linux inside VirtualBox on the said PC), and a support of USB disk file system on a Wi-Fi router (IO DATA WN-AC1167DGR) on my home LAN. The latter is a typical router that uses linux on a not-so-powerful CPU. The lesser CPU power of the CIFS server, the more significant the overhead of handling MANY write calls from client(s) becomes. This is clearly visible in the graph. I would say the general trend is clearly visible, and the writing is on the wall IMHO. We should do away with the reusable flag to simplify the code. The real world workload does not seem to favor the caching of opened streams very much. That is my tentative conclusion. YMMV. I will answer questions regarding the summary in my spare time. Sorry, collecting and summarizing statistics is a very time consuming work and I am not sure if and when I can fix some measurement errors. (However, since I repeated the measurement four times, at least two valid data was obtained even when a mistake was made, and thus the general trend is clearly visible despite a few human errors and one mysterious out of ordinary value. I will upload the test messages used for downloading test, and scripts that I used for measuring and summarizing data. TIA
tr-startup.summary (preliminary) for starting and quitting only for patched C-C TB that uses buffered write, and tries to honor reusable flag (but does not work for windows currently).
This memo explains the intention of performance study, and used methodology. It is not finished especially about the uploaded summary part, but it gives the reader clear idea how to interpret the data I am uploading.
A graph showing the elapsed time for doing nothing and quitting TB, and for the download time of 210 messages. This uses simulated network of RTT 5 milliseconds, 720KB/sec bandwidth, and local disk profile (Inbox is located there).
A graph showing the elapsed time for doing nothing and quitting TB, and for the download time of 210 messages. This uses simulated network of RTT 35 milliseconds, 720KB/sec bandwidth, and local disk profile (Inbox is located there). In the middle measurement for download, I failed to download the sent messages in the first run, and somehow ended up downloading the twice the intended messages for one run (210 x 2) in the second run. Generally speaking, the effect of the difference of 6-7 seconds is diluted due to the prolonged download time due to larger RTT (from 5 millisec. to 35 millisec.).
A graph showing the elapsed time for doing nothing and quitting TB, and for the download time of 210 messages. This uses simulated network of RTT 35 milliseconds, 720KB/sec bandwidth, and mail files are located on a remote router's USB memory that is accessible using CIFS. Generally speaking, the effect of the difference of 6-7 seconds is no longer noticable because the current unmodified version of TB calls too many write system calls against the remote file system and takes a very long time to download the messages. Thus the 6-7 difference due to saved open/close is no longer an important factor here.
A graph showing the elapsed time for doing nothing and quitting TB, and for the download time of 210 messages. This uses simulated network of RTT 35 milliseconds, 720KB/sec bandwidth, and mail files are located on a Windows 10 Pro CIFS-mount in the LAN. Generally speaking, the effect of the difference of 6-7 seconds is no longer noticeable because the current unmodified version of TB calls too many write system calls against the remote file system and takes a very long time to download the messages. Thus the 6-7 difference due to saved open/close is no longer an important factor here.
Interesting findings about the cache, and the results all around. Thanks for doing all this work. Do you want anyone's response on these findings before proceeding?
Flags: needinfo?(ishikawa)
Yes, I forgot to set the requests to jorgk, and aceman, for example. I am setting the request flags now. I wonder if there could be a consensus to forget about reusing opened streams. I think on my setup, that saves 25-30 milliseconds per downloaded message, and thus for 210 messages it would save 5-6 seconds in total, however, depending on the network condition, the whole download time is more than 1 minutes and if the local Mail store is on a remote file system on a LAN, it could be slower, too. So the effect of saving is diluted by the WAN slowness and all that. Any comment will be appreciated. TIA
Flags: needinfo?(jorgk)
Flags: needinfo?(ishikawa)
Flags: needinfo?(acelists)
BTW, I am trying to create a reasonably stable binary (linux64) so that people can retry my findings on their computers (I want binaries to be made available on tryserver), but I am afraid something is wrong either with tryserver itself and the latest M-C source, etc. :-(
I would be great if we can include the patch (corrected and cleaned up): I need to know which way to go, and I am afraid that September is the last month I could do meaningful programming before a local annual tech. conference gets in the way. I can cleanup the patch if reusable flag can be ignored (fixing windows version with reusable flag is very hard in the sense that it will require somewhat ad-hoc kludge a la |closedflag|, and I won't feel comfortable because the current |make mozmill| and xpcshell-tests do not seem good enough for regression testing.) If reusable flag can be ignored I can post a cleaned up patch. Hopefully, that will be followed by Bug 1170606 - C-C T-B fixes to take care of short read (part of [META] Failure to deal with short read). The patch here will help linux, OSX and Windows versions all alike. The patch in Bug 1170606 will be essential for linux/OSX client to use a mail store in remote file system. Without it, there will be many errors when the server returns short read result, to which TB needs to respond by sending more READ requests, which it does not currently. (At least, for Windows CIFS/SAMBA mount, Windows read command seems to be retried automagically by the Windows OS itself: for example, a system call for file copy/move seems to take care of the retry [and even error retry such as timeout] automatically. The patch here will help us to keep TB from pounding the remote file system (when the mail store is on it) by reducing the WRITE system call significantly. I feel the large # of WRITE system calls may be the cause of short-READ issues when the remote file system host is overwhelmed by so many small WRITE requests. TIA
(In reply to ISHIKAWA, Chiaki from comment #202) > BTW, I am trying to create a reasonably stable binary (linux64) so that > people can retry my findings on their computers (I want binaries to be made > available on tryserver), but I am afraid something is wrong either with > tryserver itself and the latest M-C source, etc. :-( Hi, thanks for your efforts. There were some problems with c-c needed to be adapted to changes in m-c yesterday, but right now all should be fine. Please update your tree and push to try again (we see you often push to try with outdated tree). Also tests, are not working on try for linux, so try specifying only win32 platform when pushing. That should get you a green run.
(In reply to :aceman from comment #204) > > Hi, thanks for your efforts. There were some problems with c-c needed to be > adapted to changes in m-c yesterday, but right now all should be fine. > Please update your tree and push to try again (we see you often push to try > with outdated tree). Also tests, are not working on try for linux, so try > specifying only win32 platform when pushing. That should get you a green run. Thank you, aceman, for your guidance. It is strange, though, that sometimes when I am advised that the C-C/M-C tree have been updated to fix problems, but when I push to TRY-comm-central (NOT comm-central), the build fails, and I look at my local C-C tree, which has been updated (I think) and wonder why the build fails. It could be that TRY-comm-central may have a few hours delay (or several hours, my guess is 5-6 hours lag) to pick up M-C changes (or something, and it could be transient even.). Anyway, it seems that I don't update the trees often enough, but at the same time, it is a little difficult to follow all the changes necessitated to accommodate M-C changes. (If we update the local tree prematurely, local build now may fail. Oh well. Tough luck.) Anyway, I will try building the win32 version of the binary that does not use cache of REUSABLE open stream. (The version that uses cache of reusable open stream fails under windows. Linux/OSX should work, and linux version is what I tested locally.) TIA
Part-1 of test messages. (Part-2 follows.) Once these anomized messages are placed in /tmp/ directories, I used the following scripts to send the messages to mail test account on my PC, |mtest2|. --- send-to-mtest2-2017-08-19.sh : # ADDR=mtest2 for i in $( seq 0 250) do anonfile=$( printf "/tmp/t-output.%03d" $i) if [ -s $anonfile ] then FROM=$( grep ^From: $anonfile | head -1 | gawk -f pick-2nd-field-and-up.awk ) SUBJECT=$( grep ^Subject: $anonfile | head -1 | gawk -f pick-2nd-field-and-up.awk) echo "anonfile=" $anonfile echo From: $FROM echo Subject: $SUBJECT mail -s "$SUBJECT" -r "$FROM" $ADDR < $anonfile else break; fi done echo "Done" ---- pick-2nd-field-and-up.awk # # # { firstField=$1; len=length($1); print substr($0, len + 1); } Yes, these use awk, bash, etc. These need to be tweaked under Windows. Maybe the latest linux under Windows initiative by MS may help.
This is the part-2 of the messages used for benchmarking. (If I bundle the whole messages, the resulting archive may be too large for bugzilla.)
It's been a while since I looked at this so there's a lot of catching up, Sadly there is no "executive summary" ;-( Looking at the charts (attachment 8902467 [details], attachment 8902468 [details], attachment 8902472 [details] and attachment 8902473 [details]) we can see from the latter two that buffered write is much faster for writing to "remote" file systems somewhere connected in the network. So no doubt we want to move to buffered write. The task at hand is to decide whether to keep the twisted "reusable" logic or to remove it. I assume all measurements were done on mbox storage, since that's the one reusing the open streams. Attachment 8902467 [details] shows that on a faster network for download (RTT 5ms, attachment 8902467 [details]), the "new" way might be a little slower. I don't understand though, that the experiment with reusing and buffered write is slower. Why? On a slow network (RTT 35ms, attachment 8902468 [details]) there is no difference. I think the results will also depend on the mix of the test data. If you download many small messages, then doing more open/close will make things worse. If you download a few big messages, the a few additional open/close calls won't make a difference. You don't tell us how your 210 messages of "real life data" are distributed. Maybe you could rerun at least the "local disk" case with 2000 short text messages. Can you also give us a *summary* of the technical issues. I'm a bit confused by comment #193: > ... does not work for windows due to the semantics of windows that > fails the renaming and deletion of a file while an open stream exists > to the file. Deleting and renaming is mainly for maildir storage and the reuse applies to mbox. So why is there an issue? And why would you rename or delete a file which is still "open"? In comment #183 you mentioned that we're QI'ing an output stream to an input stream in nsMsgLocalStoreUtils::UpdateFolderFlag() and that, no surprise, doesn't work for buffered output streams. But why would be use a buffered stream for writing a few bytes to a fixed location in a file? The buffered stream should be reserved to writing messages to the mbox and maildir single files. Are you saying that we only have one function GetNewMsgOutputStream() that gets the stream for both cases? Maybe it needs another parameter to requesting a buffered stream for some but not all occasions. So my summary is: - Using buffered write is desirable - I'm not against removing the twisted reusable logic, yet, I haven't seen any data that confirms that removing it won't degrade performance in some cases where many open/close calls would be introduced. On the other hand, downloading 2000 small messages is not a read life case. When I start TB in the morning I get 50 new messages, so I don't think I will notice the difference. That said, does copying message from one local mbox folder to another reuse the streams? That's an operation that user would do and we don't want to degrade performance on that. Can you please look and test. Move 1000 bugmails, they are small, to another folder. I'd also need a summary of technical issues. We already found the QI issue in nsMsgLocalStoreUtils::UpdateFolderFlag().
Flags: needinfo?(jorgk)
I've done one experiment myself: I've moved 150 message between local mbox folders. nsMsgBrkMBoxStore::GetNewMsgOutputStream() gets called 150 times, but it only allocates one stream. If we stop reusing the stream, we would do it like in nsMsgMaildirStore::GetNewMsgOutputStream() and create a new stream every time. I have the feeling that not reusing streams will cause performance degradation for moving messages.
Dear Jorg, Thank you for your comment and one experiment concerning GetNewMsgOutputStream(). In reply to Jorg K (GMT+2) from comment #208) > It's been a while since I looked at this so there's a lot of catching up, > Sadly there is no "executive summary" ;-( > > Looking at the charts (attachment 8902467 [details], attachment 8902468 [details] > [details], attachment 8902472 [details] and attachment 8902473 [details]) we > can see from the latter two that buffered write is much faster for writing > to "remote" file systems somewhere connected in the network. > > So no doubt we want to move to buffered write. The task at hand is to decide > whether to keep the twisted "reusable" logic or to remove it. I assume all > measurements were done on mbox storage, since that's the one reusing the > open streams. Yes. mbox is the storage format for testing. > > Attachment 8902467 [details] shows that on a faster network for download > (RTT 5ms, attachment 8902467 [details]), the "new" way might be a little > slower. I don't understand though, that the experiment with reusing and > buffered write is slower. Why? Yes, I noticed that it is rather strange. I have not investigated this part much. Lack of time. > On a slow network (RTT 35ms, attachment 8902468 [details]) there is no > difference. > > I think the results will also depend on the mix of the test data. If you > download many small messages, then doing more open/close will make things > worse. If you download a few big messages, the a few additional open/close > calls won't make a difference. You don't tell us how your 210 messages of > "real life data" are distributed. Maybe you could rerun at least the "local > disk" case with 2000 short text messages. The last two attachments are the messages used for the tests. https://bugzilla.mozilla.org/attachment.cgi?id=8903870 https://bugzilla.mozilla.org/attachment.cgi?id=8903871 I found that definitely downloading 2000 short text message (without any main text body at all) is definitely faster with the buffered version. However, as I explained (well tried to explain) this is hardly a real world example. These days, mail servers add all sorts of additional non-standard headers for spam handling, some type of internal priority handling, etc. Actually, most of the headers of these nature. They also amount almost a few KB at least. The messages I used for downloading against local mail host (which goes through simulated WAN with simulated RTT and bandwidth limit) reflect these additional header lines (they are included in the message text so that even if the local mail host under Debian GNU/Linux does not add strange headers for spam handling, etc., the message as a whole mimic the size of messages that goes through real world mail hosts such as gmail/office365, etc. They are real messages from last year's November, and that is why I needed to make the content illegible by substitution. > > Can you also give us a *summary* of the technical issues. I'm a bit confused > by comment #193: > > ... does not work for windows due to the semantics of windows that > > fails the renaming and deletion of a file while an open stream exists > > to the file. > Deleting and renaming is mainly for maildir storage and the reuse applies to > mbox. So why is there an issue? And why would you rename or delete a file > which is still "open"? > In comment #183 you mentioned that we're QI'ing an output stream to an input > stream in nsMsgLocalStoreUtils::UpdateFolderFlag() and that, no surprise, > doesn't work for buffered output streams. But why would be use a buffered > stream for writing a few bytes to a fixed location in a file? The buffered > stream should be reserved to writing messages to the mbox and maildir single > files. Are you saying that we only have one function GetNewMsgOutputStream() > that gets the stream for both cases? Maybe it needs another parameter to > requesting a buffered stream for some but not all occasions. Hmm. That last sentence makes a sense. I may try that. But please read the comment about "closedflag" below. > > So my summary is: > - Using buffered write is desirable > - I'm not against removing the twisted reusable logic, yet, I haven't seen > any data > that confirms that removing it won't degrade performance in some cases > where > many open/close calls would be introduced. On the other hand, downloading > 2000 small messages > is not a read life case. When I start TB in the morning I get 50 new > messages, so I don't > think I will notice the difference. That said, does copying message from > one local mbox folder > to another reuse the streams? That's an operation that user would do and > we don't want > to degrade performance on that. Can you please look and test. Move 1000 > bugmails, they are small, > to another folder. > > I'd also need a summary of technical issues. We already found the QI issue > in nsMsgLocalStoreUtils::UpdateFolderFlag(). I will do an executive summary and a few more tests. Regarding the use of buffered stream only for message write and NOT for UpdateFolderFlag, remember the ugly hack called "closed" flag? That version does exactly that. I was very comfortable with it as far as the test goes. The only reason I had to use "closedflag" was to accommodate the Windows version that somehow needed it when we used Maildir. The patch, especially to make the crude error handling work [it did not before my patch], changed the timing of |Close| as later as possible, and the framework was shifted so that low-level Maildir implementation tried to call rename/delete when the stream was still open to that file. It was not entirely clear how to fix this, and I resorted to the use of "closedflag". You may wonder why, and I also wonder why now, but that was the case, and the last sentence is best I can come up with right now. (If it had not bee for Windows issue, my patch before that did not have "closedflag" and was rather simple.) Anyway, I will do a few more tests and write an executive summary. TIA PS: Thank you for fixing |make mozmill| bustage. > Fixed in bug 1395481. Mozmill was completely broken due to M-C changes. You need to rebase, as always I will do so and try.
Sadly there are too many issues mixed in this bug. The ultimate aim is to switch to buffered write, this was attempted already here https://hg.mozilla.org/comm-central/rev/a9f658106adb#l1.28 in bug 769346 and backed out later here https://hg.mozilla.org/comm-central/rev/1a577ed6eb9a#l1.28. To reach this aim, we need to solve these issues first: 1) Clean-up bad error handling and bad Close() handling on output streams. 2) Decide whether reusing output stream gives a performance advantage at the price of complicating the code. This bug set out to do point 1) but drifted a little towards point 2), so we need to recap. The first attempt to clean up the Close() business was to pass out a "closed" flag from discardNewMessage() and finishNewMessage(). I observed that this flag is the opposite of the "reusable" flag, since if the stream is reusable (mbox) the finishNewMessage() doesn't close it and vice versa. The only problem is that nsMsgDBFolder::GetOfflineStoreOutputStream() eats the "reusable" flag and doesn't pass it out to its callers, hence they don't know with which type of stream they are dealing. Attachment 8847340 [details] [diff] fixes that and makes the "reusable" flag known to all callers who need to know since they need to do closing or not. Sadly the patch set introducing the "closed" flag also changed the behaviour of discardNewMessage() for mbox, since for some reason you stopped closing the stream there. Now coming to the second issue, the reuse. As I mentioned earlier, performance degradation of additional open/close calls won't be noticeable when receiving messages, but it will be noticeable when copying messages in bulk between local folders. So unless proven beyond doubt, I'm not in favour of the removal of the reusage. Let's come to the problems you mentioned: One problem was that a buffered stream doesn't QI to an input stream in nsMsgLocalStoreUtils::UpdateFolderFlag(). Looking at https://hg.mozilla.org/comm-central/rev/a9f658106adb#l1.28 the plan there was to "promote" the output stream to buffered only in nsMsgLocalMailFolder::InitCopyMsgHdrAndFileStream() which is only called twice in nsLocalMailFolder.cpp in nsMsgLocalMailFolder::BeginCopy() and nsMsgLocalMailFolder::StartMessage(). It was never the idea for nsMsgBrkMBoxStore::GetNewMsgOutputStream() to always pass back a buffered stream: https://hg.mozilla.org/try-comm-central/rev/2f798c3fef0ce6ef41e07eec94f7c9b663554ef9#l4.109 Of course you'll run into problems when you need such a stream for unbuffered write or read. I've just checked: StartMessage() is called when copying, it's not called when incorporating received mail. I'm really confused by the code here: https://hg.mozilla.org/try-comm-central/rev/2f798c3fef0ce6ef41e07eec94f7c9b663554ef9 I can see that nsMsgBrkMBoxStore::DiscardNewMessage() and nsMsgBrkMBoxStore::FinishNewMessage() close the stream now. However, you left the hashtable in m_outputStreams in place, now handing out potentially closed streams since the callers might have closed the streams are they are no longer reusable. Luckily |rv = seekable->Seek(nsISeekableStream::NS_SEEK_END, 0);| saves us since you can't seek on a closed stream. I tested it. I'll attach a patch with which I tested copying 150 messages (about 500KB) between local folders. With reuse: 4 sec, without 6 sec. That was on a local debug build. Please repeat the test as you deem fit. Once we determined that the reuse is necessary, we'll discuss further steps. P.S.: The maildir issue where a file is moved or deleted when there was still a stream open is an issue that must have been introduced in some reshuffle. This issue doesn't exist in the current code base, or does it?
Attached patch disable-reuse.patch (obsolete) — Splinter Review
OK, this bug hasn't see enough patches yet, so here comes another one. This patch disables reusing of output streams for mbox. Please do the following: Copy a large amount of messages between local folders and compare the times with and without the patch. I've used 600 bugmails of about 2MB and I can move those in about 20 seconds. I don't see much difference with and without the patch. So perhaps reusing file streams is just a myth and we can terminate it here and now.
Attachment #8904354 - Flags: feedback?(acelists)
Comment on attachment 8904354 [details] [diff] [review] disable-reuse.patch Review of attachment 8904354 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/local/src/nsMsgBrkMBoxStore.cpp @@ +654,2 @@ > { > rv = MsgGetFileStream(mboxFile, aResult); You can repeat the experiment by replacing the preceding line with: rv = MsgNewBufferedFileOutputStream(aResult, mboxFile, PR_WRONLY, 00600);
Dear Jorg, (In reply to Jorg K (GMT+2) from comment #212) > Created attachment 8904354 [details] [diff] [review] > disable-reuse.patch > > OK, this bug hasn't see enough patches yet, so here comes another one. > > This patch disables reusing of output streams for mbox. Please do the > following: > Copy a large amount of messages between local folders and compare the times > with and without the patch. I've used 600 bugmails of about 2MB and I can > move those in about 20 seconds. I don't see much difference with and without > the patch. > > So perhaps reusing file streams is just a myth and we can terminate it here > and now. (In reply to Jorg K (GMT+2) from comment #213) > Comment on attachment 8904354 [details] [diff] [review] > disable-reuse.patch > > Review of attachment 8904354 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: mailnews/local/src/nsMsgBrkMBoxStore.cpp > @@ +654,2 @@ > > { > > rv = MsgGetFileStream(mboxFile, aResult); > > You can repeat the experiment by replacing the preceding line with: > rv = MsgNewBufferedFileOutputStream(aResult, mboxFile, PR_WRONLY, 00600); I will repeat the experiments you have done locally and report back. Too much workload for the day job this week, hopefully, I can report back before the weekend... TIA
I hope there is no misunderstanding. The data we need is: Copy a decent amount of messages between local folders making sure that the start conditions are always the same, like source and target folder compacted. 1) With TB as it is, so reusing streams. 2) With the patch, so allocating a new stream for each message. 3) Not using the patch but changing the line to MsgNewBufferedFileOutputStream() 4) With the patch and the line changed.
I've done a bit more reading on this, just capturing my findings: For maildir, nsMsgMaildirStore::GetNewMsgOutputStream() returns a buffered stream, MsgNewBufferedFileOutputStream(). Before calling UpdateFolderFlag(), it calls nsMsgMaildirStore::GetOutputStream() which returns an unbuffered stream. For mbox, nsMsgBrkMBoxStore::GetNewMsgOutputStream() returns an unbuffered stream from the "pool". Before calling UpdateFolderFlag(), it calls nsMsgBrkMBoxStore::GetOutputStream() which returns an unbuffered stream from the same pool. So if we switch nsMsgBrkMBoxStore::GetNewMsgOutputStream() to buffered, we need to make changes in the way nsMsgBrkMBoxStore::ChangeFlags() and nsMsgBrkMBoxStore::ChangeKeywords() works since they need direct read/write. There are other "beauties" in the code. This is a snippet from nsMsgBrkMBoxStore::ChangeFlags(): if (restoreStreamPos != -1) seekableStream->Seek(nsISeekableStream::NS_SEEK_SET, restoreStreamPos); else if (outputStream) outputStream->Close(); If nsMsgBrkMBoxStore::GetOutputStream() returned a stream from the pool, we seek back to the position it came with, I wonder why, and if the stream was created and then added to the pool, we close it, so leaving a closed stream in the pool, which will be detected on next use.
Inspired by Chiaki's patches I looked at where else we QI an output stream to an input stream https://dxr.mozilla.org/comm-central/search?q=regexp%3AIInputStream.*do_QueryInterface.*Stream&redirect=false and found https://dxr.mozilla.org/comm-central/rev/71094089bd57db8e1263df37da210a8df38f3c2b/mailnews/local/src/nsPop3Sink.cpp#862 in nsPop3Sink::IncorporateComplete() which Chiaki has covered already here: https://hg.mozilla.org/try-comm-central/rev/afb1dffc84369174345a45a3eee803ca8b991612#l3.272 By the looks of it this only runs for mbox, since the block is controlled by |if (m_downloadingToTempFile)| a few lines up from the QI.
** I recall Chiaki saying he doesn't have much time remaining this year, so I'm worried about the recent two weeks of inactivity. What are the next steps to keep this moving quickly? ** I am also concerned about the time line to Thunderbird 59 and having a smooth, thoughtful landing and testing process should we decide that this is doable for Thunderbird 59 (otherwise this work must wait again another year before hitting release). If this is to ship in 59, the minimum I would like to see for a set of patches so invasive and with potential for dataloss is: 1. Minimize risk to nightly users. So before landing on comm-central, we should have targeted testing using try builds for at 2-3 weeks. I propose: 1) first using interested guinea pigs from our hard core everyday developers, QA and support people who have strong technical understanding of Thunderbird, then 2) using admins/users who have issues or have volunteered in the past AND who can accept the risk and are able to commit time and effort to giving quick feedback. (I can find these people) 2. Landing on comm-central with timing such that a) We minimize possible risk to beta users. So it must happen at least 4-6 weeks prior to merge, i.e. 4 weeks prior to beta. b) Facilitating regression hunting is a goal. So no other major patch landings for ~2 days after the commits (and possibly also before), so that bug reporting for this bug(set) and regression hunting can't be confused with other patches. c) The patches go through two betas, 58 and 59, meaning landings on comm-central should happen during the 58 cycle. Working the time line backwards... If all a, b, and c are to be met, then patches need to land on comm-central no later than 2017-10-13 which is 4 weeks away. But preferably sooner. Testing with try builds would need to begin very soon. If c) is not possible, then landings on comm-central should happen within the first two weeks of the 59 nightly cycle, around 2017-11-13. (But this is not my first choice.) I'd be interested in any other ideas which help us minimize risk. So, how much more is there to do?
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #218) > So, how much more is there to do? A lot since we haven't even decided which approach to take. Frankly, it's a great effort to get up to speed with this every couple of weeks :-(
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #218) > ** I recall Chiaki saying he doesn't have much time remaining this year, so > I'm worried about the recent two weeks of inactivity. What are the next > steps to keep this moving quickly? > > (In reply to Jorg K (GMT+2) from comment #219) > (In reply to Wayne Mery (:wsmwk, NI for questions) from comment #218) > > So, how much more is there to do? > A lot since we haven't even decided which approach to take. Frankly, it's a > great effort to get up to speed with this every couple of weeks :-( Sorry, I probably can use two weekends' worth of time before the end of October :-( Already, a big one week meeting of about 10 organizations is looming at the beginning of October, and then a big annual technology symposium in the middle of December :-( I can't even find the time to check the short performance issue Jorg raised a couple of weeks ago...
Bear in mind, we don't need perfection in the performance. We only need perfection in stability and reliability. It seems to me the primary initial performance goal is should be noticeable IMPROVEMENT. If we have something that's improved enough to ship, and we can iterate for further improvements from there. In other words, “Don't Let Perfection Get in the Way of Progress“. Otherwise it may be 2020 til we land something. By then, some of us may be dead.
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #221) > Bear in mind, we don't need perfection in the performance. We only need > perfection in stability and reliability. Before we remove the "reusable" flag, that is, open/close mbox files for every message we incorporate/copy, we really need to ascertain the performance impact.
OK. But it seems like we keep on adding new bits to this process. The idea of removing reusable wasn't originally part of the picture iirc. Can we not land what we know should work on nightly so it gets testing, and handle reusable investigations and removal (or not) and other follow on issues later?
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #223) > OK. But it seems like we keep on adding new bits to this process. The idea > of removing reusable wasn't originally part of the picture iirc. Wayne, you need to understand what this is about. We have two difference stores, maildir and mbox. Maildir stores message in a single file per message, mbox stores many messages in the same file. The reusable flag indicates which storage we are using, in case of mbox, the output streams are "reusable", so they need to be kept open for writing the next message via the same stream. Having to cater for the two difference cases, complicates the code greatly. Therefore Chiaki suggested to remove this processing and always open/close the mbox file for each message processed. This might come at a performance cost which we need to determine first. There is of course the option to leave the reusable processing in place and proceed with a more complex approach which we previously considered. > Can we not land what we know should work on nightly so it gets testing, and > handle reusable investigations and removal (or not) and other follow on > issues later? Chiaki has run the latest tests assuming the removal of the reusable processing. So if were to abandon that option, we'd have to return to another option previously under consideration. Frankly, I don't think trying to do micro-management or trying to accelerate the process will help. Unless we take the issue off Chiaki's hands, we can't accelerate here.
I will *try* to get the performance figure of moving/copying the messages from a folder to the other, as suggested by Jorg, over this weekend unless some events get in the way (well actually, I am already quite tied up with some family business on Sunday, and this is already Saturday morning...), and I have about 10 papers/documents to take care of before Monday :-) / :-( At least, the performance figure would help us (or at least give some indication hopefully).
> Frankly, I don't think trying to do micro-management or trying to accelerate the process will help. If someone says it doesn't make sense, that's fine - it is what it is and that's all I need to hear. I asked because sometimes there are ways to accelerate or take a different path for perfectly valid reasons. And if one doesn't ask the question, one doesn't know. Unfortunately my comment 222 was not in the form of a question (in my strange head it was a question but didn't come out that way) and I'm sorry if it was taken as mere criticism. That's not the intention. OTOH I'm clearly not micromanaging - if I were micromanaging you would see me with lots of bug comments and in people's faces, but the history of this bug and others doesn't show that. This was just a question from someone who has been very involved with Chiaki's work, starting years ago.
Blocks: 484119
Blocks: 1121842
If we look at the action in bug 1403771 where we had to adapt Mailnews to some Necko changes (which, BTW, will have invalidated all the patches here), we can see that reusing streams is just a terrible maintenance burden, just look at https://dxr.mozilla.org/comm-central/search?q=m_multipleMsgMoveCopyStream&redirect=false (and make sure the index has updated with today's changes) to see what we need to do to support that concept. "Use once, close and forget" is far simpler than "open, make a clone to use and make sure the clone closes, hang on to the original for further cloning and make sure we close the original in the end" is much more complicated, so far without an empirically determined performance benefit. If we got rid of re-usability, we could streamline our code, make mbox and maildir behave the same and get rid of chunks of code, like all the cloning and nsMailboxProtocol::OpenMultipleMsgTransport(). I'll attach a sample patch written for bug 1403771 that shows what code can be removed in nsMailboxProtocol.cpp if we get rid of re-usability.
Just as an illustration of what could be removed. I've never used this patch, I've written it as alternative approach in bug 1403771.
I have finally managed to start compiling the latest C-C source tree under Debian GNU/Linux: there were a a few hurdles I needed to overcome after four months of silence. The patch I proposed used a function NS_BufferOutputStream() which has been removed since Oct, 2017. So I have to rework that part. Once that is done [I am still figuring out how to use the new NS_NewBufferedOutputStream, which has a slightly different parameter in the second position] , and I test the binary with |make mozmill| and xpcshell-tests, I will check the performance figures: cf. Comment 222 > Before we remove the "reusable" flag, that is, open/close mbox files for every message we incorporate/copy, we really need to ascertain the performance impact. Then after a direction is clear, I will clean up the patch set. Due to a hiccup on tryserver last year, I had to remove Jorg's modifications from the patch set I have. It was a compiler issue that did not like the particular manner error code was printed (I think it was the MSC compiler that did not grok the modification for a while. I have no idea why.). The error seemed transient now that I thought about it, and I probably should not removed the particular changes (since now I have to re-apply the changes selectively. Agah.) Anyway, there are a lot of cleanup work to do since the C-C source tree has changed greatly in the last four months. I have to make sure the patch cleanly applies including the further patches I have for short-read handling. in Bug 1170606 - C-C T-B fixes to take care of short read (part of [META] Failure to deal with short read) TIA
(In reply to ISHIKAWA, Chiaki from comment #230) > Anyway, there are a lot of cleanup work to do since the C-C source tree has > changed greatly in the last four months. Absolutely, there have been huge changes as far as the processing of streams is concerned, see comment #227. Bug 1403771 mentioned there has quite a few friends which all brought changes around September to October 2017.
(In reply to Jorg K (GMT+1) from comment #231) > (In reply to ISHIKAWA, Chiaki from comment #230) > > Anyway, there are a lot of cleanup work to do since the C-C source tree has > > changed greatly in the last four months. > Absolutely, there have been huge changes as far as the processing of streams > is concerned, see comment #227. Bug 1403771 mentioned there has quite a few > friends which all brought changes around September to October 2017. Thank you, Jorg, for heads-up. I am reading 1403771 now. The removal of NS_BufferOutputStream did happen in October timeframe, and so all these changes that busted some codes are related. TIA
(In reply to ISHIKAWA, Chiaki from comment #232) > (In reply to Jorg K (GMT+1) from comment #231) > > (In reply to ISHIKAWA, Chiaki from comment #230) > > > Anyway, there are a lot of cleanup work to do since the C-C source tree has > > > changed greatly in the last four months. > > Absolutely, there have been huge changes as far as the processing of streams > > is concerned, see comment #227. Bug 1403771 mentioned there has quite a few > > friends which all brought changes around September to October 2017. > > Thank you, Jorg, for heads-up. I am reading 1403771 now. The removal of > NS_BufferOutputStream did happen in October timeframe, and so all these > changes that busted some codes are related. > > TIA If the closing of streams is done properly so that, under Windows, Move does not fail at a certain place, THEN it may be possible that a special handling of Windows that was necessary when the reusable flag was used may not be necessary any more (!). That simplifies code very much... If ditching reusable flag causes significant performance issue, then the above observation is a good news. It is too early to tell anything in confidence. I will tidy up my patch. After reading Bug 1403771 comment 227 and all other comments, I am not even sure if the patch works any more without significant change. TIA
Many moons ago, there was a question of whether we should rip out the code for caching open file stream to make the source easier for maintenance. Of course, we need to check that the performance does not suffer from doing this. The suggestion from Jorg was that I check the operation of copying, say, 100 messages from a folder to the other. (I think it was mentioned in a mail post, or somewhere, but I am a little hazy and could not find the original reference.) Well, family matters prevented me from doing this quickly. But now I have produced the performance data for copying 1001 messages from a folder to the other for the version created from TB hg source tree with the buffered output feature WITHOUT caching of open file stream and for the version WITHOUT buffered output but WITH caching. Since the only method available for me to measure the timing is the wristwatch, I copied enough messages so that I can measure the time meaningfully. The following is the summary of what I found. Summary: Based on the timing data of copying of 1001 messages which takes about 3-6 seconds with my patched TB (no caching of open file stream, but buffering output stream) and about 215 seconds (!) with the unpatched TB (caching of open file stream, but unbuffered output stream) in one typical test case, the open/close of message folder per message copy s not a performance bottleneck of TB at all. Rather it was found that unbuffered output is a killer overhead. The observation is backed up by the analysis of the number of system calls involved. It is hard to believe the timing difference of 3-6 seconds vs 215 seconds. However, you might want to check how long it takes for you to copy hundred messages from a folder to the other. I think it will be a dozen seconds or so. (It was on my Windows10 released TB version, too.) I didn't realize the speed was so slow since I don't do bulk copying often in real world, and I have used the local buffered version for local testing quite a while last year and the before. Background: Shall we rip out the code for caching open file stream from TB source code to simplify the code for easy maintenance? For example, using buffered output everywhere in TB requires the caching of buffered open file stream. This caching turns out not quite easy or straightforward with the changes of some buffered file creation functions since the fall of 2017. As a matter of fact, I could not cope with this caching when I tried to adapt my patches for buffered output with the later hg source tree after I could not program for a while in the fall of 2017 and early 2018 due to family matters. So I decided to rip out the caching of open file stream in order to enable buffered output in the Spring of 2018. The patches were modified eventually to work with mid-April source tree. Of course the big question pops up. Does this doing away with caching of open file stream affect performance significantly? ANSWER: Definitely NO is the answer based on performance data and system call statistics. (And actually, the buffered output turns out to be a huge win for message copying case.) Here is the report of detailed performance analysis. Conclusion: we can safely rip out the code for caching of open file stream to make the source tree easier to maintain. We can introduce the buffered output for much better performance of TB even after ripping the caching of open file streams. My point here is that the maintaining caching of open file stream has become rather difficult and made the code maintenance difficult if we wanted to introduce new features such as using buffered stream as often as possible. And as the performance data and system call analysis shows, the caching actually didn't buy much in terms of performance gain, and the gain obtained by using buffered output far outweighs the merit of caching, if any. Detailed statistics. I obtained the elapsed time and system call statistics for the following combinations of source folder and test folder. Note 215 seconds (marked with leading *) for Binary-2 in the case of copy 1001 messages. That clearly shows that the current TB is very slow, and buffered output can bring about significant win. Source folder Target Folder --------------------------------+-----+------+-------+------------ | 0 | gdb |TeXLive| Libc-alpha --------------------------------+-----+------+-------+------------ 1|emacs-org-mode-DIR | 1A | 1B | 1C | 1D |copy 1001 msgs (sec)Binary-1 | 4 | 3 | 3.5-4 | 3.5-4 |copy 2000 msgs (sec)Binary-1 |5-5.5| | 6 | 1' | | | | |copy 1001 msgs (sec)Binary-2 |*215 | | --------------------------------+-----+------+-------+------------ 2|libc-alpha-dir 387MB | 2A | 2B | 2C | n/a |copy 1001 msgs (sec)Binary-1 | 2-3 | 2-3 | 5 | --------------------------------+-----+------+-------+------------ 4| Fix-compiler-warnings 8MB | 4A | 4B | 4C | 4D |copy 1001 msgs (sec)Binary-1 | 3.5 |3.5-4 | 3-4 | 3 |copy 2000 msgs (sec)Binary-1 | 6 | | 6-7 | --------------------------------+-----+------+-------+------------ For example, the tag 1A in the table refers to the case of source folder: emacs-org-mode-DIR target folder: newly-created empty zero-folder Regarding syscall statistics (First) This is for the case of copying 1001 messages from emacs-org-mode-DIR to zero-folder (newly created empty folder) using the Binary-1 (buffered output but without caching open file stream). The wall elapsed time was approximately 4 sec. I notice that there is one fsync, but 24 fdatasync calls.. There are large number openat/close. # of calls time spent avg. time spent (microseconds) clone 1 198 198.000000 exit 1 0 0.000000 fsync 1 118148 118148.000000 <- fsync once getpriority 1 84 84.000000 mkdir 1 12 12.000000 prctl 1 87 87.000000 quotactl 1 101 101.000000 set_robust_list 1 50 50.000000 setpriority 1 85 85.000000 statfs 1 104 104.000000 rename 2 166 83.000000 uname 2 152 76.000000 epoll_wait 3 0 0.000000 ftruncate 4 292 73.000000 shmat 4 339 84.750000 shmget 4 548 137.000000 geteuid 6 241 40.166667 shmdt 6 543 90.500000 shmctl 8 593 74.125000 restart_syscall 10 4848284 484828.400000 unlink 11 2776 252.363636 fdatasync 24 2373405 98891.875000 <- 24 fdatasync calls? brk 60 11547 192.450000 getrusage 86 8377 97.406977 dup 123 11352 92.292683 munmap 149 12001 80.543624 sendmsg 180 15179 84.327778 getpid 286 20527 71.772727 mmap 489 81128 165.905930 getuid 711 47481 66.780591 fstat 928 68061 73.341595 stat 2039 171133 83.929868 lstat 2159 223204 103.383048 times 2772 291825 105.275974 fcntl 2781 221920 79.798634 close 3574 212760 59.529938 <- close openat 4159 434382 104.443857 <- openat writev 4774 517728 108.447424 access 5978 467610 78.221813 mprotect 6322 533449 84.379785 lseek 8883 602492 67.825284 read 10544 851908 80.795524 madvise 10636 454616 42.743137 gettid 11268 666666 59.164537 write 11945 1145337 95.884219 <- write poll 23582 96770841 4103.589221 recvmsg 38291 3232226 84.412160 futex 43569 703213974 16140.236728 (Second) This is for the case of copying 1001 messages from emacs-org-mode-DIR to zero-folder (newly created empty folder) using the Binary-2 (unbuffered output but with caching open file stream). The elapsed time was 3 minutes 35 seconds, i.e.. 215 sec. Why is this so slow compared with the above? Obviously the sheer number of context switches of write is the killing factor. >write 168947 14689604 86.948001 <- write is the killer > Too many context switches.w We notice 24 fdatasync calls. We see that the # of openat and close are called less often due to caching.. >openat 2224 375128 168.672662 yes open and close >close 2233 107400 48.096731 is called less often. Without caching in the previous case, it was >close 3574 212760 59.529938 <- close >openat 4159 434382 104.443857 <- openat # of calls time spent avg. time spent (microseconds) clone 1 560 560.000000 getpriority 1 60 60.000000 prctl 1 98 98.000000 readlink 1 68 68.000000 set_robust_list 1 119 119.000000 setpriority 1 95 95.000000 exit 2 0 0.000000 quotactl 2 142 71.000000 rt_sigreturn 2 131 65.500000 shmdt 2 229 114.500000 statfs 2 131 65.500000 tgkill 2 183 91.500000 epoll_wait 3 0 0.000000 mkdir 4 198 49.500000 rename 4 483 120.750000 uname 4 388 97.000000 shmat 6 80 13.333333 shmget 6 96 16.000000 ftruncate 7 535 76.428571 getdents 8 350 43.750000 restart_syscall 10 8756919 875691.900000 shmctl 12 64 5.333333 geteuid 20 1037 51.850000 unlink 32 186171 5817.843750 brk 48 13646 284.291667 fdatasync 80 10857559 135719.487500 <--- Wow sendmsg 84 10231 121.797619 getrusage 90 9931 110.344444 getuid 350 23665 67.614286 dup 401 54583 136.117207 munmap 413 34947 84.617433 mmap 738 66095 89.559621 getpid 860 81509 94.777907 fstat 1001 71634 71.562438 fsync 1001 146870325 146723.601399 lstat 1155 116723 101.058874 madvise 1335 34313 25.702622 fcntl 1638 187943 114.739316 openat 2224 375128 168.672662 yes open and close close 2233 107400 48.096731 is called less often. stat 2591 280157 108.126978 Less by 1001. writev 3348 373862 111.667264 times 3392 370686 109.282429 access 3521 347793 98.776768 lseek 4858 378373 77.886579 mprotect 5551 611719 110.199784 read 8276 668608 80.788787 poll 19455 478942793 24617.979594 recvmsg 28682 2631934 91.762569 gettid 54866 3151040 57.431561 futex 119854 2759027240 23019.901213 write 168947 14689604 86.948001 <- write is the killer Too many context switches. The text file that contains detail is posted to buzilla 1242030.
Blocks: 1432879

I am going to upload a new set of patches that are now compatible with reformatted source tree, and are known to work.
(Finally, I could get rid of a couple of errors that showed up: one in linux and OSX build, and additional one in Windows build on tryserver tests.
After carefuly checking I found that there was a subtle copy&paste error during the update to match clang-formatted source tree.

All is well : passes mozmil and xpcshell-test.

It ripped out the unnecessary caching as far as the buffered writing is concerned. (That is, as far as the code I touched is concerned. Maybe not everywhere. It may have to be touched in another buzilla comprehensively IMHO.)

The following patches will be uploaded:

1242030-DONT-USE-REUSABLE-new-part-1.patch: bug 1242030: DONT-USE-REUSABLE new part-1 new splitting of (consolidation of 1122698, 1134527, 1134529, 1174500)
1242030-DONT-USE-REUSABLE-new-part-2-imap-JK-v1-rev02.patch: bug 1242030: DONT-USE-REUSABLE Improve error handling in file/stream I/O. Part 2. r=jorgk.
1242030-DONT-USE-REUSABLE-new-part-3-import-JK-v1-rev03.patch: bug 1242030: DONT-USE-REUSABLE Improve error handling in file/stream I/O. Part 3 (was 2). r=jorgk. revision 2
1242030-DONT-USE-REUSABLE-new-part-4-local-less-pop3.patch: bug 1242030: DONT-USE-REUSABLE new part-4 new splitting of (consolidation of 1122698, 1134527, 1134529, 1174500)
1242030-DONT-USE-REUSABLE-new-part-5-pop3.patch: bug 1242030: DONT-USE-REUSABLE new part-5 new splitting of (consolidation of 1122698, 1134527, 1134529, 1174500)

Additionally, following two patches posted in respective bugzilla completes the buffered output finally (!).

enable-buffering-1st-step.patch: Bug 1242042: Enabling buffering for file stream to write message for C-C TB (Enabling buffering first step.)
removing-a-Seek-rev02.patch: Bug 1242046 - Removing unnecessary |Seek| that caused the C-C TB to operate slowly in terms of I/O

clang formatted. I obsoleted attachment 8879215 [details] [diff] [review].

Attachment #8879215 - Attachment is obsolete: true

clang-formatted. I obsoleted attachment 8879217 [details] [diff] [review] (1242030-DONT-USE-REUSABLE-new-part-2-imap-JK-v1-rev02.patch)

Attachment #8879217 - Attachment is obsolete: true

clang-formatted. I obsoleted 8879218: 1242030-DONT-USE-REUSABLE-new-part-3-import-JK-v1-rev02.patch.

Attachment #8879218 - Attachment is obsolete: true

clang-formatted. I obsoleted 8879219: 1242030-DONT-USE-REUSABLE-new-part-4-local-less-pop3.patch.

clang-formatted. I obsoleted 8879221: 1242030-DONT-USE-REUSABLE-new-part-5-pop3.patch.

Attachment #8879219 - Attachment is obsolete: true
Attachment #8879221 - Attachment is obsolete: true

Folks,
Can somebody give me an idea how I should go about landing these patches?
It has taken so many years and I am a bit at a loss myself.

I am cc: ing to Jorg, aceman and wayne for now.

Thank you in advance.

Flags: needinfo?(vseerror)
Flags: needinfo?(jorgk)

Hi, thank you. Yes, it is a pity this took so long, but if the patch set works now, we should be able to accept it.
You did not obsolete some of the older patch sets named "USE REUSABLE" and "USE CLOSED". Are those still valid in addition to the new "DONT USE REUSABLE" patch set?

Flags: needinfo?(acelists)

Thanks for continuing to push this through. I defer to Jorg in these code matters. I look forward to seeing this in a release!

Flags: needinfo?(vseerror)

(In reply to :aceman from comment #244)

Hi, thank you. Yes, it is a pity this took so long, but if the patch set works now, we should be able to accept it.
Thank you for you comment.

That will be great, but we need to review them first.
(There were a few modifications that were necessary to get the local compiler and tryserver MS STUDIO compiler build the code.
The latter happened in the summer of 2017 for about 10 days if I recall correctly.)

You did not obsolete some of the older patch sets named "USE REUSABLE" and "USE CLOSED". Are those still valid in addition to the new "DONT USE REUSABLE" patch set?

Well, they were referred in the old comments and so I thought it might be necessary to keep them.

Maybe I should obsolete them to avoid misunderstanding. I will do so after this comment.

BTW, as the latest patch name suggests, the patch set does away with reusable flag based on my observation reported here (comment 234) that caching of file stream is not worthwhile and my patch that reduces the # of write system calls is a big win for copying messages between folders even.

(In reply to Wayne Mery (:wsmwk) from comment #245)

Thanks for continuing to push this through. I defer to Jorg in these code matters. I look forward to seeing this in a release!

Yup. As Jorg is one of the key maintainers of C-C TB , I need his input to keep the code in good shape.
So a review cycle will be necessary before landing.
I will do more local testing (using a router's USB stick as remote server storage and unplugging the network cable (simulated by disabling networking of VirtualBox) while the review will go on.

TIA

I am obsoleting these patches.
(They may be referenced in comments, but I think we can access obsoleted attachments if necessary.)

USE-REUSABLE 1242030-new-part-1.patch
USE REUSABLE 1242030-new-part-2-imap.patch
USE REUSABLE 1242030-new-part-3-import.patch
USE REUSABLE 1242030-new-part-4-local-less-pop3.patch
USE REUSABLE 1242030-new-part-5-pop3.patch
USE REUSABLE 1242030-part0-pass-out-reusable.patch

Hmm... I should have added updated -rev03 suffix to my latest patches. I would do so locally now.
1242030-USE-CLOSED-new-part-1.patch
1242030-USE-CLOSED-new-part-2-imap-JK-v1-rev02.patch
1242030-USE-CLOSED-new-part-3-import-JK-v1-rev02.patch
1242030-USE-CLOSED-new-part-4-local-less-pop3.patch
1242030-USE-CLOSED-new-part-5-pop3.patch

Attachment #8879199 - Attachment is obsolete: true
Attachment #8879200 - Attachment is obsolete: true
Attachment #8879201 - Attachment is obsolete: true
Attachment #8879203 - Attachment is obsolete: true
Attachment #8879205 - Attachment is obsolete: true
Attachment #8879207 - Attachment is obsolete: true
Attachment #8879208 - Attachment is obsolete: true
Attachment #8879209 - Attachment is obsolete: true
Attachment #8879211 - Attachment is obsolete: true
Attachment #8879212 - Attachment is obsolete: true
Attachment #8879214 - Attachment is obsolete: true

Done.

(In reply to ISHIKAWA, Chiaki from comment #247)

I am obsoleting these patches.
(They may be referenced in comments, but I think we can access obsoleted attachments if necessary.)

USE-REUSABLE 1242030-new-part-1.patch
USE REUSABLE 1242030-new-part-2-imap.patch
USE REUSABLE 1242030-new-part-3-import.patch
USE REUSABLE 1242030-new-part-4-local-less-pop3.patch
USE REUSABLE 1242030-new-part-5-pop3.patch
USE REUSABLE 1242030-part0-pass-out-reusable.patch

Hmm... I should have added updated -rev03 suffix to my latest patches. I would do so locally now.

I meant to say that I should update the existing -rev02 to -rev03 and then add -rev03 to patch names that don't have them.
Already a couple of them do have such -rev03 string, but no all of them yet in the attachments.

1242030-USE-CLOSED-new-part-1.patch
1242030-USE-CLOSED-new-part-2-imap-JK-v1-rev02.patch
1242030-USE-CLOSED-new-part-3-import-JK-v1-rev02.patch
1242030-USE-CLOSED-new-part-4-local-less-pop3.patch
1242030-USE-CLOSED-new-part-5-pop3.patch

Can somebody give me an idea how I should go about landing these patches?

They need to be reviewed again. It would also be good to see the performance comparison again. Maybe we should get our new backend man, Ben C, involved, although he's generally quite busy.

Flags: needinfo?(jorgk)

(In reply to Jorg K (GMT+2) from comment #249)

Can somebody give me an idea how I should go about landing these patches?

They need to be reviewed again.

Sure.
Actually, I would appreciate it.
Due to some compiler issues locally and on tryserver, I had to re-do some parts of earlier code which you kindly reviewed before.
So we have to make sure the code is in good shape in terms of today's debugging macros, etc.

It would also be good to see the performance comparison again. Maybe we should get our new backend man, Ben C, involved, although he's generally quite busy.

I can do the comparison again.

So can you review the patches at least partially, Jorg?

TIA

Comment on attachment 9106025 [details] [diff] [review] 1242030-DONT-USE-REUSABLE-new-part-1.patch (clang-formatted) Review of attachment 9106025 [details] [diff] [review]: ----------------------------------------------------------------- I looked through the first patch. Let's try to get all the formal stuff and nits out of the way first. ::: mailnews/base/public/nsIMsgPluggableStore.idl @@ +16,5 @@ > interface nsIUrlListener; > interface nsIMsgDatabase; > interface nsITransaction; > > +[scriptable, uuid(046AB3DC-D4C7-11E5-95DB-0800272954B9)] We don't change UUIDs any more. @@ +120,5 @@ > +// * @param aReusable set to true on output if the caller can reuse the > +// * stream for multiple messages, e.g., mbox format. > +// * This means the caller will likely get the same stream > +// * back on multiple calls to this method, and shouldn't > +// * close the stream in between calls if they want reuse. Remove old comment. @@ +135,5 @@ > * to an other folder as a filter action, or is deleted because it's > * a duplicate. This gives the berkeley mailbox store a chance to simply > * truncate the Inbox w/o leaving a deleted message in the store. > * > + * discardNewMessage closes aOutputStream always unless the passed stream Trailing space. ::: mailnews/base/util/nsMsgDBFolder.cpp @@ +729,5 @@ > + if (NS_FAILED(rv)) { > +#ifdef DEBUG > + fprintf(stderr, > + "(debug) nsMsgDBFolder::GetOfflineFileStream: " > + "GetMsgInputStream(hdr, &reusable, aFileStream); rv=0x%" PRIx32 that should be rv=%" PRIx32. @@ +835,5 @@ > +#ifdef DEBUG > + if (NS_FAILED(rv)) { > + fprintf(stderr, > + "(debug) nsMsgDBFolder::GetMsgInputStream: msgStore->" > + "GetMsgInputStream(this, ...) returned error rv=%" PRIx32 "\n", Correct. @@ +936,5 @@ > nsresult rv = OpenBackupMsgDatabase(); > +#ifdef DEBUG > + if (NS_FAILED(rv)) { > + fprintf(stderr, > + "(debug) OpenBackupMsgDatabase(); returns error=0x%" PRIx32 "\n", Wrong. @@ +1668,5 @@ > + NS_WARNING("m_tempMessageStream->close returned error"); > +#ifdef DEBUG > + fprintf(stderr, > + "(debug) Offline message too small: messageSize=%d " > + "curStorePos=%" PRId64 " numOfflineMsgLines=%d bytesAdded=%d\n", Are the %d right for all platforms?

(In reply to Jorg K (GMT+2) from comment #251)

Comment on attachment 9106025 [details] [diff] [review]
1242030-DONT-USE-REUSABLE-new-part-1.patch (clang-formatted)

Review of attachment 9106025 [details] [diff] [review]:

I looked through the first patch. Let's try to get all the formal stuff and
nits out of the way first.

::: mailnews/base/public/nsIMsgPluggableStore.idl
@@ +16,5 @@

interface nsIUrlListener;
interface nsIMsgDatabase;
interface nsITransaction;

+[scriptable, uuid(046AB3DC-D4C7-11E5-95DB-0800272954B9)]

We don't change UUIDs any more.

So it can remain the same even if the argument list of a public function (the removal of |aReusable| happened (?)

@@ +120,5 @@

+// * @param aReusable set to true on output if the caller can reuse the
+// * stream for multiple messages, e.g., mbox format.
+// * This means the caller will likely get the same stream
+// * back on multiple calls to this method, and shouldn't
+// * close the stream in between calls if they want reuse.

Remove old comment.

Will do.

@@ +135,5 @@

* to an other folder as a filter action, or is deleted because it's
* a duplicate. This gives the berkeley mailbox store a chance to simply
* truncate the Inbox w/o leaving a deleted message in the store.
*
    • discardNewMessage closes aOutputStream always unless the passed stream

Trailing space.

My. I thought clang-format will take care of extra trailing whitespace automagically.
This means I may still have other trailing whatspace stuff in the patches.

::: mailnews/base/util/nsMsgDBFolder.cpp
@@ +729,5 @@

  • if (NS_FAILED(rv)) {
    +#ifdef DEBUG
  • fprintf(stderr,
  •        "(debug) nsMsgDBFolder::GetOfflineFileStream: "
    
  •        "GetMsgInputStream(hdr, &reusable, aFileStream); rv=0x%" PRIx32
    

that should be rv=%" PRIx32.

OK, I thought I need to add 0x in front of %. As you already noticed below, I got confused above.
Will fix these through out the patch set.

@@ +835,5 @@

+#ifdef DEBUG

  • if (NS_FAILED(rv)) {
  • fprintf(stderr,
  •        "(debug) nsMsgDBFolder::GetMsgInputStream: msgStore->"
    
  •        "GetMsgInputStream(this, ...) returned error rv=%" PRIx32 "\n",
    

Correct.

@@ +936,5 @@

nsresult rv = OpenBackupMsgDatabase();
+#ifdef DEBUG

  • if (NS_FAILED(rv)) {
  • fprintf(stderr,
  •        "(debug) OpenBackupMsgDatabase(); returns error=0x%" PRIx32 "\n",
    

Wrong.

Will fix this and friends.

@@ +1668,5 @@

  •    NS_WARNING("m_tempMessageStream->close returned error");
    

+#ifdef DEBUG

  •  fprintf(stderr,
    
  •          "(debug) Offline message too small: messageSize=%d "
    
  •          "curStorePos=%" PRId64 " numOfflineMsgLines=%d bytesAdded=%d\n",
    

Are the %d right for all platforms?

As far as I know, I didn't get compiler errors and I thought %d were all right.
(I thought the # of lines and bytes added for a single message are 32-bit entities.)
But let me check. In the worst case scenario, I can extend the size on all platforms and use PRId64, but that seems to be a bit far-fetched.

TIA

By the way, the successful build run on tryserver with the patch set and a few additional ones [LDAP library reformat, etc.] is, for example,
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=ffebf8451c3ba83e3e6f53b198b861e7251ace26

[I have run them with a minor modification and additional local patches about a dozen times by now.]

TIA

Jorg, I thought I would take care of the following in one sweep.

that should be rv=%" PRIx32.

But then I realized that I added "0x" in front of % to obtain a 0x prefix for the printed error value.

I see output of error value as hexadecimal value as in the following example from error handling macros:

 0:01.52 pid:27621 [27621, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, kKnownEsrVersion) failed with result 0x80004002: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/toolkit/components/resistfingerprinting/nsRFPService.cpp, line 661

The value 80004002 is prefixed by "0x" to signal it is a hexadecimal number.

For example, the macro NS_ENSURE_SUCCESS_BODY is define thusly.
https://dxr.mozilla.org/mozilla-central/source/xpcom/base/nsDebug.h#250

define NS_ENSURE_SUCCESS_BODY(res, ret) \

mozilla::SmprintfPointer msg = mozilla::Smprintf( \
    "NS_ENSURE_SUCCESS(%s, %s) failed with "      \
    "result 0x%" PRIX32,                          \
    #res, #ret, static_cast<uint32_t>(__rv));     \
NS_WARNING(msg.get());

I think I added "0x" before "%" for the same reason.
That there are a few missing cases need fixing.

Am I mistaken for the use of PRIx32 here?

TIA

PS: will check if %d is good for all architecture later today.

As for the trailing space issue, I am not entirely sure if I accidentally add it after clang-format took care of the reformatting, or if clang-format honors the trailing space inside comment block. Could be the latter case.

Flags: needinfo?(jorgk)

Ouch, I now recall Jorg reads the messages sent automatically for C-C TB bugzilla filings.
How do I turn off the needinfo I turned on.
Oh, I can provide the information and then turn off the needinfo flag. Hmm, somewhat unintuitive, but hope it works.

Flags: needinfo?(jorgk)

(In reply to Jorg K (GMT+2) from comment #251)

Comment on attachment 9106025 [details] [diff] [review]
1242030-DONT-USE-REUSABLE-new-part-1.patch (clang-formatted)

[omission]

> 
> @@ +1668,5 @@
> > +        NS_WARNING("m_tempMessageStream->close returned error");
> > +#ifdef DEBUG
> > +      fprintf(stderr,
> > +              "(debug) Offline message too small: messageSize=%d "
> > +              "curStorePos=%" PRId64 " numOfflineMsgLines=%d bytesAdded=%d\n",
> 
> Are the %d right for all platforms?

I checked. It should be OK.

Here the data types are all 32-bit except for curStorePos which is int64_t.
So use of %d is OK. See below.

 (defined in the same function)
  uint32_t messageSize;
  int64_t curStorePos;

comm/mailnews/base/util/nsMsgDBFolder.h
201 int32_t m_numOfflineMsgLines;

omm/mailnews/base/util/nsMsgDBFolder.h
202 int32_t m_bytesAddedToLocalMsg;

So the patch should be OK.

TIA

As for redoing the performance analysis of copying messages between folders (as reported in comment 234), I think I need about 10 days to do the comprehensive analysis.
(Given that a large technological exhibition for which I work as part of an organizer and exhibitor side as well is fast approaching [it is slated in Dec 11-13], I may not be able to perform the comprehensive test until December 20th in the worst case, but I am trying to modify the patch as reviewed more or less in realtime.)

If the variables are int32_t then use PRId32 in the format string to be safe. PRIu32 for the uint32_t variable.

(In reply to :aceman from comment #259)

If the variables are int32_t then use PRId32 in the format string to be safe. PRIu32 for the uint32_t variable.

Oh, I did not realize that. I will modify the patch.

I am curious, though. Which architecture has an issue with %d to print int32_t and/or uint32_t?

Updated to accommodate Jorg's comment.
Uses PRIu32 and PRId32 to print out unsigned and signed 32-bit int.

Generic description for the the five (5) patches:
There were mixed usage of "0x%" PRIx32 and "..%" PRIx32 (the latter not preceded by 0x).
But my intention was to precede the hex printout of nsresult value.
So I changed them all to "0x%" PRIx32.
The remaining "%d" usage is for printing the value of LINE macro, which is an integer (and my cursory inspection indicates they are 16 bit entities .
So I think it should not be a big problem.

I replaced |-rev03| with |-rev04| the patch file name.

Attachment #9106025 - Attachment is obsolete: true

See previous upload.

Attachment #9106026 - Attachment is obsolete: true

See part-1 of the series.

Attachment #9106027 - Attachment is obsolete: true

See part-1 of the patch series.

Attachment #9106028 - Attachment is obsolete: true

See part-1.

Attachment #9106029 - Attachment is obsolete: true

I forgot to mention. UUID was not reverted yet because I was not sure if I can retain the same UUID even when the published API function has a different set of parameters.

TIA

UUIDs never need to change nowadays. It was only needed for binary add-ons which are long gone.

I reverted the UUID back to the original value.
Otherwise, the same with the previous version.

Attachment #9107846 - Attachment is obsolete: true
Attachment #8904354 - Flags: feedback?(acelists)

I will update the patches mentioned in comment 237 with the latest local patches.
And do the performance measurement against the current distributed binary (on linux probably.)

Also, I may want to add a few cleanups JorgK mentioned when we don't need to use reuseflag.

Flags: needinfo?(ishikawa)
Blocks: 1385580

I will refresh the patches to fix bitrot.
(I was side tracked with other minor bugs and valgrind not usable against TB under linux for the better part of past 16 months.)

Flags: needinfo?(ishikawa)
Flags: needinfo?(ishikawa)

These patches are updated with recent C-C tree changes, and they

  • removed the use of reuse flag as far as message handlings are concernted, and
  • implement the old legacy pop3 error recovery more or less correctly (more on this later).

I will upload the updated performance figure(s) similar to those were posted before the patch.

From my local patch queue:

 4 A new-1242030-precond-000-nsMsgUtils.h-typo-fix.patch: Fixed a typo and added file stream tracing macros
 5 A new-1242030-DONT-USE-REUSABLE-new-part-1-rev04-with-the-original-uuid.patch: bug 1242030: DONT-USE-REUSABLE new part-1 new splitti...
 6 A new-1242030-DONT-USE-REUSABLE-new-part-1.5-mostly-remove-reuseflag.patch: bug 1242030: DONT-USE-REUSABLE new part-1.5 mostly remov...
 7 A new-1242030-very-tell-removal-was-OK-1719996.patch: check whether removing tell was OK.
 8 A new-1242030-DONT-USE-REUSABLE-new-part-2-imap-JK-v1-rev04.patch: bug 1242030: DONT-USE-REUSABLE Improve error handling in file/str...
 9 A new-1242030-DONT-USE-REUSABLE-new-part-3-import-JK-v1-rev04.patch: bug 1242030: DONT-USE-REUSABLE Improve error handling in file/s...
10 A new-1242030-DONT-USE-REUSABLE-new-part-4-local-less-pop3-rev04.patch: bug 1242030: DONT-USE-REUSABLE new part-4 new splitting of (...
11 A new-1242030-DONT-USE-REUSABLE-new-part-5-pop3-rev04.patch: bug 1242030: DONT-USE-REUSABLE new part-5 new splitting of (consolidati...
12 A new-1242030-enable-buffering-1st-step-rev03.patch: Bug 1242042: Enabling buffering for file stream to write message for C-C TB (En...
13 A new-1242030-removing-a-Seek-rev03.patch: Bug 1242046 - Removing unnecessary |Seek| that caused the C-C TB to operate slowly in ter...
14 A new-1242030-disable-safe-stream-base-search-local.patch: 
15 U add-stream-tracing-macros.patch: adding stream value tracing macros
16 U add-trace-local-pop3.patch: adding stream trace macro to pop3 files under local directory
17 U add-trace-local-misc.patch: adding stream trace macro calls to files under loca (not pop3)
18 U add-stream-trace-to-base-dir-rev01.patch: add stream trace macro invocations to files under 'base' dir
19 U add-stream-trace-to-imap-dir-rev01.patch: add stream trace macro invocations in files under 'imap' directory
20 U add-and-fix-dump-messages-to-local-misc.patch: 

In the numbering above, 4-14 are essential.
But if someone tries to analyze what is going on regarding the file stream operations,
the debug dumps in 14-20 are essential.
They are part of the latest job submission.
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=622660c032ef4334223ac3a5baa750973128e6b4

The errors I have noticed until yesterday are all intermittent category. I have weeded out permanent
ones and I verified the error recovery operation using a simulated I/O error (mostly write error) by
disconnecting net cable to the remote CIFS server where POP3 local messages are stored.
I found out that the error handling by safe I/O stream HAD TO BE DISABLED for the crude error recovery to work reasonably well. (I will write more on the next comment.)
That disabling is done in "14 A new-1242030-disable-safe-stream-base-search-local.patch".

I am uploading these to show the patch efforts are kicking and alive also to report
the error recovery does not work with the unmodified C-C very well.

BTW, I have done the initial performance check and they show the patch as expected.:
the number of write system calls was reduced, and write performance (copying/moving messages to a different folder) is faster.
As a matter of fact, I noticed xpcshell test executed sequentially (--sequential) now is much faster: it used to take about 1200 seconds on my PC, but now it is 900+ seconds. I think the 4GB folder copying operation and others suffer most from the unbufferred message write operation. But more detailed figures later.

Attachment #9107847 - Attachment is obsolete: true
Attachment #9107848 - Attachment is obsolete: true
Attachment #9107851 - Attachment is obsolete: true
Attachment #9107852 - Attachment is obsolete: true
Attachment #9108908 - Attachment is obsolete: true
Flags: needinfo?(ishikawa)
Attachment #9279703 - Attachment description: Fixing a typo in a macro used by the patches in the folowing. → new-1242030-precond-000-nsMsgUtils.h-typo-fix.patch: Fixing a typo in a macro used by the patches in the folowing.

I had to make sure that the removal of seek/tell method from a data type (bug 1719996) is OK with the buffered write approach.

Even with Ben's latest cleanup patches, there are seeks/flushes that need to be removed and this is one of the crucial removals.
(I checked the latest C-C TB and it did NOT use the buffered stream although Ben's patch DID start with buffered stream. Unnecessary seeks and flushes nullified the effect of buffering.)

After a lot of head scratching, I had to disable the use of safe stream to handle error condition.
Safe stream has been introduced by Ben's patch to hide the error(s) under the rug, so to speak.

Unfortunately, it did not work well when I tested TB's behavior.
The current C-C without my patches here

  • cannot recover from simulated write error during POP3 download.
    I unplugged the network cable to a CIFS server where local messages are stored.
    TB shows an error dialog, but as soon as I hit the dialog, it shows another dialog, ad infinitum. This is bad. A user ought to be able to recover from such an error, at least be able to quit TB quickly to see what is causing the I/O error..
    With my patch,
  • TB recovers from the simulated write error during POP3 download.
    I unplugged the network cabe, and then TB shows the error dialog.
    But as soon I hit the OK in the dialog and restores the network cable, it recovers and
    waits for the next user operation (most likely the user wants to quit TB to investigate what is causing the I/O error).

I had expected the use of Safe Filestream to gets TB looking as if it were hung due to repeated
reading operation even after write error(s), but not this endlessly repeated error-dialog.

I think we may be able to insert proper check and recovery even when we use safe file stream to recover nicely. BUT, that negates the original intention of introducing safe stream and
as a matter of fact, inserting proper check and recovery are exactly what my patches try to do.
I found out the programming mistake regarding pop3 error recovery when I started to work on buffered write. (Basically it is the failure of proper error checks, and the mis-coding regarding the setting of crucial file pointer to nullptr to signal the I/O error to upper-level routine. Once these were corrected, the crude POP3 error recovery started to work and it is working with the current patch.)

Given the disabling of the use of safe stream can be done with several #if/#else/#endif and insertion of a few dozen lines of code only in a clean manner, I think we should seriously consider reverting to the legacy pop3 error recovery framework.
I don't think repeated error dialog that prohibits a user from quitting TB in the face of write error is
acceptable to general user community. Error dialog is fine, but one cannot simply quit TB cleanly.

There are a few more issues concerning error recovery.
I tested the TB's operation using the following setup.

I created a very small partition 50MB in a remote CIFS file system and
created a symlink from the ordinary message store position to this CIFS.
I could have created a different profile, but using symlink is much easier.
I tested TB under linux.
The size was small so that I can test the file system full condition during testing.
I sent 240+ e-mails to a mail test account and checked the operation.
What I found so far.:

  • Filesystem full condition was properly caught.

  • Simulated write error during pop3 message downloading by removing the remote CIFS server's network cable caused
    unpatched C-C TB to enter endlessly shown error dialog loop.
    (I have to send enough e-mails so that the downloading from the local POP3 server takes time to allow me the cable-unplugging operation before all the messages are downloaded.)
    Even if I restored the network cable, TB could not recover to the state
    where the modal error dialog is not shown so that the user can choose the menu to finish TB (or click the X button on the window frame.)
    I had to kill TB from another terminal using kill PID-of-TB.

    Patched C-C TB showed the error dialog and once I hit OK button in the dialog and restored the cable (<- I am not sure I needed to restore the cable back when I tested the original patch a few years ago), TB was ready to receive user input and so I could quit TB using mouse.

There are a few more issues with the current C-C TB.

  • I noticed the discrepancy of error messages shown during
    -- the copying of messages from Inbox to Trash, and
    -- "Deletion" of messages form Inbox.
    You see they are actually the same operation. Deletion is the "Moving" of messages to trash holder.
    I get different messages suggesting there may be two paths, and more to the point, it seems that the checks done are not quite the same.
    I need to investigate.
    My current plan is to investigate the two paths mentioned and if there are indeed two paths, at least we should
    implement the same level of I/O error checks in these paths.
    (I would investigate both the "file system full condition" and "write error during moving operation" errors for the above operations).

I really wish there is a good harness to simulate I/O errors so that TB can be tested to make sure it shows robust behavior in the face of such errors.
Using a remote file system and a network proxy to access that remote file system may be a way to go.
I wonder if something like that can be implemented on tryserver.

Actually, under linux, it may be rather easy.
The proxy can be turned off to simulute a network error and thus causing file system read error.
(However, we cannot really co-relate the timing of I/O error with TB's internal operation in fine grained manner.
So only the brute force approach of causing I/O error randomly during
the repetition of sending e-mails to TB, copying mails from one folder to the other, deletion of e-mails, compacting folders, etc.
is the way to trigger and test error handling. This is what I did several years ago over a few weeks. Back then I did not bother to create the proxy. I manually plugged and unplugged the cable. Well, actually back then I used VM Player and used its network on/off menu.)

Words of wisdom to those who may want to try such a setup.
Set the timeout of the network file system very short. Usually they are set like 90 seconds, etc.
You have to wait for a long time before the error is returned to TB. I set CIFS time out to 15 seconds or so.
Otherwise I have to spend days simply to obtain the preliminary error recovery test result I am reporting.
For CIFS, echo_interval is the ping interval. If CIFS daemon decides two consecutive ping did not return in time,
it will begin error processing. But with 5 seconds time interval, I get 15 seconds timeout. It thought it would be 10 second timeout.
Such is life. The following is the mount command I use for mounting a CIFS server (actually a router with a function to make U
SB memory stick available as CIFS file system.)

mount.cifs "//192.168.0.201/Drive(A1)" /tmp/L-temp -o  username=admin,noperm,vers=1.0,sec=ntlmv2,echo_interval=5

In any case, when the abnormal error recovery behavior is observed, the only way to observe what is going at the I/O routine level is to trace
what is happening to key file stream pointers.
This is what the following patch attachments do.

Eventually, I may want to put them behind MOZ_LOG, but during this development phase, I really have to
correlate the operation with other debug messages coming from NS_WARNING, NS_ERROR that are in the C-C code already.
So I need to dump them using these macros or simple |fprintf(stderr, ...)|.

Attached patch add-stream-tracing-macros.patch (obsolete) — Splinter Review

Add macros for tracing file streams.

It may be easy to turn them to use MOZ_LOG eventually.
For now, it uses an existing macro and stdout/stderr output.

At least, rewriting the code to use MOZ_LOG would be contained in this file.
That is the intention of introducing these macros.

Attached patch add-trace-local-pop3.patch (obsolete) — Splinter Review

adding calls to stream trace macro in Pop3-related files in local/src.

Attached patch add-trace-local-misc.patch (obsolete) — Splinter Review

adding stream trace macros in files under non-pop3 files in local/src.

I wanted to clean up this pot pourri of dump messages, but figured it would be more important
to raise the issue of non-working error recovery of the current C-C TB.

Fixing the dump message can wait until we figure out what to do with the
failing error recovery.

Comment on attachment 9279704 [details] [diff] [review] new-1242030-DONT-USE-REUSABLE-new-part-1-rev04-with-the-original-uuid.patch Review of attachment 9279704 [details] [diff] [review]: ----------------------------------------------------------------- I think we should start trying to land some of these. This one is a good one to start with, to remove `aReusable` from nsIMsgPluggableStore.getNewOutputStream(). My suggestions: - break out the debug printfs into a separate patch (for developer use rather than to land). - This patch doesn't update the mbox and maildir GetNewOutputStream() implementations? I assume there's another patch which does that? In any case, those changes would need to be rolled into this patch. - I think that once getNewOutputStream() is called, it should be a concrete requirement that either `finishNewMessage()` or `discardNewMessage()` should be called with the returned stream. This is the only way I can see the messageStore being able to clean up properly (either after a success or an error). This implies that passing a null stream to `finishNewMessage()` or `discardNewMessage()` should be considered an error. Also, I'd recommend using phabricator to handle patches, if you're up for it. It's now the main tool we use and it seems pretty good at handling WIP patches which are still changing, and stacks of dependent patches. It's not critical - the old attach-the-patch-to-the-bug method still works, but you might find phab makes it easier. ::: mailnews/base/public/nsIMsgPluggableStore.idl @@ +129,5 @@ > * a duplicate. This gives the berkeley mailbox store a chance to simply > * truncate the Inbox w/o leaving a deleted message in the store. > * > + * discardNewMessage closes aOutputStream always unless the passed stream > + * is nullptr due to error processing.. Should we really allow a null aOutputStream to be passed in? If something goes wrong, the mailStore might need to do some cleanup (eg truncate the mbox file back to how it was, or delete temporary files or whatever). And at the moment this is the only way for the mailstore to be notified of a failure. @@ +130,5 @@ > * truncate the Inbox w/o leaving a deleted message in the store. > * > + * discardNewMessage closes aOutputStream always unless the passed stream > + * is nullptr due to error processing.. > + * (Clarification/Rationale in Bug 1121842, 1122698, 1242030) You don't need the bug numbers here (If you really want to leave a trail maybe put them in the commit message instead?). ::: mailnews/base/src/nsMsgDBFolder.cpp @@ +760,5 @@ > + static_cast<uint32_t>(rv)); > +#endif > + // Return early: If we could not find an offline stream, clear the offline > + // flag which will fall back to reading the message from the server. > + if (mDatabase) mDatabase->MarkOffline(msgKey, false, nullptr); I totally want to kill this MarkOffline() this far into the stack. It should be handled back out by IMAP/news code which can deal with the error properly. (related to Bug 1681487 et al) But yeah. For now this is the same as the other failure paths, so needs to be kept in :-) @@ +1721,5 @@ > mDatabase->MarkOffline(messageKey, false, nullptr); > // we should truncate the offline store at messageOffset > ReleaseSemaphore(static_cast<nsIMsgFolder*>(this)); > + if (msgStore) { > + // DiscardNewMessage now closes the stream all the time. Document the current behaviour rather than history, eg: ``` // DiscardNewMessage() will close the stream. ``` @@ +1747,5 @@ > + } // if (seekable) > + > + rv1 = rv2 = NS_OK; > + if (msgStore) { > + // FinishNewMessage now closes the stream ALL THE TIME. Again, don't need to document the history, eg: ``` // FinishNewMessage() will close the stream. ```
Attachment #9279704 - Flags: feedback+

(In reply to Ben Campbell from comment #288)

Comment on attachment 9279704 [details] [diff] [review]
new-1242030-DONT-USE-REUSABLE-new-part-1-rev04-with-the-original-uuid.patch

Thank you for the comment.

Review of attachment 9279704 [details] [diff] [review]:

I think we should start trying to land some of these.
This one is a good one to start with, to remove aReusable from
nsIMsgPluggableStore.getNewOutputStream().

My suggestions:

  • break out the debug printfs into a separate patch (for developer use
    rather than to land).

That is what I intend to do.
Either remove them into separate patch as I did for the later stream tracing macro calls, or
maybe hide them behind MOZ_LOG and remove #if/#endif altogether.

  • This patch doesn't update the mbox and maildir GetNewOutputStream()
    implementations? I assume there's another patch which does that? In any
    case, those changes would need to be rolled into this patch.

I might have missed that. Let me investigate.

  • I think that once getNewOutputStream() is called, it should be a concrete
    requirement that either finishNewMessage() or discardNewMessage() should
    be called with the returned stream. This is the only way I can see the
    messageStore being able to clean up properly (either after a success or an
    error). This implies that passing a null stream to finishNewMessage() or
    discardNewMessage() should be considered an error.

That is a good point.
Problem is this.

  • Old error handling strategy adopted during pop3 download was to set the file stream to nullptr to signify an error was detected.
    This is how the overall pop3 download handler kept track of the error situation in C++ code.
    My patch followed this practice.

    I realized there is a code to truncate the half-cooked message portion that was attached to the end of a folder (presumably Inbox?).
    That won't work to be sure.

    However, I had to disable the newer error handling using safe-stream because it did not work well with the real-world error situation.
    C-C withtout my patch repeated showed error dialog and I could not quit TB cleanly. (See comment 281).
    I think there may be a statement to break out of the normal processing when an error occurs.

    And as a matter of fact that disabling safe-stream was the only way to make the old error processing scheme to work.
    That is the patch
    new-1242030-disable-safe-stream-base-search-local.patch
    https://bugzilla.mozilla.org/attachment.cgi?id=9279713

We may have to investigate why unpatched C-C TB loops with endless error dialog display even when the transient error caused by the
unplugging of network cable to the remote CIFS server where the message is stored clears up (the cable is plugged again.).

But I do recognize that

  • removal of reuseflag, and
  • desirable error handling is a separate issue.
    Let me think about how to extract the removal of |reuseflag| from the patch set, and if possible, create a patch set for purely removal of reuseflag.
    Yeah, in that case, probably I would move the error checks to later patch set(s).

Also, I'd recommend using phabricator to handle patches, if you're up for
it. It's now the main tool we use and it seems pretty good at handling WIP
patches which are still changing, and stacks of dependent patches. It's not
critical - the old attach-the-patch-to-the-bug method still works, but you
might find phab makes it easier.

This is what was suggested during a bugday discussion by Thomas D, and Wayne.
Maybe I should ask for a phabricator initiation session or something like that with all the policy issues clearly explained.
I tried to use phabricator when it was introduced several years ago
because it was announced to be the way to go but it was a premature announcement.
Not all the details regarding some setup that needed to be followed based on the policy decisions by administrators were in place and I did something out of line with the would be practice and so I decided to stay away from it thinking "for a while". I found out later that internal mozilla introdcutory seminars were held later, presumably to fill in those missing information regarding policy issues.
Since TB, I mean C-C developement, did not adopt phabricator immediately, that "for a while" continued until now. :-(

I am a carried-over from the old era when HG MQ extension was recommended, and so I need to switch to newer HG command practice to begin with before attempting to use Phabricator? (Or HG MQ style patch submission still work. But I digress. This probably needs separate talk.)

::: mailnews/base/public/nsIMsgPluggableStore.idl
@@ +129,5 @@

* a duplicate. This gives the berkeley mailbox store a chance to simply
* truncate the Inbox w/o leaving a deleted message in the store.
*
    • discardNewMessage closes aOutputStream always unless the passed stream
    • is nullptr due to error processing..

Should we really allow a null aOutputStream to be passed in?
If something goes wrong, the mailStore might need to do some cleanup (eg
truncate the mbox file back to how it was, or delete temporary files or
whatever). And at the moment this is the only way for the mailstore to be
notified of a failure.

So obviously, in your new scheme, unlike in the legacy pop3 code, setting m_outputstream or some such to nullptr is not the way to signal that the error occurred somewhere in the low-level I/O system calls.
(Maybe we really should have an explicit m_errorflag or something that signifies an error has occurred.)

My temporal patchset here had to disable safestream-based error recovery, but as long as the filestream is non-null the
truncation happens, but I think there WERE cases when the stream was set to nullptr.
(Well, I am afraid have to simulate the error cases a few dozen times to cause the simulated error to occur an appropriate time to
cause a particular error situation where m_outputstream to nullptr and TB calls DiscardMessage. I thought I did this over a three weeks period, but as it turned out I had to do this over three months period. Error simulation to trigger various error modes to check the code took time. :-(
Unless I build a better error simulation mechanism that is tied to the timing/phase of TB processing, the timing of simulated error is pretty much random and cannot cause a desired situation to trigger a particular error processing deterministically.
We need to discuss this error handling situation further because the user is a bit perplexed unless C-C TB stops showing error dialog forever.
There is something wrong with the error recovery path, but I have yet to figure out where and how to avoid it.
(Thus, I simply adopted the old error recovery method. Well, to be exact, the code did not seem to work without disabling it with my patch set.)

@@ +130,5 @@

* truncate the Inbox w/o leaving a deleted message in the store.
*
    • discardNewMessage closes aOutputStream always unless the passed stream
    • is nullptr due to error processing..
    • (Clarification/Rationale in Bug 1121842, 1122698, 1242030)

You don't need the bug numbers here (If you really want to leave a trail
maybe put them in the commit message instead?).

I will take them out.

::: mailnews/base/src/nsMsgDBFolder.cpp
@@ +760,5 @@

  •        static_cast<uint32_t>(rv));
    

+#endif

  • // Return early: If we could not find an offline stream, clear the offline
  • // flag which will fall back to reading the message from the server.
  • if (mDatabase) mDatabase->MarkOffline(msgKey, false, nullptr);

I totally want to kill this MarkOffline() this far into the stack. It should
be handled back out by IMAP/news code which can deal with the error
properly. (related to Bug 1681487 et al)
But yeah. For now this is the same as the other failure paths, so needs to
be kept in :-)

Oh, beauty with legacy code. I will study 1681487 et al.

@@ +1721,5 @@

   mDatabase->MarkOffline(messageKey, false, nullptr);
   // we should truncate the offline store at messageOffset
   ReleaseSemaphore(static_cast<nsIMsgFolder*>(this));
  •  if (msgStore) {
    
  •    // DiscardNewMessage now closes the stream all the time.
    

Document the current behaviour rather than history, eg:

// DiscardNewMessage() will close the stream.

@@ +1747,5 @@

  • } // if (seekable)
  • rv1 = rv2 = NS_OK;
  • if (msgStore) {
  • // FinishNewMessage now closes the stream ALL THE TIME.

Again, don't need to document the history, eg:

// FinishNewMessage() will close the stream.

Once there are enough eyeballs with the innards and the history, i will take them out.
But this close or non-close was really a source of confusion coupled with reuseflag and I had to leave it in to
keep myself aware of the issues.

Thank you again for your comments, Ben.
I think I will try to

  • separate the "removal of reuseflag", and better I/O error checks necessitated by the introduction of buffered write (the error tends to be detected later due to the buffered write, and so Flush() and Close() need to be checked for errors, and actually there are so many places where error is not handled at all for Write(). It is pathetic.)
  • separate the debug dump or hide them behind MOZ_LOG and remove #if/#endif,

So my intention would be producing a series of patch sets as follows.

  • removal of reuse flag,
  • buffered write introduction with added error checks (but how to signal the presence of an error is a big question. The legacy pop3 code sets m_outputstream or some such to nullptr for this purpose. We may have to introduce an explicit m_errorflag or something. This requires some discussion. The solution to the looping of error dialog may be incorporated here or probably come as a separate patch.)
  • debug dump for developers.
  • enabling of buffered write
    (- disabling of the use of safe stream: could be removed depending how well the error handler is improved, but come to think of it, there ARE legitimate error cases that we can no longer write to the file system like the complete network error to remote CIFS server. So we must bear in mind the limit of what we can do. Our error recovery won't be perfect. My point is to let users quit TB cleanly in a manageable manner. That is what one needs most when we realize there is an I/O error and runaway execution of TB in such a case may tarnish the data worse. Seeing TB refusing to stop execution is a consternation in such a situation. :-(
    ...

Thank you again for your comment.

PS: s for phabricator, maybe I consult Wayne since he mentioned an introduction to phabricator session may be made available. :-)

Flags: needinfo?(vseerror)
Flags: needinfo?(benc)

(In reply to ISHIKAWA, Chiaki from comment #289)

  • Old error handling strategy adopted during pop3 download was to set the file stream to nullptr to signify an error was detected.
    This is how the overall pop3 download handler kept track of the error situation in C++ code.
    My patch followed this practice.

Ahh... I see.
I think that's the first thing to fix then!
I propose we open another bug to change pop3 error handling to use a separate variable to signify errors rather than just nulling out the filestream. And also to make sure it calls one of FinishNewMessage()/DiscardNewMessage() depending on outcome.
I'll leave the NI open on this one to remind me to look at the pop3 code on Monday and write up the bug, but feel free to open a bug yourself if you'd prefer :-)

After that, I think it'll be easier to reduce your patches down into nice little atomic chunks we can start landing.

I am a carried-over from the old era when HG MQ extension was recommended, and so I need to switch to newer HG command practice to begin with before attempting to use Phabricator?

(not to get bogged down in too many details) The moz-phab commandline tool just submits changesets in Hg, wherever they came from. I'd guess it'd happily pick up any patch you applied from MQ (I don't use MQ - I just keep all my local patches in anonymous branches off my tree. I use hg rebase to keep them up to date after pulls, and hg histedit to chop and change and combine them ready for landing).

(In reply to Ben Campbell from comment #290)

(In reply to ISHIKAWA, Chiaki from comment #289)

  • Old error handling strategy adopted during pop3 download was to set the file stream to nullptr to signify an error was detected.
    This is how the overall pop3 download handler kept track of the error situation in C++ code.
    My patch followed this practice.

Ahh... I see.
I think that's the first thing to fix then!
I propose we open another bug to change pop3 error handling to use a separate variable to signify errors rather than just nulling out the filestream. And also to make sure it calls one of FinishNewMessage()/DiscardNewMessage() depending on outcome.
I'll leave the NI open on this one to remind me to look at the pop3 code on Monday and write up the bug, but feel free to open a bug yourself if you'd prefer :-)

After that, I think it'll be easier to reduce your patches down into nice little atomic chunks we can start landing.

Thank you for you comment.

OK, let me see how this goes.
Locally, I have already separated the removal of reusable flag.
Worked fine locally. So I submitted a try-comm-central job.
See
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=1f00a8ed748893fe0f4aecaa229dfd28ae04e78e

So the next step is to introduce something like |bool m_errorflag|. Let me see how this will improve the situation.

BTW, my latest uploaded patches was based on C-C tree about a couple of weeks ago, and
it did not handle |SyncCopyStream| change and others.
It has been updated against the latest C-C as of yesterday.
Locally, it seems to work.

I am a carried-over from the old era when HG MQ extension was recommended, and so I need to switch to newer HG command practice to begin with before attempting to use Phabricator?

(not to get bogged down in too many details) The moz-phab commandline tool just submits changesets in Hg, wherever they came from. I'd guess it'd happily pick up any patch you applied from MQ (I don't use MQ - I just keep all my local patches in anonymous branches off my tree. I use hg rebase to keep them up to date after pulls, and hg histedit to chop and change and combine them ready for landing).

I think I will pick up my work flow. It seems we can name patches via bookmark. Again, I will see what I can.
There are many blogs or tutorials which are definitely OUT OF DATE. This is not good for a newbie.

TIA

(I made a serious typo of writing |m_outputstream| instead of the correct |m_outFileStream|. So I corrected this comment.)|

I filed bug 1773787
that takes care of the "remove reusable flag" issue first.

In the comment there, I wrote

The current plan I discussed with Ben is

1    land this "remove reusable flag" patch set.
2        Change the POP3 error handling code to use a separate EXPLICIT |m_error| flag instead of setting |m_outFileStream| to nullptr to signal an error occurred to the upper layer functions.
3        Then modify the code to properly enable buffered write.

Part-1 is here.
Part-2, -3 will be handled in bug 1242030 or maybe part-2 is handled in still another bugzilla entry.

One point. Changing the pop3 error handling code to use |m_error| flag is fine.
However, please note that setting |m_outFileStream| to nullptr gives the chance to upper layer functions to stop reading/writing data any more once there has been a fatal write error, say, and this make it possible for TB to quit downloading (or for that matter message copying operation in relevant places). This DOES make a difference for user experience when an error occurs.
I found the patch here makes TB return immediately after an error dialog popup appears so that I can decide what to do. I think many users would quit TB immediately if they realize file back end shows error behavior.
The stock C-C TB without the change somehow got into a repeated error dialog display and did not give me a chance to terminate TB cleanly.

So, I would try to disable read/write once |m_error| flag is raised as much as possible to mimic the early quit path that was taken when the |m_outFileStream| was set to nullptr.

Aha, it is |m_outFilesStream| and not |m_outputStream|. A serious typo.
So I fixed the previous comment.

Turns out
"2 Change the POP3 error handling code to use a separate EXPLICIT |m_error| flag instead of setting |m_outFileStream| to nullptr to signal an error occurred to the upper layer functions."
is relatively easy.
I can't recall the code of several years ago, but for the last few years, I only set "m_outFileStream" to nullptr when major operation occurs, basically when the final close is near.
So the change is about a few dozen lines of code.
Well, I decided to return from WriteLineToMailbox in nsPop3Sink.cpp immediately if m_error contains an error code. (I forgot to mention that I decided to store the last seen error code in m_error.)
Because it is not meaningful to continue writing if we have a write error once.

I think more work is necessary to modify code to clean up the dump statements for debugging purposes by using MOZ_LOG and other macro so that they are less obtrusive.

These should come after the patches in bug 1773787.

All sounds reasonable to me.

Flags: needinfo?(vseerror)
Attachment #9387526 - Attachment is obsolete: true
See Also: → 1882971
Depends on: 1882971
See Also: 1882971

I am going to push patches for this long cooking patchsets using new phabricator system to handle patches very soon.

Does anyone have an idea how to split them up?
Per directories?
Or simply a big chunk?
(Simply a big chunk may be easier to understand since I hold the removal of offending seek and final tweaking of buffering to the last patch which will change a few lines in the preceding patchsets.)

Currently, mail bulk of the patchsets are per directory since many years ago it was suggested a big chunk patch was hard to review, but now that phabricator system is in place, that may no longer to be true.

Here is a list of HG MQ patches relevant to this bugzilla.
I mark the patches that are relevant to this bugzilla with a leading "*" character.

* 4 A DONT-USE-RESUABLE-part-2-DEBUG-PRINT-FOR-DEVELOPERS-REv04.patch: Bug 1773787 - Removing reusable flag...
  5 A malloc-opt.patch: call malloc_opt()
  6 A 1677202-rev02.patch: Bug 1677202 - avoid accessing released heap, coping with timing issue.
  7 A CopyToNative-return-value-check.patch: CopyToNative return value check (one indirect)
  8 A add-check-msgData-Windows-failure-Rev01.patch: add-check-msgData-Windows-failure
* 9 A new-1242030-precond-000-nsMsgUtils.h-typo-fix.patch: Fixed a typo
*10 A new-1242030-DONT-USE-REUSABLE-new-part-1.8-base.patch: an added error handling in SyncCopyStream
*11 A new-1242030-DONT-USE-REUSABLE-new-part-2-imap-JK-v1-rev04.patch: bug 1242030: DONT-USE-REUSABLE Impro...
*12 A new-1242030-DONT-USE-REUSABLE-new-part-3-import-JK-v1-rev04.patch: bug 1242030: DONT-USE-REUSABLE Imp...
*13 A new-1242030-DONT-USE-REUSABLE-new-part-4-local-less-pop3-rev04.patch: bug 1242030: DONT-USE-REUSABLE ...
*14 A new-1242030-DONT-USE-REUSABLE-new-part-5-pop3-rev04.patch: bug 1242030: DONT-USE-REUSABLE new part-5 ...
*15 A new-1242030-enable-buffering-1st-step-rev03.patch: Bug 1242042: Enabling buffering for file stream to...
*16 A new-1242030-removing-a-Seek-and-tweak-buffering-rev03.patch: Bug 1242046 - Removing unnecessary |Seek...

No 4 is to print the entry to a few functions that are essential to understanding the message insertion to a folder.
It is indispensable to check for the errors that may happen with Windows and Mac OS which I cannot test locally since develop patches and test them under linux.
When an error occurs on treeherder for Windows and OSX, I need to understand exactly where and the history of function calls that have happened immediately before the error occurs.

I may want to merge 15 and 16 above although they are for bug 1240242 and 1242046 originally. The have been mutated and worked on so much so that the merging of the two would be easier to understand since the code inserted in 15 is reverted and changed in 16.
Getting the buffering work correctly in C++ and now in JS land has been very tricky.

I need to test the error failure mode of my patch in the next couple of weeks, but before doing that I may have to fix Bug 1881984 first. Or my testing error conditions may not proceed very well.

Ben, do you have any preference about the splitting/unsplitting of the patches?

EDIT: I forgot to request info from Ben.

Although there are still some error handling issues uncovered by my brutal unplug/plug network cable method (messages stored on remote CIFS server), the write buffering does work well.
For errors, see
bug 1887532 (more likely CIFS kernel bug),
bug 1886360,
bug 1886349,
bug 1881984

The speed of download from the local mailserver (on the same PC) is now as fast as I remember from my testing a few years ago once I figured out where to enable the buffering after Ben's patch landed.
I also found that the copy of messages (actually moving that uses the same code for copying) uses buffered write.

Here is the data.
With the patched C-C TB to use buffered write,
I downloaded 304 messages of various sizes, some of them are much larger than 64KB (the internal buffer I used).
One of these days, even 128KB does not sound too large, though. I wonder what everyone thinks about this.
Some of the are moved to a folder using message filter.

max value of write length is:  65536
count of write calls is :  16050  <---  13248 calls are used by one byte write... Something is fishy.
Total bytes (WRITE):  5002659
Length of write: # of such calls
                1: 13248    <--- This is a one octet write that seems to be used within UI or
                                       among TB's various threads for communication?
                                        I feel it is a bit too excessive.     Remember this is only for duration of 10-20 seconds.
                                        This certainly causes context switches and  should be reduced as much as possible.                               
                                        It may be worth looking at after the buffered write patch lands.
                                        Now I have a suspiciion that this large amount of write calls interferes definitely cause
                                        performance issues under heavy workload, but more pointedly
                                        bug 1880148
                                        the timing of C-C TB under strace or gdb tracing may be skewed much with this many calls and
                                        thus proper GUI interaction may not take place. Especially strace. It has to dump data for each                
                                        write call. 
                                        Ugh... But I need someone in the know to tackle                 
                                        this issue. I think reduction by a factor of 10 would be the first goal.

                2:   305  <--- # of messages plus one.
                3:     1
                4:     0
                5:     1
                6:     0
                7:     0
                8:    16
                9:     1
               10:     1
               11:     0
               12:     1
               13:     4
               14:     2
               15:     1
               16:     2
               17:     0
               18:     0
               19:     1
               20:     0
               21:    79
               22:     0
               23:     0
               24:     1
               25:     2
               26:   306  <-- the # of messages plus one.
               27:     1
               28:     2
               29:   305    <--- This and the next line and the previous size 2 write of 305 counts are related to the # of 
               30:   304           message downloaded (304), I think.
               31:     0
               32:     0
               33:     0
               34:     0
               35:     0
               36:     1
               37:     1
               38:     1
               39:     0
               40:     2
               41:     0
               42:     1
               43:     1
               44:     1
               45:     0
               46:     3
               47:    77
               48:     0
               49:     1
               50:     0
               51:     1
               52:     1
               53:     0
               54:     1
               55:     0
               56:     0
               57:     2
               58:     5
               59:     1
               60:     0
               61:     0
               62:     0
               63:     0
               64:     0
               65:     0
               66:     0
               67:     2
               68:     2
               69:     0
               70:     0
               71:     1
               72:     1
               73:     2
               74:     2
               75:     1
               76:     0
               77:     1
               78:     2
               79:     2
               80:    17
               81:     1
               82:     4
               83:     1
               84:     1
               85:     0
               86:     0
               87:     0
               88:     0
               89:   381   <--- This is very likely the write of the header line. Not sure if it can be buffered. but I suspect
               90:     0            we can let it be for the time being. 
                                       This is peanut compared with the number of size one write calls.
               91:     0
               92:     0
               93:     1
               94:    68
               95:     1
               96:     0
               97:     0
               98:     0
               99:     1
              100:     0
              101:     1
              102:     1
              103:     0
              104:     0
              105:     0
              106:     2
              107:     0
              108:     0
              109:     1
              110:     0
              111:     0
              112:     2
              113:     0
              114:     0
              115:     3
              116:     1
              117:     3
              118:     1
              119:     0
              120:     1
              121:     0
              122:     1
              123:     0
              124:     1
              125:     0
              126:     1
              127:     0
2^ 7 - (2^ 8 - 1):     0
2^ 8 - (2^ 9 - 1):    76      <--- So most of the write are now buffered and thus there is no more less 80 octet lines.
2^ 9 - (2^10 - 1):   152           Buffered.
2^10 - (2^11 - 1):   198                //
2^11 - (2^12 - 1):   180                //
2^12 - (2^13 - 1):   160                //  
2^13 - (2^14 - 1):     0
2^14 - (2^15 - 1):     9                  //
2^15 - (2^16 - 1):    20                 //
2^16 - (2^17 - 1):    60    <--- 64KB write is counted here.
2^17 - (2^18 - 1):     0
2^18 - (2^19 - 1):     0
2^19 - (2^20 - 1):     0
2^20 - (2^21 - 1):     0
2^21 - (2^22 - 1):     0
2^22 - (2^23 - 1):     0
2^23 - (2^24 - 1):     0
2^24 - (2^25 - 1):     0
2^25 - (2^26 - 1):     0
2^26 - (2^27 - 1):     0

Some of the buffered writes are from mork and other internal db operation.

The performance win for a data that contains mime-encoded PDF or photo files of 1MB size would be more compelling.
If 64KB write is not buffered, I think from the way the test data is created, it would be like 60000 write calls instead of the current 60 calls. (for 64KB write alone. Count the smaller buffered write, and the # of writes would be very large.)

Basically, the context switch overhead by the write calls kills TB performance during download and message copying, etc.

This is an interim reporting. I will report the similar data with the current TB without my buffered patch.

The following data is with the Thunderbird 115.7.0 64-bit under linux.

Oh, I forgot to mention that my patched version dumps copious amount of debug message to the invoking tty window.
YET, the download was much faster.
The download with the current 115.7.0 is visibly slow.
Of course, the slowness is acute when you download from the pop3 server on the same PC. But the speed up for
moving/copying messages improves much since it is a local issue.

For data collection, the same e-mails were received as in the previous comment and data is recorded.
Mork and sqlite writes are hard to control, so we should focus on the # of unbufferred calls which have
increased the count of write calls is to approximately 66K from 16K.
This causes the slowness due to context switches.

max value of write length is:  32768 <--- I think morq and sqlite use relatively conservative buffer size.
count of write calls is :  66065
Total bytes (WRITE):  8558101
Length of write: # of such calls
                1: 14541
                2:     0
                3:     0
                4:   102
                5:     0
                6:     0
                7:     0
                8:    26
                9:     0
               10:     0
               11:     0
               12:     3
               13:     0
               14:   128
               15:     0
               16:     0
               17:     0
               18:     0
               19:     0
               20:     0
               21:     4
               22:     0
               23:   304
               24:   608
               25:   304
               26:     0
               27:     0
               28:   304
               29:     0
               30:   608
               31:     0
               32:   304
               33:     0
               34:     0
               35:     0
               36:     0
               37:     0
               38:   608
               39:   320
               40:     0
               41:     0
               42:    16
               43:    16
               44:     6
               45:     3
               46:    41
               47:    49
               48:   161
               49:   412
               50:     0
               51:     0
               52:   304
               53:     0
               54:     0
               55:     0
               56:     0
               57:     0
               58:     0
               59:     0
               60:     0
               61:     0
               62:     0
               63:     0
               64:   304
               65:     0
               66:     0
               67:   304
               68:     0
               69:     0
               70:     0
               71:     0
               72:     0
               73:     0
               74:    64
               75: 45744   <--- This is the result of unbuffered write. We lose performance wise.
               76:     0
               77:     0
               78:     0
               79:     0
               80:     0
               81:     0
               82:     0
               83:     0
               84:     0
               85:     0
               86:     0
               87:     0
               88:     0
               89:     0
               90:     0
               91:     0
               92:     0
               93:     0
               94:     0
               95:     0
               96:     0
               97:   304
               98:     0
               99:     0
              100:     0
              101:     0
              102:     0
              103:     0
              104:     0
              105:     0
              106:     0
              107:     0
              108:     0
              109:     0
              110:     0
              111:     0
              112:     0
              113:     0
              114:     0
              115:     0
              116:     0
              117:     0
              118:     0
              119:     0
              120:     0
              121:     0
              122:     0
              123:     0
              124:     0
              125:     0
              126:     0
              127:     0
2^ 7 - (2^ 8 - 1):     0
2^ 8 - (2^ 9 - 1):     0
2^ 9 - (2^10 - 1):     5  <--- This and other  buffered writes are from morq and sqlite3, I think. 
2^10 - (2^11 - 1):     2
2^11 - (2^12 - 1):     2
2^12 - (2^13 - 1):     2
2^13 - (2^14 - 1):     0
2^14 - (2^15 - 1):    20
2^15 - (2^16 - 1):   142
2^16 - (2^17 - 1):     0
2^17 - (2^18 - 1):     0
2^18 - (2^19 - 1):     0
2^19 - (2^20 - 1):     0
2^20 - (2^21 - 1):     0
2^21 - (2^22 - 1):     0
2^22 - (2^23 - 1):     0
2^23 - (2^24 - 1):     0
2^24 - (2^25 - 1):     0
2^25 - (2^26 - 1):     0
2^26 - (2^27 - 1):     0

Again, this is an interim report.
I hope I can resolve the error issues I uncovered during my preliminary testing and
get this patch landed very soon.

I NI'ed Ben so that he knows that things are moving in the right direction as far as bffuering goes.

Flags: needinfo?(benc)
Flags: needinfo?(benc)
Keywords: perf

How is phab going?

Depends on: 1773787
Flags: needinfo?(benc) → needinfo?(ishikawa)

(In reply to Wayne Mery (:wsmwk) from comment #301)

How is phab going?

Short summary:
Finally, after about three weeks,
I have locally verified that the simulated (transient NFS) I/O error is not trapping TB any more.
So my added error checks and modification such as the modification of mork to avoid crash are not causing any major issues as far as error processing is concerned after all.
So I have updated my source tree today [the test I described needed a fixed binary sort of] and I am going to run the I/O simulated error tests once more, and
if all is well, I will create a phab patch. That would be a bit huge patch set, but that is what phab is meant to handle.
Also having the patch available for other people to read is a great way to obtain feedback.

Long summary:
6-7 years ago, I manually plugged/unplugged network cable (or virtually toggle the network interface of my virtualbox where my Linux instance runs) to simulate network issues to cause I/O errors.

However, I could never be so sure of the coverage of such manual testing.
Manual or virtual unplugging/plugging of network cable cannot be finely controlled to cause a simulated error at a particular place in the source code. No matter how I hard I try manually, it won't cover much of the code base.

So this time around, I got fed up with this manual error simulation especially after I saw TB crash so easily after a simple I/O simulated error happened. (Mork was the problem.)
I needed test method to make error happen at a finer resolution and reliably and in a repeatable manner so that I could fix any easy-to-catch such serious errors that may have been introduced by my buffered write patch.

How do I automate the test process?
I studied a bit and I selected and used a GUI interaction simulation tool called sikulix
available under linux and windows ( http://sikulix.com/ ) to simulate what a user will do to download messages and handle error/warning dialogs that may be displayed after an I/O error occurs.
This automation was necessary after I realize that I probably needed to test hundred of test cases reliably.

I had to do the following.
I need to instrument the C-C TB by incorporating a macro at the entry of functions where
I would like I/O error to occur at the resolution of a function entry (at the N-th invocation of the group of functions monitored by the inserted macro at function entry.).
I studied a bit and selected a few dozen functions to instrument by a macro that will invoke an external shell script
with the
invoked function name, source code file, and a internally incremented counter value which is incremented by one every time the macro is called.

The test scenario is as follows.
I send four e-mail to test a mail account. The account stores e-mails on a remote NFS server.
I download four e-mail messages by C-C TB.
One message is filtered into Trash folder by filter rule.
The other three messages are filtered to different folders.
I cause a simulated network error at the specified N-th invocation of a specified function during download.
Before the mork patch, C-C TB would crash one way or the other if network is disabled during download.

Basically, I do the following.

For N in my specified list of numbers ( to cause the I/O error at the N-th invocation of monitored functions):
Define N as the timing for the simulated I/O error.
The externally called shell script manages an EXTERNAL counter which is zeroed when BeginMailDelivery is called and
is incremented after each call to the script.
The script compares this external counter to the value of N.
When the comparison succeeds, linux's network host adaptor is disabled and network error occurs for NFS read/write.
Thus, N is the delta of the internal counter value between BeginMailDelivery is called and one of the functions monitored is called and a network error is introduced.

So, download e-mails.
During this, I/O error is simulated at the N-th function invocation as I describe above.
Because the network error is introduced, the download does not complete.
There will be error dialogs/warning dialogs.
My automated sikulix script clear these dialogs automatically.
(*** To my surprise, I found that much fewer error dialogs appear than I anticipated. This may need investigation.
It may be that some try/except pairs hide the errors unexpectedly. ***)

To repeat the loop in the known state, I clear various factors.
I download the e-mails again without network error (so that I can start the cycle with the mail server in known state).
I empty the folders to which e-mails are filtered into.
Emptying avoids the failure due to the broken MSF files for the folder.
Sometimes an e-mail message cannot be filtered into a folde by filter rule.
Yes, MSF files sometimes get corrupted after an I/O error.
My mork patch is not perfect. Important thing is NOT CRASHING so that user can continue.
I empty Trash folder (since I have about 400 test cases, i.e, list of numbers for N), I do not want to make Trash to
become very large.

The clearing of folders are done to make the folders in a known state for the next round of test.

END FOR

The test was meant to make sure that C-C TB does not crash during an simple transient NFS error during downloading (and message filter processing).
Finally after about 2.5 weeks, the initial round of test completed. TB no longer crashes in the above test scenario
with various N values.
I had to make the sikulix script very robust so that it can run hundred of tests without stopping at an error.
(The sikulix script can fail if it encounters a unknown dialog hiding known area of window, etc.)

There are pros and cons of this test approach.

Pros:

  • detailed simulation of I/O errors at the resolution of function entry.
  • I can cover the whole sequence of a test scenario by inserting the simulated I/O error at the timing of the function instrumented by the incorporation of a macro in the source code. I can specify the I/O error to happen at the N-th invocation of a chosen function.
  • I can let it run automatically while I work on other things. <- Very important. I think one whole cycle can finish in a day.

Cons:

  • the timing of the occurrence of the simulated (transient NSF) error is only at the entry of a chosen function (and N-th invocation in the whole test scenario.)
    This means that if a function has an series of write functions, say, only the first function's error is handled.
    The I/O error handling of subsequent I/O functions may not be tested if an earlier error causes the processing to be interrupted and the function returns early.
    (<--- To remedy this, the macro I inserted at the entry of function may needed to be inserted before each I/O function.
    But this may be excessive.
    We may want to catch the lower I/O function invocation ala DTRACE available in Solaris or FreeBSD instead of tweaking the source code of C-C TB. But I digress.)
  • Available sikulix has python engine that is based on major version 2.x.
    Thus later 3.y python module for invoking external subprocesses is not available and thus I am forced to
    interact with external shell script by way of creating a flag file to request the enabling of network host adaptor, or
    sending four e-mails, etc.

The framework works for now.
I can make the test scenario more complex like downloading 16 messages, etc.
And again, the test proceeds automatically except that the linux desktop is basically overaken by sikulix because it needs to access the screen (to recognize the menu/dialog, etc) and manages mouse pointer and X input events.
But I run linux in virtualbox hosted under Windwos01 and I can run applications like words/excel/firefox/thunderbird under windows10 while the test runs in the guest linux in virtualbox. This is nice. I can occasionally monitor the progress.

I may produce a writeup on this test setup.

Flags: needinfo?(ishikawa)

(In reply to ISHIKAWA, Chiaki from comment #302)

(In reply to Wayne Mery (:wsmwk) from comment #301)

How is phab going?

Short summary:
Finally, after about three weeks,
I have locally verified that the simulated (transient NFS) I/O error is not trapping TB any more.
So my added error checks and modification such as the modification of mork to avoid crash are not causing any major issues as far as error processing is concerned after all.
So I have updated my source tree today [the test I described needed a fixed binary sort of] and I am going to run the I/O simulated error tests once more, and
if all is well, I will create a phab patch. That would be a bit huge patch set, but that is what phab is meant to handle.
Also having the patch available for other people to read is a great way to obtain feedback.

Long summary:

.... omission ...

For N in my specified list of numbers ( to cause the I/O error at the N-th invocation of monitored functions):

I made an optimization of the list of N.
There are repetition of function sequences involving the header processing at each message downloading.
I only selected the value of Ns such that the first occurrences of such functions experience I/O errors for each message download and dropped other invocations that handle the subsequent header lines, etc.
This cut down the list of Ns to 266 for now.
The dropped value of Ns might have to be added back for testing eventually, but for initial attempt, running the test to completion covering as many different function as possible was first objective.

As for linux desktop taken over by sikulix, I think using a virtual desktop and running both sikullix and C-C TB inside the virtual desktop avoid this desktop interaction issue so that I can do other things under linux while the test procees.
As a matter of fact, I use virtual desktop to run C-C TB mochitest and I can whatever I want under linux while mochitest runs locally inside the virtual desktop.

I verified that in my simple test scenario and 278 points where a single I/O error is inserted during the test scenario execution, locally created C-C TB did not crash, and the message received/downloaded again after the error condition is reset, is not corrupted.

Before the patches of Mork so that it won't crash, I saw strange missing new line character and such (and worse randomly lost messages), but these problems are now completely gone at least for the simple scenario. I will test a bit more complex test scenario with more messages downloaded.

But, anyway, I am going to start posting the buffered write patch, but the integration is the next merge after the immediately approaching ESR release, of course.

Flags: needinfo?(vseerror)

That's a good plan.

Flags: needinfo?(vseerror)
Attachment #9407129 - Attachment description: Bug 1242030, buffered write with required error check → Bug 1242030, buffered write with required error check, r=BenC,mkmellin

(In reply to Wayne Mery (:wsmwk) from comment #305)

That's a good plan.

Wayne,
I have uploaded the buffered write patch set.

Let the fun begin.

Flags: needinfo?(vseerror)

I obsoleted old WIP patches.

My patch improves the performance by removing MANY write system calls by using a much larger buffered write.

The attachment shows how many write system calls are invoked for sending an attachment (/usr/lib/gcc/x86_64-linux-gnu/14/libgomp.a, 612712 octets long).

Before the patch, we needed to invoke write system call more than 11000 times.
(An attachment is turned into a MIME-encoded dta, and thus there are
11346 writes of 73 octet long.

After the patch, these disappears.
They are replaced with much fewer write system calls of 64KB and a smaller write for remaining data.)
There are writes due to mork and sqlite3.
So important thing is to notice the decrease of the # of write system calls.
The sheer number of system calls add overhead and longer elapsed time.
This is also true for copying messages between folders.

Under linux, the speedup due to my patch may not be realized by casual users.
But under Windows the speedup may be noticed by many, especially copying large number of messages between folders. (Download time could be affected by the trip time between the local machine and remote server. If the mail server is local, the speed up WILL BE noticeable.)

Under windows, there are so many tasks running and vying for CPU time slots that the overhead of context switch caused by write system call is very high.
So the speed up should be noticeable.

This is an anecdotal data, but there was a case when I copied messages using same set of messages
under linux and windows, the windows speedup was unbelievably high.
To start with, I had to increase the number of messages to the order of few hundreds since measuring the time to copy or move the messages by wristwatch requires the time to be in the 10 second range.
Under linux, with my patch, the copy time shrunk to less than five seconds. Basically, looking at the wristwatch and hitting the download button and checking the download to finish on the screen, I could tell it was less than five seconds.
Copying the same messages under Windows took more than 1min 30 seconds and
with my patch it shrunk below 10 seconds.

We should invoke as few system calls as reasonably possible for any application program.

Attachment #8847340 - Attachment is obsolete: true
Attachment #8847342 - Attachment is obsolete: true
Attachment #8847355 - Attachment is obsolete: true
Attachment #8852092 - Attachment is obsolete: true
Attachment #8881089 - Attachment is obsolete: true
Attachment #8904354 - Attachment is obsolete: true
Attachment #9279703 - Attachment is obsolete: true
Attachment #9279704 - Attachment is obsolete: true
Attachment #9279705 - Attachment is obsolete: true
Attachment #9279706 - Attachment is obsolete: true
Attachment #9279707 - Attachment is obsolete: true
Attachment #9279708 - Attachment is obsolete: true
Attachment #9279709 - Attachment is obsolete: true
Attachment #9279710 - Attachment is obsolete: true
Attachment #9279711 - Attachment is obsolete: true
Attachment #9279713 - Attachment is obsolete: true
Attachment #9279714 - Attachment is obsolete: true
Attachment #9279715 - Attachment is obsolete: true
Attachment #9279716 - Attachment is obsolete: true
Attachment #9279717 - Attachment is obsolete: true
Attachment #9279718 - Attachment is obsolete: true
Attachment #9279719 - Attachment is obsolete: true

tryserver job.
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=5c08ba4c1163e89ab0db4291c6c68a2e07b23154

I don't believe the patch introduces any new errors not seen by other people's jobs.

Also, locally, I have simulated light-weight I/O error by disabling network adapter while I store my message folders on a remote NFS server.
Once the NFS error occurs due to time out, I re-enable the network adapter to continue testing.

TB did not crash and crude transaction handling of pop3 download works.
(I have been testing this for the last few years, so it should be really error free.)
BUT, the error simulation is not perfect. I cannot let the error happen at arbitrarily designated position.
But the test coverage is much better now.
Crash due to mork was found and fixed earlier and since then, TB does not crash anymore after the transient one-time I/O error window is simulated.

The error simulation test is done under linux, but the hardening against the error should also be effective under Windows and OSX.
In any case, this is an improvement and if any deficiency is found in the future, that should be handled accordingly.

So the speedgain is definitely worth it and error handling is not worse than it is today.

From the viewpoint of long term maintenance, framework for transactional handling (rollback of internal DB in the face of error) needs to be hammered out.
There are places where I do check for low-level I/O code and print error message, but does not return the error code from the function where the error is detected. As of now, during my testing, I have not found ill-effect so far. But this is an area where future work may be needed once the patch lands.

But I have a set of patches for Bug 170606 which I would like to land after this patch.
Landing these patches will shrink my local MQ patch queue very much and I will be able to move to HG EVOLVE extension easily is my plan.

Attachment #9407129 - Attachment description: Bug 1242030, buffered write with required error check, r=BenC,mkmellin → Bug 1242030 - buffered write with required error check. r=BenC,mkmellin,babolivier

But I have a set of patches for Bug 170606 which I would like to land after this patch.
Landing these patches will shrink my local MQ patch queue very much and I will be able to move to HG EVOLVE extension easily is my plan.

I meant to say Bug 1170606.

The comments still refer to reusableflag which was removed in bug 1773787. Overall, buffered write was already implemented everywhere (for POP in bug 1882971), so the patch improves the error handling but doesn't change the overall behaviour. Or am I missing something?

Attachment #9407129 - Attachment description: Bug 1242030 - buffered write with required error check. r=BenC,mkmellin,babolivier → Bug 1242030 - buffered write with required error check. r=BenC,mkmelin,babolivier
Attachment #9407129 - Attachment description: Bug 1242030 - buffered write with required error check. r=BenC,mkmelin,babolivier → Bug 1242030 - buffered write with required error check. r=BenC,mkmelin,babolivier
Flags: needinfo?(vseerror)

(In reply to Yury from comment #311)

The comments still refer to reusableflag which was removed in bug 1773787. Overall, buffered write was already implemented everywhere (for POP in bug 1882971), so the patch improves the error handling but doesn't change the overall behaviour. Or am I missing something?

Sorry, I missed your comment.

I should modify the comment to drop the reference to reusableflag since it is indeed removed many months ago.
You are also right the bulk of buffering has been enabled by bug 1882971, and so the remaining issues are related to the
error handling (or the lack of it) in the code.

I wanted to show a stat of write system calls using the latest M-C/C-C.
I wanted to show the # of write system calls for unpatched and patched version.

Duh, I just tried to re-create the write system call statistics based on unpatch M-C/C-C tree locally.
Then I found a grave issue of Inbox corruption. (I have not seen this before. Maybe I have been just lucky.)
The way I created the stat in comment 308 is to record the number of syscalls and do some postprocessing.

I use "strace" command under linux to record the number of syscalls.

strace -f -o /tmp/tb.old-2 /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird -p

strace called this way captures every system call during the startup and execution of downloading of pending messages.
(I could have restricted the capturing to a few selected system calls instead of every system calls.)

Problem is that in the last several months or maybe longer, TB started to issue simply too many system calls that monitoring TB this way slows down the execution to the point that context menu (pull down) does not work reliably any more. This did not happen before. See Bug 1880148
I think there is something wrong in the GUI library. I suspect it is issuing too many system calls to check for mouse interaction under X Windows system.
Anyway, I recall I had to run thunderbird first and then attach strace while TB is running before I send a large attachment and receive the message from the local server. Otherwise, the strange bug may interfere with the operation. Tough.

So I simply quit the above "strace" command by hitting control-C.
This killed thunderbird, too.
Unfortunately, it was receiving prior messages and Inbox as far as I can tell is corrupt toward the end. I saw empty subject, etc. in the message window. I know I should not have killed thunderbird by Control-C.

But believe me, I have tested TB regarding these I/O and other error issues and had to stop TB by hitting control-C MANY times, but it never corrupt Inbox in this manner. (Like I said, I may have been simply lucky.) I think I need to investigate this before showing the comparison of write system call with or without my patches.
The reason I wanted to create a new stat was that there were write system calls of data size of
8, 46, 56, and 92 which were called more than 20 times during the sending and receiving of MIME data (libgomp.a) with the patched version (the dreaded large # of MIME line write system calls are now gone), and I wanted to
see how it goes with the unpatched M-C/C-C trunk.
Then I invoked rebuild thunderbird under strace and wanted to retry running thunderbird first ant then attach strace to it.
So I killed strace command and the above Inbox corruption happened.

My first guess is that we are missing a flush in appropriate place in the unpatched version.

I am looking at the Inbox using an editor.
It seems that my last sent attachment for stat collection ARE at the
end of Inbox. So I am not sure what is wrong and why my last sent messages with attachment show up as ghost messages of
empty subject, empty correspondents. (This is after repair and compaction.)

But I do find something very disturbing.
"The From - ..." line that separates each message now has a date information (resurrected by a recent patch).
BUT, it shows inappropriate date string.
EVERY "From -" line shows the date of the last time I tried repair folder or compaction, I think.
This is true for a message dated Tue, 23 Apr 2024 07:36:50 +09:00 (JST)
And as TB goes through repair/compaction the date string on "From -" line progress a bit. A second is added occasionally.
I am filing a bug although I am not sure what ill-effect it has aside from making the debugging of Inbox content a bit dificult.
It used to be I can simply look at "From-" to figure out the date, but now I have to wade through various headers to find out the timestamp. :-(

Back to the corruption issue.
I THINK we are missing a flush at appropriate places in the unpatched TB.
I will check if my patch here does the proper flushing at appropriate places. My patch was meant to avoid such issues.

The corruption may be due to how strace() intercepts write system calls, but I doubt it.

(In reply to ISHIKAWA, Chiaki from comment #312)

(In reply to Yury from comment #311)

The comments still refer to reusableflag which was removed in bug 1773787. Overall, buffered write was already implemented everywhere (for POP in bug 1882971), so the patch improves the error handling but doesn't change the overall behaviour. Or am I missing something?

Sorry, I missed your comment.

I should modify the comment to drop the reference to reusableflag since it is indeed removed many months ago.

I removed the reference to reusableflag completely before and now I removed the last reference to m_tempMessageStreamReusable in a comment.

You are also right the bulk of buffering has been enabled by bug 1882971, and so the remaining issues are related to the
error handling (or the lack of it) in the code.

I should clarify that the bulk of buffering of done by Ben's code, but I needed to tweak the code in a few places to make sure that the buffering of the writing to POP3 download takes place properly.
Now I can't recall exactly what it was, but the code here was necessary to obtain the statistics properly IIRC.
I won't be able to do much until the late December due to my day job getting busy.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: