Consolidated patch set from bug 1122698, bug 1134527, bug 1134529, bug 1174500
Categories
(MailNews Core :: Networking, defect)
Tracking
(Not tracked)
People
(Reporter: ishikawa, Assigned: ishikawa)
References
(Blocks 4 open bugs)
Details
(Keywords: perf)
Attachments
(13 files, 113 obsolete files)
187.21 KB,
application/pdf
|
Details | |
124.55 KB,
text/plain
|
Details | |
18.96 KB,
text/plain
|
Details | |
34.82 KB,
image/png
|
Details | |
36.70 KB,
image/png
|
Details | |
39.73 KB,
image/png
|
Details | |
36.79 KB,
image/png
|
Details | |
3.02 MB,
application/zip
|
Details | |
2.43 MB,
application/zip
|
Details | |
8.27 KB,
patch
|
Details | Diff | Splinter Review | |
163.16 KB,
text/plain
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
Decrease of write system calls during downloading / sending an attachment of 612712 octet long file.
17.27 KB,
text/plain
|
Details |
Assignee | ||
Comment 1•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Comment 2•9 years ago
|
||
Assignee | ||
Comment 3•9 years ago
|
||
Assignee | ||
Comment 4•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Comment 5•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Comment 6•9 years ago
|
||
Assignee | ||
Comment 7•9 years ago
|
||
Assignee | ||
Comment 8•9 years ago
|
||
Assignee | ||
Comment 9•9 years ago
|
||
Assignee | ||
Comment 10•9 years ago
|
||
Assignee | ||
Comment 11•9 years ago
|
||
Assignee | ||
Comment 12•8 years ago
|
||
Comment 13•8 years ago
|
||
Assignee | ||
Comment 14•8 years ago
|
||
Comment 15•8 years ago
|
||
Assignee | ||
Comment 16•8 years ago
|
||
Assignee | ||
Comment 17•8 years ago
|
||
Assignee | ||
Comment 18•8 years ago
|
||
Assignee | ||
Comment 19•8 years ago
|
||
Assignee | ||
Comment 20•8 years ago
|
||
Assignee | ||
Comment 21•8 years ago
|
||
Assignee | ||
Comment 25•8 years ago
|
||
Assignee | ||
Comment 26•8 years ago
|
||
Assignee | ||
Comment 27•8 years ago
|
||
Assignee | ||
Comment 28•8 years ago
|
||
Comment 31•8 years ago
|
||
Comment 32•8 years ago
|
||
Assignee | ||
Comment 33•8 years ago
|
||
Assignee | ||
Comment 34•8 years ago
|
||
Assignee | ||
Comment 35•8 years ago
|
||
Assignee | ||
Comment 36•8 years ago
|
||
Assignee | ||
Comment 37•8 years ago
|
||
Assignee | ||
Comment 38•8 years ago
|
||
Assignee | ||
Comment 39•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Comment 40•8 years ago
|
||
Assignee | ||
Comment 41•8 years ago
|
||
Assignee | ||
Comment 42•8 years ago
|
||
Assignee | ||
Comment 43•8 years ago
|
||
Assignee | ||
Comment 44•8 years ago
|
||
Assignee | ||
Comment 45•8 years ago
|
||
Assignee | ||
Comment 46•8 years ago
|
||
Assignee | ||
Comment 47•8 years ago
|
||
Assignee | ||
Comment 48•8 years ago
|
||
Comment 49•8 years ago
|
||
Assignee | ||
Comment 50•8 years ago
|
||
Comment 51•8 years ago
|
||
Assignee | ||
Comment 52•8 years ago
|
||
Comment 53•8 years ago
|
||
Comment 54•8 years ago
|
||
Comment 55•8 years ago
|
||
Assignee | ||
Comment 56•8 years ago
|
||
Comment 57•8 years ago
|
||
Assignee | ||
Comment 58•8 years ago
|
||
Assignee | ||
Comment 59•8 years ago
|
||
Assignee | ||
Comment 60•8 years ago
|
||
Comment 61•8 years ago
|
||
Comment 62•8 years ago
|
||
Comment 63•8 years ago
|
||
Assignee | ||
Comment 64•8 years ago
|
||
Assignee | ||
Comment 65•8 years ago
|
||
Assignee | ||
Comment 66•8 years ago
|
||
Comment 67•8 years ago
|
||
Comment 68•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 69•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 70•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 71•8 years ago
|
||
Comment 72•8 years ago
|
||
Assignee | ||
Comment 73•8 years ago
|
||
Comment 74•8 years ago
|
||
Comment 75•8 years ago
|
||
Comment 76•8 years ago
|
||
Comment 77•8 years ago
|
||
Assignee | ||
Comment 78•8 years ago
|
||
Assignee | ||
Comment 79•8 years ago
|
||
Comment 80•8 years ago
|
||
Updated•8 years ago
|
Assignee | ||
Comment 81•8 years ago
|
||
Comment 82•8 years ago
|
||
Assignee | ||
Comment 83•8 years ago
|
||
Assignee | ||
Comment 84•8 years ago
|
||
Assignee | ||
Comment 85•8 years ago
|
||
Assignee | ||
Comment 86•8 years ago
|
||
Assignee | ||
Comment 87•8 years ago
|
||
Assignee | ||
Comment 88•8 years ago
|
||
Comment 89•8 years ago
|
||
Assignee | ||
Comment 90•8 years ago
|
||
Assignee | ||
Comment 91•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 92•8 years ago
|
||
Assignee | ||
Comment 93•8 years ago
|
||
Assignee | ||
Comment 94•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 95•8 years ago
|
||
Assignee | ||
Comment 96•8 years ago
|
||
Comment 97•8 years ago
|
||
Comment 98•8 years ago
|
||
Assignee | ||
Comment 99•8 years ago
|
||
Assignee | ||
Comment 100•8 years ago
|
||
Assignee | ||
Comment 101•8 years ago
|
||
Assignee | ||
Comment 102•8 years ago
|
||
Assignee | ||
Comment 103•8 years ago
|
||
Assignee | ||
Comment 104•8 years ago
|
||
Comment 105•8 years ago
|
||
Assignee | ||
Comment 106•8 years ago
|
||
Comment 107•8 years ago
|
||
Assignee | ||
Comment 108•8 years ago
|
||
Assignee | ||
Comment 109•8 years ago
|
||
Assignee | ||
Comment 110•8 years ago
|
||
Assignee | ||
Comment 111•8 years ago
|
||
Assignee | ||
Comment 112•8 years ago
|
||
Assignee | ||
Comment 113•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Comment 114•8 years ago
|
||
Comment 115•8 years ago
|
||
Assignee | ||
Comment 116•8 years ago
|
||
Assignee | ||
Comment 117•8 years ago
|
||
Assignee | ||
Comment 118•8 years ago
|
||
Updated•8 years ago
|
Updated•8 years ago
|
Comment 119•8 years ago
|
||
Comment 120•8 years ago
|
||
Comment 121•8 years ago
|
||
Comment 122•8 years ago
|
||
Comment 123•8 years ago
|
||
Assignee | ||
Comment 124•8 years ago
|
||
Comment 125•8 years ago
|
||
Comment 126•8 years ago
|
||
Comment 127•8 years ago
|
||
Comment 128•8 years ago
|
||
Assignee | ||
Comment 129•8 years ago
|
||
Comment 130•8 years ago
|
||
Assignee | ||
Comment 131•8 years ago
|
||
Comment 132•8 years ago
|
||
Assignee | ||
Comment 133•8 years ago
|
||
Comment 134•8 years ago
|
||
Comment 135•8 years ago
|
||
Comment 136•8 years ago
|
||
Comment 137•8 years ago
|
||
Comment 138•8 years ago
|
||
Assignee | ||
Comment 139•8 years ago
|
||
Comment 140•8 years ago
|
||
Assignee | ||
Comment 141•8 years ago
|
||
Assignee | ||
Comment 142•8 years ago
|
||
Assignee | ||
Comment 143•8 years ago
|
||
Assignee | ||
Comment 144•8 years ago
|
||
Assignee | ||
Comment 145•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 146•8 years ago
|
||
Comment 147•8 years ago
|
||
Assignee | ||
Comment 148•8 years ago
|
||
Assignee | ||
Comment 149•7 years ago
|
||
Assignee | ||
Comment 150•7 years ago
|
||
Assignee | ||
Comment 151•7 years ago
|
||
Assignee | ||
Comment 152•7 years ago
|
||
Assignee | ||
Comment 153•7 years ago
|
||
Assignee | ||
Comment 154•7 years ago
|
||
Assignee | ||
Comment 155•7 years ago
|
||
Assignee | ||
Comment 156•7 years ago
|
||
Assignee | ||
Comment 157•7 years ago
|
||
Assignee | ||
Comment 158•7 years ago
|
||
Assignee | ||
Comment 159•7 years ago
|
||
Assignee | ||
Comment 160•7 years ago
|
||
Assignee | ||
Comment 161•7 years ago
|
||
Assignee | ||
Comment 162•7 years ago
|
||
Assignee | ||
Comment 163•7 years ago
|
||
Assignee | ||
Comment 164•7 years ago
|
||
Assignee | ||
Comment 165•7 years ago
|
||
Assignee | ||
Comment 166•7 years ago
|
||
Assignee | ||
Comment 167•7 years ago
|
||
Assignee | ||
Comment 168•7 years ago
|
||
Assignee | ||
Comment 169•7 years ago
|
||
Assignee | ||
Comment 170•7 years ago
|
||
Assignee | ||
Comment 171•7 years ago
|
||
Assignee | ||
Comment 172•7 years ago
|
||
Comment 173•7 years ago
|
||
Assignee | ||
Comment 174•7 years ago
|
||
Comment 175•7 years ago
|
||
Assignee | ||
Comment 176•7 years ago
|
||
Comment 177•7 years ago
|
||
Assignee | ||
Comment 178•7 years ago
|
||
Assignee | ||
Comment 179•7 years ago
|
||
Assignee | ||
Comment 180•7 years ago
|
||
Assignee | ||
Comment 181•7 years ago
|
||
Assignee | ||
Comment 182•7 years ago
|
||
Assignee | ||
Comment 183•7 years ago
|
||
Comment 184•7 years ago
|
||
Assignee | ||
Comment 185•7 years ago
|
||
Assignee | ||
Comment 186•7 years ago
|
||
Assignee | ||
Comment 187•7 years ago
|
||
Assignee | ||
Comment 188•7 years ago
|
||
Assignee | ||
Comment 189•7 years ago
|
||
Assignee | ||
Comment 190•7 years ago
|
||
Comment 191•7 years ago
|
||
Assignee | ||
Comment 192•7 years ago
|
||
Assignee | ||
Comment 193•7 years ago
|
||
Assignee | ||
Comment 194•7 years ago
|
||
Assignee | ||
Comment 195•7 years ago
|
||
Assignee | ||
Comment 196•7 years ago
|
||
Assignee | ||
Comment 197•7 years ago
|
||
Assignee | ||
Comment 198•7 years ago
|
||
Assignee | ||
Comment 199•7 years ago
|
||
Comment 200•7 years ago
|
||
Assignee | ||
Comment 201•7 years ago
|
||
Assignee | ||
Comment 202•7 years ago
|
||
Assignee | ||
Comment 203•7 years ago
|
||
Comment 204•7 years ago
|
||
Assignee | ||
Comment 205•7 years ago
|
||
Assignee | ||
Comment 206•7 years ago
|
||
Assignee | ||
Comment 207•7 years ago
|
||
Comment 208•7 years ago
|
||
Comment 209•7 years ago
|
||
Assignee | ||
Comment 210•7 years ago
|
||
Comment 211•7 years ago
|
||
Comment 212•7 years ago
|
||
Comment 213•7 years ago
|
||
Assignee | ||
Comment 214•7 years ago
|
||
Comment 215•7 years ago
|
||
Comment 216•7 years ago
|
||
Comment 217•7 years ago
|
||
Comment 218•7 years ago
|
||
Comment 219•7 years ago
|
||
Assignee | ||
Comment 220•7 years ago
|
||
Comment 221•7 years ago
|
||
Comment 222•7 years ago
|
||
Comment 223•7 years ago
|
||
Comment 224•7 years ago
|
||
Assignee | ||
Comment 225•7 years ago
|
||
Comment 226•7 years ago
|
||
Comment 227•7 years ago
|
||
Comment 228•7 years ago
|
||
Assignee | ||
Comment 230•7 years ago
|
||
Comment 231•7 years ago
|
||
Assignee | ||
Comment 232•7 years ago
|
||
Assignee | ||
Comment 233•7 years ago
|
||
Assignee | ||
Comment 234•6 years ago
|
||
Assignee | ||
Comment 235•5 years ago
|
||
I am going to upload a new set of patches that are now compatible with reformatted source tree, and are known to work.
(Finally, I could get rid of a couple of errors that showed up: one in linux and OSX build, and additional one in Windows build on tryserver tests.
After carefuly checking I found that there was a subtle copy&paste error during the update to match clang-formatted source tree.
All is well : passes mozmil and xpcshell-test.
Assignee | ||
Comment 236•5 years ago
|
||
It ripped out the unnecessary caching as far as the buffered writing is concerned. (That is, as far as the code I touched is concerned. Maybe not everywhere. It may have to be touched in another buzilla comprehensively IMHO.)
Assignee | ||
Comment 237•5 years ago
|
||
The following patches will be uploaded:
1242030-DONT-USE-REUSABLE-new-part-1.patch: bug 1242030: DONT-USE-REUSABLE new part-1 new splitting of (consolidation of 1122698, 1134527, 1134529, 1174500)
1242030-DONT-USE-REUSABLE-new-part-2-imap-JK-v1-rev02.patch: bug 1242030: DONT-USE-REUSABLE Improve error handling in file/stream I/O. Part 2. r=jorgk.
1242030-DONT-USE-REUSABLE-new-part-3-import-JK-v1-rev03.patch: bug 1242030: DONT-USE-REUSABLE Improve error handling in file/stream I/O. Part 3 (was 2). r=jorgk. revision 2
1242030-DONT-USE-REUSABLE-new-part-4-local-less-pop3.patch: bug 1242030: DONT-USE-REUSABLE new part-4 new splitting of (consolidation of 1122698, 1134527, 1134529, 1174500)
1242030-DONT-USE-REUSABLE-new-part-5-pop3.patch: bug 1242030: DONT-USE-REUSABLE new part-5 new splitting of (consolidation of 1122698, 1134527, 1134529, 1174500)
Additionally, following two patches posted in respective bugzilla completes the buffered output finally (!).
enable-buffering-1st-step.patch: Bug 1242042: Enabling buffering for file stream to write message for C-C TB (Enabling buffering first step.)
removing-a-Seek-rev02.patch: Bug 1242046 - Removing unnecessary |Seek| that caused the C-C TB to operate slowly in terms of I/O
Assignee | ||
Comment 238•5 years ago
|
||
clang formatted. I obsoleted attachment 8879215 [details] [diff] [review].
Assignee | ||
Comment 239•5 years ago
|
||
clang-formatted. I obsoleted attachment 8879217 [details] [diff] [review] (1242030-DONT-USE-REUSABLE-new-part-2-imap-JK-v1-rev02.patch)
Assignee | ||
Comment 240•5 years ago
|
||
clang-formatted. I obsoleted 8879218: 1242030-DONT-USE-REUSABLE-new-part-3-import-JK-v1-rev02.patch.
Assignee | ||
Comment 241•5 years ago
|
||
clang-formatted. I obsoleted 8879219: 1242030-DONT-USE-REUSABLE-new-part-4-local-less-pop3.patch.
Assignee | ||
Comment 242•5 years ago
|
||
clang-formatted. I obsoleted 8879221: 1242030-DONT-USE-REUSABLE-new-part-5-pop3.patch.
Assignee | ||
Comment 243•5 years ago
|
||
Folks,
Can somebody give me an idea how I should go about landing these patches?
It has taken so many years and I am a bit at a loss myself.
I am cc: ing to Jorg, aceman and wayne for now.
Thank you in advance.
Comment 244•5 years ago
|
||
Hi, thank you. Yes, it is a pity this took so long, but if the patch set works now, we should be able to accept it.
You did not obsolete some of the older patch sets named "USE REUSABLE" and "USE CLOSED". Are those still valid in addition to the new "DONT USE REUSABLE" patch set?
Comment 245•5 years ago
|
||
Thanks for continuing to push this through. I defer to Jorg in these code matters. I look forward to seeing this in a release!
Assignee | ||
Comment 246•5 years ago
|
||
(In reply to :aceman from comment #244)
Hi, thank you. Yes, it is a pity this took so long, but if the patch set works now, we should be able to accept it.
Thank you for you comment.
That will be great, but we need to review them first.
(There were a few modifications that were necessary to get the local compiler and tryserver MS STUDIO compiler build the code.
The latter happened in the summer of 2017 for about 10 days if I recall correctly.)
You did not obsolete some of the older patch sets named "USE REUSABLE" and "USE CLOSED". Are those still valid in addition to the new "DONT USE REUSABLE" patch set?
Well, they were referred in the old comments and so I thought it might be necessary to keep them.
Maybe I should obsolete them to avoid misunderstanding. I will do so after this comment.
BTW, as the latest patch name suggests, the patch set does away with reusable flag based on my observation reported here (comment 234) that caching of file stream is not worthwhile and my patch that reduces the # of write system calls is a big win for copying messages between folders even.
(In reply to Wayne Mery (:wsmwk) from comment #245)
Thanks for continuing to push this through. I defer to Jorg in these code matters. I look forward to seeing this in a release!
Yup. As Jorg is one of the key maintainers of C-C TB , I need his input to keep the code in good shape.
So a review cycle will be necessary before landing.
I will do more local testing (using a router's USB stick as remote server storage and unplugging the network cable (simulated by disabling networking of VirtualBox) while the review will go on.
TIA
Assignee | ||
Comment 247•5 years ago
|
||
I am obsoleting these patches.
(They may be referenced in comments, but I think we can access obsoleted attachments if necessary.)
USE-REUSABLE 1242030-new-part-1.patch
USE REUSABLE 1242030-new-part-2-imap.patch
USE REUSABLE 1242030-new-part-3-import.patch
USE REUSABLE 1242030-new-part-4-local-less-pop3.patch
USE REUSABLE 1242030-new-part-5-pop3.patch
USE REUSABLE 1242030-part0-pass-out-reusable.patch
Hmm... I should have added updated -rev03 suffix to my latest patches. I would do so locally now.
1242030-USE-CLOSED-new-part-1.patch
1242030-USE-CLOSED-new-part-2-imap-JK-v1-rev02.patch
1242030-USE-CLOSED-new-part-3-import-JK-v1-rev02.patch
1242030-USE-CLOSED-new-part-4-local-less-pop3.patch
1242030-USE-CLOSED-new-part-5-pop3.patch
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 248•5 years ago
|
||
Done.
(In reply to ISHIKAWA, Chiaki from comment #247)
I am obsoleting these patches.
(They may be referenced in comments, but I think we can access obsoleted attachments if necessary.)USE-REUSABLE 1242030-new-part-1.patch
USE REUSABLE 1242030-new-part-2-imap.patch
USE REUSABLE 1242030-new-part-3-import.patch
USE REUSABLE 1242030-new-part-4-local-less-pop3.patch
USE REUSABLE 1242030-new-part-5-pop3.patch
USE REUSABLE 1242030-part0-pass-out-reusable.patchHmm... I should have added updated -rev03 suffix to my latest patches. I would do so locally now.
I meant to say that I should update the existing -rev02 to -rev03 and then add -rev03 to patch names that don't have them.
Already a couple of them do have such -rev03 string, but no all of them yet in the attachments.
1242030-USE-CLOSED-new-part-1.patch
1242030-USE-CLOSED-new-part-2-imap-JK-v1-rev02.patch
1242030-USE-CLOSED-new-part-3-import-JK-v1-rev02.patch
1242030-USE-CLOSED-new-part-4-local-less-pop3.patch
1242030-USE-CLOSED-new-part-5-pop3.patch
Comment 249•5 years ago
|
||
Can somebody give me an idea how I should go about landing these patches?
They need to be reviewed again. It would also be good to see the performance comparison again. Maybe we should get our new backend man, Ben C, involved, although he's generally quite busy.
Assignee | ||
Comment 250•5 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #249)
Can somebody give me an idea how I should go about landing these patches?
They need to be reviewed again.
Sure.
Actually, I would appreciate it.
Due to some compiler issues locally and on tryserver, I had to re-do some parts of earlier code which you kindly reviewed before.
So we have to make sure the code is in good shape in terms of today's debugging macros, etc.
It would also be good to see the performance comparison again. Maybe we should get our new backend man, Ben C, involved, although he's generally quite busy.
I can do the comparison again.
So can you review the patches at least partially, Jorg?
TIA
Comment 251•5 years ago
|
||
Assignee | ||
Comment 252•5 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #251)
Comment on attachment 9106025 [details] [diff] [review]
1242030-DONT-USE-REUSABLE-new-part-1.patch (clang-formatted)Review of attachment 9106025 [details] [diff] [review]:
I looked through the first patch. Let's try to get all the formal stuff and
nits out of the way first.::: mailnews/base/public/nsIMsgPluggableStore.idl
@@ +16,5 @@interface nsIUrlListener;
interface nsIMsgDatabase;
interface nsITransaction;+[scriptable, uuid(046AB3DC-D4C7-11E5-95DB-0800272954B9)]
We don't change UUIDs any more.
So it can remain the same even if the argument list of a public function (the removal of |aReusable| happened (?)
@@ +120,5 @@
+// * @param aReusable set to true on output if the caller can reuse the
+// * stream for multiple messages, e.g., mbox format.
+// * This means the caller will likely get the same stream
+// * back on multiple calls to this method, and shouldn't
+// * close the stream in between calls if they want reuse.Remove old comment.
Will do.
@@ +135,5 @@
* to an other folder as a filter action, or is deleted because it's * a duplicate. This gives the berkeley mailbox store a chance to simply * truncate the Inbox w/o leaving a deleted message in the store. *
- discardNewMessage closes aOutputStream always unless the passed stream
Trailing space.
My. I thought clang-format will take care of extra trailing whitespace automagically.
This means I may still have other trailing whatspace stuff in the patches.
::: mailnews/base/util/nsMsgDBFolder.cpp
@@ +729,5 @@
- if (NS_FAILED(rv)) {
+#ifdef DEBUG- fprintf(stderr,
"(debug) nsMsgDBFolder::GetOfflineFileStream: "
"GetMsgInputStream(hdr, &reusable, aFileStream); rv=0x%" PRIx32
that should be rv=%" PRIx32.
OK, I thought I need to add 0x in front of %. As you already noticed below, I got confused above.
Will fix these through out the patch set.
@@ +835,5 @@
+#ifdef DEBUG
- if (NS_FAILED(rv)) {
- fprintf(stderr,
"(debug) nsMsgDBFolder::GetMsgInputStream: msgStore->"
"GetMsgInputStream(this, ...) returned error rv=%" PRIx32 "\n",
Correct.
@@ +936,5 @@
nsresult rv = OpenBackupMsgDatabase();
+#ifdef DEBUG
- if (NS_FAILED(rv)) {
- fprintf(stderr,
"(debug) OpenBackupMsgDatabase(); returns error=0x%" PRIx32 "\n",
Wrong.
Will fix this and friends.
@@ +1668,5 @@
NS_WARNING("m_tempMessageStream->close returned error");
+#ifdef DEBUG
fprintf(stderr,
"(debug) Offline message too small: messageSize=%d "
"curStorePos=%" PRId64 " numOfflineMsgLines=%d bytesAdded=%d\n",
Are the %d right for all platforms?
As far as I know, I didn't get compiler errors and I thought %d were all right.
(I thought the # of lines and bytes added for a single message are 32-bit entities.)
But let me check. In the worst case scenario, I can extend the size on all platforms and use PRId64, but that seems to be a bit far-fetched.
TIA
Assignee | ||
Comment 253•5 years ago
|
||
By the way, the successful build run on tryserver with the patch set and a few additional ones [LDAP library reformat, etc.] is, for example,
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=ffebf8451c3ba83e3e6f53b198b861e7251ace26
[I have run them with a minor modification and additional local patches about a dozen times by now.]
TIA
Assignee | ||
Comment 254•5 years ago
|
||
Jorg, I thought I would take care of the following in one sweep.
that should be rv=%" PRIx32.
But then I realized that I added "0x" in front of % to obtain a 0x prefix for the printed error value.
I see output of error value as hexadecimal value as in the following example from error handling macros:
0:01.52 pid:27621 [27621, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, kKnownEsrVersion) failed with result 0x80004002: file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/toolkit/components/resistfingerprinting/nsRFPService.cpp, line 661
The value 80004002 is prefixed by "0x" to signal it is a hexadecimal number.
For example, the macro NS_ENSURE_SUCCESS_BODY is define thusly.
https://dxr.mozilla.org/mozilla-central/source/xpcom/base/nsDebug.h#250
define NS_ENSURE_SUCCESS_BODY(res, ret) \
mozilla::SmprintfPointer msg = mozilla::Smprintf( \
"NS_ENSURE_SUCCESS(%s, %s) failed with " \
"result 0x%" PRIX32, \
#res, #ret, static_cast<uint32_t>(__rv)); \
NS_WARNING(msg.get());
I think I added "0x" before "%" for the same reason.
That there are a few missing cases need fixing.
Am I mistaken for the use of PRIx32 here?
TIA
PS: will check if %d is good for all architecture later today.
Assignee | ||
Comment 255•5 years ago
•
|
||
As for the trailing space issue, I am not entirely sure if I accidentally add it after clang-format took care of the reformatting, or if clang-format honors the trailing space inside comment block. Could be the latter case.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 256•5 years ago
|
||
Ouch, I now recall Jorg reads the messages sent automatically for C-C TB bugzilla filings.
How do I turn off the needinfo I turned on.
Oh, I can provide the information and then turn off the needinfo flag. Hmm, somewhat unintuitive, but hope it works.
Assignee | ||
Comment 257•5 years ago
•
|
||
(In reply to Jorg K (GMT+2) from comment #251)
Comment on attachment 9106025 [details] [diff] [review]
1242030-DONT-USE-REUSABLE-new-part-1.patch (clang-formatted)
[omission]
>
> @@ +1668,5 @@
> > + NS_WARNING("m_tempMessageStream->close returned error");
> > +#ifdef DEBUG
> > + fprintf(stderr,
> > + "(debug) Offline message too small: messageSize=%d "
> > + "curStorePos=%" PRId64 " numOfflineMsgLines=%d bytesAdded=%d\n",
>
> Are the %d right for all platforms?
I checked. It should be OK.
Here the data types are all 32-bit except for curStorePos which is int64_t.
So use of %d is OK. See below.
(defined in the same function)
uint32_t messageSize;
int64_t curStorePos;
comm/mailnews/base/util/nsMsgDBFolder.h
201 int32_t m_numOfflineMsgLines;
omm/mailnews/base/util/nsMsgDBFolder.h
202 int32_t m_bytesAddedToLocalMsg;
So the patch should be OK.
TIA
Assignee | ||
Comment 258•5 years ago
|
||
As for redoing the performance analysis of copying messages between folders (as reported in comment 234), I think I need about 10 days to do the comprehensive analysis.
(Given that a large technological exhibition for which I work as part of an organizer and exhibitor side as well is fast approaching [it is slated in Dec 11-13], I may not be able to perform the comprehensive test until December 20th in the worst case, but I am trying to modify the patch as reviewed more or less in realtime.)
Comment 259•5 years ago
|
||
If the variables are int32_t then use PRId32 in the format string to be safe. PRIu32 for the uint32_t variable.
Assignee | ||
Comment 260•5 years ago
|
||
(In reply to :aceman from comment #259)
If the variables are int32_t then use PRId32 in the format string to be safe. PRIu32 for the uint32_t variable.
Oh, I did not realize that. I will modify the patch.
I am curious, though. Which architecture has an issue with %d to print int32_t and/or uint32_t?
Assignee | ||
Comment 261•5 years ago
|
||
Updated to accommodate Jorg's comment.
Uses PRIu32 and PRId32 to print out unsigned and signed 32-bit int.
Generic description for the the five (5) patches:
There were mixed usage of "0x%" PRIx32 and "..%" PRIx32 (the latter not preceded by 0x).
But my intention was to precede the hex printout of nsresult value.
So I changed them all to "0x%" PRIx32.
The remaining "%d" usage is for printing the value of LINE macro, which is an integer (and my cursory inspection indicates they are 16 bit entities .
So I think it should not be a big problem.
I replaced |-rev03| with |-rev04| the patch file name.
Assignee | ||
Comment 262•5 years ago
|
||
See previous upload.
Assignee | ||
Comment 263•5 years ago
|
||
See part-1 of the series.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 264•5 years ago
|
||
See part-1 of the patch series.
Assignee | ||
Comment 266•5 years ago
|
||
I forgot to mention. UUID was not reverted yet because I was not sure if I can retain the same UUID even when the published API function has a different set of parameters.
TIA
Comment 267•5 years ago
|
||
UUIDs never need to change nowadays. It was only needed for binary add-ons which are long gone.
Assignee | ||
Comment 268•5 years ago
|
||
I reverted the UUID back to the original value.
Otherwise, the same with the previous version.
Updated•5 years ago
|
Assignee | ||
Comment 269•4 years ago
•
|
||
I will update the patches mentioned in comment 237 with the latest local patches.
And do the performance measurement against the current distributed binary (on linux probably.)
Also, I may want to add a few cleanups JorgK mentioned when we don't need to use reuseflag.
Assignee | ||
Comment 270•3 years ago
|
||
I will refresh the patches to fix bitrot.
(I was side tracked with other minor bugs and valgrind not usable against TB under linux for the better part of past 16 months.)
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 271•2 years ago
•
|
||
These patches are updated with recent C-C tree changes, and they
- removed the use of reuse flag as far as message handlings are concernted, and
- implement the old legacy pop3 error recovery more or less correctly (more on this later).
I will upload the updated performance figure(s) similar to those were posted before the patch.
From my local patch queue:
4 A new-1242030-precond-000-nsMsgUtils.h-typo-fix.patch: Fixed a typo and added file stream tracing macros
5 A new-1242030-DONT-USE-REUSABLE-new-part-1-rev04-with-the-original-uuid.patch: bug 1242030: DONT-USE-REUSABLE new part-1 new splitti...
6 A new-1242030-DONT-USE-REUSABLE-new-part-1.5-mostly-remove-reuseflag.patch: bug 1242030: DONT-USE-REUSABLE new part-1.5 mostly remov...
7 A new-1242030-very-tell-removal-was-OK-1719996.patch: check whether removing tell was OK.
8 A new-1242030-DONT-USE-REUSABLE-new-part-2-imap-JK-v1-rev04.patch: bug 1242030: DONT-USE-REUSABLE Improve error handling in file/str...
9 A new-1242030-DONT-USE-REUSABLE-new-part-3-import-JK-v1-rev04.patch: bug 1242030: DONT-USE-REUSABLE Improve error handling in file/s...
10 A new-1242030-DONT-USE-REUSABLE-new-part-4-local-less-pop3-rev04.patch: bug 1242030: DONT-USE-REUSABLE new part-4 new splitting of (...
11 A new-1242030-DONT-USE-REUSABLE-new-part-5-pop3-rev04.patch: bug 1242030: DONT-USE-REUSABLE new part-5 new splitting of (consolidati...
12 A new-1242030-enable-buffering-1st-step-rev03.patch: Bug 1242042: Enabling buffering for file stream to write message for C-C TB (En...
13 A new-1242030-removing-a-Seek-rev03.patch: Bug 1242046 - Removing unnecessary |Seek| that caused the C-C TB to operate slowly in ter...
14 A new-1242030-disable-safe-stream-base-search-local.patch:
15 U add-stream-tracing-macros.patch: adding stream value tracing macros
16 U add-trace-local-pop3.patch: adding stream trace macro to pop3 files under local directory
17 U add-trace-local-misc.patch: adding stream trace macro calls to files under loca (not pop3)
18 U add-stream-trace-to-base-dir-rev01.patch: add stream trace macro invocations to files under 'base' dir
19 U add-stream-trace-to-imap-dir-rev01.patch: add stream trace macro invocations in files under 'imap' directory
20 U add-and-fix-dump-messages-to-local-misc.patch:
In the numbering above, 4-14 are essential.
But if someone tries to analyze what is going on regarding the file stream operations,
the debug dumps in 14-20 are essential.
They are part of the latest job submission.
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=622660c032ef4334223ac3a5baa750973128e6b4
The errors I have noticed until yesterday are all intermittent category. I have weeded out permanent
ones and I verified the error recovery operation using a simulated I/O error (mostly write error) by
disconnecting net cable to the remote CIFS server where POP3 local messages are stored.
I found out that the error handling by safe I/O stream HAD TO BE DISABLED for the crude error recovery to work reasonably well. (I will write more on the next comment.)
That disabling is done in "14 A new-1242030-disable-safe-stream-base-search-local.patch".
I am uploading these to show the patch efforts are kicking and alive also to report
the error recovery does not work with the unmodified C-C very well.
BTW, I have done the initial performance check and they show the patch as expected.:
the number of write system calls was reduced, and write performance (copying/moving messages to a different folder) is faster.
As a matter of fact, I noticed xpcshell test executed sequentially (--sequential) now is much faster: it used to take about 1200 seconds on my PC, but now it is 900+ seconds. I think the 4GB folder copying operation and others suffer most from the unbufferred message write operation. But more detailed figures later.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 272•2 years ago
|
||
Assignee | ||
Comment 273•2 years ago
|
||
Assignee | ||
Comment 274•2 years ago
|
||
I had to make sure that the removal of seek/tell method from a data type (bug 1719996) is OK with the buffered write approach.
Assignee | ||
Comment 275•2 years ago
|
||
Assignee | ||
Comment 276•2 years ago
|
||
Assignee | ||
Comment 277•2 years ago
|
||
Assignee | ||
Comment 278•2 years ago
|
||
Assignee | ||
Comment 279•2 years ago
|
||
Even with Ben's latest cleanup patches, there are seeks/flushes that need to be removed and this is one of the crucial removals.
(I checked the latest C-C TB and it did NOT use the buffered stream although Ben's patch DID start with buffered stream. Unnecessary seeks and flushes nullified the effect of buffering.)
Assignee | ||
Comment 280•2 years ago
|
||
After a lot of head scratching, I had to disable the use of safe stream to handle error condition.
Safe stream has been introduced by Ben's patch to hide the error(s) under the rug, so to speak.
Unfortunately, it did not work well when I tested TB's behavior.
The current C-C without my patches here
- cannot recover from simulated write error during POP3 download.
I unplugged the network cable to a CIFS server where local messages are stored.
TB shows an error dialog, but as soon as I hit the dialog, it shows another dialog, ad infinitum. This is bad. A user ought to be able to recover from such an error, at least be able to quit TB quickly to see what is causing the I/O error..
With my patch, - TB recovers from the simulated write error during POP3 download.
I unplugged the network cabe, and then TB shows the error dialog.
But as soon I hit the OK in the dialog and restores the network cable, it recovers and
waits for the next user operation (most likely the user wants to quit TB to investigate what is causing the I/O error).
I had expected the use of Safe Filestream to gets TB looking as if it were hung due to repeated
reading operation even after write error(s), but not this endlessly repeated error-dialog.
I think we may be able to insert proper check and recovery even when we use safe file stream to recover nicely. BUT, that negates the original intention of introducing safe stream and
as a matter of fact, inserting proper check and recovery are exactly what my patches try to do.
I found out the programming mistake regarding pop3 error recovery when I started to work on buffered write. (Basically it is the failure of proper error checks, and the mis-coding regarding the setting of crucial file pointer to nullptr to signal the I/O error to upper-level routine. Once these were corrected, the crude POP3 error recovery started to work and it is working with the current patch.)
Given the disabling of the use of safe stream can be done with several #if/#else/#endif and insertion of a few dozen lines of code only in a clean manner, I think we should seriously consider reverting to the legacy pop3 error recovery framework.
I don't think repeated error dialog that prohibits a user from quitting TB in the face of write error is
acceptable to general user community. Error dialog is fine, but one cannot simply quit TB cleanly.
Assignee | ||
Comment 281•2 years ago
|
||
There are a few more issues concerning error recovery.
I tested the TB's operation using the following setup.
I created a very small partition 50MB in a remote CIFS file system and
created a symlink from the ordinary message store position to this CIFS.
I could have created a different profile, but using symlink is much easier.
I tested TB under linux.
The size was small so that I can test the file system full condition during testing.
I sent 240+ e-mails to a mail test account and checked the operation.
What I found so far.:
-
Filesystem full condition was properly caught.
-
Simulated write error during pop3 message downloading by removing the remote CIFS server's network cable caused
unpatched C-C TB to enter endlessly shown error dialog loop.
(I have to send enough e-mails so that the downloading from the local POP3 server takes time to allow me the cable-unplugging operation before all the messages are downloaded.)
Even if I restored the network cable, TB could not recover to the state
where the modal error dialog is not shown so that the user can choose the menu to finish TB (or click the X button on the window frame.)
I had to kill TB from another terminal using kill PID-of-TB.Patched C-C TB showed the error dialog and once I hit OK button in the dialog and restored the cable (<- I am not sure I needed to restore the cable back when I tested the original patch a few years ago), TB was ready to receive user input and so I could quit TB using mouse.
There are a few more issues with the current C-C TB.
- I noticed the discrepancy of error messages shown during
-- the copying of messages from Inbox to Trash, and
-- "Deletion" of messages form Inbox.
You see they are actually the same operation. Deletion is the "Moving" of messages to trash holder.
I get different messages suggesting there may be two paths, and more to the point, it seems that the checks done are not quite the same.
I need to investigate.
My current plan is to investigate the two paths mentioned and if there are indeed two paths, at least we should
implement the same level of I/O error checks in these paths.
(I would investigate both the "file system full condition" and "write error during moving operation" errors for the above operations).
I really wish there is a good harness to simulate I/O errors so that TB can be tested to make sure it shows robust behavior in the face of such errors.
Using a remote file system and a network proxy to access that remote file system may be a way to go.
I wonder if something like that can be implemented on tryserver.
Actually, under linux, it may be rather easy.
The proxy can be turned off to simulute a network error and thus causing file system read error.
(However, we cannot really co-relate the timing of I/O error with TB's internal operation in fine grained manner.
So only the brute force approach of causing I/O error randomly during
the repetition of sending e-mails to TB, copying mails from one folder to the other, deletion of e-mails, compacting folders, etc.
is the way to trigger and test error handling. This is what I did several years ago over a few weeks. Back then I did not bother to create the proxy. I manually plugged and unplugged the cable. Well, actually back then I used VM Player and used its network on/off menu.)
Words of wisdom to those who may want to try such a setup.
Set the timeout of the network file system very short. Usually they are set like 90 seconds, etc.
You have to wait for a long time before the error is returned to TB. I set CIFS time out to 15 seconds or so.
Otherwise I have to spend days simply to obtain the preliminary error recovery test result I am reporting.
For CIFS, echo_interval is the ping interval. If CIFS daemon decides two consecutive ping did not return in time,
it will begin error processing. But with 5 seconds time interval, I get 15 seconds timeout. It thought it would be 10 second timeout.
Such is life. The following is the mount command I use for mounting a CIFS server (actually a router with a function to make U
SB memory stick available as CIFS file system.)
mount.cifs "//192.168.0.201/Drive(A1)" /tmp/L-temp -o username=admin,noperm,vers=1.0,sec=ntlmv2,echo_interval=5
In any case, when the abnormal error recovery behavior is observed, the only way to observe what is going at the I/O routine level is to trace
what is happening to key file stream pointers.
This is what the following patch attachments do.
Eventually, I may want to put them behind MOZ_LOG, but during this development phase, I really have to
correlate the operation with other debug messages coming from NS_WARNING, NS_ERROR that are in the C-C code already.
So I need to dump them using these macros or simple |fprintf(stderr, ...)|.
Assignee | ||
Comment 282•2 years ago
|
||
Add macros for tracing file streams.
It may be easy to turn them to use MOZ_LOG eventually.
For now, it uses an existing macro and stdout/stderr output.
At least, rewriting the code to use MOZ_LOG would be contained in this file.
That is the intention of introducing these macros.
Assignee | ||
Comment 283•2 years ago
|
||
adding calls to stream trace macro in Pop3-related files in local/src.
Assignee | ||
Comment 284•2 years ago
|
||
adding stream trace macros in files under non-pop3 files in local/src.
Assignee | ||
Comment 285•2 years ago
|
||
Assignee | ||
Comment 286•2 years ago
|
||
Assignee | ||
Comment 287•2 years ago
|
||
I wanted to clean up this pot pourri of dump messages, but figured it would be more important
to raise the issue of non-working error recovery of the current C-C TB.
Fixing the dump message can wait until we figure out what to do with the
failing error recovery.
Comment 288•2 years ago
|
||
Assignee | ||
Comment 289•2 years ago
|
||
(In reply to Ben Campbell from comment #288)
Comment on attachment 9279704 [details] [diff] [review]
new-1242030-DONT-USE-REUSABLE-new-part-1-rev04-with-the-original-uuid.patch
Thank you for the comment.
Review of attachment 9279704 [details] [diff] [review]:
I think we should start trying to land some of these.
This one is a good one to start with, to removeaReusable
from
nsIMsgPluggableStore.getNewOutputStream().My suggestions:
- break out the debug printfs into a separate patch (for developer use
rather than to land).
That is what I intend to do.
Either remove them into separate patch as I did for the later stream tracing macro calls, or
maybe hide them behind MOZ_LOG and remove #if/#endif altogether.
- This patch doesn't update the mbox and maildir GetNewOutputStream()
implementations? I assume there's another patch which does that? In any
case, those changes would need to be rolled into this patch.
I might have missed that. Let me investigate.
- I think that once getNewOutputStream() is called, it should be a concrete
requirement that eitherfinishNewMessage()
ordiscardNewMessage()
should
be called with the returned stream. This is the only way I can see the
messageStore being able to clean up properly (either after a success or an
error). This implies that passing a null stream tofinishNewMessage()
or
discardNewMessage()
should be considered an error.
That is a good point.
Problem is this.
-
Old error handling strategy adopted during pop3 download was to set the file stream to nullptr to signify an error was detected.
This is how the overall pop3 download handler kept track of the error situation in C++ code.
My patch followed this practice.I realized there is a code to truncate the half-cooked message portion that was attached to the end of a folder (presumably Inbox?).
That won't work to be sure.However, I had to disable the newer error handling using safe-stream because it did not work well with the real-world error situation.
C-C withtout my patch repeated showed error dialog and I could not quit TB cleanly. (See comment 281).
I think there may be a statement to break out of the normal processing when an error occurs.And as a matter of fact that disabling safe-stream was the only way to make the old error processing scheme to work.
That is the patch
new-1242030-disable-safe-stream-base-search-local.patch
https://bugzilla.mozilla.org/attachment.cgi?id=9279713
We may have to investigate why unpatched C-C TB loops with endless error dialog display even when the transient error caused by the
unplugging of network cable to the remote CIFS server where the message is stored clears up (the cable is plugged again.).
But I do recognize that
- removal of reuseflag, and
- desirable error handling is a separate issue.
Let me think about how to extract the removal of |reuseflag| from the patch set, and if possible, create a patch set for purely removal of reuseflag.
Yeah, in that case, probably I would move the error checks to later patch set(s).
Also, I'd recommend using phabricator to handle patches, if you're up for
it. It's now the main tool we use and it seems pretty good at handling WIP
patches which are still changing, and stacks of dependent patches. It's not
critical - the old attach-the-patch-to-the-bug method still works, but you
might find phab makes it easier.
This is what was suggested during a bugday discussion by Thomas D, and Wayne.
Maybe I should ask for a phabricator initiation session or something like that with all the policy issues clearly explained.
I tried to use phabricator when it was introduced several years ago
because it was announced to be the way to go but it was a premature announcement.
Not all the details regarding some setup that needed to be followed based on the policy decisions by administrators were in place and I did something out of line with the would be practice and so I decided to stay away from it thinking "for a while". I found out later that internal mozilla introdcutory seminars were held later, presumably to fill in those missing information regarding policy issues.
Since TB, I mean C-C developement, did not adopt phabricator immediately, that "for a while" continued until now. :-(
I am a carried-over from the old era when HG MQ extension was recommended, and so I need to switch to newer HG command practice to begin with before attempting to use Phabricator? (Or HG MQ style patch submission still work. But I digress. This probably needs separate talk.)
::: mailnews/base/public/nsIMsgPluggableStore.idl
@@ +129,5 @@* a duplicate. This gives the berkeley mailbox store a chance to simply * truncate the Inbox w/o leaving a deleted message in the store. *
- discardNewMessage closes aOutputStream always unless the passed stream
- is nullptr due to error processing..
Should we really allow a null aOutputStream to be passed in?
If something goes wrong, the mailStore might need to do some cleanup (eg
truncate the mbox file back to how it was, or delete temporary files or
whatever). And at the moment this is the only way for the mailstore to be
notified of a failure.
So obviously, in your new scheme, unlike in the legacy pop3 code, setting m_outputstream or some such to nullptr is not the way to signal that the error occurred somewhere in the low-level I/O system calls.
(Maybe we really should have an explicit m_errorflag or something that signifies an error has occurred.)
My temporal patchset here had to disable safestream-based error recovery, but as long as the filestream is non-null the
truncation happens, but I think there WERE cases when the stream was set to nullptr.
(Well, I am afraid have to simulate the error cases a few dozen times to cause the simulated error to occur an appropriate time to
cause a particular error situation where m_outputstream to nullptr and TB calls DiscardMessage. I thought I did this over a three weeks period, but as it turned out I had to do this over three months period. Error simulation to trigger various error modes to check the code took time. :-(
Unless I build a better error simulation mechanism that is tied to the timing/phase of TB processing, the timing of simulated error is pretty much random and cannot cause a desired situation to trigger a particular error processing deterministically.
We need to discuss this error handling situation further because the user is a bit perplexed unless C-C TB stops showing error dialog forever.
There is something wrong with the error recovery path, but I have yet to figure out where and how to avoid it.
(Thus, I simply adopted the old error recovery method. Well, to be exact, the code did not seem to work without disabling it with my patch set.)
@@ +130,5 @@
* truncate the Inbox w/o leaving a deleted message in the store. *
- discardNewMessage closes aOutputStream always unless the passed stream
- is nullptr due to error processing..
- (Clarification/Rationale in Bug 1121842, 1122698, 1242030)
You don't need the bug numbers here (If you really want to leave a trail
maybe put them in the commit message instead?).
I will take them out.
::: mailnews/base/src/nsMsgDBFolder.cpp
@@ +760,5 @@
static_cast<uint32_t>(rv));
+#endif
- // Return early: If we could not find an offline stream, clear the offline
- // flag which will fall back to reading the message from the server.
- if (mDatabase) mDatabase->MarkOffline(msgKey, false, nullptr);
I totally want to kill this MarkOffline() this far into the stack. It should
be handled back out by IMAP/news code which can deal with the error
properly. (related to Bug 1681487 et al)
But yeah. For now this is the same as the other failure paths, so needs to
be kept in :-)
Oh, beauty with legacy code. I will study 1681487 et al.
@@ +1721,5 @@
mDatabase->MarkOffline(messageKey, false, nullptr); // we should truncate the offline store at messageOffset ReleaseSemaphore(static_cast<nsIMsgFolder*>(this));
if (msgStore) {
// DiscardNewMessage now closes the stream all the time.
Document the current behaviour rather than history, eg:
// DiscardNewMessage() will close the stream.
@@ +1747,5 @@
- } // if (seekable)
- rv1 = rv2 = NS_OK;
- if (msgStore) {
- // FinishNewMessage now closes the stream ALL THE TIME.
Again, don't need to document the history, eg:
// FinishNewMessage() will close the stream.
Once there are enough eyeballs with the innards and the history, i will take them out.
But this close or non-close was really a source of confusion coupled with reuseflag and I had to leave it in to
keep myself aware of the issues.
Thank you again for your comments, Ben.
I think I will try to
- separate the "removal of reuseflag", and better I/O error checks necessitated by the introduction of buffered write (the error tends to be detected later due to the buffered write, and so Flush() and Close() need to be checked for errors, and actually there are so many places where error is not handled at all for Write(). It is pathetic.)
- separate the debug dump or hide them behind MOZ_LOG and remove #if/#endif,
So my intention would be producing a series of patch sets as follows.
- removal of reuse flag,
- buffered write introduction with added error checks (but how to signal the presence of an error is a big question. The legacy pop3 code sets m_outputstream or some such to nullptr for this purpose. We may have to introduce an explicit m_errorflag or something. This requires some discussion. The solution to the looping of error dialog may be incorporated here or probably come as a separate patch.)
- debug dump for developers.
- enabling of buffered write
(- disabling of the use of safe stream: could be removed depending how well the error handler is improved, but come to think of it, there ARE legitimate error cases that we can no longer write to the file system like the complete network error to remote CIFS server. So we must bear in mind the limit of what we can do. Our error recovery won't be perfect. My point is to let users quit TB cleanly in a manageable manner. That is what one needs most when we realize there is an I/O error and runaway execution of TB in such a case may tarnish the data worse. Seeing TB refusing to stop execution is a consternation in such a situation. :-(
...
Thank you again for your comment.
PS: s for phabricator, maybe I consult Wayne since he mentioned an introduction to phabricator session may be made available. :-)
Comment 290•2 years ago
|
||
(In reply to ISHIKAWA, Chiaki from comment #289)
- Old error handling strategy adopted during pop3 download was to set the file stream to nullptr to signify an error was detected.
This is how the overall pop3 download handler kept track of the error situation in C++ code.
My patch followed this practice.
Ahh... I see.
I think that's the first thing to fix then!
I propose we open another bug to change pop3 error handling to use a separate variable to signify errors rather than just nulling out the filestream. And also to make sure it calls one of FinishNewMessage()
/DiscardNewMessage()
depending on outcome.
I'll leave the NI open on this one to remind me to look at the pop3 code on Monday and write up the bug, but feel free to open a bug yourself if you'd prefer :-)
After that, I think it'll be easier to reduce your patches down into nice little atomic chunks we can start landing.
I am a carried-over from the old era when HG MQ extension was recommended, and so I need to switch to newer HG command practice to begin with before attempting to use Phabricator?
(not to get bogged down in too many details) The moz-phab commandline tool just submits changesets in Hg, wherever they came from. I'd guess it'd happily pick up any patch you applied from MQ (I don't use MQ - I just keep all my local patches in anonymous branches off my tree. I use hg rebase
to keep them up to date after pulls, and hg histedit
to chop and change and combine them ready for landing).
Assignee | ||
Comment 291•2 years ago
|
||
(In reply to Ben Campbell from comment #290)
(In reply to ISHIKAWA, Chiaki from comment #289)
- Old error handling strategy adopted during pop3 download was to set the file stream to nullptr to signify an error was detected.
This is how the overall pop3 download handler kept track of the error situation in C++ code.
My patch followed this practice.Ahh... I see.
I think that's the first thing to fix then!
I propose we open another bug to change pop3 error handling to use a separate variable to signify errors rather than just nulling out the filestream. And also to make sure it calls one ofFinishNewMessage()
/DiscardNewMessage()
depending on outcome.
I'll leave the NI open on this one to remind me to look at the pop3 code on Monday and write up the bug, but feel free to open a bug yourself if you'd prefer :-)After that, I think it'll be easier to reduce your patches down into nice little atomic chunks we can start landing.
Thank you for you comment.
OK, let me see how this goes.
Locally, I have already separated the removal of reusable flag.
Worked fine locally. So I submitted a try-comm-central job.
See
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=1f00a8ed748893fe0f4aecaa229dfd28ae04e78e
So the next step is to introduce something like |bool m_errorflag|. Let me see how this will improve the situation.
BTW, my latest uploaded patches was based on C-C tree about a couple of weeks ago, and
it did not handle |SyncCopyStream| change and others.
It has been updated against the latest C-C as of yesterday.
Locally, it seems to work.
I am a carried-over from the old era when HG MQ extension was recommended, and so I need to switch to newer HG command practice to begin with before attempting to use Phabricator?
(not to get bogged down in too many details) The moz-phab commandline tool just submits changesets in Hg, wherever they came from. I'd guess it'd happily pick up any patch you applied from MQ (I don't use MQ - I just keep all my local patches in anonymous branches off my tree. I use
hg rebase
to keep them up to date after pulls, andhg histedit
to chop and change and combine them ready for landing).
I think I will pick up my work flow. It seems we can name patches via bookmark. Again, I will see what I can.
There are many blogs or tutorials which are definitely OUT OF DATE. This is not good for a newbie.
TIA
Assignee | ||
Comment 292•2 years ago
•
|
||
(I made a serious typo of writing |m_outputstream| instead of the correct |m_outFileStream|. So I corrected this comment.)|
I filed bug 1773787
that takes care of the "remove reusable flag" issue first.
In the comment there, I wrote
The current plan I discussed with Ben is
1 land this "remove reusable flag" patch set.
2 Change the POP3 error handling code to use a separate EXPLICIT |m_error| flag instead of setting |m_outFileStream| to nullptr to signal an error occurred to the upper layer functions.
3 Then modify the code to properly enable buffered write.
Part-1 is here.
Part-2, -3 will be handled in bug 1242030 or maybe part-2 is handled in still another bugzilla entry.
One point. Changing the pop3 error handling code to use |m_error| flag is fine.
However, please note that setting |m_outFileStream| to nullptr gives the chance to upper layer functions to stop reading/writing data any more once there has been a fatal write error, say, and this make it possible for TB to quit downloading (or for that matter message copying operation in relevant places). This DOES make a difference for user experience when an error occurs.
I found the patch here makes TB return immediately after an error dialog popup appears so that I can decide what to do. I think many users would quit TB immediately if they realize file back end shows error behavior.
The stock C-C TB without the change somehow got into a repeated error dialog display and did not give me a chance to terminate TB cleanly.
So, I would try to disable read/write once |m_error| flag is raised as much as possible to mimic the early quit path that was taken when the |m_outFileStream| was set to nullptr.
Assignee | ||
Comment 293•2 years ago
|
||
Aha, it is |m_outFilesStream| and not |m_outputStream|. A serious typo.
So I fixed the previous comment.
Assignee | ||
Comment 294•2 years ago
|
||
Turns out
"2 Change the POP3 error handling code to use a separate EXPLICIT |m_error| flag instead of setting |m_outFileStream| to nullptr to signal an error occurred to the upper layer functions."
is relatively easy.
I can't recall the code of several years ago, but for the last few years, I only set "m_outFileStream" to nullptr when major operation occurs, basically when the final close is near.
So the change is about a few dozen lines of code.
Well, I decided to return from WriteLineToMailbox in nsPop3Sink.cpp immediately if m_error contains an error code. (I forgot to mention that I decided to store the last seen error code in m_error.)
Because it is not meaningful to continue writing if we have a write error once.
I think more work is necessary to modify code to clean up the dump statements for debugging purposes by using MOZ_LOG and other macro so that they are less obtrusive.
These should come after the patches in bug 1773787.
Updated•9 months ago
|
Assignee | ||
Comment 297•9 months ago
•
|
||
I am going to push patches for this long cooking patchsets using new phabricator system to handle patches very soon.
Does anyone have an idea how to split them up?
Per directories?
Or simply a big chunk?
(Simply a big chunk may be easier to understand since I hold the removal of offending seek and final tweaking of buffering to the last patch which will change a few lines in the preceding patchsets.)
Currently, mail bulk of the patchsets are per directory since many years ago it was suggested a big chunk patch was hard to review, but now that phabricator system is in place, that may no longer to be true.
Here is a list of HG MQ patches relevant to this bugzilla.
I mark the patches that are relevant to this bugzilla with a leading "*" character.
* 4 A DONT-USE-RESUABLE-part-2-DEBUG-PRINT-FOR-DEVELOPERS-REv04.patch: Bug 1773787 - Removing reusable flag...
5 A malloc-opt.patch: call malloc_opt()
6 A 1677202-rev02.patch: Bug 1677202 - avoid accessing released heap, coping with timing issue.
7 A CopyToNative-return-value-check.patch: CopyToNative return value check (one indirect)
8 A add-check-msgData-Windows-failure-Rev01.patch: add-check-msgData-Windows-failure
* 9 A new-1242030-precond-000-nsMsgUtils.h-typo-fix.patch: Fixed a typo
*10 A new-1242030-DONT-USE-REUSABLE-new-part-1.8-base.patch: an added error handling in SyncCopyStream
*11 A new-1242030-DONT-USE-REUSABLE-new-part-2-imap-JK-v1-rev04.patch: bug 1242030: DONT-USE-REUSABLE Impro...
*12 A new-1242030-DONT-USE-REUSABLE-new-part-3-import-JK-v1-rev04.patch: bug 1242030: DONT-USE-REUSABLE Imp...
*13 A new-1242030-DONT-USE-REUSABLE-new-part-4-local-less-pop3-rev04.patch: bug 1242030: DONT-USE-REUSABLE ...
*14 A new-1242030-DONT-USE-REUSABLE-new-part-5-pop3-rev04.patch: bug 1242030: DONT-USE-REUSABLE new part-5 ...
*15 A new-1242030-enable-buffering-1st-step-rev03.patch: Bug 1242042: Enabling buffering for file stream to...
*16 A new-1242030-removing-a-Seek-and-tweak-buffering-rev03.patch: Bug 1242046 - Removing unnecessary |Seek...
No 4 is to print the entry to a few functions that are essential to understanding the message insertion to a folder.
It is indispensable to check for the errors that may happen with Windows and Mac OS which I cannot test locally since develop patches and test them under linux.
When an error occurs on treeherder for Windows and OSX, I need to understand exactly where and the history of function calls that have happened immediately before the error occurs.
I may want to merge 15 and 16 above although they are for bug 1240242 and 1242046 originally. The have been mutated and worked on so much so that the merging of the two would be easier to understand since the code inserted in 15 is reverted and changed in 16.
Getting the buffering work correctly in C++ and now in JS land has been very tricky.
I need to test the error failure mode of my patch in the next couple of weeks, but before doing that I may have to fix Bug 1881984 first. Or my testing error conditions may not proceed very well.
Ben, do you have any preference about the splitting/unsplitting of the patches?
EDIT: I forgot to request info from Ben.
Assignee | ||
Comment 298•8 months ago
|
||
Although there are still some error handling issues uncovered by my brutal unplug/plug network cable method (messages stored on remote CIFS server), the write buffering does work well.
For errors, see
bug 1887532 (more likely CIFS kernel bug),
bug 1886360,
bug 1886349,
bug 1881984
The speed of download from the local mailserver (on the same PC) is now as fast as I remember from my testing a few years ago once I figured out where to enable the buffering after Ben's patch landed.
I also found that the copy of messages (actually moving that uses the same code for copying) uses buffered write.
Here is the data.
With the patched C-C TB to use buffered write,
I downloaded 304 messages of various sizes, some of them are much larger than 64KB (the internal buffer I used).
One of these days, even 128KB does not sound too large, though. I wonder what everyone thinks about this.
Some of the are moved to a folder using message filter.
max value of write length is: 65536
count of write calls is : 16050 <--- 13248 calls are used by one byte write... Something is fishy.
Total bytes (WRITE): 5002659
Length of write: # of such calls
1: 13248 <--- This is a one octet write that seems to be used within UI or
among TB's various threads for communication?
I feel it is a bit too excessive. Remember this is only for duration of 10-20 seconds.
This certainly causes context switches and should be reduced as much as possible.
It may be worth looking at after the buffered write patch lands.
Now I have a suspiciion that this large amount of write calls interferes definitely cause
performance issues under heavy workload, but more pointedly
bug 1880148
the timing of C-C TB under strace or gdb tracing may be skewed much with this many calls and
thus proper GUI interaction may not take place. Especially strace. It has to dump data for each
write call.
Ugh... But I need someone in the know to tackle
this issue. I think reduction by a factor of 10 would be the first goal.
2: 305 <--- # of messages plus one.
3: 1
4: 0
5: 1
6: 0
7: 0
8: 16
9: 1
10: 1
11: 0
12: 1
13: 4
14: 2
15: 1
16: 2
17: 0
18: 0
19: 1
20: 0
21: 79
22: 0
23: 0
24: 1
25: 2
26: 306 <-- the # of messages plus one.
27: 1
28: 2
29: 305 <--- This and the next line and the previous size 2 write of 305 counts are related to the # of
30: 304 message downloaded (304), I think.
31: 0
32: 0
33: 0
34: 0
35: 0
36: 1
37: 1
38: 1
39: 0
40: 2
41: 0
42: 1
43: 1
44: 1
45: 0
46: 3
47: 77
48: 0
49: 1
50: 0
51: 1
52: 1
53: 0
54: 1
55: 0
56: 0
57: 2
58: 5
59: 1
60: 0
61: 0
62: 0
63: 0
64: 0
65: 0
66: 0
67: 2
68: 2
69: 0
70: 0
71: 1
72: 1
73: 2
74: 2
75: 1
76: 0
77: 1
78: 2
79: 2
80: 17
81: 1
82: 4
83: 1
84: 1
85: 0
86: 0
87: 0
88: 0
89: 381 <--- This is very likely the write of the header line. Not sure if it can be buffered. but I suspect
90: 0 we can let it be for the time being.
This is peanut compared with the number of size one write calls.
91: 0
92: 0
93: 1
94: 68
95: 1
96: 0
97: 0
98: 0
99: 1
100: 0
101: 1
102: 1
103: 0
104: 0
105: 0
106: 2
107: 0
108: 0
109: 1
110: 0
111: 0
112: 2
113: 0
114: 0
115: 3
116: 1
117: 3
118: 1
119: 0
120: 1
121: 0
122: 1
123: 0
124: 1
125: 0
126: 1
127: 0
2^ 7 - (2^ 8 - 1): 0
2^ 8 - (2^ 9 - 1): 76 <--- So most of the write are now buffered and thus there is no more less 80 octet lines.
2^ 9 - (2^10 - 1): 152 Buffered.
2^10 - (2^11 - 1): 198 //
2^11 - (2^12 - 1): 180 //
2^12 - (2^13 - 1): 160 //
2^13 - (2^14 - 1): 0
2^14 - (2^15 - 1): 9 //
2^15 - (2^16 - 1): 20 //
2^16 - (2^17 - 1): 60 <--- 64KB write is counted here.
2^17 - (2^18 - 1): 0
2^18 - (2^19 - 1): 0
2^19 - (2^20 - 1): 0
2^20 - (2^21 - 1): 0
2^21 - (2^22 - 1): 0
2^22 - (2^23 - 1): 0
2^23 - (2^24 - 1): 0
2^24 - (2^25 - 1): 0
2^25 - (2^26 - 1): 0
2^26 - (2^27 - 1): 0
Some of the buffered writes are from mork and other internal db operation.
The performance win for a data that contains mime-encoded PDF or photo files of 1MB size would be more compelling.
If 64KB write is not buffered, I think from the way the test data is created, it would be like 60000 write calls instead of the current 60 calls. (for 64KB write alone. Count the smaller buffered write, and the # of writes would be very large.)
Basically, the context switch overhead by the write calls kills TB performance during download and message copying, etc.
This is an interim reporting. I will report the similar data with the current TB without my buffered patch.
Assignee | ||
Comment 299•8 months ago
|
||
The following data is with the Thunderbird 115.7.0 64-bit under linux.
Oh, I forgot to mention that my patched version dumps copious amount of debug message to the invoking tty window.
YET, the download was much faster.
The download with the current 115.7.0 is visibly slow.
Of course, the slowness is acute when you download from the pop3 server on the same PC. But the speed up for
moving/copying messages improves much since it is a local issue.
For data collection, the same e-mails were received as in the previous comment and data is recorded.
Mork and sqlite writes are hard to control, so we should focus on the # of unbufferred calls which have
increased the count of write calls is to approximately 66K from 16K.
This causes the slowness due to context switches.
max value of write length is: 32768 <--- I think morq and sqlite use relatively conservative buffer size.
count of write calls is : 66065
Total bytes (WRITE): 8558101
Length of write: # of such calls
1: 14541
2: 0
3: 0
4: 102
5: 0
6: 0
7: 0
8: 26
9: 0
10: 0
11: 0
12: 3
13: 0
14: 128
15: 0
16: 0
17: 0
18: 0
19: 0
20: 0
21: 4
22: 0
23: 304
24: 608
25: 304
26: 0
27: 0
28: 304
29: 0
30: 608
31: 0
32: 304
33: 0
34: 0
35: 0
36: 0
37: 0
38: 608
39: 320
40: 0
41: 0
42: 16
43: 16
44: 6
45: 3
46: 41
47: 49
48: 161
49: 412
50: 0
51: 0
52: 304
53: 0
54: 0
55: 0
56: 0
57: 0
58: 0
59: 0
60: 0
61: 0
62: 0
63: 0
64: 304
65: 0
66: 0
67: 304
68: 0
69: 0
70: 0
71: 0
72: 0
73: 0
74: 64
75: 45744 <--- This is the result of unbuffered write. We lose performance wise.
76: 0
77: 0
78: 0
79: 0
80: 0
81: 0
82: 0
83: 0
84: 0
85: 0
86: 0
87: 0
88: 0
89: 0
90: 0
91: 0
92: 0
93: 0
94: 0
95: 0
96: 0
97: 304
98: 0
99: 0
100: 0
101: 0
102: 0
103: 0
104: 0
105: 0
106: 0
107: 0
108: 0
109: 0
110: 0
111: 0
112: 0
113: 0
114: 0
115: 0
116: 0
117: 0
118: 0
119: 0
120: 0
121: 0
122: 0
123: 0
124: 0
125: 0
126: 0
127: 0
2^ 7 - (2^ 8 - 1): 0
2^ 8 - (2^ 9 - 1): 0
2^ 9 - (2^10 - 1): 5 <--- This and other buffered writes are from morq and sqlite3, I think.
2^10 - (2^11 - 1): 2
2^11 - (2^12 - 1): 2
2^12 - (2^13 - 1): 2
2^13 - (2^14 - 1): 0
2^14 - (2^15 - 1): 20
2^15 - (2^16 - 1): 142
2^16 - (2^17 - 1): 0
2^17 - (2^18 - 1): 0
2^18 - (2^19 - 1): 0
2^19 - (2^20 - 1): 0
2^20 - (2^21 - 1): 0
2^21 - (2^22 - 1): 0
2^22 - (2^23 - 1): 0
2^23 - (2^24 - 1): 0
2^24 - (2^25 - 1): 0
2^25 - (2^26 - 1): 0
2^26 - (2^27 - 1): 0
Again, this is an interim report.
I hope I can resolve the error issues I uncovered during my preliminary testing and
get this patch landed very soon.
Assignee | ||
Comment 300•8 months ago
|
||
I NI'ed Ben so that he knows that things are moving in the right direction as far as bffuering goes.
Updated•8 months ago
|
Comment 301•6 months ago
|
||
How is phab going?
Assignee | ||
Comment 302•6 months ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #301)
How is phab going?
Short summary:
Finally, after about three weeks,
I have locally verified that the simulated (transient NFS) I/O error is not trapping TB any more.
So my added error checks and modification such as the modification of mork to avoid crash are not causing any major issues as far as error processing is concerned after all.
So I have updated my source tree today [the test I described needed a fixed binary sort of] and I am going to run the I/O simulated error tests once more, and
if all is well, I will create a phab patch. That would be a bit huge patch set, but that is what phab is meant to handle.
Also having the patch available for other people to read is a great way to obtain feedback.
Long summary:
6-7 years ago, I manually plugged/unplugged network cable (or virtually toggle the network interface of my virtualbox where my Linux instance runs) to simulate network issues to cause I/O errors.
However, I could never be so sure of the coverage of such manual testing.
Manual or virtual unplugging/plugging of network cable cannot be finely controlled to cause a simulated error at a particular place in the source code. No matter how I hard I try manually, it won't cover much of the code base.
So this time around, I got fed up with this manual error simulation especially after I saw TB crash so easily after a simple I/O simulated error happened. (Mork was the problem.)
I needed test method to make error happen at a finer resolution and reliably and in a repeatable manner so that I could fix any easy-to-catch such serious errors that may have been introduced by my buffered write patch.
How do I automate the test process?
I studied a bit and I selected and used a GUI interaction simulation tool called sikulix
available under linux and windows ( http://sikulix.com/ ) to simulate what a user will do to download messages and handle error/warning dialogs that may be displayed after an I/O error occurs.
This automation was necessary after I realize that I probably needed to test hundred of test cases reliably.
I had to do the following.
I need to instrument the C-C TB by incorporating a macro at the entry of functions where
I would like I/O error to occur at the resolution of a function entry (at the N-th invocation of the group of functions monitored by the inserted macro at function entry.).
I studied a bit and selected a few dozen functions to instrument by a macro that will invoke an external shell script
with the
invoked function name, source code file, and a internally incremented counter value which is incremented by one every time the macro is called.
The test scenario is as follows.
I send four e-mail to test a mail account. The account stores e-mails on a remote NFS server.
I download four e-mail messages by C-C TB.
One message is filtered into Trash folder by filter rule.
The other three messages are filtered to different folders.
I cause a simulated network error at the specified N-th invocation of a specified function during download.
Before the mork patch, C-C TB would crash one way or the other if network is disabled during download.
Basically, I do the following.
For N in my specified list of numbers ( to cause the I/O error at the N-th invocation of monitored functions):
Define N as the timing for the simulated I/O error.
The externally called shell script manages an EXTERNAL counter which is zeroed when BeginMailDelivery is called and
is incremented after each call to the script.
The script compares this external counter to the value of N.
When the comparison succeeds, linux's network host adaptor is disabled and network error occurs for NFS read/write.
Thus, N is the delta of the internal counter value between BeginMailDelivery is called and one of the functions monitored is called and a network error is introduced.
So, download e-mails.
During this, I/O error is simulated at the N-th function invocation as I describe above.
Because the network error is introduced, the download does not complete.
There will be error dialogs/warning dialogs.
My automated sikulix script clear these dialogs automatically.
(*** To my surprise, I found that much fewer error dialogs appear than I anticipated. This may need investigation.
It may be that some try/except pairs hide the errors unexpectedly. ***)
To repeat the loop in the known state, I clear various factors.
I download the e-mails again without network error (so that I can start the cycle with the mail server in known state).
I empty the folders to which e-mails are filtered into.
Emptying avoids the failure due to the broken MSF files for the folder.
Sometimes an e-mail message cannot be filtered into a folde by filter rule.
Yes, MSF files sometimes get corrupted after an I/O error.
My mork patch is not perfect. Important thing is NOT CRASHING so that user can continue.
I empty Trash folder (since I have about 400 test cases, i.e, list of numbers for N), I do not want to make Trash to
become very large.
The clearing of folders are done to make the folders in a known state for the next round of test.
END FOR
The test was meant to make sure that C-C TB does not crash during an simple transient NFS error during downloading (and message filter processing).
Finally after about 2.5 weeks, the initial round of test completed. TB no longer crashes in the above test scenario
with various N values.
I had to make the sikulix script very robust so that it can run hundred of tests without stopping at an error.
(The sikulix script can fail if it encounters a unknown dialog hiding known area of window, etc.)
There are pros and cons of this test approach.
Pros:
- detailed simulation of I/O errors at the resolution of function entry.
- I can cover the whole sequence of a test scenario by inserting the simulated I/O error at the timing of the function instrumented by the incorporation of a macro in the source code. I can specify the I/O error to happen at the N-th invocation of a chosen function.
- I can let it run automatically while I work on other things. <- Very important. I think one whole cycle can finish in a day.
Cons:
- the timing of the occurrence of the simulated (transient NSF) error is only at the entry of a chosen function (and N-th invocation in the whole test scenario.)
This means that if a function has an series of write functions, say, only the first function's error is handled.
The I/O error handling of subsequent I/O functions may not be tested if an earlier error causes the processing to be interrupted and the function returns early.
(<--- To remedy this, the macro I inserted at the entry of function may needed to be inserted before each I/O function.
But this may be excessive.
We may want to catch the lower I/O function invocation ala DTRACE available in Solaris or FreeBSD instead of tweaking the source code of C-C TB. But I digress.) - Available sikulix has python engine that is based on major version 2.x.
Thus later 3.y python module for invoking external subprocesses is not available and thus I am forced to
interact with external shell script by way of creating a flag file to request the enabling of network host adaptor, or
sending four e-mails, etc.
The framework works for now.
I can make the test scenario more complex like downloading 16 messages, etc.
And again, the test proceeds automatically except that the linux desktop is basically overaken by sikulix because it needs to access the screen (to recognize the menu/dialog, etc) and manages mouse pointer and X input events.
But I run linux in virtualbox hosted under Windwos01 and I can run applications like words/excel/firefox/thunderbird under windows10 while the test runs in the guest linux in virtualbox. This is nice. I can occasionally monitor the progress.
I may produce a writeup on this test setup.
Assignee | ||
Comment 303•6 months ago
|
||
(In reply to ISHIKAWA, Chiaki from comment #302)
(In reply to Wayne Mery (:wsmwk) from comment #301)
How is phab going?
Short summary:
Finally, after about three weeks,
I have locally verified that the simulated (transient NFS) I/O error is not trapping TB any more.
So my added error checks and modification such as the modification of mork to avoid crash are not causing any major issues as far as error processing is concerned after all.
So I have updated my source tree today [the test I described needed a fixed binary sort of] and I am going to run the I/O simulated error tests once more, and
if all is well, I will create a phab patch. That would be a bit huge patch set, but that is what phab is meant to handle.
Also having the patch available for other people to read is a great way to obtain feedback.Long summary:
.... omission ...
For N in my specified list of numbers ( to cause the I/O error at the N-th invocation of monitored functions):
I made an optimization of the list of N.
There are repetition of function sequences involving the header processing at each message downloading.
I only selected the value of Ns such that the first occurrences of such functions experience I/O errors for each message download and dropped other invocations that handle the subsequent header lines, etc.
This cut down the list of Ns to 266 for now.
The dropped value of Ns might have to be added back for testing eventually, but for initial attempt, running the test to completion covering as many different function as possible was first objective.
As for linux desktop taken over by sikulix, I think using a virtual desktop and running both sikullix and C-C TB inside the virtual desktop avoid this desktop interaction issue so that I can do other things under linux while the test procees.
As a matter of fact, I use virtual desktop to run C-C TB mochitest and I can whatever I want under linux while mochitest runs locally inside the virtual desktop.
Assignee | ||
Comment 304•6 months ago
|
||
I verified that in my simple test scenario and 278 points where a single I/O error is inserted during the test scenario execution, locally created C-C TB did not crash, and the message received/downloaded again after the error condition is reset, is not corrupted.
Before the patches of Mork so that it won't crash, I saw strange missing new line character and such (and worse randomly lost messages), but these problems are now completely gone at least for the simple scenario. I will test a bit more complex test scenario with more messages downloaded.
But, anyway, I am going to start posting the buffered write patch, but the integration is the next merge after the immediately approaching ESR release, of course.
Assignee | ||
Comment 306•5 months ago
|
||
Updated•5 months ago
|
Assignee | ||
Comment 307•5 months ago
•
|
||
(In reply to Wayne Mery (:wsmwk) from comment #305)
That's a good plan.
Wayne,
I have uploaded the buffered write patch set.
Let the fun begin.
Assignee | ||
Comment 308•5 months ago
|
||
I obsoleted old WIP patches.
My patch improves the performance by removing MANY write system calls by using a much larger buffered write.
The attachment shows how many write system calls are invoked for sending an attachment (/usr/lib/gcc/x86_64-linux-gnu/14/libgomp.a, 612712 octets long).
Before the patch, we needed to invoke write system call more than 11000 times.
(An attachment is turned into a MIME-encoded dta, and thus there are
11346 writes of 73 octet long.
After the patch, these disappears.
They are replaced with much fewer write system calls of 64KB and a smaller write for remaining data.)
There are writes due to mork and sqlite3.
So important thing is to notice the decrease of the # of write system calls.
The sheer number of system calls add overhead and longer elapsed time.
This is also true for copying messages between folders.
Under linux, the speedup due to my patch may not be realized by casual users.
But under Windows the speedup may be noticed by many, especially copying large number of messages between folders. (Download time could be affected by the trip time between the local machine and remote server. If the mail server is local, the speed up WILL BE noticeable.)
Under windows, there are so many tasks running and vying for CPU time slots that the overhead of context switch caused by write system call is very high.
So the speed up should be noticeable.
This is an anecdotal data, but there was a case when I copied messages using same set of messages
under linux and windows, the windows speedup was unbelievably high.
To start with, I had to increase the number of messages to the order of few hundreds since measuring the time to copy or move the messages by wristwatch requires the time to be in the 10 second range.
Under linux, with my patch, the copy time shrunk to less than five seconds. Basically, looking at the wristwatch and hitting the download button and checking the download to finish on the screen, I could tell it was less than five seconds.
Copying the same messages under Windows took more than 1min 30 seconds and
with my patch it shrunk below 10 seconds.
We should invoke as few system calls as reasonably possible for any application program.
Assignee | ||
Comment 309•5 months ago
|
||
tryserver job.
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=5c08ba4c1163e89ab0db4291c6c68a2e07b23154
I don't believe the patch introduces any new errors not seen by other people's jobs.
Also, locally, I have simulated light-weight I/O error by disabling network adapter while I store my message folders on a remote NFS server.
Once the NFS error occurs due to time out, I re-enable the network adapter to continue testing.
TB did not crash and crude transaction handling of pop3 download works.
(I have been testing this for the last few years, so it should be really error free.)
BUT, the error simulation is not perfect. I cannot let the error happen at arbitrarily designated position.
But the test coverage is much better now.
Crash due to mork was found and fixed earlier and since then, TB does not crash anymore after the transient one-time I/O error window is simulated.
The error simulation test is done under linux, but the hardening against the error should also be effective under Windows and OSX.
In any case, this is an improvement and if any deficiency is found in the future, that should be handled accordingly.
So the speedgain is definitely worth it and error handling is not worse than it is today.
From the viewpoint of long term maintenance, framework for transactional handling (rollback of internal DB in the face of error) needs to be hammered out.
There are places where I do check for low-level I/O code and print error message, but does not return the error code from the function where the error is detected. As of now, during my testing, I have not found ill-effect so far. But this is an area where future work may be needed once the patch lands.
But I have a set of patches for Bug 170606 which I would like to land after this patch.
Landing these patches will shrink my local MQ patch queue very much and I will be able to move to HG EVOLVE extension easily is my plan.
Updated•5 months ago
|
Assignee | ||
Comment 310•5 months ago
|
||
But I have a set of patches for Bug 170606 which I would like to land after this patch.
Landing these patches will shrink my local MQ patch queue very much and I will be able to move to HG EVOLVE extension easily is my plan.
I meant to say Bug 1170606.
Comment 311•5 months ago
|
||
The comments still refer to reusableflag
which was removed in bug 1773787. Overall, buffered write was already implemented everywhere (for POP in bug 1882971), so the patch improves the error handling but doesn't change the overall behaviour. Or am I missing something?
Updated•5 months ago
|
Updated•5 months ago
|
Updated•1 month ago
|
Assignee | ||
Comment 312•1 month ago
|
||
(In reply to Yury from comment #311)
The comments still refer to
reusableflag
which was removed in bug 1773787. Overall, buffered write was already implemented everywhere (for POP in bug 1882971), so the patch improves the error handling but doesn't change the overall behaviour. Or am I missing something?
Sorry, I missed your comment.
I should modify the comment to drop the reference to reusableflag since it is indeed removed many months ago.
You are also right the bulk of buffering has been enabled by bug 1882971, and so the remaining issues are related to the
error handling (or the lack of it) in the code.
I wanted to show a stat of write system calls using the latest M-C/C-C.
I wanted to show the # of write system calls for unpatched and patched version.
Duh, I just tried to re-create the write system call statistics based on unpatch M-C/C-C tree locally.
Then I found a grave issue of Inbox corruption. (I have not seen this before. Maybe I have been just lucky.)
The way I created the stat in comment 308 is to record the number of syscalls and do some postprocessing.
I use "strace" command under linux to record the number of syscalls.
strace -f -o /tmp/tb.old-2 /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird -p
strace called this way captures every system call during the startup and execution of downloading of pending messages.
(I could have restricted the capturing to a few selected system calls instead of every system calls.)
Problem is that in the last several months or maybe longer, TB started to issue simply too many system calls that monitoring TB this way slows down the execution to the point that context menu (pull down) does not work reliably any more. This did not happen before. See Bug 1880148
I think there is something wrong in the GUI library. I suspect it is issuing too many system calls to check for mouse interaction under X Windows system.
Anyway, I recall I had to run thunderbird first and then attach strace while TB is running before I send a large attachment and receive the message from the local server. Otherwise, the strange bug may interfere with the operation. Tough.
So I simply quit the above "strace" command by hitting control-C.
This killed thunderbird, too.
Unfortunately, it was receiving prior messages and Inbox as far as I can tell is corrupt toward the end. I saw empty subject, etc. in the message window. I know I should not have killed thunderbird by Control-C.
But believe me, I have tested TB regarding these I/O and other error issues and had to stop TB by hitting control-C MANY times, but it never corrupt Inbox in this manner. (Like I said, I may have been simply lucky.) I think I need to investigate this before showing the comparison of write system call with or without my patches.
The reason I wanted to create a new stat was that there were write system calls of data size of
8, 46, 56, and 92 which were called more than 20 times during the sending and receiving of MIME data (libgomp.a) with the patched version (the dreaded large # of MIME line write system calls are now gone), and I wanted to
see how it goes with the unpatched M-C/C-C trunk.
Then I invoked rebuild thunderbird under strace and wanted to retry running thunderbird first ant then attach strace to it.
So I killed strace command and the above Inbox corruption happened.
My first guess is that we are missing a flush in appropriate place in the unpatched version.
I am looking at the Inbox using an editor.
It seems that my last sent attachment for stat collection ARE at the
end of Inbox. So I am not sure what is wrong and why my last sent messages with attachment show up as ghost messages of
empty subject, empty correspondents. (This is after repair and compaction.)
But I do find something very disturbing.
"The From - ..." line that separates each message now has a date information (resurrected by a recent patch).
BUT, it shows inappropriate date string.
EVERY "From -" line shows the date of the last time I tried repair folder or compaction, I think.
This is true for a message dated Tue, 23 Apr 2024 07:36:50 +09:00 (JST)
And as TB goes through repair/compaction the date string on "From -" line progress a bit. A second is added occasionally.
I am filing a bug although I am not sure what ill-effect it has aside from making the debugging of Inbox content a bit dificult.
It used to be I can simply look at "From-" to figure out the date, but now I have to wade through various headers to find out the timestamp. :-(
Back to the corruption issue.
I THINK we are missing a flush at appropriate places in the unpatched TB.
I will check if my patch here does the proper flushing at appropriate places. My patch was meant to avoid such issues.
The corruption may be due to how strace() intercepts write system calls, but I doubt it.
Assignee | ||
Comment 313•3 days ago
|
||
(In reply to ISHIKAWA, Chiaki from comment #312)
(In reply to Yury from comment #311)
The comments still refer to
reusableflag
which was removed in bug 1773787. Overall, buffered write was already implemented everywhere (for POP in bug 1882971), so the patch improves the error handling but doesn't change the overall behaviour. Or am I missing something?Sorry, I missed your comment.
I should modify the comment to drop the reference to reusableflag since it is indeed removed many months ago.
I removed the reference to reusableflag completely before and now I removed the last reference to m_tempMessageStreamReusable in a comment.
You are also right the bulk of buffering has been enabled by bug 1882971, and so the remaining issues are related to the
error handling (or the lack of it) in the code.
I should clarify that the bulk of buffering of done by Ben's code, but I needed to tweak the code in a few places to make sure that the buffering of the writing to POP3 download takes place properly.
Now I can't recall exactly what it was, but the code here was necessary to obtain the statistics properly IIRC.
I won't be able to do much until the late December due to my day job getting busy.
Description
•