Closed Bug 558528 Opened 15 years ago Closed 8 years ago

Larger buffer size in file I/O by Mail&News to improve performance on slower drives (eg network drives/shares). Use of Necko's buffer size, buffer size user can easily customize.

Categories

(MailNews Core :: Backend, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 52.0

People

(Reporter: World, Assigned: aceman)

References

(Blocks 3 open bugs)

Details

(Keywords: perf, qawanted, Whiteboard: [needs system-specific profiling])

Attachments

(3 files, 2 obsolete files)

Tgis is sipin-off of bug 539389 comment #103, bug 539389 comment #105, bug 539389 comment #106. As seen in bug 539389, mail&news currently uses static 4KB for both read/write buffer of file. As written in bug 539389 comment #103 and bug 539389 comment #105, write buffer size by Necko for Disk Cache is 16KB. If larger buffer size won't produce problem when network file is used, consider larger buffer size than hard coded 4KB, please.
Depends on: 539389
Keywords: perf
Summary: Consider larger buffer size in file I/O by Mail&News. Use of Necko's buffer size, buffer size user can easily customize, ... → Consider larger buffer size in file I/O by Mail&News to improve performance on slower drives (eg network). Use of Necko's buffer size, buffer size user can easily customize, ...
yeah, I don't have a problem increasing the buffer size.
Severity: enhancement → normal
Summary: Consider larger buffer size in file I/O by Mail&News to improve performance on slower drives (eg network). Use of Necko's buffer size, buffer size user can easily customize, ... → Larger buffer size in file I/O by Mail&News to improve performance on slower drives (eg network drives/shares). Use of Necko's buffer size, buffer size user can easily customize.
Irving, ace, know someone who might be interested to hack on this?
I can try this if irving has better things to do. My plan would be: 1. In mailnews/base/util/nsMsgUtils.cpp, remove FOUR_K define and make MsgNewBufferedFileOutputStream read the block size from a pref (read once after app startup). There are also some other defines FOUR_K across the codebase (e.g. I defined one for reading filter definition file). Those could be merged. 2. Going by patch https://bug539389.bugzilla.mozilla.org/attachment.cgi?id=443896, check if there are any NS_NewBufferedOutputStream() left and convert them to MsgNewBufferedFileOutputStream. From what I can read, this solves only access to files local to the profile. The IMAP block size is done in Necko, that I can't touch. But to test this I do not have any networked drive. Can anybody tell me how to simulate this? E.g. would downloading a big POP3 mailbox to a slow USB1 disk be enough? Is just reading from locally stored POP3 message (or Local folders) slow?
Blocks: 487375
This is a good candidate for the Thunderbird profiler: http://mikeconley.ca/blog/2012/06/15/gecko-profiler-now-works-in-thunderbird-daily/ to identify where the real bottleneck is. I don't think we need to make buffer size a pref; if buffering is the problem, the biggest win will come from doing any buffering at all. The difference between buffer sizes is likely to be much less significant. That said, I have no objection to increasing the buffer size to 16k, if that's what Core uses. Different network drives (NFS, SMB, etc) have very different performance characteristics, particularly around file locking, so I think some system-specific profiling might be needed - the problem might not be in write buffering.
Depends on: 815012, 769346
Keywords: qawanted
Whiteboard: [needs system-specific profiling]
Attached patch WIP patch (obsolete) — Splinter Review
Aryx, could you please make us a try build with this patch? Debug builds and all tests. Thanks
Flags: needinfo?(archaeopteryx)
Wayne, it seems the builds are ready. While there are no new test failures visible in the run, this patch has potential of causing dataloss (changes also some raw char* buffers). So make the testers aware of that and make them test on backed up/copied profiles.
Flags: needinfo?(vseerror)
(In reply to :aceman from comment #10) > Wayne, it seems the builds are ready. While there are no new test failures > visible in the run, this patch has potential of causing dataloss (changes > also some raw char* buffers). So make the testers aware of that and make > them test on backed up/copied profiles. thanks aryx and aceman testers, builds to be tested are at http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/archaeopteryx@coole-files.de-3a4c82df47c6/ Take all necessary precautions to prevent potential loss of production data. While we do not expect problems, testing is not without risk. The primary potential risk would be loss of messages stored locally on the PC/network drive, i.e. pop accounts and Local Folders. Use a test account if possible. And create a new profile https://support.mozillamessaging.com/en-US/kb/using-multiple-profiles
better try build at http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/josiah@programmer.net-9c8539d25007/ (comment 11's is a debug build, which for windows requires MSVC to be installed )
Flags: needinfo?(vseerror)
time to request review and land?
Flags: needinfo?(acelists)
Do we have any positive feedback on the patch yet?
Flags: needinfo?(acelists)
(In reply to :aceman from comment #14) > Do we have any positive feedback on the patch yet? only the one I sent you from Phil dated 7/26. Other possibles from Erwin, Renato and Florian on the tb-enterprise mailing list but never heard back. Do you want to do another try build (the last one is gone) and we'll ask them for feedback?
Flags: needinfo?(acelists)
I think he was seeing some regressions there. Were they caused by the patch?
Flags: needinfo?(acelists)
(In reply to :aceman from comment #16) > I think he was seeing some regressions there. Were they caused by the patch? You have the whole of the conversation with that user**, and can probably draw the same conclusion as me - which is I don't think we can assume or conclude anything from it. In fact I find everything about his results to be unexpected. Plus a fatal flaw is the test compared v17 to v25 with patch, instead of v25 versus v25 with patch. In fact, the test probably should be done offline to eliminate mail server interaction. Unfortunately I did not specify either of these items to the tester. I think the next try build should be based on version 25 or 24, not daily. And I think we want comparison of v17, v N and vN+patch. What do we want tested? Do we want more than or other actions **? What do you think? **Action V17 V25 Startup + display Inbox with 468 messages 20-22sec 12-13sec Move 1391 messages from 1 folder to new empty folder 190sec 210sec Folder compact (after the big move, both versions compacted) 35sec 45sec
We want testers to be able to compare TB17, TB24 release and TB24+patch. So Josiahone will create a new try build. Do we want testers to do it working "offline"? Or do we want both online and offline?
Flags: needinfo?(josiah)
It will take a few hours though.
Flags: needinfo?(josiah)
(In reply to Wayne Mery (:wsmwk) from comment #18) > We want testers to be able to compare TB17, TB24 release and TB24+patch. So > Josiahone will create a new try build. > > Do we want testers to do it working "offline"? > Or do we want both online and offline? So my machine is not functioning properly at all so I can't run the try push... Sorry about that.
(In reply to Archaeopteryx [:aryx] from comment #21) > Pushed to Thunderbird-Try: > https://tbpl.mozilla.org/?tree=Thunderbird-Try&showall=1&rev=464bfdf7afdf :( looks like windows builds failed. Can you retry the try please? :)
Flags: needinfo?(archaeopteryx)
Patch has bitrotted. Aceman, can you please update it? Thank you.
Flags: needinfo?(archaeopteryx) → needinfo?(acelists)
Attached patch WIP patch 2 (obsolete) — Splinter Review
Sure.
Attachment #776505 - Attachment is obsolete: true
Flags: needinfo?(acelists) → needinfo?(archaeopteryx)
all platforms failed
<JoeS1> wsm: the error on Try is same as fixed by https://hg.mozilla.org/comm-central/rev/a5925d4f1670 on c-c so it seems the files are out of sync <JoeS1> And on current trunk: TEST-UNEXPECTED-FAIL | check-sync-dirs.py | build file copies are not in sync <JoeS1> I don't know what fixes that though
Flags: needinfo?(acelists)
should be fixed by bug 918819 try try again (please) :) Thanks
Flags: needinfo?(josiah)
Flags: needinfo?(archaeopteryx)
Flags: needinfo?(acelists)
Thanks Aryx! (Clearing needinfo flag)
Flags: needinfo?(josiah)
Comment on attachment 807904 [details] [diff] [review] WIP patch 2 Review of attachment 807904 [details] [diff] [review]: ----------------------------------------------------------------- Also, while you're in these files, code review to make sure we're never leaking the buffers we PR_Malloc(). ::: mailnews/base/util/nsMsgUtils.cpp @@ +1454,5 @@ > + return NS_ERROR_OUT_OF_MEMORY; > + } > + > + return NS_OK; > +} Please remove the loop trying to PR_Malloc() smaller data sizes. We don't run in any environments where the difference between malloc(1k) and malloc(16k) is enough memory to matter; if malloc(16k) fails we can't expect to keep running in any reliable way. In fact, the Firefox code base is moving towards terminating the process when C++ new fails, except in a few special data structures.
Did we think about the possibility that some of the slowness (e.g. downloading messages) may be caused by syncing the database file after each message? See also bug 487375. I wanted to disable syncing even on my desktop as it is unnecessarily bashing the disk (I can see and hear it) but I couldn't find a way to disable it. Do we have a pref. Would we be willing to implement it? Is sync useful over the network? Would enterprises like to disable it?
That seems to be bug 484119.
(In reply to :aceman from comment #33) > That seems to be bug 484119. There may also be sync() vs syncfs() [limited scope] under linux, and also fsync() vs fdatasync() [less I/O] issues under POSIXen OSes including linux. IDE/ATA disks and drivers may not honour the write cache operation very well, whereas SCSI disks and their drivers tend to honour the cache operation very well. So if an application issue sync-related system calls, SCSI disks do flush buffer content to the permanent storage disk media, and under linux PC with SCSI disk(s), I can literally hear the noise from the disk drive at EACH mail download (!) I have tried to figure out where sync-related system calls are called, but could not find it easily in C-C tree. TIA
(In reply to ISHIKAWA, Chiaki from comment #34) > (In reply to :aceman from comment #33) > > That seems to be bug 484119. yes. And we still need advice there? From someone Yet To Be Determined? > There may also be > sync() vs syncfs() [limited scope] under linux, and also > fsync() vs fdatasync() [less I/O] issues under POSIXen OSes including linux. > > IDE/ATA disks and drivers may not honour the write cache operation very > well, whereas > SCSI disks and their drivers tend to honour the cache operation very well. > So if an application issue sync-related system calls, SCSI disks do flush > buffer content to the permanent storage disk media, and > under linux PC with SCSI disk(s), I can literally hear the noise from the > disk drive > at EACH mail download (!) In case it makes a difference, bear in mind most of our users are windows. And likewise enterprise users. > I have tried to figure out where sync-related system calls are called, but > could not find it easily in > C-C tree. > > TIA Are either of you able to help here? (Parkhideh from bug 487375)
Flags: needinfo?(irving)
Flags: needinfo?(Parkhideh)
(In reply to :Irving Reid from comment #6) > This is a good candidate for the Thunderbird profiler: > http://mikeconley.ca/blog/2012/06/15/gecko-profiler-now-works-in-thunderbird- > daily/ to identify where the real bottleneck is. Profiler is much better now. So that is more feasible I think. But do we want profiler as separate measurement? Or, do we want the profiler run when they are timing comparing patched and not patched? > I don't think we need to make buffer size a pref; if buffering is the > problem, the biggest win will come from doing any buffering at all. The > difference between buffer sizes is likely to be much less significant. I agree. A pref is also a possible footgun to users. > That said, I have no objection to increasing the buffer size to 16k, if > that's what Core uses. I got only about 3 users test results - less than half the people who volunteered to test. Attached is the most organized one. But it is also the only one that does not contain conflicting results - the other users results are wildly different within their test suite. If we do not expect increasing the buffer size to make any environments significantly worse, should we just pick a number as Irving suggests, and move on from there? (eg locking issues mentioned below, etc) If not, our next test effort must be more complete in terms of exactly what data we want, more structured, and with good instructions, without being overbearing on the user. And perhaps we should supply them with a test suite of mail folders, so that results from different environments can be compared. (It's bad enough that most of these users are benchmarking on live systems, with network and server contention). Do we have such a thing? > Different network drives (NFS, SMB, etc) have very different performance > characteristics, particularly around file locking, so I think some > system-specific profiling might be needed - the problem might not be in > write buffering. Different bug?
There is a very real risk of data loss if we don't sync the mbox file (or individual message file, in the case of pseudo-maildir) before we tell the server the download was successful, especially for POP. On the other hand, if we crash without syncing the .msf file we can largely recover, though we're having other problems with the .msf rebuilding code right now. Profiling would give us the clearest measurement of whether our slowness is caused by buffering (or lack of buffering) on the message file, vs. writes/sync() calls on the .msf database. If the problem is really in the .msf database, we should file a separate bug for that. The Gecko Profiler might give good information for this, but it might be easier to use system call tracing or platform level profiling, if you have tools for that you like to use. Profiles from before and after the patch would be excellent, but even just profiles of the existing distribution (before patch) would be pretty useful.
Flags: needinfo?(irving)
Re: profiler. Are we talking about profile for TB as in http://mikeconley.ca/blog/2012/06/15/gecko-profiler-now-works-in-thunderbird-daily/ or something different? I could not make the add-on in above URL work as described there with the current C-C TB. The enable/disable button or whatever did not appear in status bar.
(In reply to Wayne Mery (:wsmwk) from comment #18) > Do we want testers to do it working "offline"? > Or do we want both online and offline? ----------------------------------------------------- (In reply to ISHIKAWA, Chiaki from comment #34) > (In reply to :aceman from comment #33) > > That seems to be bug 484119. > > IDE/ATA disks and drivers may not honour the write cache operation very > well, whereas > SCSI disks and their drivers tend to honour the cache operation very well. > So if an application issue sync-related system calls, SCSI disks do flush > buffer content to the permanent storage disk media, and > under linux PC with SCSI disk(s), I can literally hear the noise from the > disk drive > at EACH mail download (!) > ------------------------------------------------------------ (In reply to Wayne Mery (:wsmwk) from comment #17) > > In fact, the test probably should be > done offline to eliminate mail server interaction. Unfortunately I did not > specify either of these items to the tester. > > I think the next try build should be based on version 25 or 24, not daily. > And I think we want comparison of v17, v N and vN+patch. What do we want > tested? Do we want more than or other actions **? What do you think? > > **Action V17 V25 > Startup + display Inbox with 468 messages 20-22sec 12-13sec > Move 1391 messages from 1 folder to new empty folder 190sec 210sec > Folder compact (after the big move, both versions compacted) 35sec 45sec ------------------------------------------------------------ As a late comer who does not have an issue with this bug, I am confused as to what problem is being addressed. So to muddy the water these are my numbers in reply to above three comments: 1) Correct; Tests should be done offline if this bug is addressing disk <==> Thunderbird issues. 2) Using Thunderbird version 16.0.2, OS Windows 2000, hard disk 500 GB SATA Seagate, memory available for Thunderbird 23 MB, Memory available after Thunderbird has its window painted (without new activity) 12 MB, CPU single core Intel Celeron, 1.73 GHz clock, total RAM 512 MB. 3) Total Mail Folder size = 1.34 GB; in 27 folders and 342 files. POP accounts = 8; IMAP = 4 accounts. 4) Size of the largest POP INBOX file 150 MB; 2708 e-mails. 5) Size of the largest POP TRASH file 487 MB; 20416 e-mails, both in the same POP account. 6) Starting Thunderbird in POP INBOX: Window painted in under 6 seconds. 7) Starting Thunderbird in POP TRASH: Window painted in under 12 seconds. 8) COPY (not move) of above POP INBOX (item 4) to another POP TEMP account's folder: less than 83 seconds. Content of e-mails are there. 9) Deleting e-mails of above TEMP folder, less than 3 seconds (not move to Trash). e-mail file still exists on hard disk TEMP. 10) Deleting the above TEMP folder, less than 5 seconds. Disk is cleared of this folder. 11) Copying INBOX file (150 MB) from windows to another folder, less than 8 seconds (not a Thunderbird operation). 12) Sanity check: COPY (not move) of above POP INBOX to an IMAP account's INBOX: less than 5 seconds. But Thunderbird is showing the information without content (offline IMAP). When clicked on an e-mail this message appears, which is correct: "The body of this message has not been downloaded from the server for reading offline." 13) A question has been raised as to how hard disk handles its cache (comment 34); my lighthearted answer is none of our business; hardware engineers do it well. :) Once again, pardon my confusion of not grasping well if this bug is about power up and file movements of Thunderbird or something else. I see comments that I cannot related to the above measurements.
Flags: needinfo?(Parkhideh)
(In reply to ISHIKAWA, Chiaki from comment #38) > Re: profiler. > > Are we talking about profile[R] for TB as in > http://mikeconley.ca/blog/2012/06/15/gecko-profiler-now-works-in-thunderbird- > daily/ We are, but the version currently on AMO doesn't seem to work for Thunderbird nightly - which gecko-profiler-now-works-in-thunderbird-daily points to. I've emailed you the 266k version that works for me. The two little panels described at http://mikeconley.ca/blog/2012/06/15/gecko-profiler-now-works-in-thunderbird-daily/ are shown in the screen this comment's screen shot
Flags: needinfo?(ishikawa)
(In reply to Wayne Mery (:wsmwk) from comment #40) > Created attachment 8410940 [details] > TB profiler status bar.png > > (In reply to ISHIKAWA, Chiaki from comment #38) > > Re: profiler. > > > > Are we talking about profile[R] for TB as in > > http://mikeconley.ca/blog/2012/06/15/gecko-profiler-now-works-in-thunderbird- > > daily/ > > We are, but the version currently on AMO doesn't seem to work for > Thunderbird nightly - which gecko-profiler-now-works-in-thunderbird-daily > points to. > > I've emailed you the 266k version that works for me. The two little panels > described at > http://mikeconley.ca/blog/2012/06/15/gecko-profiler-now-works-in-thunderbird- > daily/ are shown in the screen this comment's screen shot I have no idea why, but - the addon doesn't work on C-C version of TB on which I am working currently. That is bad enough. But, I will try to use this on daily so that I can obtain the rough estimate of the performance issues. TIA
Flags: needinfo?(ishikawa)
Blocks: 905576
(In reply to ISHIKAWA, Chiaki from comment #41) > (In reply to Wayne Mery (:wsmwk) from comment #40) > >... > > I've emailed you the 266k version that works for me. The two little panels > > described at > > http://mikeconley.ca/blog/2012/06/15/gecko-profiler-now-works-in-thunderbird- > > daily/ are shown in the screen this comment's screen shot > > I have no idea why, but > - the addon doesn't work on C-C version of TB on which I am working > currently. > > That is bad enough. > > But, I will try to use this on daily so that I can obtain the rough estimate > of the > performance issues. > > TIA I am now using 1.12.0 with current trunk, which I think is newer than what I emailed you. If this newer version is not working for you, check error console for startup errors.
Blocks: 1052107
Blocks: tbbigfolder
(In reply to ISHIKAWA, Chiaki from comment #41) > I have no idea why, but > - the addon doesn't work on C-C version of TB on which I am working > currently. > > That is bad enough. > > But, I will try to use this on daily so that I can obtain the rough estimate > of the performance issues. > > TIA Have you had good results?
Flags: needinfo?(ishikawa)
(In reply to Wayne Mery (:wsmwk) from comment #44) > (In reply to ISHIKAWA, Chiaki from comment #41) > > I have no idea why, but > > - the addon doesn't work on C-C version of TB on which I am working > > currently. > > > > That is bad enough. > > > > But, I will try to use this on daily so that I can obtain the rough estimate > > of the performance issues. > > > > TIA > > Have you had good results? Obviously, the elapsed time for file I/O using larger buffer especially the target file is on a remote file system is much shorter. I wanted to create a comparison summary, but Bug 1111304 - assertion failure loadinfo got in the way. Open community development is wonderful when it works. Unfortunately, for the last few months, TB development is not. We are basically catching up with M-C changes that break TB (and SeaMonkey) and surprisingly, every time I try to make sure my patch works, something gets in the way and break mozmill or xpcshell tests very badly :-( So concrete comparison data has to wait for a while :-(
Flags: needinfo?(ishikawa)
(I am going to post a redacted summary to a mozilla mailing list since I think it merits wide exposure.) C-C TB performance issue: unbuffered write where it can use buffering. I have spotted a very conspicuous offender of not using buffered output where it can in comm-central thunderbird. My current questions are as follows. Is there a way to set the buffer-mode m_fileStream->Write inside CopyDate [see below] (just in case there are other wayward callers)? Or can we simply replace this Write with a buffered output version, but what that would be? I think as long as a buffer of reasonable size, even if it is 4K or 16K, is used, I/O performance will much be better and reasonably good. It does not have to be 128K or much larger something, I think, although it would help for a local disk write and read. However, here I report that I have found a routine is copying message data WITHOUT ANY BUFFERING at all when it WRITES the output LINE BY LINE. A REAL PROBLEM: Imagine if the message has a few MB of binary attachment as mime data (PDF, JPEG photo, etc.). Typically, the attachment in the message is encoded as mime data, each line smaller than 80 octets. TB seems to send mime as 74 octets line including the ending. So, say, for a 2MB bytes of attachment (which grows x (1.3 - 2) by the encoding), assuming that base64 increases the size of characters to be sent by 1.33 approximately 1.33*2MB / 72 is = 38739 write system calls are issued by TB on the receiving side. 38739 system calls. NO WONDER TB feels slow. The copying feels slow when my mail is stored on local disk. I imagine someone who has to use a remote file system in a corporate setting for mail store would be really penalized with so excessive # of write system calls with small chunk of data over the wire. These write calls ought to be merged by proper buffering! Even a small 4K buffer would cut down the # of system calls about 55 times, not too bad. Enough preamble for giving the background information. The culprit routine I found is NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) in the file comm-central/mailnews/local/src/nsLocalMailFolder.cpp And the problem unbuffered write in question is http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsLocalMailFolder.cpp#2111 2111 rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten); This does not seem to use buffering at all. (Please check the excerpt of trace output from strace under linux attached in the following.). I agree that, since this function seems to be in need of scanning the data for "From" at the beginning, and escaping "From" at the beginning of line, etc., it needs to SCAN the INPUT line by line, but there is NO REASON AT ALL TO USE UNBUFFERED OUTPUT! Repeat-by: Monitor the system calls issued by TB. I used strace under linux. strace -o /tmp/t.out TB-BINARY Start TB, and then copy a message with reasonably large binary attachment in one folder to another folder. I found the repetition of small write(fd, bug, len) with length parameter 74 MANY times. E.g.: from a typical strace record. I only excerpted the interesting lines. There are many other system calls. The particular message I copied had an attachment of 1.2MB size. The repeated writes() have the first parameter (file descriptor) as 54. I searched backward in trace file and found that FD 54 is for the target folder. (folder-C) ... open("/home/ishikawa/.thunderbird/u1ku1i3y.default/Mail/Local Folders/folder-C", O_RDWR|O_CREAT, 0664) = 54 ... Extraction of the initial part of the writes to this file descriptor: (I think I created the message using HTML format mail. That is why you see a few lines of HTML main message body. But that is not of any concern. Concern is the long repetition of mime data that follows.) ... write(54, "From - Sun Nov 22 11:35:01 2009\r"..., 33) = 33 clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0 clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0 write(54, "X-Mozilla-Status: 0001\r\n", 24) = 24 clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0 clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0 write(54, "X-Mozilla-Status2: 10000000\r\n", 29) = 29 .... I omit clock_gettime() below ... write(54, "X-Mozilla-Keys: $label3 "..., 97) = 97 write(54, "FCC: mailbox://nobody@Local%20Fo"..., 44) = 44 write(54, "X-Identity-Key: id1\r\n", 21) = 21 write(54, "Message-ID: <4B08A02E.7080201@ex"..., 44) = 44 write(54, "Date: Sun, 22 Nov 2009 11:35:01 "..., 39) = 39 write(54, "From: ishikawa <ishikawa@example"..., 39) = 39 write(54, "X-Mozilla-Draft-Info: internal/d"..., 70) = 70 write(54, "User-Agent: Thunderbird 2.0.0.23"..., 49) = 49 write(54, "MIME-Version: 1.0\r\n", 19) = 19 write(54, "To: nobody@example.com\r\n", 24) = 24 write(54, "Subject: libc-2.7.so 11:35\r\n", 28) = 28 write(54, "Content-Type: multipart/mixed;\r\n", 32) = 32 write(54, " boundary=\"------------000003080"..., 50) = 50 write(54, "\r\n", 2) = 2 write(54, "This is a multi-part message in "..., 46) = 46 write(54, "--------------000003080301040107"..., 40) = 40 write(54, "Content-Type: text/html; charset"..., 45) = 45 write(54, "Content-Transfer-Encoding: 7bit\r"..., 33) = 33 write(54, "\r\n", 2) = 2 write(54, "<!DOCTYPE html PUBLIC \"-//W3C//D"..., 65) = 65 write(54, "<html>\r\n", 8) = 8 write(54, "<head>\r\n", 8) = 8 write(54, "</head>\r\n", 9) = 9 write(54, "<body bgcolor=\"#ffffff\" text=\"#0"..., 41) = 41 write(54, "<br>\r\n", 6) = 6 write(54, "<br>\r\n", 6) = 6 write(54, "</body>\r\n", 9) = 9 write(54, "</html>\r\n", 9) = 9 write(54, "\r\n", 2) = 2 write(54, "--------------000003080301040107"..., 40) = 40 write(54, "Content-Type: application/octet-"..., 41) = 41 write(54, " name=\"libc-2.7.so\"\r\n", 21) = 21 write(54, "Content-Transfer-Encoding: base6"..., 35) = 35 write(54, "Content-Disposition: attachment;"..., 34) = 34 write(54, " filename=\"libc-2.7.so\"\r\n", 25) = 25 write(54, "\r\n", 2) = 2 write(54, "f0VMRgEBAQAAAAAAAAAAAAMAAwABAAAA"..., 74) = 74 write(54, "AAA0AAAANAAAADQAAABAAQAAQAEAAAUA"..., 74) = 74 write(54, "BAAAAAEAAAABAAAAAAAAAAAAAAAAAAAA"..., 74) = 74 write(54, "EwCcJwAAcFQAAAYAAAAAEAAAAgAAAJyd"..., 74) = 74 ... many many lines of write with 74 octets.... write(54, "--------------000003080301040107"..., 42) = 42 write(54, "\r\n", 2) = 2 close(54) = 0 Observation: For this message with 1.2MB attachment, I counted 24016 write calls. (fgrep "write(54," /tmp/t.out | wc) Granted that the routine seems to be doing something about seeing "From", escaping "From" with ">From", etc., but it has *NO* reason to use UNBUFFERED WRITE here. It can buffer the output and then finally at the end of the loop it can flush or something. HOW TO FIX? I am not familiar with I/O routine inside mozilla code base. I am not sure where this file stream is opened and why it is not using buffered mode. rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten); Is there a way to set the buffer-mode to m_fileStream inside CopyData (just in case there are other wayward callers)? Or can we simply replace this Write with a buffered output version, but what that function would be? [The stack trace is shown below for those who want to chase what is calling this function.] The above is for copying existing message to another folder. (So it can happen offline.) When I look up CopyData in mxr database, I found another instance of CopyData (for imap) and that seems to have the same unbuffered output issue. (Or at least superficially the code look the same. Unless "->Write" semantics is different (i.e., unbuffered vs buffered) we have the crawling I/O performance when we copy / download large message (when we write to a local folder)! MXR: E.g. Other place(s) where CopyData was found: I searched the identifier CopyData using mxr: MXR is slightly buggy, I think: I only left mail/mailnews-related entries. Defined as a function in: mailnews/imap/src/nsImapMailFolder.cpp (View Hg log or Hg annotations) line 3343, as member of class nsImapMailFolder -- NS_IMETHODIMP nsImapMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) This function, CopyData calls another function CopyDataToOutputStreamForAppend() and in it, we have 3303 { 3304 rv = outputStream->Write(start, 3305 end-start, 3306 &writeCount); 3307 rv = outputStream->Write(CRLF, 2, &writeCount); 3308 } We should make sure that these writes are buffered! (and flush at the end of the loop/return.) From what I read before about the slowness of I/O in a comment posted by IMAP user, probably these are NOT buffered, either. AND, we SHOULD CHECK the return value rv: there is NO ERROR checking here !? No wonder there are reports of IMAP users losing messages, etc. in the past? Defined as a function prototype in: mailnews/base/public/nsICopyMessageListener.idl (View Hg log or Hg annotations) line 20 -- void copyData(in nsIInputStream aIStream, in long aLength); Defined as a variable in: mailnews/local/src/nsLocalMailFolder.cpp (View Hg log or Hg annotations) line 1830 -- rv = CopyData(inputStream, (int32_t) fileSize); line 2026 -- NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength) This is the function I found out about and report here. Again, the questions I am posing to this list is >Is there a way to set the buffer-mode to m_fileStream inside CopyDate > (just in case there are other wayward callers)? >Or can we simply replace this Write with a buffered output version? See Related bugs: Bug 558528 - Larger buffer size in file I/O by Mail&News to improve performance on slower drives (eg network drives/shares). Use of Necko's buffer size, buffer size user can easily customize. (Well, I have not used the aceman's patch in there yet since I wanted to find out where the offending code for calling write() repeatedly with smallish 74 octet length originated. I noticed this repeated write() of smallish size, earlier this year when someone reported a serious issue when his mail store is on a remote file system (CIFS) and he seems to have used imap. I tried to re-create his problem locally by mounting Windows 7 file system from my local linux installation and in so doing noticed a few issues with CIFS code itself (which seems to have been fixed nicely), and a missing error code or two on TB's side, and in so doing noticed a trace of MANY write system calls with short (70+ octets) on wire. Such many small write calls over wire kills performance easily. BTW, :aceman's patch there seems to handle many "read" side of the issue. The problem I am reporting is "write" side.) Bug 494238 - (jetpack-panel-apps) ability for a jetpack to open a panel The great profiler ( see https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Thunderbird_Performance_Problem_with_G ) never seemed to work for me when I tried to use it against C-C TB. But finally, thanks to https://bugzilla.mozilla.org/show_bug.cgi?id=494238#c47 it seems to work! (But I have not yet used that profiler for this because it is only a day since I could get it working after a long fiasco with bug regarding loadinfo. Bug Bug 1111304 - assertion failure loadinfo, this month. ) I wanted to get out the words out fast so that something can be done during the holidays for this serious issue. But once, I am familiar and can offer performance figures using the profiler, I will. I am posting this Bug 558528: This is the stack trace of C-C TB when the write with this small chunk is invoked. (gdb) where #0 write () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff7fb8740 in pt_Write (fd=<optimized out>, buf=<optimized out>, amount=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/pthreads/ptio.c:1315 #2 0x00007ffff04beccc in (anonymous namespace)::interposedWrite ( aFd=<optimized out>, aBuf=<optimized out>, aAmt=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/build/NSPRInterposer.cpp:68 #3 0x00007ffff7f9f0b9 in PR_Write (fd=<optimized out>, buf=<optimized out>, amount=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/io/priometh.c:114 #4 0x00007ffff009ce03 in nsMsgFileStream::Write (this=<optimized out>, buf=<optimized out>, count=<optimized out>, result=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgFileStream.cpp:154 #5 0x00007ffff02ace8d in nsMsgLocalMailFolder::CopyData ( this=<optimized out>, aIStream=<optimized out>, aLength=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsLocalMailFolder.cpp:2111 #6 0x00007fffeffa089f in nsCopyMessageStreamListener::OnDataAvailable ( this=<optimized out>, ctxt=<optimized out>, aIStream=<optimized out>, sourceOffset=<optimized out>, aLength=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/src/nsCopyMessageStreamListener.cpp:86 #7 0x00007ffff02c8a48 in nsMailboxProtocol::ReadMessageResponse ( this=<optimized out>, inputStream=<optimized out>, sourceOffset=<optimized out>, length=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:594 #8 0x00007ffff02c8f0b in nsMailboxProtocol::ProcessProtocolState ( this=<optimized out>, url=<optimized out>, inputStream=<optimized out>, offset=<optimized out>, length=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:679 #9 0x00007ffff00c1399 in nsMsgProtocol::OnDataAvailable ( this=<optimized out>, request=<optimized out>, ctxt=<optimized out>, inStr=<optimized out>, sourceOffset=<optimized out>, count=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgProtocol.cpp:353 #10 0x00007ffff0572bfe in nsInputStreamPump::OnStateTransfer ( this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:609 #11 0x00007ffff0573f5d in nsInputStreamPump::OnInputStreamReady ( this=<optimized out>, stream=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:436 #12 0x00007ffff04751d8 in nsInputStreamReadyEvent::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/io/nsStreamUtils.cpp:88 #13 0x00007ffff049cf6d in nsThread::ProcessNextEvent (this=<optimized out>, aMayWait=<optimized out>, aResult=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/threads/nsThread.cpp:855 #14 0x00007ffff04cfba3 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/glue/nsThreadUtils.cpp:265 #15 0x00007ffff08c4356 in mozilla::ipc::MessagePump::Run ( this=<optimized out>, aDelegate=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/glue/MessagePump.cpp:99 #16 0x00007ffff0885029 in MessageLoop::RunInternal (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:233 #17 0x00007ffff08853fc in RunHandler (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:226 #18 MessageLoop::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:200 #19 0x00007ffff28827b3 in nsBaseAppShell::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/widget/nsBaseAppShell.cpp:164 #20 0x00007ffff33d61f9 in nsAppStartup::Run (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/components/startup/nsAppStartup.cpp:281 #21 0x00007ffff345e1ee in XREMain::XRE_mainRun (this=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4153 #22 0x00007ffff345f9ab in XREMain::XRE_main (this=<optimized out>, argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4229 #23 0x00007ffff345fd7d in XRE_main (argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>, aFlags=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4449 #24 0x000000000040378e in do_main (argc=argc@entry=1, argv=argv@entry=0x7fffffffd9f8, xreDirectory=<optimized out>) at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:195 #25 0x00000000004038b5 in main (argc=1, argv=0x7fffffffd9f8) at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:380 (gdb) info register Yeah, right. I had to set a breakpoint to "write" system call and checked the registers (this is under Debian GNU/Linux 64-bit) and when the passed length parameter is 74 in , I hit the breakpoint and voila. Took me a while to figure out gdb's estoeric command expression syntax, and I have to skip the major part of the execution without setting up the break point and hitting control-C to rever back to gdb whe TB is finally waiting for user interaction after showing the three windows and THEN set the conditional breakpoint break write if ($rdx == 74) to halt and obtain the above stacktrace. TIA
See Also: → 1116055
(In reply to ISHIKAWA, Chiaki from comment #46) > (I am going to post a redacted summary to a mozilla mailing list since > I think it merits wide exposure.) > > C-C TB performance issue: unbuffered write where it can use buffering. > Since the newly found issue is about "Not using buffering at all" whereas this bugzilla entry we are reading here is about the SIZE of the buffer when buffering is used, I filed a new bug Bug 1116055 - Performance issue: Failure to use buffered write TIA
Yes, thanks for the separation.
(In reply to ISHIKAWA, Chiaki from comment #47) > Since the newly found issue is about "Not using buffering at all" > whereasthis bugzilla entry we are reading here is about the SIZE of the buffer when buffering is used, > I filed a new bug > Bug 1116055 - Performance issue: Failure to use buffered write For "writing mail data to local msgStore file", Bug 769346 is already opened. What is difference of new Bug 1116055 from existing Bug 769346? FYI. If "read of msgFilterRules.dat", "read for each single byte" is also known.
(In reply to WADA from comment #49) > (In reply to ISHIKAWA, Chiaki from comment #47) > > Since the newly found issue is about "Not using buffering at all" > > whereasthis bugzilla entry we are reading here is about the SIZE of the buffer when buffering is used, > > I filed a new bug > > Bug 1116055 - Performance issue: Failure to use buffered write > > For "writing mail data to local msgStore file", Bug 769346 is already opened. > What is difference of new Bug 1116055 from existing Bug 769346? > > FYI. > If "read of msgFilterRules.dat", "read for each single byte" is also known. Agha! The bug 769346 has exactly the same patch, since 2012 (!), as I have proposed this week in Bug 1116055 - Performance issue: Failure to use buffered write . Why did the patch not make it to the source tree!? Thank you for pointing out.
Please note that a side effect of the slowness, is that, because code for moving mail from IMAP account to remote CIFS mounted directory does not correctly handled various network IO error (timeout, eagain, ...), as soon as I transfer a mail with a > 3 MB attachment on a samba share on a filler (besides the fact that it takes ages when it execptionnaly works) I usually end up with corrupted target directory (after having TB itself unresponsives for minutes). I need to quit and to try recovering the target directory. Fater recovery I usually get several identical mails with 0 size. frequenetly it even causes data lost which makes this bug side effect critical. I agree the write error handling in case of networked IO is a separate issue but this rize the criticallity of this bug.
Severity: normal → major
Chiaki, have you been able to try this patch? It seems to be furthering what you start in bug 1242050. Could I move forward here? I need to address comment 31.
Depends on: 1242050
Flags: needinfo?(ishikawa)
(In reply to :aceman from comment #52) > Chiaki, have you been able to try this patch? It seems to be furthering what > you start in bug 1242050. > Could I move forward here? I need to address comment 31. aceman, I am sorry that I didn't realize (or rather I forgot) this excellent work regarding the increase of buffer size. I was checking I/O trace after my local patches for enabling buffering (removing the offending |Seek|) and found the strange 4K limit and looked in the code and simply tried to increase it to 16K in Bug 1242050. (At least, it seems to work that way, too.) But after reading the patch here, I am surprised that it works at all since I did not bother to tweak the underlying buffer size, et al. I will try your patch instead of my patch in 1242050 and report back. Unfortunately, at this moment, C-C TB does not compile due to Bug Bug 1243760 - Replace nsPIDOMWindow with nsPIDOMWindowInner/Outer and I want to make sure that I can, at least, check Windows build other than my local build under linux (well testing OSX build would be nice, but OSX build has been busted for nearly a month on C-C tryserver.) Wanting to check the behavior of C-C build with my patches on OSX and Windows platforms necessitates the C-C tryserver build, and this forces me to fetch M-C changes locally to test compilation under linux on my PC (since C-C tryserver automatically fetches all the new M-C changes when a build is executed.) I suspect this is a price to pay because of the split repositories. So stay tuned until Bug 1243760 is fixed so that I can test locally and on C-C tryserver. Hopefully, it will be good, but until the real code is executed, I can't say for sure. TIA
Actually I am fine if you land your Bug 1242050 first (with the other needed bugs) as that is the set of patches that you have tested and have seen a difference. We can than add my patch here on top to see if it does anything.
(In reply to :aceman from comment #54) > Actually I am fine if you land your Bug 1242050 first (with the other needed > bugs) as that is the set of patches that you have tested and have seen a > difference. We can than add my patch here on top to see if it does anything. I finally put a formal review request in bug 1242050 now that the bug I noticed during testing of my patch has a partial solution. (See bug 1250723 It seems a very complex and subtle issue, and we can come to it after 45 is released, I think. ) TIA
I failed to clear the needinfo flag somehow. Please see comment 55 above. I have requested the big buffer patch to be incorporated soon.
Flags: needinfo?(ishikawa)
Attached patch patch v3Splinter Review
So this increases the buffering to 16KB at some places. There are some places that already use 16KB (or have their own constants). And I left some on 4KB that didn't look like they could use more. https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=45a75ef11264b4b6817bb91070192e0a6b5b8114
Attachment #807904 - Attachment is obsolete: true
Attachment #8797812 - Flags: review?(rkent)
Comment on attachment 8797812 [details] [diff] [review] patch v3 Review of attachment 8797812 [details] [diff] [review]: ----------------------------------------------------------------- This all looks OK to me, so let's land it and see if there are any issues.
Attachment #8797812 - Flags: review?(rkent) → review+
Assignee: nobody → acelists
Thanks. https://hg.mozilla.org/comm-central/rev/6e8535df065688482afc8a161c56bac42c529fb0 If there are any slow cases still seen in TB52 (after also Chiaki's patches are landed), please open new bugs for specific scenarios that we can focus on.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 52.0
You've switched to buffered write in two cases, most prominently in the attachment handler: https://hg.mozilla.org/comm-central/rev/6e8535df065688482afc8a161c56bac42c529fb0#l7.83 That's part of the deal here, right?
(In reply to Jorg K (GMT+2) from comment #61) > You've switched to buffered write in two cases, most prominently in the > attachment handler: > https://hg.mozilla.org/comm-central/rev/ > 6e8535df065688482afc8a161c56bac42c529fb0#l7.83 > That's part of the deal here, right? A comment from a third party: this is good. I first noticed the slowness when I was saving attachments to local disk. Everything else in my patch series has been added along the way to increase the output throughput during mail downloading and attachment saving [with better error check.].
Sure, the idea is good. Only it was snuck into a bug whose summary talks about buffer sizes. And remember that this simple change caused corruption in bug 769346. Anyway that was about mailboxes and we're in the process of looking into why that caused a problem and doing it better this time. Also, note that the change here was not made when saving attachments but when writing the message body to a temporary file upon send. In defence of this change we note that attachments are already "snarfed" using buffered write. So it's most likely OK to treat the body in the same way.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: