Closed Bug 539389 Opened 14 years ago Closed 14 years ago

Very slow file manipulation deleting or moving messages (profile stored on synchronized network folder). Tb3 requests write for each line of mail data. Should use buffering.

Categories

(MailNews Core :: Backend, defect)

1.9.1 Branch
x86
Windows XP
defect
Not set
major

Tracking

(blocking-thunderbird3.1 beta2+, thunderbird3.1 beta2-fixed, blocking-thunderbird3.0 -, thunderbird3.0 .5-fixed)

RESOLVED FIXED
Tracking Status
blocking-thunderbird3.1 --- beta2+
thunderbird3.1 --- beta2-fixed
blocking-thunderbird3.0 --- -
thunderbird3.0 --- .5-fixed

People

(Reporter: marc3, Assigned: Bienvenu)

References

(Blocks 2 open bugs, )

Details

(Keywords: perf, regression, Whiteboard: [gs][gssolved])

Attachments

(6 files, 5 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0

My profile is located on H:\mail, H: being a network drive which is the "My documents" location. When deleting a message or moving a message to another folder, the operation is very slow (the bigger the message the slower the operation). Thunderbird 2 wuith the same type of config was working fine.

Reproducible: Always

Steps to Reproduce:
1. Move My Documents to a network drive and activate off-line usage
2. Move profile to this drive
3. Delete / move messages within TB
Actual Results:  
S l o w . . .

Expected Results:  
fast !
Are you using global indexing? Tools | options | advanced, general - enable global search and indexing. That might be slow using a networked drive.
Hello. Disabling the indexing option does not solve the problem: still very slow.
Keywords: perf
Marc, you much slower is version 3 compared to version 2, for what size of message(s)?
(In reply to comment #3)
> much slower *in* version 3 compared to version 2? for what size of message(s)?
Well the bigger message the slower the operation. There is a huge difference between the two versions. 3.0.1 does not bring much improvement.

It is so slow that I will not deploy it on the site were we use Thunderbird.
we're really looking for numbers to describe the difference, so that someone investigating this knows whether they are seeing what you see. so some benchmarks would be helpful. 

Also do you see this with antivirus software turned off?  (this is important)
Keywords: regression
Summary: Very slow file manipulation (profile stored on synchronized network folder) → Very slow file manipulation deleting or moving messages (profile stored on synchronized network folder)
Version: unspecified → 3.0
I have the same problem... move message to another folder is slow: 1 message of 10 MB take 48 seconds!! Even disabling Global Indexer, antivirus and working in Safe Mode...  :-(
I think that read message by 4 kb chunk over the network is not a big idea... bigger (or settable) chunk size can reduce read requests and improve IO performance!
To Marc Rechté(bug opener):

POP3(or Local Folders)? Or IMAP? Or both?
Slow on any folder of any type of account?

> Steps to Reproduce:
> 1. Move My Documents to a network drive and activate off-line usage

What do you mean by "off-line usage"?
if by network you mean to the imap server, see  Bug 216374
Deletion of a 460 KB message:
POP account (POP server located on Internet): 25 s
IMAP account (IMAP server located on LAN): immediate
Local Folder and POP3. Slow on any folder. I have already deleted msf files...
:-(
Thunderbird performance:
    Load message of  ~9500 kb   :   5 seconds
    Move message of  ~9500 kb :    48 seconds

Network performance:
    single file of 147 megabyte (OpenOffice Installer) moved from the NAS to my Desktop in 17 seconds
    3842 files of (total) 19.2 megabyte (icons) move from the NAS to my Desktop in 68 seconds
we have some TB3 with profile saved in our NAS and the message
loading/moving operation is slow...
File Monitor tell me that TB3 try to read file by 4 kb chunk... but over
a network path this is a problem because the network don't have the same
performance than a hard disk. I'm a software developer and my experience
tell me that 2300 read requests of 4 Kb are much slower than single
request of 10 Mb (this only an example, I know that is stupid in a
production environment) This is only my opinion... but reducing the
number of IO requests can improve performance.
I have tried with Thunderbird 2 and the result with same pc/message/folder is:

Thunderbird 2 performance:
    Load message of  ~9500 kb   :   1 seconds
    Move message of  ~9500 kb :     5 seconds

Thunderbird 3 performance:
    Load message of  ~9500 kb   :   5 seconds
    Move message of  ~9500 kb :    48 seconds

I have found the problem... TB2 write by 4096 byte chunk... TB3 write data by 74 byte chunk... this behaviour increase the number of IO requests!!!!!!!
Read chuck are always 4096 in TB2 and TB3
What data is TB 3 writing in 74 byte chunks? Or, probably more useful, to which files?
(In reply to comment #15)
> Thunderbird 2 performance:
>     Load message of  ~9500 kb   :   1 seconds
>     Move message of  ~9500 kb :     5 seconds
> Thunderbird 3 performance:
>     Load message of  ~9500 kb   :   5 seconds
>     Move message of  ~9500 kb :    48 seconds
> I have found the problem... TB2 write by 4096 byte chunk... TB3 write data by
> 74 byte chunk...

How about "Copy message" case?

"74 byte chunk" is possibly data for X-Mozilla-Status:/X-Mozilla-Status2: header update for each deleted mail in move source folder, if local mail folder(POP3, Local Folders). And difference may be a result of frequent "flush" than Tb2.
  X-Mozilla-Status: 0001[CRLF]        24 bytes (already read mail)
  X-Mozilla-Status2: 00000000[CRLF]   29 bytes
  If this mail is deleted, replaced by X-Mozilla-Status: 0009[CRLF]
> http://www.eyrich-net.org/mozilla/X-Mozilla-Status.html?en
> MSG_FLAG_EXPUNGED
When I got Process Monitor log on MS Win in the past, 1KB data was written(re-write) for this update of mail folder file on local HDD(100GB HDD, NTFS).

Application usually doesn't care for local HDD or network drive. Application simply uses open(C:\..., X:\..., or UNC=\\server\...)/seek/read/write/close. But Tb may be different. It may depends on NAS.
What kind of NAS do you use? MS Win's standard NAS, NETBIOS over TCP(SMB)?
Can you get Process Monitor log for read/write/re-write requests to mail folder file by Tb?
> http://technet.microsoft.com/en-us/sysinternals/bb545027.aspx
> Filter : Path contains \<move_source_folder_name>, Include
>          Path contains \<move_target_folder_name>, Include
For the mozilla status updates, we only write 4 bytes, unless we had to add a whole x-mozilla-status line because it was missing. But that was true in 2.0 as well.

x-mozilla-keywords header is roughly 74 bytes but that gets added when the message is downloaded so move/delete shouldn't care.

3.0 uses different file i/o from 2.0, definitely.
You can look my File Monitor log at http://www.stefanofraccaro.org/move_message_tb3.zip   (simple text file)

I have move a single message of 10 MB from the folder [Local Folder > Cai] to the folder [Local Folder > Cai > Pubblicità]... and the write chunks are even 74 byte!
Copy message log is available at  http://www.stefanofraccaro.org/copy_message_tb3.zip   (simple text file)

I have copied a single message of 10 MB from the folder [Local Folder > Cai > Pubblicità] to folder [Local Folder > Cai]... and the write chunks are even
74 byte!

N:\ is a mapped network drive. Our NAS is a HP NAS with Windows Storage Server 2003 R2 with 2x1 Gigabit link. My pc is a Windows XP SP3 with TB 3.0.1 with 100 Mbit link.
(In reply to comment #19)
> the write chunks are even 74 byte!

4KB buffer looks to be used for read, but Tb3 looks to issue write for each line of mail data. This was same for copy of a mail to local mail folder on local HDD drive.
Many write of 74 bytes is probably for lines in base64 encoded part for big attachment.

As buffering for physical write is done by OS, line-mode like write won't produce severe performance problem if local HDD and if data size is not so huge. And relatively small 4KB buffer usually doesn't improve performance well if very big data copy, although overhead of "many write calls" is sufficiently reduced.
However, if file writing to network drive is executed by OS without sufficient buffering, "write for each line" may produce performance problem.

Stefano Fraccaro, can you get NAS level(SMB level) trace for data writing to the network drive?
Summary: Very slow file manipulation deleting or moving messages (profile stored on synchronized network folder) → Very slow file manipulation deleting or moving messages (profile stored on synchronized network folder). Tb3 requests write for each line of mail data.
"Enable write caching on disk" is disabled in our scsi configuration because we had some problems with write caching. I don't remember exactly the problem at the moment.
Why writing line by line? Local HDD is used only in home environment... but in many (I think all) company network path are used for central data administration (and backup). We have often email with 5-15 MB of attachments and is slow... if TB2 use 4096 byte chunk for writing... why TB3 use 74 byte chunk? I know that is a simpler implementation (maybe) but is not useful! Writing/reading by 4096 byte (or 8192, settable parameter?) result ever in better performance... with or without disk cache.
Errata corrige: write cache on each disk is disabled... but the system use the HP Array Accelerator that is an hardware cache on the scsi controller.
(In reply to comment #22)
> if TB2 use 4096 byte chunk for writing... why TB3 use 74 byte chunk?

As seen in your log, and as i write, 74 not buffer size. Length of many data lines. See first part of write request(for mail header lines). 

> Why writing line by line?

Sorry but I don't know why. Possibly a result of new-line conversion to standard [CRLF] instead of dead copy of mail data. As you say, I think buffering should be used always if bulk write of mail data lines instead of replace of small data part such as X-Mozilla-Status: header.
> As seen in your log, and as i write, 74 not buffer size. Length of many data
> lines. See first part of write request(for mail header lines). 

TB3 can write multiple lines for each write request... there is no need to write only 1 line for each request

> Sorry but I don't know why. Possibly a result of new-line conversion to
> standard [CRLF] instead of dead copy of mail data. As you say, I think
> buffering should be used always if bulk write of mail data lines instead of
> replace of small data part such as X-Mozilla-Status: header.

Our array controller on NAS provide 512 MB of cache... the problem is the very high number of single lines write. I think that the problem is not the write cache but the time wasted in network requests...
You can look my Wireshark log at
http://www.stefanofraccaro.org/copy_message_pcap.zip  

I have copied the message to another folder... Follow me:

9728000 byte / 4096 = 2375 requests
9728000 byte / 74 = 131459 requests
0.000300 second between network packet requests

  EQUAL

0.000300 x 2375 = 0.712500 seconds wasted
0.000300 x 131459 = 39.437700 seconds wasted

Right? Or not?
I migrated to Windows 7 (my profile being accessed through the network on the same Samba server) and believe it or not it is much faster. A 460k message deletion gives:

XP (old PC): with Anti-virus 26s, without: 22s
7 (new PC): with Anti-virus 5s, without: 2s.

OK, the new PC is much more powerful than the former one, would it be the reason ?

Regards
Windows 7 is "faster" because minimum hardware requirements (like 1 GB ram but 2 suggested,...). Try XP on the same hardware configuration and see the difference  ;-)
It's an unconfirmed bug because you are investigating? or you think that writing line by line is a correct IO behaviour?
(In reply to comment #29)
> or you think (snip)

Who is the "you"?
As for me, not developer, this is regression by some changes, because log says;
  (a) After two 4KB read, write for single line occurs
  (b) (a) repeats many times. almost all "write for single line" is 74 bytes,
      as big attachemt part. 
  (c) after many write of 74 bytes without 4KB read,
      operation becomes like (a) again.
  (d) (c) repeats.
I guess wrong logic change happened.
  old: (4KB read, 4KB write)*N for mail data copy
  Tb3: (i)   4KB read, write a line, next 4KB read till all 4KB buffers is full
       (ii)  repeat write a line, till buffer becomes free
       (iii) (i) and (ii) is repeated until all data is written
I don't know for a fact the issue here is the same as what some people are seeing in http://gsfn.us/t/nw16 -- but Doug there just reported his performance markedly improved after putting profile on local drive
Whiteboard: [gs]
http://forums.mozillazine.org/viewtopic.php?f=39&t=1715325&start=0 may be another example - waiting on reporter feedback.


bienvenu observes "we switched away from nsFIleSpec-based (and nsIFileSpec) [1] streams, but I don't know how much buffering they did" and so we are using NS_NewLocalFileOutputStream function to get nsILocalFileOutputStream ... which is not buffered, "it just calls PR_Write"

[1] (Bug 459693 Eliminate nsFileSpec and nsIFileSpec (references) from MailNews)

OT but interesting
 Bug 241708 -  nsIFile support for unc paths is almost entirely broken
Severity: normal → major
The old FileImpl object automatically buffered writes:

http://mxr.mozilla.org/seamonkey/ident?i=AllocateBuffers

So we may need to buffer all of our output streams.
Confirming by attached Process Monitor log data.
Status: UNCONFIRMED → NEW
Component: Folder and Message Lists → Backend
Ever confirmed: true
Flags: blocking-thunderbird3.1?
Product: Thunderbird → MailNews Core
QA Contact: folders-message-lists → backend
Version: 3.0 → 1.9.1 Branch
So, whats the verdict? Is there a fix in the works for this bug?

We're getting ready to roll out a shiny new Domain Controller and implement Folder Redirection, but we are also half-way through upgrading everyone to TB3. It would be a catastrophe to flip the switch and have everyone's TB die a horrible death immediately thereafter...
For the moment I suggest you to use TB2.x... TB 3.1 is scheduled for April 2010 and I hope will be a rock solid release.
Not an option, the IMAP improvements in TB3 are too many to ignore for so long.

Besides, I see no reason this would have to wait for 3.1 - it sounds like the cause is now known, s should be fixable in a minor point release (3.0.2? 3.0.3?).

If it isn't fixed by the time we roll it out, I'll have to decide between delaying the implementation of Redirected Folders, or figure out a simple way to keep just the TB profiles local until this is fixed and then 'move' them, without having to go to each workstation.

Well, we only have about 60, so it wouldn't be *that* much work, but I'd rather not do that if I don't have to...
(In reply to comment #35)
> So, whats the verdict? Is there a fix in the works for this bug?

We're still looking at it, but its now on the radars that it needs to be (which will help).

(In reply to comment #36)
> For the moment I suggest you to use TB2.x... TB 3.1 is scheduled for April 2010
> and I hope will be a rock solid release.

I would actually recommend waiting a while before making such a big decision. If Neil's comment 33 is pointing at the solution, then we may have a fix being proposed within a few days.
blocking-thunderbird3.0: --- → ?
discussed on Thunderbird weekly call that this is required for Lehigh to deploy  version 3, i.e. 3.1 (we must skip 3.0)
blocking-thunderbird3.1: --- → ?
Bummer...
Marcus... TB 3.1 is a short cycle release that is coming in few months (tb3.1 beta1 code freeze is scheduled for 23 february) . I think that is better to hard test the buffering code than missing some email because bugs. The most important thing is that the bug has been identified and that are solving   ;-)
I understand... but our new Domain Controller is being pushed out before 3.1 will be released, so this still means I must make a decision on how to deal with the fact that TB is going to have to be treated as a special case until the bug is not only fixed, but properly tested... which just means another headache for me...

But no worries, I'm used to headaches. I was just hoping that since it was a regression, it would be an easier/reliable fix that wouldn't touch as much code and therefore be able to be rolled out earlier.

Thanks, at least the bug (and apparently the fix) are known and will happen fairly soon.
Blocks: 386306
I can confirm this bug. I tried saving the profile on a server running ubuntu linux with samba for file sharing with my windows vista client and tried also to save the profile on a windows 2008 r2 standard server which makes no difference. The problem also occurs under W2K and XP.
So I would confirm such comments like #33 that something in TB3 must have changed.
Attached patch Proposed patch (obsolete) — Splinter Review
* Created MsgNewBufferedFileOutputStream method. This method always uses a buffer size of FOUR_K since that's what was used by the old nsIFileStream.
* Replaced all uses of NS_NewLocalFileOutputStream. Fortunately buffered streams are also seekable so no further changes are needed there.
* Fixed nsURLFetcher and nsMsgAttachmentHandler to use nsIOutputStream instead of nsIFileOutputStream
* Included nsMsgUtils as appropriate
Assignee: nobody → neil
Status: NEW → ASSIGNED
Attachment #427105 - Flags: superreview?(bienvenu)
Attachment #427105 - Flags: review?(bugzilla)
blocking-thunderbird3.1: ? → rc1+
Flags: blocking-thunderbird3.1?
Summary: Very slow file manipulation deleting or moving messages (profile stored on synchronized network folder). Tb3 requests write for each line of mail data. → Very slow file manipulation deleting or moving messages (profile stored on synchronized network folder). Tb3 requests write for each line of mail data. Should use buffering.
it's possible to adjust buffer size by a multiple of 4 Kb (a parameter with default value of 4 Kb)? We work often with large attachment and, in my tests, I see that larger buffer are faster with network drive:

[1] 1000 requests by 4 kb = ~1922 ms
[2] 1000 requests by 8 kb = ~2375 ms

[3] 500 requests by 4 kb = ~953 ms
[4] 500 requests by 8 kb = ~1282 ms

I can transfer the same amount of data with case 1 and 4... but case 4 is very very fast than case 1!
Attached file xpcshell test log
running with this patch causes quite a few failures - I've attached output from a run of the tests. My suspicion is that we're not flushing in all the places we should be, and that's causing the test failures.
I know this might not be the right place to diskuss when bugs are being fixed, but where is the right place? Many of our users want to downgrade to TB 2.x because working with TB3 seems sometimes to be impossible. Why has been version 3.0.2 released before fixing this bug? Who decides about new releases and when they should be released or being hold back until bugs like this one are fixed?
Attachment #427672 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 427105 [details] [diff] [review]
Proposed patch

>diff -r 10bf94ca3428 mailnews/compose/public/nsIURLFetcher.idl

>-  void fireURLRequest(in nsIURI aURL, in nsILocalFile localFile, in nsIFileOutputStream fileStream, in nsAttachSaveCompletionCallback cb, in voidPtr tagData);
>+  void fireURLRequest(in nsIURI aURL, in nsILocalFile localFile, in nsIOutputStream fileStream, in nsAttachSaveCompletionCallback cb, in voidPtr tagData);
> 
>-  void initialize(in nsILocalFile localFile, in nsIFileOutputStream fileStream, in nsAttachSaveCompletionCallback cb, in voidPtr tagData);
>+  void initialize(in nsILocalFile localFile, in nsIOutputStream fileStream, in nsAttachSaveCompletionCallback cb, in voidPtr tagData);

I know the restriction is being lifted, but should these have a UUID change anyway, to ensure back-wards compatibility is supported?

>diff -r 10bf94ca3428 mailnews/import/applemail/src/nsAppleMailImport.cpp

This file needs to include nsMsgUtils.h to compile on Mac.

I also see bienvenu's failures. It definitely looks like the file write hasn't completed somehow.
Attachment #427105 - Flags: review?(bugzilla) → review-
I understand why rc1 is targeted, however I suggest this be targeted to 3.1b2 for these reasons:

1. if this doesn't land well before 3.1 rc1 (scheduled 4-20) then we likely won't get it fixed in the next 3.0.x, i.e. 3.0.4, perhaps not even 3.0.5 depending on how much baking is wanted. (unless we don't care about 3.0.x in this regard - which may be OK)

2. Because of the pain threshold this may prevent affected users from joining in v3.1 testing, and therefore some of the v3.1 "relief" patches targeted to the same user population (the institutional dudes) won't get enough baking time with testers.

3. It doesn't seem like a difficult patch, aside from getting the tests right :)
Flags: in-litmus?
I think that serious bug like this one MUST be fixed before releasing a version... is there are blocking bugs all date must be rescheduled!
Why release another "good for the trash" version? Release when is time to release... you are not Microsoft!
@ Comment 52: Full ACK!
I'm wrong or this bug will be fixed on rc1? Why not beta2 ???
Only two words... no comment   :-(
> 2. Because of the pain threshold this may prevent affected users from joining
> in v3.1 testing, and therefore some of the v3.1 "relief" patches targeted to
> the same user population (the institutional dudes) won't get enough baking time
> with testers.

I support this comment. Right now I'm forced to use evolution if I want to read email read, so there'll be no feedback from me on anything until this is fixed.
I came to this location from another bug that said upgrade to TB3.0 because simple MAPI may work.

https://bugzilla.mozilla.org/show_bug.cgi?id=319714


TB3.0 is unusable in a small business network, when user emails are located on shared network drive. Very, very slow on simple manipulation of emails ( move, copy, download etc)


Once rolled back to 2.0.23 was very fast again, email database was not corrupted or changed by TB3.0 ( that we have noticed)


So I can't upgrade and the simple email MAPI bug will not be fixed in  2.0.23...what do I do..?


( Please don't say buy a hex core with SSD's, if we had the dough we would all be using MS Exchange server )
(In reply to comment #56)
> So I can't upgrade and the simple email MAPI bug will not be fixed in 
> 2.0.23...what do I do..?

The only thing you can do... wait until 3.1 ships. Of course you could go ahead and upgrade to a beta or RC if you want it earlier, but it is coming soon - hopefully in April...
I tried 3.1 today, following these instructions (loosely) 
http://www.webupd8.org/2010/03/thunderbird-31-beta-1-released-and-how.html

So I'm running:

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.2pre) Gecko/20100302 Lanikai/3.1b1

and find none of the slowness and hanging problems that 3.0 is giving me or that I associated with this bug report.
Some I/O work better but when you send an email with large attachment, for example, the message will be appended to the Sent folder by 74 byte for each request (expected 4096) ... and this is very very slow when working with network paths (#26).
(In reply to comment #58)
> I tried 3.1 today, following these instructions (loosely) 
> http://www.webupd8.org/2010/03/thunderbird-31-beta-1-released-and-how.html
> 
> So I'm running:
> 
> Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.2pre) Gecko/20100302
> Lanikai/3.1b1
> 
> and find none of the slowness and hanging problems that 3.0 is giving me or
> that I associated with this bug report.

Hmmm... so, did a fix land early??
(In reply to comment #59)
> Some I/O work better but when you send an email with large attachment,
> for example, the message will be appended to the Sent folder by 74 byte
> for each request (expected 4096) ...
> and this is very very slow when working with network paths (#26).

It's not surprizing, because it's this bug, and this bug is not fixed yet.
Sam Liddicott, what is your purpose to generate mail of "this bug is not fixed yet" which is sent to all peoples who set "cc:" of this bug, even though this bug is really not fixed yet? Are you wanting to let us to know "Some I/O work better" with new build or new release even though this bug is not fixed yet?
Yes.

I found TB3.0 unusable and it made my whole PC unusable (NFS home directories).

With this TB3.1 I find it quite usable and I can stop using evolution.

This information may be helpful to other TB users feeling the pain.

I'm not a maintainer so I didn't mean to report that the bug was fixed, only that my pain was gone with this build, and how other people may use this build.  As this build downloads and runs from a single directory it is easy to remove it afterwards when a fix comes out.

I don't grumble that a fix takes a long time, but when it does it is a common thing for the bug report to become a conversation on work-arounds.

Comment #59 was a clarification of why it was better for me even though the bug is not fixed, #59 was not a complaint that some things were not fixed.
(In reply to comment #62)
> Yes.
> I found TB3.0 unusable and it made my whole PC unusable (NFS home directories).
> With this TB3.1 I find it quite usable and I can stop using evolution.
> This information may be helpful to other TB users feeling the pain.

I see. I may be a result by "reduce of issueing flush in unnecessary situation" of new build or new release, because you use Linux and "flush request" directry affects on I/O perfrmance on Linux.

Sam Liddicott, do you use local HDD as profile(and mail directory) location? Or network file server for it?
Home directories, including profiles and mail directories are NFS based.
Sam, is that 3.1 beta 1?  If not, which nightly?
This works:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.2pre) Gecko/20100302 Lanikai/3.1b1

Problematic builds for NFS  are shortly prior to:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.10pre) Gecko/20100401 Shredder/3.0.5pre

I can't swear that the 20100401 build has the NFS problems because I stopped using TB before then, but the 20100324 daily build certainly did.

I'm getting daily builds from the Ubuntu PPA:
https://launchpad.net/~ubuntu-mozilla-daily/+archive/ppa

but they aren't building 3.1, but only 3.0 (as "thunderbird" not as "thunderbird-3.0")
I've been investigating the test failures - the first one, test_bug471682.js, is because the timestamp on the target mail folder isn't updated by the time we check it. We're only flushing the stream, not closing it, at this point, which may be account for that.
test_imap/unit/test_bug460636.js is failing because we're getting an OnStopRunningUrl notification before OnStopRequest is called on the nsMsgSaveAsListener, so the file hasn't been flushed yet. We could hack the test, but the bug would still be there. I'm not sure why OnStopRequest isn't called earlier; it looks a bit like it's leaking, since it's not getting called until we go offline at the end of the test.
I flipped around the order of calling OnStopRequest and OnStopRunningUrl in nsIMapProtocol::ProcessCurrentUrl, and that does fix that test (though I don't know if anyone else is relying on the opposite order).
I fixed an other issue in offline imap, so I'm down to 3 non-gloda test failures. Prior to this patch, doing a seek of 0 from the current pos would do a flush w/o a sync. That's no longer true, so code that does that needs to be fixed.
Attached patch patch that passes unit tests (obsolete) — Splinter Review
Neil/Standard8, this patch passes the xpcshell tests for me. Neil, could you have a look at the changes I've made?
Attachment #436801 - Flags: superreview?(bugzilla)
Attachment #436801 - Flags: review?(neil)
(In reply to comment #71)
> patch that passes unit tests

4KB buffer size looks hard coded. We are not able to customize buffer size for writing of mail file data? (read buffer of 4KB seems hard coded too, because read is always doen by 4KB.)

> +nsresult MsgNewBufferedFileOutputStream(nsIOutputStream **aResult,
>(snip)
> +  if (NS_SUCCEEDED(rv))
> +    rv = NS_NewBufferedOutputStream(aResult, stream, FOUR_K);

> +// Automatically creates an output stream with a 4K buffer
> +NS_MSG_BASE nsresult MsgNewBufferedFileOutputStream(nsIOutputStream (snip)
Comment on attachment 436801 [details] [diff] [review]
patch that passes unit tests

>@@ -2604,16 +2604,13 @@ NS_IMETHODIMP nsMsgLocalMailFolder::EndC
>     // need to reset this in case we're move/copying multiple msgs.
>     mCopyState->m_fromLineSeen = PR_FALSE;
> 
>-    // flush the copied message. Seeking causes a flush, w/o syncing. We need
>-    // a flush+sync at the end to get the file size and time updated correctly.
>+    // flush the copied message. We need a close at the end to get the
>+    // file size and time updated correctly.
>     if (mCopyState->m_fileStream)
>     {
>       // we need this for the m_dummyEnvelopeNeeded code below.
>       seekableStream = do_QueryInterface(mCopyState->m_fileStream);
>-      if (multipleCopiesFinished)
>-        mCopyState->m_fileStream->Flush();
>-      else
>-        seekableStream->Seek(nsISeekableStream::NS_SEEK_CUR, 0);
>+      mCopyState->m_fileStream->Flush();
>     }
>   }
>   //Copy the header to the new database
>@@ -2736,6 +2733,11 @@ NS_IMETHODIMP nsMsgLocalMailFolder::EndC
>   }
>   else
>   {
>+    // Need to close the output stream before we send notifications so that
>+    // the dest folder file is closed and synced.
>+    if ((multipleCopiesFinished || !mCopyState->m_copyingMultipleMessages) &&
>+        mCopyState->m_fileStream)
>+      mCopyState->m_fileStream->Close();
>     // If we have some headers, then there is a source, so notify itemMoveCopyCompleted.
>     // If we don't have any headers already, (eg save as draft, send) then notify itemAdded.
>     // This notification is done after the messages are deleted, so that saving a new draft
The other two Seek/Flush changes look reasonable to me, but this one doesn't look quite right; as far as I understand the Flush in the first hunk should change to Close (and the Seek to Flush as before).
The problem is that if we're writing out the dummy envelope, we need the file stream left open after the code in the first hunk runs. I had originally done as you suggested (if I understand your suggestion), and ran into a hang because we tried to use the closed file stream later on...
(In reply to comment #74)
> The problem is that if we're writing out the dummy envelope, we need the file
> stream left open after the code in the first hunk runs. I had originally done
> as you suggested (if I understand your suggestion), and ran into a hang because
> we tried to use the closed file stream later on...
Ah yes, I see that envelope code lives in between the two changes in the patch. I don't suppose it could be moved earlier in the patch? It's not a problem if it can't, but then I'm confused as to why the test to close the stream
>+    if ((multipleCopiesFinished || !mCopyState->m_copyingMultipleMessages) &&
>+        mCopyState->m_fileStream)
is not the same as the test to "Flush" the stream earlier.
>     if (mCopyState->m_fileStream)
>     {
>       // we need this for the m_dummyEnvelopeNeeded code below.
>       seekableStream = do_QueryInterface(mCopyState->m_fileStream);
>-      if (multipleCopiesFinished)
Kent, this is the bug+patch I was referring to that brings out issues in the test having to do with flushing&syncing disk files.
I don't know if we'll be able to get this change into gecko, but it could be helpful. W/ this change, we don't need to change as much of the mailnews code, because Flush goes back to doing what it used to do without buffered streams.
Attachment #437020 - Flags: feedback?(neil)
Attached patch address Neil's comments (obsolete) — Splinter Review
this also passes the unit tests. I moved the dummy envelope code to before the close/seek, and things still seem OK. I restored the Seek code because buffered streams do flush themselves (though not the underlying stream) when you seek 0 from cur offset, and that code is complicated enough I wanted to reduce the changes. I did leave in a few of the other more straightfoward changes from Seek to Flush mainly just because it seems a bit hacky, and it doesn't matter with buffered streams.

If we could fix buffered streams to flush the underlying stream on Flush, we could make this diff even smaller...
Attachment #436801 - Attachment is obsolete: true
Attachment #437067 - Flags: superreview?(bugzilla)
Attachment #437067 - Flags: review?(neil)
Attachment #436801 - Flags: superreview?(bugzilla)
Attachment #436801 - Flags: review?(neil)
Comment on attachment 437020 [details] [diff] [review]
make buffered output stream flush underlying stream

You'll need to think about what you want to do, if anything, about the return value from the Sink's Flush.
Attachment #437020 - Flags: feedback?(neil) → feedback+
(In reply to comment #78)
> I restored the Seek code because buffered streams do flush themselves
> (though not the underlying stream) when you seek 0 from cur offset
I don't think that's true; there's an early return at line 187.
Attachment #437067 - Attachment is private: false
Comment on attachment 437067 [details] [diff] [review]
address Neil's comments

r=me with all the Seek()s changed back to Flush() again as per a previous iteration of the patch.
Attachment #437067 - Flags: review?(neil) → review+
Neil, re-requesting review because there was only one Seek that needed to be changed back to a Flush, if I'm understanding correctly.
Attachment #437067 - Attachment is obsolete: true
Attachment #437330 - Flags: superreview?
Attachment #437330 - Flags: review?(neil)
Attachment #437067 - Flags: superreview?(bugzilla)
I need to request a try server build for folks to try. I'll do that this afternoon.
Attached patch oops, wrong patch (obsolete) — Splinter Review
Attachment #437330 - Attachment is obsolete: true
Attachment #437397 - Flags: superreview?(bugzilla)
Attachment #437397 - Flags: review?(neil)
Attachment #437330 - Flags: superreview?
Attachment #437330 - Flags: review?(neil)
Comment on attachment 437397 [details] [diff] [review]
oops, wrong patch

Bugzilla interdiff lies, but I think that's what I meant, yes.
Attachment #437397 - Flags: review?(neil) → review+
Attached patch fix mac bustageSplinter Review
try server build showed a mac build bustage, which I've tried to fix.
Attachment #437397 - Attachment is obsolete: true
Attachment #437466 - Flags: superreview?(bugzilla)
Attachment #437397 - Flags: superreview?(bugzilla)
try server builds for folks who want to see if it fixes network profile dir issues are here - http://tinderbox.mozilla.org/showbuilds.cgi?tree=ThunderbirdTry
Attachment #427105 - Attachment is obsolete: true
Attachment #427105 - Flags: superreview?(bienvenu)
Testing st8-ldap-debug-191 I had the feeling that the issue (as it went with 3.0) is not solved ("s l o w"). So I made some quick measures.

Mail with a 1.5 Mo attachment file:
* Open with 3.0: ~3s
* Open with st8-ldap-debug-191: ~3s

Mail with a 10 Mo attachment file:
* Open with 3.0: ~3s
* Open with st8-ldap-debug-191: ~3s
* Move with 3.0: ~55s
* Move with st8-ldap-debug-191: ~55s

Mail with 500 ko plain text
* Open with 3.0: ~1s
* Move with 3.0: ~4s
* Open with st8-ldap-debug-191: ~1s
* Move with st8-ldap-debug-191: ~4s
Sorry for the wrong build.

With Bienvenu's build (after some global reindexing), it is far better for moving mails - but not for "experience" (mail loading on display).

Mail with a 1.5 Mo attachment file:
* Open: ~3s

Mail with a 10 Mo attachment file:
* Open: ~3s
* Move: ~7s (*instead of 55s*)

Mail with 500 ko plain text
* Open: ~1s
* Move: not tested
4KB buffer was used in writing to local mail folder file.
However, many write with very small record size(4bytes) was seen for write to \global-messages-db.sqlite-journal.
"one write request per a line" was observerd for pluginreg.dat.
(In reply to comment #91)
> 4KB buffer was used in writing to local mail folder file.
> However, many write with very small record size(4bytes) was seen for write to
> \global-messages-db.sqlite-journal.

(In reply to comment #92)
> "one write request per a line" was observerd for pluginreg.dat.

WADA: Unless these are definitive regressions from 2.0 (and the first of these isn't as its a new file), I suggest we keep them to separate bugs/issues if they are really an issue.
(In reply to comment #90)
Sylvain, thx for testing this. Were you using a profile on a networked drive?

 
> With Bienvenu's build (after some global reindexing)

trunk builds require a new gloda index, and the patch was applied to a trunk build.

> Mail with a 1.5 Mo attachment file:
> * Open: ~3s
Is that an inline attachment or not?
> 
> Mail with a 10 Mo attachment file:
> * Open: ~3s
> * Move: ~7s (*instead of 55s*)
The same time for a 1.5 Mo attachment as a 10Mo attachment? If those were the same kind of attachments, that implies most of the time is not spent loading the message from disk...
(In reply to comment #93)
> WADA: Unless these are definitive regressions from 2.0 (and the first of these
> isn't as its a new file), I suggest we keep them to separate bugs/issues if
> they are really an issue.

I don't know whether phenomenon of "many write requests with small or very small record size for other than local mail folder file" is one of causes of this bug's original problem of "slowness if remote profile" or not. I don't know whether the phenomenon is regression from Tb 2 or not.
I merely reported next.
 - 4KB buffer was used in write to local mail folder file by David's change.
 - write request with very small record size(4bytes) was seen in log.
   It's for journal of SQLite DB. (write of 4bytes + write of 1024byte)*N
   is seen in log for contiguous position.
   (I/O of SQLite file can not be regression from Tb 2.)
 - write request with 1024bytes for contiguous position is seen for SQLite DB.
   (I/O of SQLite file can not be regression from Tb 2.)
 - "one write request per a line" for some files is seen in log.
 
Mark Banner, please note that next part in bug summary of this bug is text I added to bug summary because I found "one write request per a line for local mail folder file with Tb 3.0"(apparently regression from Tb2) in Process Monitor log.
> Tb3 requests write for each line of mail data. Should use buffering.
It's never a part of original report of this bug. Original report of this bug is for "slowness if remote profile".
Are you sure that "one write request per a line for local mail folder file by Tb 3.0" is only cause of this bug's original problem?
The lack of output buffering is a known regression (as per comment 32) that we need to fix before we can make further performance comparisons.
(In reply to comment #96)

"one write request per a line for pluginreg.dat(and for some other files)" is same regression of Tb 3.0 as "one write request per a line for local mail folder file"? If so, does developers plan to resolve issue of "lack of output buffering for all files" in this bug? Or we are better to open separate bug(s) as Mark Banner suggested to me? (keep this bug for issue of "lack of output buffering for local mail folder file" only?)

Am I better to open separate bug(s) for hard-to-understand-why I/O behaviour on journal of SQLite DB and for looks-inefficient I/O behavior on SQLite DB(looks to use 1KB buffer which I think too small)?
yes, separate bugs for separate issues is best. pluginreg.dat is core code, and never used filespec, so it's definitely a different issue. sqlite never used filespec either...
(In reply to comment #97)
> "one write request per a line for pluginreg.dat(and for some other files)" is
> same regression of Tb 3.0 as "one write request per a line for local mail
> folder file"?
No; if it is a regression then it's a core regression, and needs a core bug.

> If so, does developers plan to resolve issue of "lack of output
> buffering for all files" in this bug? Or we are better to open separate bug(s)
> as Mark Banner suggested to me? (keep this bug for issue of "lack of output
> buffering for local mail folder file" only?)
Even if we find other mail regressions it's better to open separate bug(s).

> Am I better to open separate bug(s) for hard-to-understand-why I/O behaviour on
> journal of SQLite DB and for looks-inefficient I/O behavior on SQLite DB(looks
> to use 1KB buffer which I think too small)?
SQLite is actually an imported library so any issues would have to go upstream.
(In reply to comment #99)
> > Am I better to open separate bug(s) for hard-to-understand-why I/O behaviour on
> > journal of SQLite DB and for looks-inefficient I/O behavior on SQLite DB(looks
> > to use 1KB buffer which I think too small)?
> SQLite is actually an imported library so any issues would have to go upstream.

The 1k writes are because gloda uses a 1k page size; that can be changed.  A new bug should be logged in 'MailNews / Database' if the suggestions is to increase the size.
Mark Banner, David:Bienvenu, neil@parkwaycc.co.uk, Andrew Sutherland:

Thanks for explanation and guidance. I now can distingush next;
  a. one write per line for mail folder file
  b. one write per line for pluginreg.dat(and some others)
  c. I/O on file for SQLite DB
I'll check b. is regression or not, and I'll open bug for c. if I can discover performance issue like this bug due to 1k page size.
(In reply to comment #94)
> Sylvain, thx for testing this. Were you using a profile on a networked drive?

Yes. A network share on Samba 3 in a 100MB LAN, accessed from Windows Vista XP2.

> > Mail with a 1.5 Mo attachment file:
> > * Open: ~3s
> Is that an inline attachment or not?

After check, not inlined: "Content-Disposition: attachment;".

> > Mail with a 10 Mo attachment file:
> > * Open: ~3s
> > * Move: ~7s (*instead of 55s*)
> The same time for a 1.5 Mo attachment as a 10Mo attachment? If those were the
> same kind of attachments, that implies most of the time is not spent loading
> the message from disk...

Neither this one is inline. I confirm the time spent for loading is similar for both.
4KB buffer use by try server build in write operation is confirmed for following file writing.
1. Save As .eml (save.eml)
2. Save As text (save.txt)
3. mail attachment in draft save/send later/send (...\Temp\nsemail.eml)
4. image attachment in draft save/send later/send (...\Temp\nscopy.tmp)
5. drfat save/send later (mail folder file of "Drafts" and "Unsent Messages")
6. work file of folder compaction (...\<mailfolder>.sbd\nstmp)
7. download to IMAP offline-store file

Note:
16KB buffer looks to be used in writing to Disk Cache for mail data of IMAP folder of offline-use=off. What is reason why different buffer size is used for local folderfile/offline-store(4KB) and for Disk Cache for IMAP mail data(16KB)?
I read this thread for some time, and I stil asking myself: Why You did not write to disk in bigger chunks? 
It always is faster to use 16kb buffer instead of 4kb one... And there is plenty ofRAM to use.

Did sticking to small buffers give any profits?
(In reply to comment #103)
> What is reason why different buffer size is used for local
> folderfile/offline-store(4KB) and for Disk Cache for IMAP mail data(16KB)?
The Disk Cache is part of Necko rather than MailNews Core. It may well be a good idea to copy Necko's buffer size, but I'll let bienvenu make that call.
Comment on attachment 437466 [details] [diff] [review]
fix mac bustage

sr=Standard8.

Given where we are with soon to be releasing b2, I think we should stick with 4k for now as that's what we know TB 2 used and hence performance will be restored to a similar level.

If we feel we want to investigate a larger buffer size and consider the implications of that, then I think we should look at that in a separate bug.
Attachment #437466 - Flags: superreview?(bugzilla) → superreview+
(In reply to comment #105 and comment #106)

I've opened Bug 558528 for larger buffer size.
Before I go open a new bug for a problem we have been having on our fairly small (60+) network of TB installs, I'd like to ask the devs if they think this bug may be responsible...

We only allow IMAP access to email, and these problems are occurring on our local LAN, but the profiles are all locally stored. I'm thinking this could be related to the much less efficient network communication discovered in this bug, but don't know if this bug affects the communication with the IMAP server too.

I have been getting random complaints about - and have seen this myself - TB not fully loading an email when it is clicked on, and just getting the never ending progress indicator on the status bar. I've also seen problems where this indicator just keeps going and going on certain messages, even though the message is already loaded. This seems to only happen on HTML messages, but it is very inconsistent, and fully restarting TB generally fixes it for a while.

I've also seen occasional serious problems with certain messages with attachments, where the message will load maybe once, and attachments open fine, then all of a sudden the message size drops to 1K, and the attachments say they are corrupted when the user tries to open them. Rebuilding the index will usually fix this temporarily, but I just now had to resort to deleting both the .msf file and the local folder file, to get the message to redownload completely, and then I could open the attachments and save them to the desktop - and then the message dropped to 1K again and the attachments wouldn't open. I did it again, and it stayed fixed for a few minutes, but it wouldn't surprise me if it whacks again...

Comments? I do have one message saved as an attachment that has this problem, but I just realized it has confidential info in it (one of the attachments is a signed credit app)... I'll try to get at least one or two sample messages that exhibit the problem relatively consistently that I can upload if need be...
No, this bug has nothing to do with those symptoms. Those bugs sound like corruption of the offline store, but I thought you turned off the offline store for your users...
(In reply to comment #109)
> No, this bug has nothing to do with those symptoms. Those bugs sound like
> corruption of the offline store, but I thought you turned off the offline
> store for your users...

Sortof... I do disable the 'Keep messages for this account on this computer' option, but not before making sure all folders are selected for offline use - which is *supposed* to result in 'on-demand offline storage' (the discussion of this behavior occurred in another bug that I think you were involved in). There is definitely one (or more) bug(s) causing the issues I just described. It isn't rampant or anything, but I get enough complaints to know that there is something going on.

And like I said, I see myself the never ending blue progress bar on the status bar sometimes.

One other symptom - whenever this happens, closing TB will *not* fully quit TB - task manager shows TB still running, and yes, I've waited minutes to see if it would close. Reopening TB without killing the process results in the same symptoms continuing to occur. The only way to make them stop is to kill the process - or use the File > Restart Thunderbird menu choice (all have the MR Tech Toolkit extension installed).

anyway, thanks for confirming that this bug (or bugs) aren't related to this one. Now I've just got to try to nail it down enough to be at least semi-reproducible...
blocking-thunderbird3.1: rc1+ → beta2+
Whiteboard: [gs] → [gs][bienvenu to land today]
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [gs][bienvenu to land today] → [gs]
i've notified the users in http://gsfn.us/t/nw16 they can test the 3.1 builds. Hopefully we'll get good reports :)
In my case with nightly builds the slowness seems to be fixed  :-)
Assignee: neil → bienvenu
David - this is your fix with a lot of the small file or low frequency i/o changes removed, so we still get the win of the writing folders, but without all the risk. Thoughts on it for 3.0?
Attachment #443896 - Flags: review?(bienvenu)
I think the riskier parts are the parts in this patch. I'm inclined to have those users try 3.1 b2 instead.
Attachment #443896 - Flags: review?(bienvenu) → review+
(In reply to comment #114)
> Created an attachment (id=443896) [details]
> Possible reduced fix for 3.0

Checked in: http://hg.mozilla.org/releases/comm-1.9.1/rev/afd00a357640

This will be in tomorrow's nightly builds for 3.0.5pre.

(In reply to comment #115)
> I think the riskier parts are the parts in this patch. I'm inclined to have
> those users try 3.1 b2 instead.

I'm planning a slightly longer beta/preview cycle for the 3.0.5 release, so hopefully we'll catch any issues there (plus beta 2 is already out).
blocking-thunderbird3.0: ? → -
(In reply to comment #94)
> The same time for a 1.5 Mo attachment as a 10Mo attachment? If those were the
> same kind of attachments, that implies most of the time is not spent loading
> the message from disk...

FYI, this slowness on mail loading is gone using Bienvenu's 28/04 build for bug #545126 - so I was affected by both bugs. Thank you all.
As this is now fixed in v3.1 and v3.0.5 per numerous reports, I have closed http://gsfn.us/t/nw16 as solved
Whiteboard: [gs] → [gs][gssolved]
As bug 539389 is considered fixed and I encounter a similar bug with account located on a Windows shared drive with TB 3.1.2. So I open bug 589798.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: