10.96 KB, image/gif
1.27 KB, patch
|Details | Diff | Splinter Review|
2.46 KB, patch
Scott MacGregor: superreview+
Judson Valeski: approval+
|Details | Diff | Splinter Review|
Solaris 2.8 with recommended PatchCluster 1 Problem observed since 0.9.9, continues to rc1: When a bunch of mail (say, >5) is drawn from pop-server, mail gets in very slowly (about 10 and more seconds for every mail, regardless the length). During the download, high network activity and collisions are observed. nfsstat shows many readdirs. The browser is completely blocked until download has finished. Mail download was fast in 0.9.8 (but ran without PatchCluster 1). This should be resolved for final Mozilla 1.0.
Sorry, platform is Sun Sparc, not PC.
cc'ing one of the sun guys.
Hi, grabow, I tested on SunOS pingpong 5.9 s81_51 sun4u sparc SUNW,Sun-Blade-100 and didn't see what you said. I also tested on Redhat Linux 7.1. Can you give out more clues about your enviroment, your operations to reproduce this bug? You may test some very simple mails, then test some very complex mails and then test the mix of them, see what happens. Thanks. Henry
Take a look at Bug 140893. Seems to have the same problem under MacOS 8.6 and OpenVMS.
*** Bug 140893 has been marked as a duplicate of this bug. ***
Downloading 508 pop3 msgs on Mac 9.2 takes about 90 seconds which is 1.5 times slower than linux/win2k. I am wondering if it has always been like this before also. cc stephend who does perf, but I don't think we measure pop3 msg download speed.
Created attachment 81979 [details] POP retrieving performance (MacOS 8.6) Comm4.7/Moz1.0rc2 The same mail msg (12MB) was used. Moz seems to stall every 5 secs and the download throughput never go over 47k/s when Comm4.7 gets constant throughput of ~275k/s. Under MacOS 8.6 with Mozilla/5.0 (Macintosh; U; PPC; en-US; rv:1.0rc2) Gecko/20020501. I use ipop3d as server under LinuxPPC.
Bug 140893 is a duplicate of this bug. To resolve that bug, it was suggested to me to increase the send buffer size of the POP server. I did so, and set it to 65536 byte (64kB). The next morning I had about 10 mail messages of just a little less then 1 MB a piece. Retrieving these messages went fast !! Can it be that there is a kind of handshaking problem over the buffersize to use between Mozilla and the POP server ? (or another handshaking problem ?)
I tried with other POP servers (as Sympatico.ca) on large emails and I got the same result: Stall and low throughput. Change the send buffer size can be a workaround... but we have clearly a problem with Moz here.
Hi Henry (comment #3), here's my environment: Sun Ultra 5 Sparc, POP server on an intranet over 10Mbit Ethernet, /home directory NFS mounted over that net. As a first workaround I moved the cache to local /tmp. This helped a bit against total stall, but does by far not reach the short download times we had prior to 0.9.9 and are not comparative to any other browser. The browser itself is irresponsive during the download of one large mail (say, a few 10th kb and beyond) and gets a short time slice to recover while turning to the next mail to download. Especially interesting is the huge amount of collisions, which indicate that there's a lot of interaction underway. A shortened receiving buffer could trigger this problem. Something essential on the communication must have been changed since 0.9.8. I also saw one guy reporting the same problem with an IMAP server on Mac OS last week (but cannot find the bug number...), so this may not be POP specific, but some problem introduced on an upper or lower communication layer.
As mentioned before I reported this same problem in bug 140893. My configuration on OpenVMS 7.3 is very simple. I'm running SMTP, POP and Mozilla all on my workstation. Mozilla's mail accounts point to Local Host (127.0.0.1), so no actual network traffic takes place. It is all done in memory. The MTU for this pseudo interface is 4096 byte. But of course the TCPIP stack is used by POP and Mozilla, so we are talking about real IP traffic ! Furthermore there is hardly any 'real' network traffic. The CPU is running at about 1%, and the buffered IO rate (includes network) is 3 - 5 IO's per second on a scale of 500. In short, this system is doing absolutely nothing most of the time. :-) Despite all that I had very poor mail retrieval speed before I changed the send buffer of POP to 64kB. I think we can conclude from this that there is a serious problem somewhere in Mozilla Mail (and News ??).
I'll look into it more. thanks for good comments that might be helpful to track the problem.
I am seeing slowdown in recent builds on mac. It is ok in 0.9.9 release. It would be nice to narrow down when this started happening. Any help would be greatly appreciated.
For now I can say that it had begun before 2002042505 Branch... build that reproduces the exact behavior shown in my previous attachment. A bug prevent me to test older builds in the nightly directory under MacOS 8.6.
This started happening right after I checked in the fix for bug 122361 - 04/05. I looked at the patch that was checked in and found out m_outFileStream->seek(PR_SEEK_END, 0) is taking a long time on mac, if I comment them (2 instances) out I think we get back to before the fix. If I do check for end-of-file before seeking it still improves performance but not as much. I wonder why it affects only mac and other platforms, doesn't affect linux and win32 (as I tested earlier) conrad, do you know what could be the problem here.
I'll take a look.
Since the problem also exists with OpenVMS, may I point you to the effect that increasing the pop send buffer size had in my setup ? I'm not suggesting that this is the solution to the problem (it clearly isn't), but if we can figure out why this change makes the retrieval so much faster, it may lead us to the cause of the problem.
Back to my original Sun problem, I noted by accidence that downloading is as fast as it has been before when starting Mozilla on the machine where the Mail directory lives. This again seems to hint to a buffer problem, here maybe something related to the NFS buffers; see also comment #8.
Looked at seek(). It's incredibly expensive. http://lxr.mozilla.org/mozilla/source/xpcom/io/nsIFileStream.cpp#330 1. Since it does a Flush() for each call, it hits the disk whenever there is anything in the buffer (always after the first time). 2. PR_Available() internally does 2 PR_Seeks so we have a total of 4 PR_Seeks for this operation. 3. No checks are done for seeking to the current position so, in this case, all this work is for naught. 4. On the lowest Mac-specific level, PR_Seek has a similar lack of optimization. Unless both FileImpl::Seek and PR_Seek can be significantly improved, avoid the call to seek() is possible.
conrad, Thanks for seek() info. how costly is checking eof()? I will have to do that atleast in one place to avoid seeking every time.
Created attachment 83122 [details] [diff] [review] possible fix 1 This fix helps us to gain back some performance loss but it still does not get back to 0.9.9.
Created attachment 83371 [details] [diff] [review] proposed fix This will get back us to the performance as it was before fixing bug 122361. We now seek only when we have to -> While downloading new pop3 mail, if the header flags are changed, then only it causes a seek because we are sharing folder stream between pop3 code and mailDatabase code, otherwise not.
David, can you review the last patch(proposed fix). This has to do with sharing folder stream between mailDatabase and pop3 code. This was mainly done to avoid checking for and/or seeking un-necessarily >Unless both FileImpl::Seek and PR_Seek can be significantly improved, avoid the >call to seek() is possible.
It looks like you're seeking to the end of the file whenever we update the folder flag if m_folderStream is set. That seems to be based on some sort of very odd and hitherto unknown contract - that if m_folderStream is set, we must be getting new pop3 mail. Is that the case? Otherwise, we'd be slowing down other operations by always seeking to the end of the file when updating flags. For example, when deleting a bunch of messages, it looks to me like now we'll seek to the end of the file after updating each message, which will slow that down greatly.
m_ownFolderStream is false only when we are sharing folder stream between pop3 and mailDatabase and this happens only when we are downloading msgs. So it will not slowdown deletion of messages or reading msgs (updating flags) unless you are doing this while you are downloading msgs.
Is that true? I thought we owned the folder stream if we created it, not if we're not doing get new mail. For example, if I move a bunch of messages from my inbox to another folder, and while that's going on, I press get new mail, won't it be the case that the db owns the folder stream and pop3 will share it?
Comment #25 is what we have right now. I agree it is like a contract but I didn't see any other way to restore the performance back to what it was originally. In your example we don't need to share any folder stream because we just do batching when we are actually deleting msgs, so we cannot download msgs if we are deleting msgs from Inbox. I mean there is no scope of sharing folder stream the other way. BTW, I don't see you on AIM.
I am comfortable with possible fix 1 also. we can have another bug to fix seek on mac to restore the performance fully.
I'd much prefer the first fix, iff checking for eof wasn't also slow (I don't know if it is or not - we'd need to ask) - also, in the first fix, are you sure you can remove the seek to the end after publish msg hdr? That does't seem right. And the real right fix is to make seeking to the end cheap when we're already at the end. I don't know if that's hard or not in the file stream or NSPR code. If it was a simple matter of checking eof, I would have thought they would have done it. fix 2 seems fragile and just asking for other problems. We're shuffling around the file ptr to make the current caller and code happy but could cause lots of other problems down the road. Another possibility is to use Tell to make sure the file pos hasn't changed unexpectedly.
possible fix 1 does the right thing, we don't need to seek to end after publishing msg header, because all the writing to mailbox takes place in WriteLineToMailbox and we just have to make sure we are at the EOF there.
Acutally checking for eof is also slow, that is what I found with my tests, but it is better than what we have now.
It seems like checking eof shouldn't be nearly as bad as what's happening now, and I think your tests showed that, right? And the removing of the seek after the publishmsghdr code is really a separate change in its own right, isn't it? It could be removed by itself; it doesn't depend on the other changes in either of these patches? That seems true to me, and should help a little. I'll put an r= on the first fix, and if we want to explore better fixes, we can keep going.
Comment on attachment 83122 [details] [diff] [review] possible fix 1 r=bienvenu
IMO I don't think that a half-temporary-fix is the way to go. I receive and send lot of big emails and I still use Comm4.7 for my mail because Moz isn't enought efficient on POP and SMTP transferts (bug 88489)...
Created attachment 83761 [details] [diff] [review] proposed fix, v2 So this patch gets rid of the contract I had in the last patch. We save the offset of m_folderStream before using in mailDB code and restore it back after using it (if we are not the owner of m_folderStream). This is good because we will not lose any performance and this is something general in the sense that if we are not the owner of the folderStream we should not tinker with state, and if we do we should restore it back. david, can you review ? thx.
Comment on attachment 83761 [details] [diff] [review] proposed fix, v2 I like this approach better. However, do we need to have methods and a member variable to save and restore the folder stream pos? Couldn't this just be handled with a local variable, since it's all in one synchronous method (UpdateFolderFlag()?) At least I think that's the case, w/o seeing all the context from the diff. I'm also not 100% sure that "owning the folder stream" is equivalent to "no one else is using the folder stream". Perhaps having a member variable that says "sharing folder stream" would be more accurate. Shared folder streams would require this seeking...
Created attachment 83777 [details] [diff] [review] patch with changes, making it local var I was just trying to make it generic, missed out that both the calls were in the same function. I have changed it to local var. please review.
Comment on attachment 83777 [details] [diff] [review] patch with changes, making it local var r=bienvenu - I'm not convinced that there aren't problems where the db is in batching mode, and thus has a stream, while a get new mail is done, but you're fairly confident that's not the case. I would suggest, however, putting an assertion in nsMailDatabase::SetFolderStream() that m_folderStream is null whenver this is called, and verifying that doing a get new mail while moving lots of messages from the inbox doesn't trigger the assertion.Does that make sense?
I added this assertion to SetFolderStream NS_ASSERTION(!m_folderStream || !aFileStream, "m_folderStream is not null and we are assigning a non null stream to it"); Both should be non-null to assert and verified we are not hitting this assertion while we are moving mail and do GetMsg()
Comment on attachment 83777 [details] [diff] [review] patch with changes, making it local var sr=mscott
This have added a new warning on brad TBox: +mailnews/db/msgdb/src/nsMailDatabase.cpp:335 + `PRInt32 folderStreamPos' might be used uninitialized in this function
ok, I will fix it.
Sheelar, could you verify this on the trunk? Navin, what areas of the product need to be tested around to make sure this doesn't break anything?
This fix is dependent on the fix in bug 135275. If we want to take this we have to take that.
Here are the results of download timings before and after patch on just Mac OS 9.1: Trunk build:2002-05-16-03 1.Create new profile with pop account 2.Change accounts and settings to leave mssgs on the server 3.Start download of 97 messages of 1MB size for the account Time taken: 27 seconds 4.Compose and send a message with an attachment of 9.3MB size Time taken to download this single message with a large attachment:8min 30sec Trunk build: 2002-05-21-03 after the patch on commmercial trunk builds. 1.Create new profile with pop account 2.Change accounts and settings to leave mssgs on the server 3.Start download of 97 messages of 1MB size for the account Time taken: 5 seconds 4.Compose and send a message with an attachment of 9.3MB size Time taken to download this single message with a large attachment:8min 30sec As per the results above it is very clear that there is imporved performance with the patch.
Branch landing scheduled? On truck it works great.
marking this bug as verified since it has been verified on the trunk as it is so far fixed on the commercial trunk.
adt1.0.1+ (on ADT's behalf) for checkin to the 1.0 branch, pending Driver's approval.
please checkin to the 1.0.1 branch ASAP. once there, remove the mozilla1.0.1+ keyword and add the fixed1.0.1 keyword.
this will take me a while since the last patch doesn't apply since it's not really one diff.
fix checked into 1.01 branch
verified the time taken to download 39.1MB pop inbox before and after fix. 05-16 trunk build on mac os9.1 =18 minutes 06-12-08 1.0 branch build on mac os 9.1 = 1 min 11 seconds. changing keyword to verified1.0.1
I just reported bug 231814, and the longer I think about it, the more I feel that this problem here has returned in 1.6. Can somebody check if the code that was fixed here has been changed again ?
yes, this fix was definitely removed in order to fix mailbox corruption problems. If we don't seek to the end, we can end up in situations where we write mail into the middle of the file instead of the end...I'll see if there's some other way to avoid this...