Closed Bug 102797 Opened 23 years ago Closed 23 years ago

Performance degradation downloading IMAP headers OS X

Categories

(MailNews Core :: Networking: IMAP, defect, P2)

PowerPC
macOS

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: hong_bugmail, Assigned: sfraser_bugs)

References

Details

(Keywords: perf, Whiteboard: [adt2 RTM] [driver:scc] [ETA 08/08])

Attachments

(2 files)

Downloading mail headers is much slower post the NSPR fix for dual-cpu machines (http://bugzilla.mozilla.org/show_bug.cgi?id=71718). Some results: ~30 seconds to download 2200 msg headers on a debug build from approximately 2 weeks ago (circa 9/15), running on G4/733 5 minutes to download 2200 msg headers on 10/01 trunk build, running on G4/733 5 minutes to download 3400 msg headers on 10/02 commercial branch build, running on dual G4/500. Steps to reproduce 1) launch browser 2) create new mail account (with IMAP inbox > 2000 msgs) 3) get msgs Expected Results Should take about 30 seconds to discover folders and download all headers. Actual Results Takes about 5 minutes to discover folders and download all headers.
Keywords: perf
QA Contact: huang → stephend
stephen, what are your findings? Can you attach your mail perf results from the other bug to this bug report? cc: scott as a fyi.
Odd. It took me about 40 seconds to pull down a 6000 msg IMAP folder for the first time, on my 500MHz TiBook. 5 minutes seems extreme.
Here are my results. Note that Mail Module Load Time is a 1000 message folder, and is IMAP, whereas Folder Loading Speed, where we want to measure only the APIs, and not the network/IMAP code, is Local. http://bugzilla.mozilla.org/attachment.cgi?id=51005&action=view
That same 6000 mgs folder took 23 seconds on a dual-533.
Odd... I just tested another inbox with 8200 msgs and it took about a minute. So something else is up here. I'll do more investigation.
Ok, after checking with lrg and sdagley our theory is that there is interaction with 4.x at work here. It seems that at the time both lrg and sdagley saw the slow downloading of headers (10x slower) they also had an open instance of Communicator that was open to their Inbox and presumably had open IMAP connections. Closing Communicator caused their msg downloads to speed up back to normal levels. Simon, do you run with Communicator open at the same time? This isn't immediately reproduceable just by opening Communicator, unfortunately -- it appears that you have to be using Communicator for mail for a long time before this shows up. We're trying to figure out if this change occurred AFTER Simon's checkin or whether it was before. In any case, reducing the severity of this bug to normal -- it shouldn't affect too many people.
Severity: major → normal
It would not surprise me if IMAP had some kind of folder locking that will impact performance if > 1 client is looking at a folder at the same time. I was not testing this scenario.
Status: NEW → ASSIGNED
our clients have no idea if another client is looking at the same folder The server knows, of course, but I don't think it really cares very much. We do do occasional expunges if you delete lots of messages, and that can cause the server to send unsolicited responses to the client, but that's a pretty borderline case. I don't believe 4.x and Mozilla could have any interaction, especially since their profile directories are different.
Assigning to random milestone.
Target Milestone: --- → mozilla1.2alpha
We have been seeing this problem on our MacOS X systems and have done some testing. Both cases are using UW IMAP on a Tru64 UNIX cluster and are downloading 1,147 headers: Netscape 4.78 32 seconds Mozilla 1.1a+ 4 minutes 51 seconds I have checked several times and the results are consistent. In looking through the IMAP logs I see a distinct difference between the Netscape interaction and the Mozilla interaction. In both cases, there is a UID FETCH of a list of messages followed by an OK from the server. In the Netscape case, one second later the client issues another UID FETCH of the next list of messages. In the Mozilla case, the next UID FETCH doesn't occur for 50 seconds! In each case, I made sure that only one copy of the imapd process was running. I suspect that this is strongly related to bug 139802, and may be the same problem cropping up in a different circumstance -- in this case giving up the CPU too frequently during the DBM writes after loading the batch of message headers.
does the fix for bug 139802 have any effect on this problem?
No.
I looked at this a bit today. The problem is similar to that described in bug 139802, but is not improved (much) by that fix. We're basically spending time in WaitNextEvent, starving the IMAP threads of CPU time. For some reason, the WakeUpProcess in _pl_NativeNotify(), and the WakeUpProcess() in the Timer task, don't fire often enough to avoid WaitNextEvent idling. I do note that _pl_NativeNotify() is only called for the first event that's put on the event queue after previous events have been cleared out; perhaps IMAP is tardy about handling those events?
We have rerun our tests against today's build, which includes the fix for bug 139802, and there has been some progress! Two runs (to make sure) to download 1153 headers: Run #1 2:34 Run #2 2:44 this is to be compared to 4:51 before the fix to bug 119802 landed :-) and to 32 seconds for Netscape 4.78 :-( Definite progress! Room for lots more!
This performance issue seems to center around the 50ms timeout in nsImapProtocol::CreateNewLineFromSocket(): PR_Wait(m_dataAvailableMonitor, /* PR_INTERVAL_NO_TIMEOUT */ PR_MillisecondsToInterval(50)); There is something I don't understand here. That monitor only ever gets notified from the nsImapProtocol::OnDataAvailable(). However, CreateNewLineFromSocket() and OnDataAvailable() are called on the same thread, so that PR_Wait() always just times out! The OnDataAvailable() is being called in response to one of the m_eventQueue->ProcessPendingEvents() calls in that code. Something seems really broken here.
Simon, yes, you're right. When this code was written, the idea was that we would get notified of data from a different thread, but that's not the way it works anymore. Mscott might remember the history better than I do. I don't know what would happen if we would take out the wait - right now, it's basically a sleep for 50 milliseconds if there's no data available. I don't know if continually processing pending events would peg the cpu or not.
Removing that "sleep" doesn't peg the CPU, nor does it make header downloading much faster. I said above the we call WakeUpProcess() from _pl_NativeNotify (when a PLEvent is posted to a native PLevent queue. However, some of the active event queues that IMAP uses are monitored event queues, for which _pl_NativeNotify is not called. Again, however, making both event queue types call WakeUpProcess() when an event is posted doesn't make things go faster either.
ccing dougt.
Well, this was fun. After discussions with dbienvenu, and dougt, I think I finally understand the issues. IMAP runs one thread per server connection, using the socket transport thread for network I/O. It sucks data from the input stream in CreateNewLineFromSocket(), parses it, then, for each header that is downloaded, makes a number (3) of proxied calls back to the main, UI thread. These proxied calls are made synchronously, and it seems to be this proxying of calls back to the main thread that cause the performance problems. Looking more closely at how the proxied calling works we see that synchronous proxied calls (nsProxyObject::PostAndWait) happen thus: an event is posted to the event queue of the target thread (the main thread, in this case), and then we wait in a loop, pumping the event queue of the sending thread, until that event has completed. Event completion is indicated by an event posted synchronously back from the target thread to the sending thread. So the problem on Mac seems to be that synchronous proxied calls are slow, in this case. Because we call ProcessPendingEvents every once through the event loop, every proxied call involves one trip through the event loop (and the potentially variable time in WaitNextEvent that involves). I have verified that WakeUpProcess() in being called for each proxied event, but this doesn't seem to have enough of an effect on the WNE calls; those still take anywhere between 0.3 and 85ms on my machine.
thanks, Simon. As I've said before, we could make these events asynchronous for hdr downloading, *but* we'd have to be able to guarantee that the order of events was preserved - Doug, is there any such guarantee with async events? We send a start msg download, a "here are the lines" event, and an end msg download event - the imap thread doesn't have to block for these events, but we have to guarantee that the ui thread gets the events in the right order.
async events A, and B will remained ordered IF both A and B are fired from the same source thread to the same destination thread.
A note about hacking nsProxyObject::PostAndWait() to use PostSynchronousEvent() rather than the post-and-wait-loop code of now. This change appears to have performance benefits because it somehow causes the WakeUpProcess() call to work, reducing the ave. WNE time. I'm not sure why. I did some timing of downloading a 1700 message folder for the first time. Results: 1700 msg IMAP folder WaitNextEvent sleep 5 WaitNextEvent sleep 0 -------------------------------------------------------------------------------- synchronous events 120+ seconds* 6 seconds async events 30 seconds 5 seconds Hacking nsProxyObject::PostAndWait() to call PostSynchronousEvent: WaitNextEvent sleep 5 WaitNextEvent sleep 0 -------------------------------------------------------------------------------- synchronous events 29 seconds 5 seconds * indicates current conditions. The WaitNextEvent sleep times were changed in the main event loop. A sleep time of 0 causes the app to hog the CPU. So this shows that it can be really fast, even with async proxied events as they are now.
er, with _sync_ proxied events as they are now.
So this has something to do with the ordering of PLEvent handling, the WaitNextEvent call, and switching NSPR threads in our main event loop. With some reordering of our main event loop code, I can get the time down to 6 seconds. It seems that we need to call ProcessPendingEvents() *just* before the WaitNextEvent call to guarantee that any PLEvent posting that happens inside of processing PLEvents for the main event queue (and probably running other threads inside there too) effect the upcoming WNE call. So here's a patch that gives us 5-6 second loading in the testcase described above. It's rather scarey though. Index: mozilla/widget/src/mac/nsMacMessagePump.cpp =================================================================== RCS file: /cvsroot/mozilla/widget/src/mac/nsMacMessagePump.cpp,v retrieving revision 1.136 diff -b -u -4 -r1.136 nsMacMessagePump.cpp --- mozilla/widget/src/mac/nsMacMessagePump.cpp 27 Mar 2002 07:16:07 -0000 1.136 +++ mozilla/widget/src/mac/nsMacMessagePump.cpp 13 Jul 2002 02:23:16 -0000 @@ -375,8 +375,11 @@ PRBool havePendingEvent = ::EventAvail(kEventAvailMask, &tempEvent) || (!(tempEvent.modifiers & btnState) && nsToolkit::IsAppInForeground()); + if (mRunning) + Repeater::DoRepeaters(tempEvent); // event record unused + // when in the background, we don't want to chew up time processing mouse move // events, so set the mouse region to null. If we're in the fg, however, // we want every mouseMove event we can get our grubby little hands on, so set // it to a 1x1 rect. @@ -475,11 +478,8 @@ // yield to other threads ::PR_Sleep(PR_INTERVAL_NO_WAIT); } - - if (mRunning) - Repeater::DoRepeaters(*anEvent); NS_ASSERTION(ValidateDrawingState(), "Bad drawing state"); }
Scary, but pretty awesome too, Simon.
So, can this get r and sr and into the trunk after 1.1b? We are *very* interested in getting this fixed, although I certainly understand that it is too late for 1.1b.
Blocks: 130359
I don't think this is the correct patch to fix the bug; it just demonstrates that event ordering issues are central to the bug. We need to find a safer solution.
If I comment out the EventAvail call from our GetNextEvent() function, IMAP header downloading is fast. And since EventAvail() can yield time to other apps: <http://developer.apple.com/techpubs/mac/Processes/Processes-16.html> my guess is that this nullifies the effect of an earlier WakeUpProcess() call.
I have another, somewhat better patch. I've had verification that EventAvail() can indeed affect the behaviour of a previous WakeUpProcess call. And, I think we can get away with not calling EventAvail at all here. We were calling it primarily to see if there is an event pending that we care about, and thence reduce the sleep time of the WNE call. However, WNE should return promptly anyway if such an event is the next event in the queue.
Summary: Performance degradation downloading mail headers OS X → Performance degradation downloading IMAP headers OS X
This is great, Simon. What kind of a mac are you getting these times on? I'm curious how Windows and Linux compare for a similar sized folder and cpu speed. I can't really do any meaningful tests over cable modem.
7.3 seconds to d/l 1738 headers on a dual G4/800 machine. Mean pageload time is 1027 ms (which is slightly up from my normal times). Cert auth runs fast. This patch kicks ass.
did you notice improvement in other part of the application?
I'd think that loading large imap messages might speed up.
Comment on attachment 93636 [details] [diff] [review] nsMacMessagePump patch to remove EventAvail call r=sdagley (Yeehah!)
Attachment #93636 - Flags: review+
We should test the IMAP/SSL issue with this patch. I'm not in a position to test it myself.
Okay, on Windows 2000, with a 1.4ghz / 512MB RAM machine, we take 4 seconds 58 milliseconds to download 2000 IMAP headers from judge.netscape.com over a LAN connection. (Switched 10/100).
Whiteboard: [driver:scc]
David, how do we test ssl imap?
Is this a trick question? Assuming, not, edit the server settings for your imap server in mail&news account settings, click the ssl box.
But when the ssl option is on, can the server refuse to do ssl or are you we sure we really have a secure IMAP?
I am using: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.1b) Gecko/20020805 and I am still having the problem with SSL/IMAP, bug #130359.
Dannie, did you apply the patch by hand? Nothing has been checked in, so if you haven't applied the patch, that's not really relevant to this patch.
My apologies. I have not yet applied the patch. I am still a bugzilla newby, sorry. If I get some time I will try to apply the patch. Can someone shoot me a private message with a brief overview or link to doc on Mozilla OSX development. I have compiled my share of Linux/Unix XF86/Gnome/KDE apps but I have recently switched to OSX and am not sure how to begin.
Blocks: 143047
Severity: normal → major
Keywords: nsbeta1+
Priority: -- → P2
Whiteboard: [driver:scc] → [adt2 RTM] [driver:scc] [ETA 08/08]
Comment on attachment 93636 [details] [diff] [review] nsMacMessagePump patch to remove EventAvail call sr=scc
Attachment #93636 - Flags: superreview+
Suggest we get this landed on the trunk asap, so that we can evaluate it for the 1.0 branch.
Patch checked in on the TRUNK. Do I need to do some keyword fumbling?
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
nope, the keyword you need is already there. stephend: can you pls verify this as fixed in tomorrow's trunk builds? thanks!
stephend is away on holiday. Maybe hong can confirm as the original reporter, or someone familiar with the mailnews IMAP tests.
I've just tested IMAP/SSL with today's build (2002080703). I believe the patch is included (indeed Sampler.app no longer shows any calls to EventAvail), but alas, it looks like it wasn't enough to fix bug 130359. Moz still stalls on fetching flags for my inbox, and on fetching bodystructure for the test message of bug 130359 (and it still crashes, with the same stack trace, if I stop the latter operation). Should I post the new call graphs?
I didn't expect this fix to fix the IMAP/SSL issues (which I also see on Mac OS 9, btw). Let's just look at IMAP performance here, and keep the IMAP/SSL stuff in bug 130359.
This may not have fixed the SSL bug 130359, but loading the headers is MUCH faster than previous builds! Previously I was seeing about 10/sec, now I am seeing 50-100/sec. (I still often get a hang at the "end" of loading the headers - at "Receiving Message Headers 150 of 150" for example - but I don't know if that is related to bug 130359 or not. But this is easily recoverable with a "Stop" and "Get" again.)
Note to verifyers: the patch changes core Mac event handling code, for both OS 9 and OS X, so good verification should cover a range of tests on both platforms: 1. IMAP perf 2. Pageload perf 3. Certificate authentication performance 4. FTP, http, https 5. Scrolling performance, click-handling, plugin event handling.
[Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.1b) Gecko/20020807] * 513MB RAM, 800 MHz * 3 pane view * IMAP/SSL I am running the lastest nightly and noticed an abnormally long pause when attempting to load the contents of mail messages. Specifically when I click on a message that has already been downloaded, supposedly (i.e. I have viewed it before), it des not change the display pane for an amount of time proportional to the size of the message. So in my tests 1KB messages loaded nearly instantaneously whereas a 76KB took 12.5 seconds to load. Another interesting note, but perhaps premature, I notice that when the SSL problem seems to occur (bug 130359) it is now able to recover. Whereas before I was forced to kill Mozilla completely before I could use mail again. Combined with the other new oddities it is difficult to prove this given the nature of the bug, and difficulty reproducing it. Also on newly downloaded messages there is occasionally an abnormal wait period of 6-8 seconds at which time when the message is finished downloading, presumably, the selection highlight jumps to a previous message. Then I can reselect the new message to display it. Sorry for the verbosity of this message I hope my info in helpful.
Dan, are you saying that you didn't see this abnormal delay displaying a message you'd read previously, before the latest nightly build? You do mean in the same session, I assume. If you haven't configured the folder for offline use, then the message should be coming from the memory cache and not the imap server (or the offline store). If Simon's patch somehow affects performance reading from the memory cache, we'd see that in other places, I'd think...
Dan: please also test before/after builds with SSL turned off.
> Dan, are you saying that you didn't see this abnormal delay displaying a > message you'd read previously, before the latest nightly build? This is true. I have not seen this problem with 1.0, 1.1a, 1.1b. And previous to today I was running yesterdays nightly build without these problems. > You do mean in the same session, I assume. Yep. > If you haven't configured the folder for offline use, then the message should > be coming from the memory cache and not the imap server (or the offline > store). In my server settings I have "Make the messages in my Inbox available when I am working offline" checked. I can uncheck this if I should for testing. > If Simon's patch somehow affects performance reading from the memory cache, > we'd see that in other places, I'd think... So far I haven't noticed any other oddities. Browsing seems to be normal.
yes, you should try unchecking that check box, and try with a message received after you uncheck that box. That checkbox makes it so we store and read the message to/from the local offline cache (which is just a berkeley mailbox, that we read like we would a local/pop3 message). I'm not sure why Simon's patch would make this slower.
I tested both MacOS 9 and OS X with several older builds, and the latest trunk build (commercial). I ran the following tests: 1) pageload 2) pageload w/ SSL 3) IMAP download of 5887 headers from my local netscape mail acct I also informally tested ftp and cert auth (downloaded builds, got a cert and used it to log into a few internal sites) OS 9 7/29 8/7 1. pageload 1090 1097 2. ssl pageload 1125 1128 3. IMAP 5887 hdr 86 sec 32 sec OS X 7/29 8/5 7/8/01 1. pageload 1219 1216 2. ssl pageload 1290 1291 3. IMAP 5887 hdr 899 sec 470 sec 18 sec Just as a sanity check, I compared it to a build predating the NSPR fix. The only build I have circa that date is 6.1 Preview which is dated at 7/28/01 (predating the NSPR checkin). On MacOSX and 6.1 Preview release, to download the same 5887 header inbox, it takes 18 seconds. I would put that in blinking bold italics if I could. I would be curious to try this on your machine, Simon. To sum up my test results: 1) IMAP header downloading seems to be about twice as fast as it used to be on both MacOS 9 and OS X. 2) MacOS X is still an order of magnitude slower than OS 9. 3) At some point in the past, MacOS X header downloading was *faster* than OS9. 4) All other functionality seems to be unaffectd by this patch. I'm hesitant to mark this VERIFIED FIXED unless my results are account specific/machine specific.
Used 08-05-08 & 08-07-08 Mac OSX trunk builds IMAP without SSL: Before the fix (08-05-08 trunk): It takes 2 minutes 30 seconds for downloading IMAP Inbox 1000 msgs headers on 08-05 trunk build, run on G4/450Mhz After the fix (08-07-08 trunk): It takes 5~10 seconds for downloading IMAP Inbox 1000 msgs headers on today's 08-07-08-trunk build, run on G4/450Mhz
Hong: you used an 8/5 build on OS X, which does not have the fix. The fix is in 8/7.
Updating results from another G4/500Mhz. Downloading time of 7995 headers without SSL-> 08/05 trunk build->19 minutes. Downloading time of the same messages without SSL->08/07 trunk build->34 seconds.
(surreptitiously tucking the crack back into his pocket) I tried the 8/7 build and my nearly 6000 or so headers downloaded in about 25 seconds. Beautiful. I also ran pageload and the results are slightly slower (1248 vs 1211) than the older build. This appears consistent with Simon's results, and I think an acceptable price to pay for such a tremendous speed improvement in IMAP. I'd say we can mark this verified on the trunk, if there are no objections.
I am marking as verified based on hong, I and Sheela's verification. If you are curious about IMAP SSL side, see bug 130359 and bug 149864 updated info there.
Status: RESOLVED → VERIFIED
adt1.0.1+ (on ADT's behalf) approval for the checkin to the 1.0 branch, pending Drivers' approval. pls check this in asap, the replace "fixed1.0.1" with "verified1.0.1". thanks!
Keywords: adt1.0.1adt1.0.1+
Target Milestone: mozilla1.2alpha → mozilla1.0.1
Attachment #93636 - Flags: approval+
please checkin to the MOZILLA_1_0_BRANCH branch. once there, remove the "mozilla1.0.1+" keyword and add the "fixed1.0.1" keyword.
Checked into the MOZILLA_1_0_BRANCH.
Hong, can you verify on the branch? Thanks.
Tested on today's branch build: old 1.0 8/9 1.0 8/7 trunk OS X IMAP 6010 hdr 480 sec 76 sec 25 sec OS 9 IMAP 6010 hdr 30 sec 25 sec 32 sec It's curious that the trunk build is roughly 3 times faster than the branch build on MacOS X. I expect that there are a lot of differences between the 1.0 and the trunk build that would contribute to this difference. Still, I think this is worth verifying on branch as well. Any keyword magic need to be performed?
Hong, keywords should be changed from "fixed1.0.1" to "verified1.0.1" for adt branch build verification.
That trunk/branch difference is curious. I'm not sure why that is. Maybe other IMAP changes?
Heh, I'd love to take credit for an improvement like that on the trunk, but given that it seems to be OS/X-specific, I suspect something else. Oh, wait, maybe I can take credit for it - I fixed the trunk to only download the imap hdrs we need for the db, instead of all the msg rfc822 hdrs, so there should be less than half as much data coming over the wire (I would have loved to have gotten this fix on the branch, but it took a very long time for the fix to get verified, and by that time, the ADT was not interested.) Not sure why it should be so dramatic on OS/X, though.
I can explain the OSX/OS9 difference. On OS X, we put the WakeUpProcess() calls into the NSPR threading code, and the PLEvent code, to keep the process active. We didn't do this on 9 because 9 doesn't appear to need it. The reason is that WaitNextEvent on 9 can be called with a long sleep time, but the OS will usually send the application a null event before that sleep time is up (depending on what other apps are doing), because applications are scheduled in a cooperative, round-robin fashion. On OS X, however, the OS will sleep your process for the full length of the sleep time. Thus the WakeUpProces calls were added just for OS X to keep things moving. We could probably have them called on 9 too.
*** Bug 140536 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: