Closed Bug 17065 Opened 25 years ago Closed 25 years ago

[DOGFOOD] Messenger stalls opening IMAP Inbox

Categories

(MailNews Core :: Networking, defect, P3)

x86
Linux

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: trudelle, Assigned: mscott)

References

Details

(Whiteboard: [PDT+] Verified)

Today's opt bits (and last few days too) Using existing profile/account, or creating all new ones. Open Messenger Select Inbox throbber animates, status > Receiving message headers ## of ### ## increases to a few dozen, then stalls. I left it overnight, and it never got any further, nor timed out.
Assignee: phil → mscott
Summary: Messenger stalls opening IMAP Inbox → [DOGFOOD] Messenger stalls opening IMAP Inbox
Reassign to mscott, cc alecf, nominate for dogfood
I saw this too earlier this week and I thought I fixed it with some changes that went in on Wednesday. But Peter's reporting this problem on today's build so the deadlock situation must still be there. I'll take another look.
Unless I'm not installing correctly. Build ID = 1999102208. That's today, right?
Using a build from Saturday afternoon, I was finally able to trigger this hang again.
I think I am seeing the same thing. Here is my sequence: 1. launch today's build (much faster startup) to browswer window 2. select Messenger 3. Double click account name to get tree hierarchy 4. select inbox 5. everything slows way down, CPU usage at 100% on NT, othe apps. don't respond 6. then password dialog comes up (I don't save password) 7. eventually I can enter text in the password dialog but it very, very slow respond to keyboard input. Hit return 8. Stuff happens very still crawling, about 1-2 minutes later inbox comes up and CPU usage is back to normal. 9. Windows NT, 128MB RAM
Hey Dave, I think your problem isn't necessarily a hang of the system but a problem with imap performance which has regressed horribly over the weekend. I'm also seeing huge delays when trying to bring up modal dialogs and doing other things. We're tracking that in Bug # 17062. My first guess is to say that problem is related to all the event queue changes danm has been making recently but I need to debug it further before reaching that conclusion.
What I'm seeing is Linux-only, doesn't slow anything down, and happens after the password dialog. It doesn't hang or freeze, just stalls that window while retrieving headers.
we talked to daver, his comments only relate to the rather slow IMAP performance as of late. This bug is still the "Hanging IMAP on linux" bug...
Let me just add, me too, to this bug. It looks like you've seen this, but if you need a machine to reproduce this on, my Linux machine is doing this all of the time.
Whiteboard: [PDT+]
Still happening every day for me, using the opt bits. Don't know if this is related, but on the Mac, selecting the Inbox today also stalls, doesn't even get to the authentication.
I've been trying to fix this today but every time I select an imap folder, I'm crashing off in some gtk code that I'm not familiar with. I caught a bad checkin in mid pull.
Today's Linux opt bits segfault on opening folder.
Yeah it turns out the crash I was seeing last night that was preventing me from debugging this was the same one keeping the tree closed this morning. I just thought there was something in my tree that was out of date for some reason. Apparently not. The crash is in some gtk, glib code that I know nothing about. I'm trying to find an expert on that stuff now. For more info, see Bug #17352
Pavlov is the module owner for GTK.
Component: Back End → Networking-Mail
Adding dougt to the cc list for advice. When I am able to reproduce this hang (or maybe it is another hang altogether), here's what I'm seeing: 1) The imap thread has made a call to the UI thread through a proxied interface. So the thread is now in the proxy code blocking on the response from the UI thread and waiting for the response. It's spinning in the code that pumps events through the nested event queues (i.e. is it process pending events, then sleep, then process move events) 2) the UI thread doesn't appear to ever process the event generated by the imap code. On my debug build, the UI thread has a stack trace that looks like: calling nsAppShell::Run, g_main_run(), g_main_iterate(), g_hook_next_valid() My guess is that our event somehow got lost and so the proxy call never returns with an answer. Hence the imap thread stalls and we never finish parsing the folder. Note: I had to jump through a few hoops to reproduce this problem. I had to delete all my .msf files. Then I opened my inbox. displayed a message. Opened another folder. Then opened my trash folder which was 800 msgs. I would hang just about every time trying to parse my trash folder. So in my example, we would have the UI thread, 2 imap threads (inbox, and another folder) which were just waiting on a url to run and a 3rd thread which was the imap thread trying to open the trash folder. Hmmm.
Target Milestone: M11
I've been playing around with this again this morning and I'm still seeing the same behavior. I'm now in a position where I don't have to jump through hoops. Just opening my trash folder is causing the imap thread to hang for the reasons stated above. We're blocked waiting for a proxied call to return. But the event is never reaching the UI thread. It's getting lost along the way. I think I'm going to need dougt or danm's help on this puppy.
Status: NEW → ASSIGNED
I've set printfs in two places: 1) where the imap thread calls the proxied object 2) in the ui thread, a breakpoint inside the real object Sure enough, in the case where the imap thread hangs, I see the printf saying we are now calling into the proxied object. And there is no matching printf in the real object. So the UI thread is never processing our event. I'd re-assign to dougt but I think he's going to need me to help debug this down further...
Seems bad enough to hold M11
Doug, this problem may be related to some strange stuff I'm seeing with how linux is processing events. I just posted an email thread asking for help to the xpcom and unix groups. Basically, I'm seeing the UI thread taking events out of the imap thread and processing them. This is leading to bad things for imap because we are executing code in the wrong thread!! I can't tell for sure if that problem is also causing us to block later on when we make a call through a proxy object.
this is no longer an M11 stopper, is it? We (mscott and I) checked in some code last night to get around the problem.
I haven't seen this specific problem in 3 days, but now it is taking about 20 seconds to load even the smallest plain text messages. Is that the 'call through proxy object' block mentioned above?
possibly, but it's also possibly just the general slowness painting on Linux - the meteors run when loading a message because the web shell starts it, and running the meteors really slows things down.
I think this is definetly still a M-11 stopper. What we did last night was pretty horrible for a work around (at least the part about Clearing the ODA flag) and could easily be contributing to the message display performance degradation that Peter is seeing in today's build. I've tried asking for help in the newsgroups but no one's answered. I may have to read up on glib and gtk to figure out why this code is misbehaving.
No, it isn't just general degradation. My.netscape.com starts to appear in less than 5 seconds, and finishes in 15 seconds on the same machine. There is something very wrong with message display time.
do you mean imap msg display, or msg display in general? How long does a small news message or local message take to display?
Blocks: 17907
POP message takes about as long. I can't create a news account, setup forces it into POP.
yes, that sucks. apparently, drop down combo boxes in dialogs are broken on mac and linux. Did you try using the arrow keys? I hear they work.
Cursor keys appear to work, but apparently now give the wrong result to the caller, at least in the last 2 days.
(The arrow keys used to be a workaround. Now, even after you use the arrow key to make your selection, it appears that the selection never holds. Bug 15476)
Assignee: mscott → brendan
Status: ASSIGNED → NEW
Brendan, this is the bug we talked about in the pork jockey meeting. Basically, I've verified that necko is placing events such as OnDataAvailable into our imap thread's event queue. Then when that event is processed, I've seen that the thread we're running in is the UI thread and not the imap thread. From there, strange and erratic behavior happens. In my posting to the newsgroup, you can see the stack trace where the UI thread is listening to the UI event queue in main(). And you can then see where the UI thread calls into the imap thread's event queue, asking it for an event.
I've tracked this down a bit further, if you look at my reply in the newsgroup to your posting. Also see 18005
Target Milestone: M11 → M12
sounds to hairy for this late in m11. let me know if that changes.
Blocks: 18471
Blocks: 18951
Blocks: 20203
QA Contact: lchiang → huang
Dogfood bug for M12. Change QA Contact to me. Cc: Lisa.
Status: NEW → ASSIGNED
Brendan, do you have a projected fix date on this bug?
Assignee: brendan → pavlov
Status: ASSIGNED → NEW
Reassigning to pavlov, since it's a linux bug where events are being processed on wrong thread. Pavlov -- do you have the cycles to look at this?
Assignee: pavlov → dougt
assigning to me
Status: NEW → ASSIGNED
Whiteboard: [PDT+] → [PDT+] Fix ready, patch sent for review.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
fix checked in.
Status: RESOLVED → VERIFIED
Whiteboard: [PDT+] Fix ready, patch sent for review. → [PDT+] Verified the fixed already
Verified fixed on the 12-16-12-M12 commercial build. Mark as Verified!!
Status: VERIFIED → REOPENED
Doug, I didn't think you checked in the gtk changes (removing the implementation of listen to event queues) to fix this problem, just the assertion code. I though it didn't get in for M12. Karen, the reason why this worked for you is because bienvenu and I have code changes in place that hid the problem before. Those are still in place. After we're sure that dougt checked in the gtk related changes, we need to pull out these band aids. Then you can verify this bug. I'm going to clear the verified resolution for now...doug can you confirm if all your patches are in or just the bandaid assertion stuff? We should leave as opened if that's the case
Resolution: FIXED → ---
re-opening so bievnenu and I can take out our imap hacks to hide the threading problem. dougt, if your gtk changes are checked in, just re-assign to me and I'll remove our hacks then karen can verify this.
Whiteboard: [PDT+] Verified the fixed already → [PDT+]
OK. Removed "verified" from the Status Whiteboard.
Whiteboard: [PDT+] → [PDT+] Verified the fixed already
Doug, is there any chance we can comment out those assertions? I get lots of crashes on my linux box when working in imap because are generating so many events, these asserts about events getting processed on the wrong thread get dumped to the console. Eventually, i'm just seeing a solid wall of these assertions getting dumped to the console and it can't keep causing me to crash in the printf in the assertion.
Whiteboard: [PDT+] Verified the fixed already → [PDT+]
I stomped on Karen's status white board change.
yeah, the new problem is that it takes up 100% CPU time while using IMAP
Assignee: dougt → mscott
Status: REOPENED → NEW
Take out your hacks. The fix that I put in for m12 will make sure that your events will not get processed on the wrong thread. However, you will still see asserts! The fix to the gtk widget was not checked in since it seamed to break your password dialog from coming up. I am not sure why this is, and have not really had a change to debug this.
final m12 candidates are spinnning now. moving to m13. if we fall off track and need to respin m12 for some yet unknown reason we can consider this if you get a fix in hand.
This isn't something we need to fix for M12 Chris. So need to worry about taking a fix for it later. Doug, I'll get permission from Chris H. to take out our imap hacks and then I'll send this back to you as I know you were helping us track down why we still get the asserts on linux =).
Whiteboard: [PDT+] → [PDT+] (mscott's part is NOT PDT+)
The changes I need to make here for removing our imap hacks is definetly not PDT+. Doug, you posted a sweet patch for gtk yesterday that fixes the asserts we were seeing. Can I assign this bug to you so you can check that patch in with this bug report? Then you can bounce it back to me, I'll remove the PDT label for my stuff and check in the imap changes. How does this sound?
*** Bug 22668 has been marked as a duplicate of this bug. ***
Status: NEW → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
Okay, doug's stuff is in. I finally removed my imap hack in CreateNewLineFromSocket which was there because events were getting processed on the wrong thread. I parsed a 12,000 message folder (with no.msf) and didn't stall. Marking this as fixed as my part of the bug is now done too. When you go to verify it might help to make sure the cpu on your linux box isn't spiked after you connect to an imap servr (but aren't doing anything).
OK. I will verify this bug, then.
Status: RESOLVED → VERIFIED
Whiteboard: [PDT+] (mscott's part is NOT PDT+) → [PDT+] Verified
Verfied on Linux 2000-01-07-08-M13 commercial build. Used System Processor: Pentium/200MHz can read imap 5000 msgs for 55 seconds. There is no stall anymore when reading large imap messages. CPU is not hanging on 100% anymore when idling short time. Updating Status Whiteboard and marking as verified!!
No longer blocks: 17907
No longer blocks: 18471
No longer blocks: 18951
No longer blocks: 20203
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.