Closed Bug 17065 Opened 25 years ago Closed 25 years ago

[DOGFOOD] Messenger stalls opening IMAP Inbox

Categories

(MailNews Core :: Networking, defect, P3)

x86
Linux

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: trudelle, Assigned: mscott)

References

Details

(Whiteboard: [PDT+] Verified)

Today's opt bits (and last few days too)
Using existing profile/account, or creating all new ones.

Open Messenger
Select Inbox
throbber animates, status
> Receiving message headers ## of ###
## increases to a few dozen, then stalls.
I left it overnight, and it never got any further, nor timed out.
Assignee: phil → mscott
Summary: Messenger stalls opening IMAP Inbox → [DOGFOOD] Messenger stalls opening IMAP Inbox
Reassign to mscott, cc alecf, nominate for dogfood
I saw this too earlier this week and I thought I fixed it with some changes
that

went in on Wednesday. But Peter's reporting this problem on today's build
so the deadlock situation must still be there.

I'll take another look.
Unless I'm not installing correctly.  Build ID = 1999102208.  That's today,
right?
Using a build from Saturday afternoon, I was finally able to trigger this hang
again.
I think I am seeing the same thing.  Here is my sequence:
1. launch today's build (much faster startup) to browswer window
2. select Messenger
3. Double click account name to get tree hierarchy
4. select inbox
5. everything slows way down, CPU usage at 100% on NT, othe apps. don't respond
6. then password dialog comes up (I don't save password)
7. eventually I can enter text in the password dialog but it very, very slow
respond to keyboard input. Hit return
8. Stuff happens very still crawling, about 1-2 minutes later inbox comes up and
CPU usage is back to normal.
9. Windows NT, 128MB RAM
Hey Dave, I think your problem isn't necessarily a hang of the system but
a problem with imap performance which has regressed horribly over the weekend.
I'm also seeing huge delays when trying to bring up modal dialogs and doing
other things. We're tracking that in Bug # 17062. My first guess is to say that
problem is related to all the event queue changes danm has been making recently
but I need to debug it further before reaching that conclusion.
What I'm seeing is Linux-only, doesn't slow anything down, and happens after the
password dialog.  It doesn't hang or freeze, just stalls that window while
retrieving headers.
we talked to daver, his comments only relate to the rather slow IMAP performance
as of late. This bug is still the "Hanging IMAP on linux" bug...
Let me just add, me too, to this bug.  It looks like you've seen this, but if
you need a machine to reproduce this on, my Linux machine is doing this all of
the time.
Whiteboard: [PDT+]
Still happening every day for me, using the opt bits.  Don't know if this is
related, but on the Mac, selecting the Inbox today also stalls, doesn't even get
to the authentication.
I've been trying to fix this today but every time I select an imap folder, I'm
crashing off in some gtk code that I'm not familiar with. I caught a bad checkin
in mid pull.
Today's Linux opt bits segfault on opening folder.
Yeah it turns out the crash I was seeing last night that was preventing me from

debugging this was the same one keeping the tree closed this morning. I just
thought there was something in my tree that was out of date for some reason.
Apparently not. The crash is in some gtk, glib code that I know nothing about.


I'm trying to find an expert on that stuff now. For more info, see Bug #17352
Pavlov is the module owner for GTK.
Component: Back End → Networking-Mail
Adding dougt to the cc list for advice. When I am able to reproduce this hang
(or maybe it is another hang altogether), here's what I'm seeing:
1) The imap thread has made a call to the UI thread through a proxied interface.
So the thread is now in the proxy code blocking on the response from the UI
thread and waiting for the response. It's spinning in the code that pumps events
through the nested event queues (i.e. is it process pending events, then sleep,
then process move events)

2) the UI thread doesn't appear to ever process the event generated by the imap
code. On my debug build, the UI thread has a stack trace that looks like:
calling nsAppShell::Run, g_main_run(),
g_main_iterate(), g_hook_next_valid()

My guess is that our event somehow got lost and so the proxy call never returns
with an answer. Hence the imap thread stalls and we never finish parsing the
folder.

Note: I had to jump through a few hoops to reproduce this problem. I had to
delete all my .msf files. Then I opened my inbox. displayed a message. Opened
another folder. Then opened my trash folder which was 800 msgs. I would hang
just about every time trying to parse my trash folder.

So in my example, we would have the UI thread, 2 imap threads (inbox, and
another folder) which were just waiting on a url to run and a 3rd thread which
was the imap thread trying to open the trash folder.

Hmmm.
Target Milestone: M11
I've been playing around with this again this morning and I'm still seeing
the same behavior.

I'm now in a position where I don't have to jump through hoops. Just opening
my trash folder is causing the imap thread to hang for the reasons stated
above. We're blocked waiting for a proxied call to return. But the event
is never reaching the UI thread.

It's getting lost along the way. I think I'm going to need dougt or danm's help

on this puppy.
Status: NEW → ASSIGNED
I've set printfs in two places:
1) where the imap thread calls the proxied object
2) in the ui thread, a breakpoint inside the real object

Sure enough, in the case where the imap thread hangs, I see the printf
saying we are now calling into the proxied object. And there is no
matching printf in the real object. So the UI thread is never processing
our event.


I'd re-assign to dougt but I think he's going to need me to help debug this
down further...
Seems bad enough to hold M11
Doug, this problem may be related to some strange stuff I'm seeing with how
linux is processing events. I just posted an email thread asking for help to
the xpcom and unix groups.

Basically, I'm seeing the UI thread taking events out of the imap thread and
processing them. This is leading to bad things for imap because we are
executing

code in the wrong thread!!


I can't tell for sure if that problem is also causing us to block later on when
we make a call through a proxy object.
this is no longer an M11 stopper, is it? We (mscott and I) checked in some code
last night to get around the problem.
I haven't seen this specific problem in 3 days, but now it is taking about 20
seconds to load even the smallest plain text messages.  Is that the 'call
through proxy object' block mentioned above?
possibly, but it's also possibly just the general slowness painting on Linux -
the meteors run when loading a message because the web shell starts it, and
running the meteors really slows things down.
I think this is definetly still a M-11 stopper. What we did last night was
pretty
 horrible for a work around (at least the part about Clearing the ODA
flag) and could easily be contributing to the message display performance
degradation that Peter is seeing in today's build.


I've tried asking for help in the newsgroups but no one's answered. I may have

to read up on glib and gtk to figure out why this code is misbehaving.
No, it isn't just general degradation.  My.netscape.com starts to appear in less
than 5 seconds, and finishes in 15 seconds on the same machine.  There is
something very wrong with message display time.
do you mean imap msg display, or msg display in general? How long does a
small news message or local message take to display?
Blocks: 17907
POP message takes about as long.  I can't create a news account, setup forces it
into POP.
yes, that sucks. apparently, drop down combo boxes in dialogs are broken on
mac and linux. Did you try using the arrow keys? I hear they work.
Cursor keys appear to work, but apparently now give the wrong result to the
caller, at least in the last 2 days.
(The arrow keys used to be a workaround.  Now, even after you use the arrow key
to make your selection, it appears that the selection never holds.  Bug 15476)
Assignee: mscott → brendan
Status: ASSIGNED → NEW
Brendan, this is the bug we talked about in the pork jockey meeting. Basically,
I've verified that necko is placing events such as OnDataAvailable into our imap
thread's event queue.

Then when that event is processed, I've seen that the thread we're running in is
the UI thread and not the imap thread.

From there, strange and erratic behavior happens.

In my posting to the newsgroup, you can see the stack trace where the UI thread
is listening to the UI event queue in main(). And you can then see where the UI
thread calls into the imap thread's event queue, asking it for an event.
I've tracked this down a bit further, if you look at my reply in the newsgroup
to your posting. Also see 18005
Target Milestone: M11 → M12
sounds to hairy for this late in m11.
let me know if that changes.
Blocks: 18471
Blocks: 18951
Blocks: 20203
QA Contact: lchiang → huang
Dogfood bug for M12. Change QA Contact to me. Cc: Lisa.
Status: NEW → ASSIGNED
Brendan, do you have a projected fix date on this bug?
Assignee: brendan → pavlov
Status: ASSIGNED → NEW
Reassigning to pavlov, since it's a linux bug where events are being processed
on wrong thread. Pavlov -- do you have the cycles to look at this?
Assignee: pavlov → dougt
assigning to me
Status: NEW → ASSIGNED
Whiteboard: [PDT+] → [PDT+] Fix ready, patch sent for review.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
fix checked in.
Status: RESOLVED → VERIFIED
Whiteboard: [PDT+] Fix ready, patch sent for review. → [PDT+] Verified the fixed already
Verified fixed on the 12-16-12-M12 commercial build.
Mark as Verified!!
Status: VERIFIED → REOPENED
Doug, I didn't think you checked in the gtk changes (removing the implementation
of listen to event queues) to fix this problem, just the assertion code. I
though it didn't get in for M12.

Karen, the reason why this worked for you is because bienvenu and I have code
changes in place that hid the problem before. Those are still in place. After
we're sure that dougt checked in the gtk related changes, we need to pull out
these band aids.

Then you can verify this bug. I'm going to clear the verified resolution for
now...doug can you confirm if all your patches are in or just the bandaid
assertion stuff? We should leave as opened if that's the case
Resolution: FIXED → ---
re-opening so bievnenu and I can take out our imap hacks to hide the threading
problem. dougt, if your gtk changes are checked in, just re-assign to me and
I'll remove our hacks then karen can verify this.
Whiteboard: [PDT+] Verified the fixed already → [PDT+]
OK. Removed "verified" from the Status Whiteboard.
Whiteboard: [PDT+] → [PDT+] Verified the fixed already
Doug, is there any chance we can comment out those assertions? I get lots of
crashes on my linux box when working in imap because are generating so many
events, these asserts about events getting processed on the wrong thread get
dumped to the console. Eventually, i'm just seeing a solid wall of these
assertions getting dumped to the console and it can't keep causing me to crash
in the printf in the assertion.
Whiteboard: [PDT+] Verified the fixed already → [PDT+]
I stomped on Karen's status white board change.
yeah, the new problem is that it takes up 100% CPU time while using IMAP
Assignee: dougt → mscott
Status: REOPENED → NEW
Take out your hacks.  The fix that I put in for m12 will make sure that your
events will not get processed on the wrong thread.  However, you will still see
asserts!  The fix to the gtk widget was not checked in since it seamed to break
your password dialog from coming up.  I am not sure why this is, and have not
really had a change to debug this.
final m12 candidates are spinnning now. moving to m13.
if we fall off track and need to respin m12 for some
yet unknown reason we can consider this if you get
a fix in hand.
This isn't something we need to fix for M12 Chris. So need to worry about taking
a fix for it later.

Doug, I'll get permission from Chris H. to take out our imap hacks and then I'll
send this back to you as I know you were helping us track down why we still get
the asserts on linux =).
Whiteboard: [PDT+] → [PDT+] (mscott's part is NOT PDT+)
The changes I need to make here for removing our imap hacks is definetly not
PDT+. Doug, you posted a sweet patch for gtk yesterday that fixes the asserts we
were seeing. Can I assign this bug to you so you can check that patch in with
this bug report?

Then you can bounce it back to me, I'll remove the PDT label for my stuff and
check in the imap changes.

How does this sound?
*** Bug 22668 has been marked as a duplicate of this bug. ***
Status: NEW → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
Okay, doug's stuff is in. I finally removed my imap hack in
CreateNewLineFromSocket which was there because events were getting processed on
the wrong thread. I parsed a 12,000 message folder (with no.msf) and didn't stall.

Marking this as fixed as my part of the bug is now done too.

When you go to verify it might help to make sure the cpu on your linux box isn't
spiked after you connect to an imap servr (but aren't doing anything).
OK. I will verify this bug, then.
Status: RESOLVED → VERIFIED
Whiteboard: [PDT+] (mscott's part is NOT PDT+) → [PDT+] Verified
Verfied on Linux 2000-01-07-08-M13 commercial build.
Used System Processor: Pentium/200MHz can read imap 5000 msgs for 55 seconds.
There is no stall anymore when reading large imap messages.
CPU is not hanging on 100% anymore when idling short time.
Updating Status Whiteboard and marking as verified!!
No longer blocks: 17907
No longer blocks: 18471
No longer blocks: 18951
No longer blocks: 20203
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.