Closed
Bug 258370
Opened 20 years ago
Closed 19 years ago
Slack Timer with delay zero locks up UI
Categories
(Thunderbird :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: avi, Assigned: mscott)
References
Details
Attachments
(2 files)
I'm migrating from Eudora, which is known for being incredibly quick in
searching for old emails. Thunderbird, unfortunately, isn't nearly as fast.
(Should I file a bug?)
What's problematic is that while Thunderbird is grinding away at the "Search
Messages" results, nothing else can happen. If Thunderbird is going to take
forever to return search results, at least it should allow me to switch to the
other windows and do things.
Using Thunderbird version 0.7.3 (20040803).
Comment 1•20 years ago
|
||
are these local message searches? Are you searching on message bodies or just
headers? on win2k, search doesn't lock up the UI at all.
Reporter | ||
Comment 2•20 years ago
|
||
Local messages. The particular search that drove me to write up this bug
happened to be a body search.
I'm using Mac OS X. You're saying that on Win2k, you can have the gnarliest
search grinding away, updating the results window, and still use the rest of
Thunderbird?
Reporter | ||
Comment 3•20 years ago
|
||
Let me be more precise about what I see.
I start a "body contains xxx" search on about 400 megs of archived mailboxes.
Thunderbird locks up and stops responding. It finds a few matches, updates the
result list, and processes all the clicks and keystrokes that piled up while it
was unresponsive. It then goes catatonic for ten more seconds while it searches,
and then repeats the dump-results-and-process-events deal. Repeat until all the
matching messages are found.
I don't know how it balances its time on Windows, but here on the Mac it's
really unusable.
Using a 0.8 nightly: version 0.8 (20040908)
Comment 4•20 years ago
|
||
it's a lot more useable than that. I have around 350MB of messages in my local
folders account and I can do a body search over that and still open other
folders,read messages, etc.. It's a little sluggish but not too bad. This is on
a 1.6GHz PC.
The pre Unix mac was pretty notorious for slow file I/O and inefficient thread
context switching - I would hope that OS/X would be a lot better. We might be
doing something that performs poorly on the Mac. Our basic algorithm is to set a
recurrinig timer, and on each timer callback, do some processing for no more
than 200 milliseconds, and then return to the event loop.
Reporter | ||
Comment 5•20 years ago
|
||
OS X is a completely different beast than 9, so forget all you know about
that... :) Nevertheless, something's going on here, as I'm getting the spinning
beachball, which is automatically put up by the system whenever an app stops
responding to events for five seconds.
While Thunderbird was hung searching, I sampled the app. It really does appear
to be spending all its time searching...
Reporter | ||
Comment 6•20 years ago
|
||
Comment 7•20 years ago
|
||
cc'ing sfraser who's been very helpful with this kind of thing in the past.
Herre's where create the timer:
http://lxr.mozilla.org/mozilla/source/mailnews/base/search/src/nsMsgSearchSession.cpp#535
Here's where we loop doing 200 milliseconds of work, from the timer callback:
http://lxr.mozilla.org/mozilla/source/mailnews/base/search/src/nsMsgLocalSearch.cpp#663
Are we mis-using the timer? I don't see how doing 200 milliseconds of work could
lock up the ui for 10 seconds at a time. Otherwise, I'd consider backing off to
100 ms, but I suspect something else is going on.
Re the sample data, that's interesting data. I would expect that if you weren't
doing anything else in the UI that we would spend most of our time in the search
code, so that's not neccesarily wrong. If I'm reading the log, most of the time
we're either in nsMsgBodyHandler::GetNextLine, nsMsgSearchTerm::MatchBody, or
nsMsgSearchTerm::MatchString. There looks to be lots of room for improvement in
that code - for example, we shouldn't be getting the charsets each time through
the loop. The dns service call during the reading from the file input stream
seems odd as well (nsDNSService::GetAFForLookup). And we're probably copying
strings around more than needed.
Reporter | ||
Comment 8•20 years ago
|
||
Thanks for the code references. I'm an experienced developer but know little
about Mozilla, though, so I'm not quite sure what I'm looking at.
> I would expect that if you weren't
> doing anything else in the UI that we would spend most of our time in the search
> code, so that's not neccesarily wrong.
Sounds good, but I'm _trying_ to do other things in the UI. Just failing...
> If I'm reading the log, most of the time
> we're either in nsMsgBodyHandler::GetNextLine, nsMsgSearchTerm::MatchBody, or
> nsMsgSearchTerm::MatchString.
MatchBody is indented out, so it calls GetNextLine and MatchString. The numbers
are the number of times the sample hit that function, and the indenting is the
call chain.
Comment 9•20 years ago
|
||
sorry, what I meant was, were you trying to do other things when you created the
sample?
I wonder if on the mac, we should specify a small interval before the next slack
timer fires, instead of specifying 0. 0 seems fine on windows, and I hate having
platform #ifdefs but... Avi, I don't suppose you have a source tree built that
you could play with?
Reporter | ||
Comment 10•20 years ago
|
||
> sorry, what I meant was, were you trying to do other things when you created the
> sample?
Yes. I was trying to pull down a menu or move a window, and was failing (the UI
locked up).
> I wonder if on the mac, we should specify a small interval before the next slack
> timer fires, instead of specifying 0.
The timers must be build on some platform-specific code. Where is that going to
on the Mac? Wait--I'm looking at the stack trace. Mozilla still uses WNE? :(
> 0 seems fine on windows, and I hate having
> platform #ifdefs but... Avi, I don't suppose you have a source tree built that
> you could play with?
Um, I could try to do that. Do the resources of
http://www.mozilla.org/source.html and http://www.mozilla.org/build/ apply to Tb?
Comment 11•20 years ago
|
||
http://www.mozilla.org/projects/thunderbird/build.html has some thunderbird
specific instructions.
Re WNE, I don't know about mac-specific stuff. Maybe ducarroz can say more...
Reporter | ||
Comment 12•20 years ago
|
||
I got Tb building. Not too hard. :)
First, I tried adjusting the timer frequency. Given that each tweak means a
five-minute rebuild, my level of experimentation was limited. It locked up with
a frequency of 1ms the same as 0. Adjusting it to 2000ms gave a very responsive
UI, but the search never got anywhere.
I tried a different approach--make sure that things weren't getting locked up in
the callback. I wrote up a log to check out how long the callback was taking.
(Will attach.) It occasionally went up over 200ms, but most of the time it was
well-behaved.
I'm not sure what's wrong here. Perhaps we're not getting enough idle time using
WNE, so that every time through we've blown our timer, have to immediately do a
callback, and don't get any UI events? I've had similar problems with my own apps.
I think I'm going to start looking at the nsTimerImpl source, to see how it
works on the Mac. Any recommendations as to where to look?
Reporter | ||
Comment 13•20 years ago
|
||
Comment 14•20 years ago
|
||
no idea about the mac timer code.
thx for building the source and experimenting with it. I'm confused about your
tests, however. I would expect that making the time interval 2000 milliseconds
instead of 200 would have opposite effect - the search would go faster, but the
UI would lock up even more. I would think setting the search interval to 100
milliseconds would slow the search down but make the ui more responsive.
My guess is that when we're going well over the limit is when we have to open a
db, which happens synchronously. I see one time it takes us a whole second, and
there are a few cases where it takes 3/4's of a second or so. (Is this a debug
build, btw?)
Reporter | ||
Comment 15•20 years ago
|
||
> I would expect that making the time interval 2000 milliseconds
> instead of 200 would have opposite effect - the search would go faster, but the
> UI would lock up even more. I would think setting the search interval to 100
> milliseconds would slow the search down but make the ui more responsive.
I was changing the rate of the callback firing, not the length of time that
nsMessageLocalSearch was taking during each call back. Thus, the search was
taking the same 200ms of time, but was only getting called every 2000ms and thus
wasn't progressing.
A slack timer is being used, which is good, but that makes me suspect some kind
of deeper timer bug. On the Mac, perhaps, specifying too small of a timer fire
time prevents event handling?
In other places that timers are used, what value do they use for the frequency?
Reporter | ||
Comment 16•20 years ago
|
||
I've been scouring the code.
The search uses a slack, repeating timer with a delay of 0 to perform a search,
doing only 200 milliseconds of work at a time. This search code is the only code
that specifies a delay of 0. (Well, there's some inspector thing at
http://lxr.mozilla.org/mozilla/source/extensions/inspector/base/src/inSearchLoop.cpp#61
but that doesn't count.)
On the Win32 platform, this means that the search proceeds for 200ms, then
events are handled, then as soon as time can be found the search is called back
again, and so on.
On Mac OS X, having the timer be set to a delay 0 causes UI events to not be
handled, and the UI to blocked.
This seems to be a problem with the timer implementation. Changing the summary.
Summary: "Search Messages" locks up while searching → Slack Timer with delay zero locks up UI
Reporter | ||
Comment 17•20 years ago
|
||
Please follow along in the stack trace (attachment 158391 [details]).
When I saw WaitNextEvent, I was mislead into thinking that no Carbon Events were
used. I was mistaken.
Let's take the stack crawl from the top down:
260 xre_main(int, char**, nsXREAppData const*)
260 nsAppShell::Run()
260 nsMacMessagePump::DoMessagePump()
255 nsMacMessagePump::GetEvent(EventRecord&)
5 nsMacMessagePump::DispatchEvent(int, EventRecord*)
DoMessagePump is just a loop. While we're running, GetEvent and then
DispatchEvent. If you look at the children of DispatchEvent, we can see that
we're processing the UI. So it looks like all our time is in GetEvent.
What's GetEvent? Just a call to WaitNextEvent. Why is WNE eating so much time?
Carbon Events.
Someone is dispatching kEventClassPL/kEventProcessPLEvents events to one of our
event handlers. We're getting a callback
(http://lxr.mozilla.org/mozilla/source/xpcom/threads/plevent.c#1596), calling
PL_ProcessPendingEvents, and then timer code hits.
Who's sending these events? We are, of course, since that's a custom event type.
Searching gets us
(http://lxr.mozilla.org/mozilla/search?string=kEventProcessPLEvents):
_pl_NativeNotify
(http://lxr.mozilla.org/mozilla/source/xpcom/threads/plevent.c#1263).
Now the overall rule of event handling on the Mac is: event loop pumps events to
the event handlers. There are two (main) ways to run the event loop. One is the
new-style RunApplicationEventLoop. The other is the classic WaitNextEvent. If
you use WNE (like Mozilla does) then each time you call in, it pulls the top
event out of the queue, tries to dispatch it, and only if it fails does it
return from WNE with the event converted to a classic EventRecord.
So now we have one question. Why is all the time used pumping our custom events
for the timer and none actually used for processing the UI? After all, the
custom events are added at a low priority.
This is still under investigation.
BTW, do any of the Mozilla Mac developers use Firebird? Can I get some help
here? (Or at least a second opinion to my rambling?)
Comment 18•20 years ago
|
||
I think there are some mac developers who use firefox, or vice versa...what
happens if you change the slack timer from 0 to say, 10 milliseconds? It would
not surprise me if the slack timer with a 0 second deley does not perform well
on the mac... the timer and event processing code can be hard to get right.
Reporter | ||
Comment 19•20 years ago
|
||
> I think there are some mac developers who use firefox
It just seems that the Mac port of Thunderbird is raw. Something as simple as
searching shows fundamental problems. And I'm here, a reasonably-experienced Mac
developer, rambling on about a bug, trying to approach a massive cross-platform
codebase, trying to figure out why it doesn't work like it does.
Any Mac developers out there to help? Dagley? Fraser? Pink?
> what
> happens if you change the slack timer from 0 to say, 10 milliseconds?
It's still unusable at 10. At 50, it's tolerable.
> It would
> not surprise me if the slack timer with a 0 second deley does not perform well
> on the mac... the timer and event processing code can be hard to get right.
Indeed. But I'd rather get it fixed at the lower level, in the timer code, so
that everyone can use a delay of 0, than #ifdef the delay values higher up.
Comment 20•20 years ago
|
||
Pink is definitely active - cc'ing him.
Yes, I'd definitely want this to be fixed at a lower level. My suggestions were
just experiments to try to get some data points on what was happening.
As you say, there's really only one piece of code that uses a slack timer with a
0 interval. But it's not really a thunderbird port issue - mozilla mail has been
doing this for years now. I don't know if it has always locked up the UI like
this, however.
Reporter | ||
Comment 21•20 years ago
|
||
I really have no idea. I've looked over the code, but don't really even know
where to begin. And I can't even experiment since each compile takes 5 minutes.
I'm giving up. If anyone familiar with the Mac platform code wants to take a
look at this, please do.
Comment 22•20 years ago
|
||
why does each compile take 5 minutes? surely there's a way just to rebuild just
the library that changed and copy that manually into the app package.
Reporter | ||
Comment 23•20 years ago
|
||
I don't know. That's how long the make script takes, and I never took the time
to figure out exactly which library the code ended up with and custom-build it.
Pink, what do you think of this? Sometimes I feel like I'm the only person
actually using TB on the Mac. Do you see this problem too?
Comment 24•20 years ago
|
||
shouldn't be hard to determine the library that's rebuilt from where you are in
the code. what file are you touching?
i don't use TBird, fwiw, either at home or the office.
Reporter | ||
Comment 25•20 years ago
|
||
Well, I was touching files in the mailnews component when experimenting with the
timer delay. But once it seemed to me to be a problem with the timer itself I
stopped.
For now, I'm not touching anything because I don't know what to touch. There
seemed to be a problem with slack timers, but when I started digging I started
hitting core Mac event code, and I was at a loss to explain the behavior that I
was seeing. I'm at a total loss as to how to proceed in the diagnosis of this
slack timer problem.
Comment 26•20 years ago
|
||
A shot in the dark: hack plevent.c to turn on MAC_USE_CFRUNLOOPSOURCE for the
carbon build; it might work better than carbon events.
Reporter | ||
Comment 27•20 years ago
|
||
(In reply to comment #26)
> A shot in the dark: hack plevent.c to turn on MAC_USE_CFRUNLOOPSOURCE for the
> carbon build; it might work better than carbon events.
That's from the checkin from bug 271050, right?
My tree is a bit old, so it's erroring out when building. Let me do a clean
rebuild. [Goes to bed; it's late.]
[Wakes up.]
OK. Holy cow, it's significantly better. It's very sluggish, but is actually
usable. You can open up messages as they show up in the list, and cancel the search.
Look at bug 271050's comment #47. Joel Craig was saying that FF and Mozilla were
having event problems like 271050's checkin fixed for Camino. I think this
deserves to be turned on for all the Mac ports, not just for Camino.
(BTW, why does the trunk look so different than the branch? Did someone forget
to check in the artwork?)
Comment 28•20 years ago
|
||
I filed bug 258370 to switch Moz/FF to the Camino-style plevent handling.
Depends on: 282940
Comment 29•20 years ago
|
||
(In reply to comment #28)
> I filed bug 258370 to switch Moz/FF to the Camino-style plevent handling.
Eehm, that's this bug :) I think you mean bug 282940
Reporter | ||
Comment 30•19 years ago
|
||
Well, the switch to CFRunLoopSource plevent handling is in, and testing in Thunderbird 1.5 beta something (version 1.5 (20051025)) shows that my original complaint is gone.
Therefore, I'm declaring this fixed via bug 282940 and closing it.
Thanks, all y'all.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•