Closed Bug 465009 Opened 16 years ago Closed 15 years ago

Long pauses (windows hourglass 100% cpu) switching folders while indexing gloda

Categories

(MailNews Core :: Database, defect, P4)

x86
Windows XP
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: JoeS1, Unassigned)

References

Details

(Keywords: perf)

Attachments

(1 file)

While gloda is indexing for the first time, attempt to switch folders.
In this particular case, I was moving from a newsgroup which seemed to be working noramally, to my pop inbox.

Each time this was attempted, cpu went to 100%, the windows hourglass appeared.
During this time period, (up to 4 minutes) the TB window froze completely.
I also noted that disc activity ceased during the interval.

Tested with:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b2pre) Gecko/20081114 Lightning/1.0pre Shredder/3.0b1pre ID:20081114033300
Flags: blocking-thunderbird3?
Severity: normal → major
Component: General → Database
Keywords: perf
Product: Thunderbird → MailNews Core
QA Contact: general → database
Summary: Long pauses (windows hourglass 100% cpu) switching folders while indexing → Long pauses (windows hourglass 100% cpu) switching folders while indexing gloda
Some additional thoughts:
This might be just an amplification of a problem that I'm seeing generally.
That is, a delay of a few seconds when switching folders as described above.
I've never seen more that 3-4 seconds delay normally, and I can't seem to reproduce or figure out the exact circumstances of the delays.
I attempted to replicate this on a T40 laptop but was not able to do so, as noted in Bug 465353 

Attempting same o a Gateway P4 3.2GHz which I remote desktop into, but no dice.  BUT, that might be because indexing there is on a steady, flat line of performance - CPU is not going above 20-25%. So indexing is not driving IO up high and not forcing appreciable CPU contention.  I'm not seeing the high CPU on this machine that I saw in bug 465353.
Joe, what gloda preferences do you have enabled in regards to logging?  Additionally, are you running thunderbird with a console window open?

The fact that disk activity ceased to happen and the window turned grey suggest there is something at work more than just gloda.  Gloda's asynchronous mode of operation requires that it constantly surrender time-slices, so if the window goes grey, gloda is likely not running.  Likewise, if the database is doing something (which should not be able to block the main thread), we should be seeing all kinds of disk activity.  If the problem were bug 465122 (failure to close msf files), I would expect disk swapping.
I was not logging at the time, but:
https://bugzilla.mozilla.org/show_bug.cgi?id=465122#c1
May have some relevance. I had an unusually high number of windows going and only 512m memory on this machine.
I'll take a closer look at memory and processes if I can reproduce.
Well it seems I can reproduce this. At least I did this time around.
Thunderbird.exe hits 99% and stays there for 10 minutes or more.
I have been seeing a delay in switching to/from the newsgroup mozilla.support.thunderbird in normal usage (3-4) seconds delay.
What I think is relevant is the number of headers in this group 72,713
Thats seventy two thousand.
Attempting to open that group while indexing seems to key the incident.
Not sure if the other system usage has anything to do with it.
Memory seems to churn ranging from 157k--194k with nothing much else happening.
Occasional disc activity, but very little.
Attaching screenshot.
Anything more I can do, please ask, but it gets me a little nervous hanging too long at 100% CPU
Attached image TB at 99%
After giving this bug some further thought (dangerous sometimes) I think it might boil down to fact that the TB UI freezes when attempting to open a folder while gloda is currently using that msf.
In the case I am seeing, that newsgroup I'm trying to open has a large number of headers, and could explain the amount of time the UI freezes.
If that is the case , then copying the msf to a temp file, and using that for gloda indexing, might be an answer.
Gloda has no problem sharing the msf.  The problem that explains a pause of a few seconds is likely that the msf is huge and anyone opening the msf is going to block the UI.

The multi-minute thing is very concerning.  In theory, gloda should not be doing anything with your newsgroup folder at all.  In the past at least, I think it might get tricked into performing event-driven indexing in a newsgroup, and that may still be true.

Is your newsgroup marked for offline storage?  If so, are all the messages already downloaded?  When the multi-minute thing happens, do you know if any heavy network traffic is going on?
(In reply to comment #8)
> Gloda has no problem sharing the msf.  The problem that explains a pause of a
> few seconds is likely that the msf is huge and anyone opening the msf is going
> to block the UI.
> 
> The multi-minute thing is very concerning.  In theory, gloda should not be
> doing anything with your newsgroup folder at all.  In the past at least, I
> think it might get tricked into performing event-driven indexing in a
> newsgroup, and that may still be true.
> 
> Is your newsgroup marked for offline storage?  If so, are all the messages
> already downloaded?  When the multi-minute thing happens, do you know if any
> heavy network traffic is going on?

No offline storage is selected.

I don't see any network activity at all during the incident, as a matter of fact, my anti-virus (automatic update) failed to execute during the incident.
Just popped up as a coincidence during the incident.

What can I do to elaborate. Run with console..whatever I can do to help.
Priority: -- → P1
Whiteboard: [likely resolved by patch on bug 466438]
The fix for bug 466438 is now in the trunk; it stops gloda from indexing newsgroup messages.  The next set of nightlies should include the fix.  Please try and duplicate with such a build.

This is the only problem I'm really able to conceive of that could be directly gloda's fault and might explain the observed problem.  Long load times for folders with tons of messages are not a gloda-specific problem, though they are something we want to improve.

If you are able to reproduce, please do run with console after having enabled the gloda "dump" preference and try and reproduce there.  (Search on "global" in the advanced pref config and enable the "dump" one.)  I'm advising trying to reproduce without the logging because the logging can be apt to generate a lot of CPU load itself and might confuse the issue.
Severity: major → normal
Priority: P1 → P4
Whiteboard: [likely resolved by patch on bug 466438] → [believed fixed by bug 466438]
The situation seems to be much improved.
With one test run (lasted 1 hr 53 mins to complete the indexing)
I detected 1 period of about 3 minutes with a frozen UI
The console was not too helpful, since it was flooded with hundreds of messages:
2008-11-25 18:33:55	gloda.indexer	WARN	Have aMimeMsg but not bodyPlain?
These actually wrapped the console storage, so much info was over-written.

BTW, probably 90% of my message storage is HTML hence not bodyPlain.

I'll try a few more runs when I get time, (1 run=almost 2 hours)
improved for me also using compose, but I didn't do extensive testing.

but I suspect there are other usecases that get impacted, IO intensive like  search for example (iirc I tried search - I should have documented it at the time)
Joe, are you using a debug build by any chance?  Also, any chance you have a profiler available?

Wayne, do you know if there are easy instructions to run a profiler against a windows build of thunderbird?
I'm going to deny blocking status on this one until it can be verified/duplicated.  This isn't to deny the problem, just that it's hard to have it impact drivers until we can have more actionable STRs.
Flags: blocking-thunderbird3? → blocking-thunderbird3-
(In reply to comment #13)
> Wayne, do you know if there are easy instructions to run a profiler against a
> windows build of thunderbird?

http://www.hacksrus.com/~ginda/venkman/profiles/

Perhaps this issue is worth putting out to the testing community.
(In reply to comment #15)
> (In reply to comment #13)
> > Wayne, do you know if there are easy instructions to run a profiler against a
> > windows build of thunderbird?
> 
> http://www.hacksrus.com/~ginda/venkman/profiles/

I don't think Venkman (JavaScript) profiling would be sufficient.  We need to know what the C++ code is up to.

> Perhaps this issue is worth putting out to the testing community.

Sounds reasonable.
Joe, does thunderbird window go "(not responding)" when this happens?
(In reply to comment #17)
> Joe, does thunderbird window go "(not responding)" when this happens?

I'm about ready to try this again.
I'll test 2 ways With my existing (large-everyday test profile)
And with a more limited profile that I built for for beta testing)

WRT "not responding" it certainly wasn't, nor was it telling me anything.

Oh yes in regard to the troubling slow responding folder, part of that was due to a mix of read and unread messages in threaded view (newsgroups)

Marking all-read speeded it up quite a bit.
Further testing with:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090223 Lightning/1.0pre Shredder/3.0b3pre ID:20090223031225

I deleted the database and started the indexing process.
During Gloda indexing I could not reproduce the symptoms originally noted in this bug report. 

Switching folder, as well as normal TB usage seemed to work fine, but I have the following observations to note:

Indexing seemed to take longer, but with less impact on UX
The one issue that I did note was that scrolling large messages was jerky.
Also scrolling down the folder pane.
To try to quantify that I used the URL ref in bug 372039 to test.

Here are the results (average of 4 runs via the start page)
Gloda running 7240
No Gloda 5850

I could close this bug as worksforme, but I think I'll leave it open for periodic evaluation in case a profile specific condition was to blame.
I'm marking this as worksforme; I'm trying to clean out gloda bugs and there's nothing actionable remaining to do on this.  If scrolling is still jerky for you even with the fix for bug 470329 (adaptive indexing to keep cpu usage within certain bounds based on idle/non-idle) then please file a new bug specifically on that problem.
Status: UNCONFIRMED → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
Whiteboard: [believed fixed by bug 466438]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: