Closed Bug 469448 Opened 16 years ago Closed 15 years ago

Scrolling in folder pane is unbearably slow when no .msf files exist for a couple of subfolders

Categories

(Thunderbird :: Mail Window Front End, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b2

People

(Reporter: justdave, Assigned: Bienvenu)

References

Details

(Keywords: perf, regression, Whiteboard: [no l10n impact] )

Attachments

(4 files, 1 obsolete file)

The last couple days I've noticed the folder pane is really slow and jumpy when scrolling while any folders from pop3 accounts are visible.  If only IMAP accounts are visible in the folder pane it's nice and speedy like it always was.  For me this means the bottom half of my folder pane scrolls normally and if I'm viewing the top half it's slow and jumpy.  This is something new in the last couple days, I'm not sure exactly when though, but pretty sure it's within the last 2 or 3 days.
Flags: blocking-thunderbird3?
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20081212 Shredder/3.0b2pre
maybe it was more than 2 or 3 days, I'm back through 12/5 so far and still reproducing it.
OK, so this is strange, I've gone back to 11/15 and can still reproduce this (and I rebooted just to make sure it wasn't something screwy in memory or something).  I could swear it wasn't doing this to me before the last couple days.  Something weird in my profile again that got put there in the last couple days?
OK, the cause of the slow scrolling turned out to be mailboxes that had lost their indexes.  I have no idea how they lost their indexes, but stepping through the folders selecting them one at a time and allowing them to regerate their indexes as they opened resolved the scrolling problem.

Also found several new messages Shredder had never notified me had arrived, because they had sorted into folders that had said missing indexes, including voicemail notifications that were a week and a half old wanting information for events happening a week ago that the people probably think I blew them off now. :/
index files gone? or invalidated and rebuilt?  (or not possible to tell)


>Also found several new messages Shredder had never notified me had arrived,
>because they had sorted into folders that had said missing indexes

similarish things have been reported
normally we would get this information from panacea.dat - but if you blew away your panacea.dat files, and your .msf files, we would try to open the .msf files to get some information when painting the folder pane. I don't know that just an invalid .msf file would cause us to stop getting things from panacea.dat, but I suppose it's possible.
(In reply to comment #5)
> index files gone? or invalidated and rebuilt?  (or not possible to tell)

Yeah, not possible to tell.  I didn't know what I was looking for, and I let it rebuild them all since then.
I can see the same behavior. Today I switched from Tb2 to Tb3b2pre. While the Tb2 build works fine, the 3b2pre build isn't usable. Scrolling the folder pane is one issue but the whole UI is reacting like a snail.

Lets see if we can find a regression range here.
I was unable to recreate with my vista pop account. And didn't attempt with indexes blown away.  

Have we confirmed that indexes are required to see the problem? i.e. you see significant cpu usage while machine is otherwise idle, or the problem is seen when there indexes are just fine or imap accounts are not even open/logged in?
Cpu usage is about 50% when scrolling and having only one POP3 account open. Following exception is thrown in regular intervals while scrolling:

Error: uncaught exception: [Exception... "Component returned failure code: 0x80550006 [nsIMsgFolder.getMsgDatabase]"  nsresult: "0x80550006 (<unknown>)"  location: "JS frame :: chrome://messenger/content/mailWidgets.xml :: parseFolder :: line 1929"  data: no]

Still trying to build a minimized testcase...
One of such a folder which causes this perf issues has a lot of subfolders where single files are used to hold the messages. Curiously none of these files have .msf files around. David or Mark, could that be probably the reason?
I do have several folders with lots of subfolders.  All of the subfolders currently have .msf files, and I am not currently experiencing this problem.  I'm assuming those files got created when I stepped through all of the folders letting it regenerate indexes when I got the problem to go away.  I don't know whether they were missing or just corrupted when I was having this problem.
Attached file Testcase for slowness
With this testcase you can test the slowness of Thunderbird. Just extract the zip archive and replace it with your Local Folders account. Afterward you open the perf folder you will notice that Thunderbird is getting really slow.
This can also be seen on Windows and isn't related to POP3. The cause are the missing .msf files in a couple of subfolders.
OS: Mac OS X → All
Hardware: x86 → All
Summary: Scrolling in folder pane is unbearably slow and jumpy if pop3 accounts are visible → Scrolling in folder pane is unbearably slow when no .msf files exist for a couple of subfolders
I stepped through all my folders manually and now that index files exist everywhere the scrolling works just fine!
yes, the new js driven folder pane fetches folder flags on paint, unlike the old rdf-driven one.  Usually, this just means we get the flags from the folder cache, but if the .msf file is missing, we try to open the missing .msf file on every paint, and that is slow enough that it can make scrolling slow. 

I'm not quite sure why having a missing .msf file means that we can't use the cached copy of the flags, from the nsMsgDBFolder. If I squint, I can see why we can't use the folder cache value, because the folder cache is a cache of the information in the individual .msf files, and if the .msf file is missing...
Assignee: nobody → bienvenu
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Target Milestone: --- → Thunderbird 3.0b2
I've been seeing a similar - but not the same - slowdown - probably with the same cause.  

This time, the slowest thing seems to be when you move to a new message, the index pane seems to be fast enough, but the message pane takes a long time (significant fraction of a second) to draw. Its slow enough that scrolling through messages with arrow keys is non-viable, and I find myself replying to the wrong message for example. 

The cause seems related - I experienced the recent crash where .msf files were corrupted and I had to create a new profile, delete the MSF files and copy them over (by the way - it should be possible to import the old files, a topic for the SEVEN year old bug 63389 ). 

Since then TB has been slow enough to be almost unusable.

- Mitra
Joey, any chance you could take this on?
Based on comment #16: I can fix the front-end to cache that data, but the comment seemed to suggest there's a bigger problem on the backend that I wouldn't be able to take on, as I'm really unfamiliar with that architecture.
No longer blocks: 472068
jminta was referring to my question about why we couldn't use the cached flag in the folder object - we do that, but we try to get it from the db every time because we try to open the db to refresh the cache.
Whiteboard: jminta, bienvenu to investigate
This caches the folder-flag data so we only look it up once. The trick here is that now we need to watch for stale properties. I need to do more research to figure out how to tell when any relevant property changes.
I tried removing a .msf file for a local file, and restarting, and I didn't see any scrolling problems with a recent trunk build. I'll try some more tomorrow, but I wonder if this has been fixed by some of the recent db fixes.
David, its still persistent. It's not enough to only remove one .msf file. As more files you delete as slower Shredder will be. Take my testcase from attachment 355339 [details] and you will clearly be able to see it.
(In reply to comment #10)
> Following exception is thrown in regular intervals while scrolling:
> Error: uncaught exception: [Exception... "Component returned failure code:
> 0x80550006 [nsIMsgFolder.getMsgDatabase]"  nsresult: "0x80550006 (<unknown>)" 
> location: "JS frame :: chrome://messenger/content/mailWidgets.xml ::
> parseFolder :: line 1929"  data: no]

It's simply an indicator of "out-of-date" .msf file. See Bug 471682 Comment #47.
Once ".msf" becomes "out-of-date", as seen in Bug 471682, "folder open via other than UI" currently erases ".msf" and mail folder data update(addition of mail data) is done without correct ".msf" recreation.
Watch Bug 471682 for issue of "somehow .msf is deleted", please.
Henrik, thx, yes, if I delete enough .msf files, scrolling does get slow.

I've been pondering how to fix this - my current thinking is for the code that tries to open the db to refresh the cached info set a flag if the db fails to open and don't try to open the db again until we actually try to regenerate the db (i.e., making reparse clear the "don't open again" flag). This is a little risky because we don't want to get stuck in a situation where that flag is set and never cleared - I suppose we could use a counter or timestamp to ameliorate that issue...

I'm worried about the folder pane caching this information because that gives us an excellent chance to display stale info.
This problem is a little different than I first thought (or it has morphed with some of the recent db fixes). The performance hit is in calculating the unread count for collapsed parent folders whose children don't have valid db's, or entries in panacea.dat. Just expanding the parent folder is enough to create empty db's for the children and fix this problem.  So this problem will affect less people than I thought, though it is pretty painful when it happens.
Status: NEW → ASSIGNED
Attached patch proposed fix (obsolete) — Splinter Review
the test case was causing us to get into a vicious cycle of creating an empty db and deleting it over and over again, on every paint call.
Attachment #359169 - Flags: superreview?(bugzilla)
Attachment #359169 - Flags: review?(bugzilla)
Whiteboard: jminta, bienvenu to investigate → has patch for review
Attachment #359169 - Flags: superreview?(bugzilla)
Attachment #359169 - Flags: superreview+
Attachment #359169 - Flags: review?(bugzilla)
Attachment #359169 - Flags: review+
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: has patch for review
I backed this out due to test failures:

TEST-UNEXPECTED-FAIL | ../../../mozilla/_tests/xpcshell-simple/test_mailnewsbase/unit/test_bug404489.js | test failed, see log
../../../mozilla/_tests/xpcshell-simple/test_mailnewsbase/unit/test_bug404489.js.log:
>>>>>>>
*** test pending
Directory request for: MailD that we (mailDirService.js) are not handling, leaving it to another handler.
Directory request for: MFCaF that we (mailDirService.js) are not handling, leaving it to another handler.
Directory request for: DefRt that we (mailDirService.js) are not handling, leaving it to another handler.
*** test pending
*** test finished
*** running event loop
Finished search does 1 equal 0?
*** exiting
*** TEST-UNEXPECTED-FAIL | ../../../mozilla/_tests/xpcshell-simple/test_mailnewsbase/unit/test_bug404489.js | 1 == 0
JS frame :: /buildbot/linux-comm-1.9.1-check/build/mozilla/testing/xpcshell/head.js :: do_throw :: line 101
JS frame :: /buildbot/linux-comm-1.9.1-check/build/mozilla/testing/xpcshell/head.js :: do_check_eq :: line 120
JS frame :: ../../../mozilla/_tests/xpcshell-simple/test_mailnewsbase/unit/test_bug404489.js :: anonymous :: line 171
*** FAIL ***

<<<<<<<

command timed out: 1800 seconds without output, killing pid 22645
process killed by signal 9
program finished with exit code -1

I'm guessing from the output that there could also be issues with not exiting the test correctly.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [needs new patch (broke tests)]
Whiteboard: [needs new patch (broke tests)] → [needs new patch (broke tests)][no l10n impact]
A lot of our tests were relying on the db for empty local folders getting created as a side effect of various calls. This makes the backend do that more consistently, by using a common helper routine, OpenDatabase(). I also made sure that we do call UpdateSummaryTotals(PR_TRUE) in that situation, which will force us to refresh the summary totals and call ReadDBFolderInfo(PR_TRUE), to match the promise I made in the original patch. 

All the tests pass with this patch. I tested a few things in the client (creating new local folders, handling out of date or missing .msf files for local folders), and those seem ok as well.
Attachment #359169 - Attachment is obsolete: true
Attachment #360415 - Flags: superreview?(bugzilla)
Attachment #360415 - Flags: review?(bugzilla)
Attachment #360415 - Flags: superreview?(bugzilla)
Attachment #360415 - Flags: superreview+
Attachment #360415 - Flags: review?(bugzilla)
Attachment #360415 - Flags: review+
fix checked in...
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Whiteboard: [needs new patch (broke tests)][no l10n impact] → [no l10n impact]
while testing an import patch, I discovered that the previous patch for this bug broke import of local folders, in the sense that you have to restart and perhaps even rebuild index to see the contents of imported local folders - the issue was that we were caching an invalid db. I've tweaked OpenDatabase() not to do that. Since it's not called from the GetDatabaseWithReparse(), that's OK (that's the one place where we want to allow out of date db's to escape). All tests still pass with this change and this bug is still fixed.
Attachment #360804 - Flags: superreview?(bugzilla)
Attachment #360804 - Flags: review?(bugzilla)
Attachment #360804 - Flags: superreview?(bugzilla)
Attachment #360804 - Flags: superreview+
Attachment #360804 - Flags: review?(bugzilla)
Attachment #360804 - Flags: review+
Attachment #360804 - Attachment description: tweak to previous patch → tweak to previous patch - checked in.
Depends on: 478595
Depends on: 481065
Depends on: 491130
You need to log in before you can comment on or make changes to this bug.