Closed Bug 172786 Opened 22 years ago Closed 21 years ago

Building mail summary file takes forever

Categories

(MailNews Core :: Database, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rob, Assigned: Bienvenu)

References

Details

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826

Sometimes, when I have to shut Mozilla down and restart, it can take 5-10
minutes to build the summary file for my Inbox. 

There's more examples of users complaining about this here:
http://groups.google.com/groups?q=mozilla+building+summary+file&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=353D2505.C8304A20%40netscape.com&rnum=3
http://www.geocrawler.com/archives/3/116/1999/2/0/592662/

Reproducible: Sometimes

Steps to Reproduce:
1.Open mail window
2.Try to get mail or click on inbox (other folders don't seem to have this
problem - I have only 784 messages in my inbox)


Actual Results:  
3.Get "Building mail summary file..." for 5-10 minutes and hard drive crunches
away... it's naptime

Expected Results:  
Summary file should already be there and be built. Even if it's not or is
corrupted, it should take that long to build. There have been times where I see
each message come into the inbox and it takes about 1 second per message. That
is awful!

 I have a 1.4GHz Athlon and reasonably fast hard drive.
This message are VERY old. (1998 and 1999).
Mozilla doesn't rebuild the .msf files unless they are corrupted or deleted.
I don't have this problem with my bugzilla mail folder with 11250 messages.
(Athlon 1.3Ghz and a fast HDD).

A complete rebuild needs of course it's time but you should not get this unless
something special happens (i never got this).

Have you ever compacted the folders (a deleted message is only marked as deleted
and will only physical deleted if you compact the mail folder) ?
How big is your mail file in your profile for that folder ?
Just checked main "mail" folder sizes and it's 443MB, which is pretty big. Did
an "empty trash" and it's down to 154MB, which is more reasonable and expected.

My in-box folder is 128MB, where 95MB of that is a set of e-mails with 1MB
attachments that I need to hang onto. I don't think there's anything too wild
about that.

I'm not sure if emptying the trash will fix the odd "building file..." or not.
One point I was hoping to make with the old '99 and '98 messages is that this is
not a new issue, and it can still happen, and still causes problems. I will be
watching for the situation to occur again and add more detail as I encounter it.
I don't restart Mozilla that often - once every many days, usually after it
crashes. It could be that it's corrupting the in-box file when it crashes. I
don't know.
It could be the problem that you have all this big files in one folder.
Mozilla will only close the folder file if you don't use it.
If you look in the inbox the whole time and mozilla crashes it's possible that
mozilla must rebuild the .msf file.
If the .msf file is open while Mozilla crashes it must rebuild the .msf file or
you could get corrupted mail folders.
Mozilla must read the complete mail file to generate the msf and with 450MB it
is of course not very fast.

but i don't more details but bienvenu should know more.
QA Contact: gayatri → esther
yes, we have to read through the whole file to regenerate the summary file, so
if it's huge, we're going to spend a lot of time in just diskio.

we shouldn't have to regenerate summary files very often. Do you have your inbox
on a file server, or on your local machine? If the former, that would explain
both  why your .msf files are getting out of date, and why it's so slow to
regenerate them.

there's also a bug where we update the message display while regnerating the
.msf files, which we should fix, but I doubt that's what's taking so long.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Local machine.

This would fix the problem:
Store attachments outside of the "Inbox" file (or folder file) so that
rebuilding the .msf file doesn't require scanning through MBs and MBs of
attachment data.

It is entirely useless to scan 50 or 100MB worth of attachments when you are
just trying to get the header information for 700 messages. That's the only
reason I can imagine for such slow performance during rebuilding. Instead, make
1 file (or a separate directory) for attachments and one for actual messages, so
my 100MB of attachments don't slow down the rebuilding of this inbox.msf file
that does get corrupted from time to time.
we use the berkeley mailbox format for compatibility with other mail readers -
it's a standard mailbox format, but it requires us to store the attachments
inline. 5-10 minutes is still an incredibly long time to read through a 128 MB
file - it takes a fraction of that time on my machine (1.6 GHz with a reasonably
fast hard drive). So I suspect something else is contributing to the slowness,
besides the size of the file. Things like a virus checker, or horrible disk
fragmentation.
I don't have a virus checker. Could be some fragmentation. 

OK, clear on the format standard. I give up for now.
you might defrag. I'll do some tests here on a file that size when I get a
chance for comparisons sake.
I just upgrade to Mozilla 1.2 from Netscape 4.8. Building email summary takes
AGES now, compared to old Netscape.

Since mail folders are the very same, and bout programs (NS and Mozilla) read
the entire folders in order to rebuild summary, i'd expect this operation takes
about the same time. It doesn't seem the case, nevertheless.

I'd say about 1:2 performance hit for Mozilla.
This has happened to me several more times since I posted this boog. I've
defragged the hard drive and doesn't help much. What appears to happen is:

1. Mozilla / Mozilla Mail crashes
2. Getting back into Mozilla after the crash, the mail file index is rebuilt -
which takes several minutes for probably a few hundred e-mails. I'm talking full
hard drive running crazy while I sit and watch the messages enter the index 1 by
painful 1. Each only takes a second or less, but I say again that this is way
too slow. It needs to be tuned up.
Rob and Jesus, have you tried 1.3a? It should be quite a bit faster than 1.2 for
rebuilding the summary files.
I have not yet. I will get that installed soon and report findings.
MS Windows 98.

Mailbox: 12.5 MB

Rebuild mailbox database:

- Netscape 4.8: 15 seconds.

- Mozilla Build 2002122208: 45 seconds

:-((
*** Bug 186902 has been marked as a duplicate of this bug. ***
100% reproducible on Linux, build 2003012322

Environment:
OS: Mandrake Linux 9.0
Hardware:
  Pentium III 666 MHz,
  512 MB RAM,
  hard disk storing the Mozilla profile: SAMSUNG SV2042H, ATA DISK drive (20411
MB in size), UDMA 100

A folder in Local Folders has 38M in size. Its name is 
"NMAIL Message Notifications". For your testing you can download a gzipped
archive (3.8M) here:
<http://olo.office.altkom.com.pl/domowa/qa/mozilla/2003_01_24_Large_Mail_Folder_loading/>

Mozilla's behaviour:
When I enter "Local Folders/Admin/NMAIL Message Notifications", Mozilla starts
to build the summary file. Mozilla consumes all available CPU time. After a long
time (~20 minutes) progress bar eventually stops moving. After some more time
Mozilla hangs completely and even doesn't update its window when I switch from
other apps.

The summary file has a size of 0 bytes all the time.
BTW, I have a message filter rule that moves messages to this folder, and new
messages keep coming very frequently - about 5 msgs per 1 minute during peak
hours - those are admin notifications from a 500 user mail server.
*** Bug 182907 has been marked as a duplicate of this bug. ***
BTW I've placed more testcases at the location from comment #15, all are
compressed archives containing versions of the big folder I have trouble with.

The latest ones will be compressed with bzip2 instead of gzip - now they only
take 1.9MB.
ok i get this problem too, on both win98 se and linux, but its a tad slower to
build on linux (both have mozilla 1.3 release), i should mention that i
symlinked the 'Mail' directory from windows to linux so i use the same folders
for both... i dont know if that could be a problem, because it seems to build it
everytime i reboot computer.. maybe every time i switch OS it thinks the summary
file is not up to date ?!?!

btw, i have about 300 messages, and it takes about a minute to build summary
file for inbox, size of Inbox file: 26309K

how do i prevent it from rebuilding every time?

Nehal
yes, the symlink is what's causing the problem. The timestamps stored in the
.msf files aren't agreeing with the timestamp of the file on the disk,
presumably becausee the two os's aren't in sync as far as the times are concerned.
so what would be the fix? (other than not symlinking, i would like to use common
folders)
I would like to lend my support to this bug report.  I get "Rebuilding summary
file" at least a few times a week (probably because Mozilla or Windows hangs
just as often).  It takes eons to complete.  When it's done, there is no visible
difference, but OK, maybe it's repairing something.  Surely, there must be a way
to accelerate this.  Maybe the answer is not to let the mail file get so big so
quickly in the first place (mine is well under 100MB, and it still takes several
minutes on a 1999 PC).  Perhaps the answer is to index the mail file and only
rebuild those parts of it which require rebuilding.  Please, someone optimize
this.  Thank you.



I am sitting here right now waiting. I have a "backups" folder off of my inbox
that has serveral hundred e-mails with attached files. The attached files are
1-2MB apiece. For some reason, it is taking 1 second PER e-mail to rebuild the
index. This is ridiculous. 

For some reason, Mozilla e-mail is so incredibly unstable that I can't run my
e-mail for more than a few days without needing to rebuild all of my indexes.
This is a complete mess.
In case it wasn't clear, it seems like the time it takes to index somehow
depends on the size of the attachment, which seems wrong to me.
parsing a mail folder/mail message is dependent on the size of the message - we
have to read through the message to find the beginning of the next message. 1MB
a second is still slow, however.

My summary files are never invalidated. There seems to be something about some
people's configurations or useage models that invalidate summary files. If I had
reproducible steps on my machine that caused this bug, I'd have a much better
chance to fix it.
can someone tell me how it decides whether it should rebuild or not, 
so i can fix for my problem (comment #19), does it compare dates 
of files or what?

thankyou, Nehal
Fixing bug 58308 is likely to provide a satisfiable fix to this bug.

Adding bug 58308 to deps.
Depends on: 58308
I think the problem is not frequent rebuilds but the time spent doing them.
Rebuilding a mailbox under old Mozila 4.8 is THREE times faster. The very same
mailbox! :-(. See comment #13

This issue and general inestability in email/news component keeps me using NS
4.8 for email. Mozilla is superbe as a browser, nevertheless.

And no, Maildir is not an option for me. I'd like to see Mozilla reindexing a
mailbox, at least, as fast as NS 4.8 :-). It's clearly doable, since NS 4.8 is
already doing it :-).

The limiting factor should be HD bandwidth, not CPU or memory. Currently HD
transfer rates of 20 MB/s are "normal".
mozilla decides to rebuild the summary file based on two criteria - the last
modified time, and the size of the mailbox file. When we make a change to the
mailbox file, we sync the file and then get the last changed time and the file
size, and write it to the .msf file, and save the .msf file. The next time we
open the .msf file, we compare the size and last modified time of the mailbox
file with what we have in the .msf file. If either doesn't match, we assume an
external agent has changed the mailbox file, and thus our summary file is
invalid. Outside of Mozilla crashing while writing out the .msf file, this
should be relatively robust. 

There are a few possibilities for the .msf file getting out of sync:

1. Some other program (e.g., another e-mail program, a virus checker, etc.) is
changing the last modified date or file size of the mailbox.
2. Mozilla is changing the file but not writing out the new last modified date
to the .msf file for some operation.
3. The OS is reporting one last modified date when we ask for it after making a
change, and then reporting a different last modified date later when we open the
file again.

Daylight savings time seems to cause this on some OS's. Having your mailboxes on
a networked drive can also cause this problem.
I have a patch that helps somewhat, which I'll attach. But I believe the real
reason we're slower than 4.x is that nsInputStreamPump::OnStateTransfer() limits
itself to 16K when sending OnDataAvailable - if I comment out this check, I get
speeds similar to 4.x. I'll investigate more, and talk to someone (Darin?) about
this:

            // XXX need to make max ODA size configurable
            if (avail > 16384)
                avail = 16384;
Status: NEW → ASSIGNED
Attached patch partial fixSplinter Review
this patch makes it so we delay displaying the thread pane when reparsing a
local folder (so we don't waste time updating the thread pane), and I removed a
status update that wasn't correct because contentlength was always 0 and slowed
us down as well.
this patch removes the 16K limit on ODA data - this speeds up reparsing quite a
bit. I'm going to run with it a bit to make sure it doesn't break anything.
the reason Aleksander's mail folder takes so long to parse is that it's
basically one giant thread, if we thread by subject, and our code that adds
messages to threads breaks down performance-wise when the threads are huge.

If you turn off threading by subject without_re, it should be a lot faster. I'll
try to figure out a way to improve this code. 
*** Bug 196607 has been marked as a duplicate of this bug. ***
Darin, I haven't had any problems with the patch that removes the 16K throttle -
does it look OK to you? Are there any tests you want me to try? I'd like to
consider checking this in for 1.4, or does that seem crazy?
1 folder 68 MB (16916 messages, 500 has small text attachements) all same two
subjects:
Ask A Question and Re: Ask A Question
Just imported from Outlook Express
Summary took 100% CPU for 4 hours
XP pro Pentium III 600MHz 7200rpm new harddisk never used
1.4a Build 2003040105
I have a sent folder of 10000 messages I want to do the same to. Should I wait
so it can be done using a psossible patch? Or will deleting the msf file be
enought for testing?
Michael, see comment 33 - http://bugzilla.mozilla.org/show_bug.cgi?id=172786#c33
- I have a patch that will allow you turn off threading by subject without re:,
with a hidden pref, that I will check in when the tree opens again for general
development.
Comment on attachment 121077 [details] [diff] [review]
don't limit ODA to 16K

r=darin (without this check, max ODA is 64k... and i think that should be ok
and probably good for most stream listener implementations)
Attachment #121077 - Flags: review+
Comment on attachment 121077 [details] [diff] [review]
don't limit ODA to 16K

sr/a=sspitzer
Attachment #121077 - Flags: superreview+
Attachment #121077 - Flags: approval1.4b+
fix checked in for 1.4 final - should speed up folder parsing substantially.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Tp on linux and mac improved quite a bit!  :-)

btek (linux)
----------------------
Tp before: 1078
Tp after:  1017   ( -61ms, or 5.67% improvement )

silverstone(mac osx)
----------------------
Tp before:  593
Tp after:   539   ( -54ms, or 9.1% improvement )

monkey (mac osx)
----------------------
Tp before:  664
Tp after:   584   ( -80ms, or 12.05% improvement )

fuego (linux)
----------------------
Tp before: 2655
Tp after:  2596   ( -59ms, or 2.22% improvement )


creature (win2k)
----------------------
Tp before:  270
Tp after:   269 and 268  ( no difference )
great!
So, what happens if we make ODA bigger than 64K? Would Tp go down even more?
Quite possibly - Darin talked about increasing the number of 16K segments. The
issue is that we'd need to make sure all ODA handlers can deal with getting more
data than 64K at once. I think we should try it. I suspect it would help with
local file operations the most, since those operations are most likely to have
more than 64K of data at a time.
roc, bienvenu: the issue is that a lot of ODA impls will take the count
parameter and pass that straight to malloc (either directly or indirectly). 
gzip stream converter is a good example.  of course, such impls could be revised :-/

anyhow, the parameters of interest are stored in
netwerk/base/src/nsNetSegmentUtils.h

the default values are:

#define NET_DEFAULT_SEGMENT_SIZE  4096
#define NET_DEFAULT_SEGMENT_COUNT 16

also, nsIOService owns a cache of these segments.  the default number of cached
segments is:

#define NS_NECKO_BUFFER_CACHE_COUNT 24

with a 15 minute expiration.

if someone has spare cycles, it'd be worthwhile to play around with different
configurations of these settings.

fwiw: in the past i tried upping the max buffer size, but didn't see much
improvement beyond 32k.  i also have a bug somewhere about making these buffer
sizes more configurable (per channel, transport, stream pump, etc.).
Tp on Windows (creature) has not improved but it has on Linux (btek). 
Why ? Do btek and creature have the same internet connection speed ? 
No, I think Windows is just better behaved in this respect. Any time you have a
lot of thread switches/interactions like this, Mac and Linux seem more adversely
affected. At least, that's one possibility. An other possibility is that Windows
Mozilla didn't run into the 16K limit as often for some reason; perhaps it
processed the data quickly enough so that the data didn't backup - I don't know.
beast is pretty damn fast, so yeah... unless we have some windows numbers for a
slower box, i wouldn't discount the possibility that this also helped windows
(in general).
Bienvenu, concerning your comment #33: this issue hasn't been fixed yet, right?
Is there a follow-up bug? I have a testcase at hand (the one from comment #15)
and am willing to test a fix when available.

The problem is still visible - when I visit the mentioned folder in threaded
mode, Mozilla consumes all CPU for several seconds. It's on a Pentium III
666MHz, with 512 MB RAM and the folder only contains 318 messages, but all of
them have the same subject.
slow threading when all messages have the same subject is a different problem
that is not fixed. I think there's a separate bug filed for it, but if not, I'll
file a new one and note it here.
David, bug 159660 seems to cover this.
Adam, no, that's a different issue, I think. That has to do with listing the
contents of a thread and displaying them. The slowness threading messages with
the same subject has to do with the algorithm used to add new messages to a
thread when we're threading by subject.
David, I've opened bug 226730 for the threading problem.
I don't know if this info will be useful (I have a stripped binary), but here
are the backtraces from next-stepping through the running mozilla-bin process
(the one that consumes all the CPU):


0x4285f900 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so
(gdb) bt
#0  0x4285f900 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so
#1  0x40a8bf96 in PL_DHashTableRawRemove () from /usr/lib/mozilla-1.4/libxpcom.so
(gdb) next
Single stepping until exit from function NSGetModule, 
which has no line number information.
0x40a8bf96 in PL_DHashTableRawRemove () from /usr/lib/mozilla-1.4/libxpcom.so
(gdb) bt
#0  0x40a8bf96 in PL_DHashTableRawRemove () from /usr/lib/mozilla-1.4/libxpcom.so
#1  0x09cebd28 in ?? ()
(gdb) next
Single stepping until exit from function PL_DHashTableRawRemove, 
which has no line number information.
0x40a8bf0a in PL_DHashTableOperate () from /usr/lib/mozilla-1.4/libxpcom.so
(gdb) bt
#0  0x40a8bf0a in PL_DHashTableOperate () from /usr/lib/mozilla-1.4/libxpcom.so
#1  0x09cebd28 in ?? ()
(gdb) next
Single stepping until exit from function PL_DHashTableOperate, 
which has no line number information.
0x4285faf5 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so
(gdb) bt
#0  0x4285faf5 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so
#1  0x09cebd28 in ?? ()
(gdb) next
Single stepping until exit from function NSGetModule, 
which has no line number information.
0x40a8c063 in PL_DHashTableEnumerate () from /usr/lib/mozilla-1.4/libxpcom.so
(gdb) bt
#0  0x40a8c063 in PL_DHashTableEnumerate () from /usr/lib/mozilla-1.4/libxpcom.so
#1  0x09cbb498 in ?? ()
(gdb) next
Single stepping until exit from function PL_DHashTableEnumerate, 
which has no line number information.
0x4285f550 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so
(gdb) bt
#0  0x4285f550 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so
#1  0x40a8c063 in PL_DHashTableEnumerate () from /usr/lib/mozilla-1.4/libxpcom.so
(gdb) next
Single stepping until exit from function NSGetModule, 
which has no line number information.
0x424d9de0 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmork.so
(gdb) bt
#0  0x424d9de0 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmork.so
#1  0x4286e64a in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so
thanks, I know exactly what's going on, but it's hard to fix - turning off
threading by subject will speed it up for you.

user_pref("mail.thread_without_re", false);
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: