Make "compact folders" more efficient. takes too long, and puts too much load on filte server's disk drive

RESOLVED DUPLICATE of bug 845952

Status

MailNews Core
Backend
RESOLVED DUPLICATE of bug 845952
6 years ago
2 years ago

People

(Reporter: Paul Szabo, Unassigned)

Tracking

({perf})

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

6 years ago
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:12.0) Gecko/20100101 Firefox/12.0
Build ID: 20120423122624

Steps to reproduce:

I find that compacting folders takes a long time, and is very
inefficient: the whole folder is duplicated (without the deleted
messages), then the old file is replaced with the new one.
Whereas typically, old messages are kept in the folder for
archiving and some new messages are deleted: often, a simple
truncate() of the file might almost suffice.



Actual results:

I observe serious issues with compaction efficiency, most
directly on a Linux login server, which becomes unuseably slow
for a full minute, for all users, when one user is compacting
his 1.5GB Inbox. The same issue occurs for Windows users who
have their thunderbird mail folders on a Samba server.
My workaround for now is to advise users to keep their Inbox
small, to keep long-term messages in another "Keep" folder.



Expected results:

Could compaction be made more efficient? Some ideas:
 - Do in-situ. Write directly in the old folder file: starting
   at the first "hole", write any subsequent messages, then
   truncate at the end. Might not be as robust as the current
   duplicating method, might corrupt the file if thunderbird is
   interrupted while compacting; robustness may be improved by
   writing dummy start-of-message markers at the end of each
   message moved/written.
 - Instead of mbox files, use MIX format as UW IMAP:
   http://www.washington.edu/imap/documentation/mixfmt.txt.html
   This would also help with mailbox file size limits.

Thanks, Paul

Paul Szabo   psz@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of Sydney    Australia

Comment 1

6 years ago
is your issue primarily on the server?
Component: General → Backend
Keywords: perf
Product: Thunderbird → MailNews Core
QA Contact: general → backend
Summary: Make "compact folders" more efficient → Make "compact folders" more efficient. takes too long
(Reporter)

Comment 2

6 years ago
(In reply to Wayne Mery (:wsmwk) from comment #1)
> is your issue primarily on the server?

The main issue is that a single thunderbird user, compacting his Inbox,
causes the Linux login server, or the Samba file server, to become slow
and un-responsive for all users. That the thunderbird user himself
observes his compacting take a long time, is less of an issue: at least
he thinks something useful is happening.

Thanks, Paul

Comment 3

6 years ago
ah, so the thunderbird profile for the user is on samba ?
what size of MB are you using for the compact?

bug 558528 would help individual users. Not sure how much it would reduce network load.
Summary: Make "compact folders" more efficient. takes too long → Make "compact folders" more efficient. takes too long, and puts too much load on samba networked drive
(Reporter)

Comment 4

6 years ago
(In reply to Wayne Mery (:wsmwk) from comment #3)
> ah, so the thunderbird profile for the user is on samba ?
> what size of MB are you using for the compact?

I have two kinds of users:
 - Linux users, who log in to a Linux server through Linux terminals
 - Windows users, who have many of their important files, like their
   email profiles, on a Samba server
I observe slowness on both kinds of setups, most directly for the Linux
users with an obvious correlation of cause and effect.

The "problem" users have Inboxes of several hundred MBs; the largest
Inbox we currently have is 1.5GB.

> bug 558528 would help individual users. Not sure how much it would reduce
> network load.

The load I observe is not network but disk I/O congestion. On the Linux
login server, the filesystem is a local RAID array. (Please change
summary.)

Thanks, Paul

Comment 5

6 years ago
(In reply to Paul Szabo from comment #4)
> (In reply to Wayne Mery (:wsmwk) from comment #3)
> ah, so the thunderbird
> profile for the user is on samba ?
> what size of MB are you using for the
> compact?

> I have two kinds of users:
> - Linux users, who log in to a Linux
> server through Linux terminals
 - Windows users, who have many of their
> important files, like their
   email profiles, on a Samba server
> I observe
> slowness on both kinds of setups, most directly for the Linux
users with an
> obvious correlation of cause and effect.

> The "problem" users have Inboxes of several hundred MBs; the largest
Inbox we currently have is 1.5GB.

what I was mean, is have you changed the default value in (windows) Tools | options | advanced | network and disk | disk space | Compact ... from 20 MB ?

Users with big inbox or profile and/or very big mails should probably raise the size to 100MB or more.

are any of these users using gmail?


> bug
> 558528 would help individual users. Not sure how much it would reduce
>
> network load.

> The load I observe is not network but disk I/O congestion. 
understood

> On the Linux login server, the filesystem is a local RAID array. (Please change
> summary.)
done
Summary: Make "compact folders" more efficient. takes too long, and puts too much load on samba networked drive → Make "compact folders" more efficient. takes too long, and puts too much load on networked drive
(Reporter)

Comment 6

6 years ago
(In reply to Wayne Mery (:wsmwk) from comment #5)
> what I was mean, is have you changed the default value in (windows) Tools |
> options | advanced | network and disk | disk space | Compact ... from 20 MB ?
> Users with big inbox or profile and/or very big mails should probably raise
> the size to 100MB or more.

I suppose some may have changed it. Regardless: I observe slowness each
time compaction takes place; the setting you mention may control how
often that triggers. Some of my users may receive 100MB per day, so
compaction happens daily, anyway.
Noting also that many of my users use Linux, not Windows.

> are any of these users using gmail?

No: gmail users use the web interface, not thunderbird.

>> On the Linux login server, the filesystem is a local RAID array.
>> (Please change summary.)
> done

Thanks: though it still mentions "networked drive".
(Reporter)

Updated

6 years ago
Summary: Make "compact folders" more efficient. takes too long, and puts too much load on networked drive → Make "compact folders" more efficient. takes too long, and puts too much load on disk drive
(Reporter)

Comment 7

6 years ago
(In reply to Paul Szabo from comment #0)
> Could compaction be made more efficient? Some ideas:
>  - Do in-situ. Write directly in the old folder file: starting
>    at the first "hole", write any subsequent messages, then
>    truncate at the end. Might not be as robust as the current
>    duplicating method, might corrupt the file if thunderbird is
>    interrupted while compacting; robustness may be improved by
>    writing dummy start-of-message markers at the end of each
>    message moved/written.
>  - Instead of mbox files, use MIX format as UW IMAP:
>    http://www.washington.edu/imap/documentation/mixfmt.txt.html
>    This would also help with mailbox file size limits.

Further to those "ideas" for a fix, a partial improvement which
may be easy to implement:
 - When deleting a message, if that is the last message, then
   truncate the folder file (and the index file) at the end of
   the last-remaining message. This would be quick and easy,
   and may help to trigger compaction of the folder less often.

Comment 8

6 years ago
(In reply to Paul Szabo from comment #7)
> (In reply to Paul Szabo from comment #0)
> > Could compaction be made more efficient? Some ideas:
> >  - Do in-situ. Write directly in the old folder file: starting
> >    at the first "hole", write any subsequent messages, then
> >    truncate at the end. Might not be as robust as the current
> >    duplicating method, might corrupt the file if thunderbird is
> >    interrupted while compacting; robustness may be improved by
> >    writing dummy start-of-message markers at the end of each
> >    message moved/written.
> >  - Instead of mbox files, use MIX format as UW IMAP:
> >    http://www.washington.edu/imap/documentation/mixfmt.txt.html
> >    This would also help with mailbox file size limits.
> 
> Further to those "ideas" for a fix, a partial improvement which
> may be easy to implement:
>  - When deleting a message, if that is the last message, then
>    truncate the folder file (and the index file) at the end of
>    the last-remaining message. This would be quick and easy,
>    and may help to trigger compaction of the folder less often.

We do that for move/delete message filters, but not delete of a message through the UI.

Comment 9

6 years ago
Bienvenu, 
Is there one or more good article, code comment or bug comment that talk about the efficiency/deficiency/limitations/tradeoffs of implementing compact?  

For (one) example I seem to remember some discussion in the last couple years. Maybe it was in the bugs related to making compact automatic.

Comment 10

6 years ago
(In reply to Wayne Mery (:wsmwk) from comment #9)
> Bienvenu, 
> Is there one or more good article, code comment or bug comment that talk
> about the efficiency/deficiency/limitations/tradeoffs of implementing
> compact?  
Probably, but I can't think of any that advance the discussion. Compacting berkeley mailbox is expensive by nature and moving to a different storage format is really the way out. Truncating folders when the physically last message is deleted would help for some use cases but it gets complicated when users read and delete messages from older to newer, if that makes sense.

Comment 11

6 years ago
One idea I haven't seen in the bugs  is the idea of not compacting a folder if the benefits are nominal vs the cost. For example, don't compact a 1GB folder if it will save only 1MB or 10MB.  (bug 711765 is an idea, but from a different angle)

Comment 12

6 years ago
yes, I've outlined a strategy where we compact the folders where we get the biggest bang for our buck first, before, where the ratio of space to be reclaimed to space still used is highest.

Comment 13

6 years ago
(In reply to David :Bienvenu from comment #12)
> yes, I've outlined a strategy where we compact the folders where we get the
> biggest bang for our buck first, before, where the ratio of space to be
> reclaimed to space still used is highest.

Is Bug 711765 - Percentage based automatic Compact - suitable?
(Reporter)

Comment 14

6 years ago
(In reply to Wayne Mery (:wsmwk) from comment #13)
> Is Bug 711765 - Percentage based automatic Compact - suitable?

I do not think it is.

We might argue for better strategies on deciding when or which
folders to compact. But the issue here is that, when we do a
compaction, it is slow and "expensive".

I agree with comment#10, the use of mbox files makes compaction
expensive (maybe even with trickery like comment#8, which should
be done anyway because is "right" and "neat").

We should aim to make compaction faster and more efficient,
maybe by making message deletion fast enough to do each time,
sidestepping the issue of compaction completely.

Comment 15

2 years ago
To summarize...

* Short term, your option today is to increase the compact threshold to 100MB, 200MB, etc, so that compacts are less frequent.
* Your stated ultimate goal is bug 845952, which elimiates compact
* bug 558528, but 1242042 and friends. It's anyone's guess whether these or bug 845952 will happen first. 

But your stated goal is to eliminate compact, so let's dup this to bug 845952. You're welcome of course to help with or monitor the progress of the other bug reports which might aid your cause.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → DUPLICATE
Summary: Make "compact folders" more efficient. takes too long, and puts too much load on disk drive → Make "compact folders" more efficient. takes too long, and puts too much load on filte server's disk drive
Duplicate of bug: 845952
You need to log in before you can comment on or make changes to this bug.