Closed Bug 541917 Opened 14 years ago Closed 12 years ago

TB 3.01 still redownloads entire IMAP folders several times a day

Categories

(Thunderbird :: Folder and Message Lists, defect)

x86
All
defect
Not set
major

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: david, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [may_be_fixed_bug_531033] [probably_partially_fixed_by_Bug 546384])

Linux, x86 and x86_64, using Dovecot and Exchange as the IMAP servers.  It's not the Exchange issue as that is supposedly closed/fixed for 3.01.  It's not the > 2G/4G issue.  TB redownloads entire folders (numerous folders are marked for offline sync) on all four IMAP accounts.  Some are large $folder, some small.  It occurs several times a day and saturates my link to the point I've had to set up packet rate limiting for the TB process.  Having it happen on both my desktop and laptop at the same time can be a hefty load not just locally, but on the server, when we're talking ~30,000 messages.
(In reply to comment #0)
> It's not the > 2G/4G issue.
> TB redownloads entire folders (numerous folders are marked for offline sync) on all four IMAP accounts.  
> Some are large $folder, some small.

Really less than 2GB? Offline-store file size limit on Linux is 2GB. Have you checked real offline-store file size?
Note : bug 532323 is fixed by Tb 3.1pre only.
       Even if folder size is less than the limit, bug 537498 can occur.

> It occurs several times a day and saturates (snip)

If similar issue to bug 540214 occur on IMAP folder of offline-use=on, re-download of all mail can happen, because "xxx.msf" is deleted(rebuild-index is forced).
Does issue like bug 540214 occur in your environment?
Really, less than 2G.  If you combine every single folder )which is maildir on
the linux dovecot servers), then I have ~12G of mail.  However, none of my
~1300 folders individually exceed even 1G.

Bug 540214 appears to deal with offline-use primarily.  I haven't set TB to
offline in a few months so I'll rule that one out.

When it downloads, it starts at #1 and downloads every single message up to
$total.  I can't pin down any particular email that might trigger it.  It just happened again on a periodic cronjob email I get, which for the last few are exactly the same save for slight timestamp changes in the headers.
Largest server side maildir is ~370M.  101880 would be in Kbytes, or ~99M

101880  ./.Lists.Security/cur
198800  ./.archive.2009.08.13/cur
199112  ./.archive.2009.08.18/cur
282696  ./cur
311536  ./.Sent/cur
378612  ./.Storage/cur

client side resembles the server albeit slightly larger and a bit different at times.  146044 would be in Kbytes, or ~142M

146044  ./mail.blue-labs.org/INBOX.sbd/Lists.sbd/Security
177936  ./mail.blue-labs.org/INBOX.sbd/Drafts
198620  ./mail.blue-labs.org/INBOX.sbd/archive.sbd/2009.sbd/08.sbd/13
199228  ./mail.blue-labs.org/INBOX.sbd/archive.sbd/2009.sbd/08.sbd/18
240584  ./mail.blue-labs.org/nstmp-1
254316  ./mail.blue-labs.org/nstmp-2
254360  ./mail.blue-labs.org/nstmp-4
374932  ./mail.blue-labs.org/INBOX.sbd/Storage
492884  ./mail.blue-labs.org/INBOX.sbd/Sent
509916  ./mail.blue-labs.org/INBOX
(In reply to comment #2)
> Bug 540214 appears to deal with offline-use primarily.
> I haven't set TB to offline in a few months so I'll rule that one out.

No. Bug 540214 was initially reported for Gloda=on, offline-use=on case, but currently is being cheked with Gloda=on, offline-use=off case.

> When it downloads, it starts at #1 and downloads every single message up to $total.  
> I can't pin down any particular email that might trigger it.
> It just happened again on a periodic cronjob email I get,
> which for the last few are exactly the same save
> for slight timestamp changes in the headers.

Show "Order Received" column(UID of mail if IMAP).
Different mail data from previous download of whole mail header and mail data, even though all mails has same UID as previous whole download?
It should be. Sorry for mistake.
> currently is being cheked with Gloda=OFF, offline-use=off case.
The more pertinent question is how big are your mail folder offline stores, in the imap server directory of your user profile directory. They'll have the same name as the folder, w/o the .msf extension.

It's not a question of setting TB to offline mode, but rather, we download the imap messages for offline use to speed up online use.
David, out of the ~1300 folders, I sorted, and posted the largest of them above in comment #3.  Most of them resemble the same sizes on server as local.  $INBOX is about 2x as large local as it is on the server (300M vs 600M).

WADA, I'm watching the OR column now.
WADA, just watched another download.  The OR ID#s remained the same, the only difference is three new perodic emails arrived during the time it took to download $inbox.  I did note this time that the mailbox has 1027 (now, 1024 prior) emails in it and it redownloaded 861 messages.

Mind you, the message only says ~ "Downloading INBOX, x of x", it doesn't say which of my four inboxes it's redownloading.  I assume this particular inbox because it becomes mostly inaccessible during the process.  I can read new emails out of the others.  Additionally, two of the 4 have far less than 800 emails in them and the third has 5000+
I just compacted one of my inboxes and it triggered a redownload, 4967 of 4982.
Blu3 - do you know if Thunderbird is successfully downloading the messages for offline use? I.e., once the download is finished, if you put Thunderbird in offline mode, can you read the messages that we just downloaded?
There is an extension that adds a column that tells whether the message is downloaded for offline use - maybe we can get rkent to send you a copy, though you can basically tell if a message is offline by the above steps...
(In reply to comment #2)
> Bug 540214 appears to deal with offline-use primarily.
> I haven't set TB to offline in a few months so I'll rule that one out.

Please note next.
> "offline use" (Folder Property/Syncronization) != "offline mode"(File/Work Offline)

(In reply to comment #9)
> I just compacted one of my inboxes and it triggered a redownload, 4967 of 4982.

folder-click and compact executes explicit folder open. So, if problem of Bug 540214 happens, re-download of all headers by internal rebuild-index occurs upon explicit folder open because .msf is deleted. And, if "offline use=on", re-download of all mail data occurs.

Show Tools/Activity Monitor window, and watch messages too, please.
Yes.  I have carefully unchecked nearly every folder except a few main folders.  By default I had prefs set to auto enable the offline sync, oops on my part - but I didn't expect this bug :>

This may be related to bug 540214 but I have not seen any messages indicating problems with the msf files.

My activity mgr is simple and infrequent.  Far less activity is recorded there than occurs the redownload.

I've had five redownloads in the last ~5 hours.

Activity mgr indicates 308 msgs deleted from Inbox/bluelabs at 4.24pm, 10 from SMVFD/bluelabs at 3.21am, 542 from Inbox/verio at 3.12am, 1 from spam/bluelabs at yesterday, and 132 from Inbox/verio yesterday.

My estimation is ~3x as many redownloads as does something get recorded in the Activity manager.  I'm restarting TB with debugging options.
(In reply to comment #13)
> Activity mgr indicates 308 msgs deleted from Inbox/bluelabs at 4.24pm, 10 from
> SMVFD/bluelabs at 3.21am, 542 from Inbox/verio at 3.12am, 1 from spam/bluelabs
> at yesterday, and 132 from Inbox/verio yesterday.

Even if folder size is 200MB always, if total size of "deleted 308 msg or 542 msg" is 100MB, and if 100MB mails is added after that, offline-store file size becomes 300MB.
If ("delete of 100MB" + "new 100MB mails") occurs many times before compact, offline-store file size can exceed file size limit of offline-store.
This is bug 537498. If bug 537498 happens, compact may fail. Next may be garbage due to compact falure. 
> 240584  ./mail.blue-labs.org/nstmp-1
> 254316  ./mail.blue-labs.org/nstmp-2
> 254360  ./mail.blue-labs.org/nstmp-4

Get Process Monitor and check file I/O of .msf & offline-store for a mail folder.
> http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
> Filter : If Path contains \nstmp, Include.
>          If Path contains \xxx, Include.
>          where xxx == a folder name of "offline-use=on"
Check file I/O of smallest folder or a new small folder first, please.
If no problem on small folder, check file I/O of bluelabs or verio, please.
For point 1, I guarantee the new size of emails, plus the scheduled deletion of emails, plus the emails i'm keeping, is nowhere near the mentioned limits.  I doubt this formula ever exceeds even 500M except once or twice a year.

For point 2, i am pretty sure Process Monitor will not work at all.  This is Linux.

NS_* debug logging isn't showing any errors or any reason for doing redownloads.

The largest nstmp garbage file of all my mailboxes is 249M.

So it looks like either scheduled or triggered mailbox compacting is failing for an unknown reason (not logged), and triggering a reindex/redownload of the offline-use=yes folders.  Thereby leaving a lot of garbage files behind.

There aren't any filesystem errors noted and I have 19G free space on this partition.  The partition /tmp is on has 1.2G free.
(In reply to comment #13)
> Activity mgr indicates 308 msgs deleted from Inbox/bluelabs at 4.24pm, 10 from
> SMVFD/bluelabs at 3.21am, 542 from Inbox/verio at 3.12am, 1 from spam/bluelabs
> at yesterday, and 132 from Inbox/verio yesterday.

What is trigger of the delete?
 - Deleted at server or by other client. Tb knew it by IDLE or new mail check.
 - You manually run filter, then mails are deleted from the folder.
 - You set retension policy, then Tb automatically deleted mails from folder. 

(In reply to comment #15)
> For point 2, i am pretty sure Process Monitor will not work at all. This is Linux.

Oh, sorry. Can you check file I/O for a xxx.msf & xxx by command such as strace?
It looks like either scheduled or triggered mailbox compacting is failing
for an unknown reason (not logged), and triggering a reindex/redownload of the
offline-use=yes folders.  Thereby leaving a lot of garbage files behind.

Scheduled meaning TB detects an opportunity to recoup disk space
Triggered meaning I manually request a compact

Not the best choice of words, perhaps Triggered and Requested, respectively
Can get NSPR log with timestamp,MSGDB:5 using Tb 3.1 nightly build?
> https://wiki.mozilla.org/Thunderbird:Testing:Memory_Usage_Problems
Following is bug summary of a dup of bug 524902.
> Bug 541180 old mail marked as new / unread and sometimes flagged "marked for deletion"
bug 524902 is fixed by Tb 3.1 and Tb 3.0.1, but produced bug 540554.
patch for bug 540554 is next only.
> -  memset(fUids.Elements(), fNumberOfMessageSlotsAllocated * sizeof(nsMsgKey), 0);
> +  memset(fUids.Elements(), 0, fNumberOfMessageSlotsAllocated * sizeof(nsMsgKey));
***
bug 540554 Comment #37 on 2010-01-26;
> fixed for 3.02 - nightly builds from tomorrow on will have the fix.

As both flag status & uid status is affected by above bugs, and as "delete of wrong mail" looks to happen, it may produce re-download of this bug.
Blu3(bug opener), does you problem occur with Tb 3.0.2 nightly?
Keywords: perf
Whiteboard: fixed by 3.0.2?
Whiteboard: fixed by 3.0.2? → fixed by 3.0.2?[needs protocol log]
sorry, i've been overwhelmed elsewhere.  i'll download the 3.1 nightly and test it.  for note; my x86 laptop no longer does this but my x86_64 desktop still does.
(In reply to comment #20)
> but my x86_64 desktop still does.

64bits version of Tb? (i.e. non official build)
If so, problem like bug 529691 can happen on any OS, and phenomenon like bug 532403 may occur.
Blu3 writes 
i may have a machine that I haven't "fixed" yet.  i simply didn't have
time to continue on my laptop & home desktop.  i erased my local cache
copy of the imap store which "fixed" it.  so it's definitely a problem
with some form of local corruption.
i haven't been able to use the other non-fixed machine in a while and i haven't seen this issue pop up otherwise.  i figure it's an entirely different bug that message counts are wildly inaccurate.  57,833 new emails in a folder of only ~7000?  the 57,833 isn't exact nor exaggerated, but it's representative.  i just haven't had time to wipe my local store and refetch everything.

i'm comfortable marking this fixed.
thanks for the update. WFM per comment 23
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Whiteboard: fixed by 3.0.2?[needs protocol log]
(In addition to comment #14)
> Next may be garbage due to compact falure. 
> > 240584  ./mail.blue-labs.org/nstmp-1
> > 254316  ./mail.blue-labs.org/nstmp-2
> > 254360  ./mail.blue-labs.org/nstmp-4

They were probably generated by Bug 546384.
Resolution: WORKSFORME → FIXED
Whiteboard: [fixed_by_Bug 546384]
Resolution: FIXED → WORKSFORME
Whiteboard: [fixed_by_Bug 546384] → [probably_fixed_by_Bug 546384]
blue3 do you still have the issue ?
yep.

i thought i'd discovered a bit more about how to trigger this from bug 546384 a couple days ago.  still does it btw, fix was checked in back in april.

TB refetches the entire folder if the folder is set to synchronize for offline, and you use any compact option. from bug 546384, "essentially a race between the online expunge and the offline compact" however this bug was marked fixed.

i've disabled offline sync on $folder before compacting and i no longer had to wait a long time for my $INBOX folder to redownload fully and usable again. for the record, it also occurs when TB reaches the point where more than X kbytes can be saved, so automatically expunge.

i can live with synching only on my laptop, but regardless of what is attempted for the fix, my 64bit desktop is always doing re-indexing for one reason or another as long as any offline sync is enabled.  i've done fresh builds, fresh profiles, fresh ImapMail directories.
i didn't have time then, to report that it still breaks on the other bug.

on a side note, the activity manager and bottom left status bar often don't reflect this operation and if it does, it usually is too vague to know which $folder the operation is working on until it is done.  i.e. once in a while a messages line flickers in for a split second like: "Indexing NNNN messages" during operation, then, "Indexed NNNN messages in $folder" comes on permanently when done.  just click thru all the folders to see where the cursor changes to a spinny.

on a similar side note, when watching the activity window, TB often gets a notion and is constantly re-indexing messages in "Inbox". i have no idea which server this is Inbox is on, but i suspect it's the 1st of my four as the busy spinner always spins on this mailbox regardless of anything in the activity window.  (only on my 64bit machine). it varies from 1-n, where n is a few, or a couple hundred.  nothing is actually changing in any of my 4 inboxes and doing a compact does nothing.  i can temporarily "fix" it by going offline and back online.  the invisible activity stops, the spinner stops, and compact quickly expunges.  it's as if TB gets stalled somewhere and prevents further mailbox actions.

so, what triggers it: anything that causes an expunge on $folder when $folder is set for offline sync.

yet another side note.  i haven't seen -any- nstmp files during any point in time recently.

bug 555171 appears to be very strongly related.

so in summary:
1) none of the currently marked as fixed bugs, has solved it, however other related issues have been resolved
2) it's unrelated to a 2G or 4G barrier as both local and remote mailboxes are all under 600M.
3) any compact/expunge operation causes it.  the two are currently inseparable in the GUI
4) it happens predominantly on my 64 bit desktop. my 32bit laptop is not currently suffering a refetch - same mailbox setups - but it does always have a spinner for $inbox
5) the only way to "fix" it is to disable offline sync for all mailboxes and sync manually which still downloads the whole mailbox, but is done at a time of your choosing
6) the status bar is usually blank during indexing, but the slider bar next to it is sometimes moving either as a moving box, or percentage bar
7) the activity window is usually not indicating activity, and if it is, information is too vague
8) it doesn't really matter what the imap server is, this does it with both m$ exchange and dovecot
9) it happens on windows (bug 555171) and linux (me)
10) -i- have not noticed actual message corruption for -this- machine before or during when it is doing the massive fetching
(In reply to comment #27)
> however this bug was marked fixed.

Sorry, it was my mistake.
After it, I was warned about my wrong closing code by Wayne, and I changed back to WORSFORME which was set based on your comment #23.
By your comment #23 and by existence of many nstmp-X, I thought that "re-download of all mails" of your comment #0 was mainly produced by issue like bug 546384. Was it wrong? Did you experience problem many times after your comment #23?

> bug 555171 appears to be very strongly related.

Your observation sounds so for me too.

Some questions.
(1) What is IMAP Delete Model?
    I guess EXPUNGE by "Compact" affects on "Mark it as deleted" only,
    because mail data is removed immediately from .msf if other models,
    unless many mails are flagged ^Deleted before Tb knows about the \Deleted
    flag of the many mails and issues EXPUNGE.
(2) Do you see "redownload of all mails" on any folder of offline-use=on,
    upon any "Compact" operation of an IMAP folder?
    Do you see "redownload of all mails" on many IMAP folders upon any "Compact
    Folders" operation?
(3) Do you share an IMAP account's mail box by multiple IMAP clients?

IIRC, Tb issued EXPUNGE then started compaction of offline-store file.  
When "Mark it as deleted" or when Tb doesn't know about \Deleted flag before EXPUNGE, next processes may run concurrently.
(a) Removal of mail data from .msf for UIDs returned in EXPUNGE response.
(b) Compaction of offline-store file.
It may produce similar issues like bugs listed in dependency tree for bug 498274. If "outdated msf condition" is generated by it, internal "Repair Folder" is invoked and re-fetch of all mail data is executed.

Another concern on "Compaction of offline-store file".
Tb 3.1.5 generates "From - <timestamp>" line which is same as one for local mail folder file, upon download of a mail to offline-store by auto-sync.
After "Compact", the unix mbox separator line is changed to "From " only.
It changes length of unix mbox separator line, so offset of mail data from "From " of the separator is also changed. It may produce wrong decision on "downloaded or not" in special situation. If decided as "not downloaded yet or incomplete data", redownload is initiated.
(In reply to comment #27)
> from bug 546384, "essentially a race between the online expunge and the offline compact"
> however this bug was marked fixed.

"This bug" was not your this bug 541917?
bug 546384 is not for "race between the online expunge and the offline compact". That bug resolved problem of "multiple Compact invokation on an IMAP folder" and "simulteneous update of an IMAP folder by auto-sync and Compact". See bug 546384 comment #13, please. Tb 3.1.5 showed "another process is in progress" when Compact was requested while auti-sync was downloading mails. I think it's a result by fix of bug 546384.
As David says in bug 546384 comment #16, and as you know, bug for "race between online expunge and offline compact" is bug 555171.
Following is a part of bug 531033 comment #28 by David :Bienvenu.
> One user sent me his corrupted offline store (separate issue) and I'm pretty
> sure that it was corrupted because an offline store compact happened while we
> were storing messages in the offline store. There are a couple code paths that
> don't grab the folder semaphore before writing to the offline store, and I
> believe that's the problem there.
This kind of corrupted mail data may force redownload of many mails in a folder.

Reopening for ease of analysis and tracking, because "no problem for 6 months" seems a result of "offline-use=off" and because auto-compact seems to occur frequently in your environment.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: [probably_fixed_by_Bug 546384] → [probably_partially_fixed_by_Bug 546384]
David :Bienvenu proposed patch for auto-sync & compact related issue to bug 531033. Setting dependency to bug 531033.
Depends on: 531033
Whiteboard: [probably_partially_fixed_by_Bug 546384] → [may_be_fixed_bug_531033] [probably_partially_fixed_by_Bug 546384]
Oh, yeah, the patch in bug 531033 might fix this...
Blu3, is your problem gone in version 3.1.7?

(In reply to comment #32)
> Oh, yeah, the patch in bug 531033 might fix this...
OS: Linux → All
i would have to say no, it's not fixed.  i enabled offline store today.  everytime i compact right now, it downloads 19 messages.  why 19?  not sure, i have 44 unread emails in $inbox.  i started with 88, read and mark for deletion a few, then compact.  19 downloaded.  read a few more, mark, compact.  19 downloaded.

the bonus is that i removed all emails < 2001-jan-1, so the impact isn't huge.

if i go to folder properties, sync, and click download now, nothing changes.

activity manager doesn't show the 19 being downloaded as a log entry, but it does flicker several times - but too fast for me to see what is happening.  the only word i managed to catch was "indexing"
my amd64 desktop at home is doing this for all my offline mailboxes.
Im also seeing this at a customers site on several computers running XP and TB 3.1.9 with IMAP accounts. Every time TB starts it redownloads the same set of emails to the inbox for offline use. It never says that it redownloads all of the emails in the inbox but its always the same set. IE "downloading 1 of 154" and the total could be 600 or so. I have deleted and recreated inbox file and also deleted .msf files. Inbox grows at every startup on these computers. So problem is still there.
from bug 475257...

"There are a few possibilities (and it's possible different people are experiencing different things):

1. The server is rolling UID validity - this means we have to throw away all our cached information per the IMAP RFC. So we throw away the .msf file, and redownload all the headers over again, and then autosync will eventually redownload the messages bodies. If that happens, you would see us redownloading all the message headers in the log, followed by the bodies. There's some hope that Kent James' work to use backup .msf files might be extended to help in this situation.

2. Folder discovery gets interrupted in such a way that we think all imap folders have been removed from the server, and we remove the .msf files and offline stores for everything but the inbox. We had this problem before if the first url we ran was interrupted (the first url does folder discovery), but it was fixed. In fact, I think the problem was limited to the url we run when the user collapses and expands the server to rediscover folders.

3. We somehow decide that the threading information in the folder is horked and we throw away the .msf file. We try hard not to do this and it doesn't sound like what's going on. Do you often work with your All Mail folder, or do you mainly stay in the INBOX?

Given the intermittent nature of Gary's connection, it sounds like something like #2 is happening, but I've gone over the code and I don't see anything obviously wrong. As a user, you would actually see the folders disappear and then re-appear - have you seen that?"
(In reply to Wayne Mery (:wsmwk) from comment #37)
> from bug 475257...
> 
> "There are a few possibilities (and it's possible different people are
> experiencing different things):
> ...
> Given the intermittent nature of Gary's connection, it sounds like something
> like #2 is happening. But I've gone over the code and I don't see anything
> obviously wrong. As a user, you would actually see the folders disappear and
> then re-appear - have you seen that?"

alexander, blu?
Flags: needinfo?(alexander)
Flags: needinfo?(david)
can be closed WFM, i can't reproduce it at this time
Flags: needinfo?(david)
Flags: needinfo?(alexander)
The customer has been running Thunderbird 10.x ESR release for many months now without problems and now the latest ESR release so I think this has been fixed somewhere along the way between 3.1.9 and 10.x.

This can be closed WFM.
alexander, blu, thanks for the updates
Status: REOPENED → RESOLVED
Closed: 14 years ago12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.