Last Comment Bug 517466 - Thunderbird keeps downloading 100+ messages on startup it has *already* downloaded (Gmail IMAP) - gmail line endings messing up our line length size calculation
: Thunderbird keeps downloading 100+ messages on startup it has *already* downl...
[no l10n impact]
Product: MailNews Core
Classification: Components
Component: Backend (show other bugs)
: unspecified
: x86 Linux
: P1 normal (vote)
: Thunderbird 3.0rc1
Assigned To: Nobody; OK to take it and work on it
: 514386 518789 520403 524656 (view as bug list)
Depends on:
Blocks: tb-gmailWIP
  Show dependency treegraph
Reported: 2009-09-18 06:50 PDT by Jonathan Protzenko [:protz]
Modified: 2015-10-08 12:00 PDT (History)
20 users (show)
mozilla: blocking‑thunderbird3+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---

output from nspr_imap_log_parser (6.40 KB, text/plain)
2009-09-19 00:00 PDT, Andrew Sutherland [:asuth]
no flags Details
proposed fix (1.04 KB, patch)
2009-09-22 15:15 PDT, David :Bienvenu
standard8: review+
standard8: superreview+
Details | Diff | Splinter Review

Description Jonathan Protzenko [:protz] 2009-09-18 06:50:51 PDT
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20090824 Firefox/3.5.3
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20090915 Thunderbird/3.0b4

Messages after a certain date are downloaded every time I launch thunderbird. That is to say, there are right now ~107 messages that will be downloaded if I close TB an re-open it, although I'm synced right now. This is on GMail IMAP. More information:
- might be related to bug 514386 (I also experience the "download N of M" with N > M bug
- did *not* happen with b3
- If a new mail arrives, at next TB startup, it will be 108 messages being downloaded
- I keep getting gloda errors :
2009-09-18 15:41:43	gloda.fundattr	ERROR	Message with subject 'Votre Espace Client : Confirmation de création' somehow lacks a valid author.  Bailing.

2009-09-18 15:41:43	gloda.indexer	WARN	Problem during [job:folder delta:0 id:15 items:0 offset:1 goal:2], trying to recover.  Problem was at undefined:undefined: [object StopIteration]

2009-09-18 15:41:44	gloda.fundattr	ERROR	Message with subject 'Inscription à ( )' somehow lacks a valid author.  Bailing.

2009-09-18 15:41:44	gloda.indexer	WARN	Problem during [job:folder delta:0 id:15 items:0 offset:2 goal:2], trying to recover.  Problem was at undefined:undefined: [object StopIteration]

These (exact) errors happen at every startup. The activity manager says it is bringing the "inbox" folder up-to-date and then pretends the inbox IS up-to-date.

Reproducible: Always

Steps to Reproduce:
Hard to say.

1. Create a Gmail account using the wizard.
2. Play with it for a while.
3. Notice some messages keep being downloaded.
Comment 1 Jonathan Protzenko [:protz] 2009-09-18 06:53:43 PDT
Actually, this does not happen only on startup. Right after receiving the Bugzilla mail due to creating this bug, I can see now in the status bar "Downloading 1 of 108 in Inbox"... By the time I sent this bug, the activity manager said the synchronisation was complete...
Comment 2 Ludovic Hirlimann [:Usul] 2009-09-18 07:07:08 PDT
(In reply to comment #0)
> 2009-09-18 15:41:43    gloda.fundattr    ERROR    Message with subject 'Votre
> Espace Client : Confirmation de création' somehow lacks a valid author. 
> Bailing.

That's an email gloda can't parse properly see bug 517170
> Reproducible: Always
> Steps to Reproduce:
> Hard to say.
> 1. Create a Gmail account using the wizard.
> 2. Play with it for a while.
> 3. Notice some messages keep being downloaded.

Can you provide us with one of those messages ?

can you log imap as explained at to make sure TB is really downloading the messages.
Comment 3 Jonathan Protzenko [:protz] 2009-09-18 07:44:05 PDT
The log file is a bit heavy so I stored it outside bugzilla:

Here is what it looks like. There are thousands of lines like this at the beginning:

-1421870192[aef8f8a0]: * 5520 FETCH (UID 6229 FLAGS (\Seen))^M

They go from 1 to the number of emails in my All Mail folder. Then TB starts actually downloading the messages. The first messages that appear in the log are those I just received from Bugzilla because of these bug reports. 

-1421870192[aef8f8a0]: To:^M
-1421870192[aef8f8a0]: ReadNextLine [stream=aee557a8 nb=78 needmore=0]
-1421870192[aef8f8a0]: Subject: [Bug 517466] Thunderbird keeps
-1421870192[aef8f8a0]: ReadNextLine [stream=aee557a8 nb=43 needmore=0]
-1421870192[aef8f8a0]:  it has *already* downloaded (Gmail IMAP

This mail, for instance, was already downloaded before. Then more emails from bugzilla arrive... and many mails I received previously. Then a lot of Base64 encoded data (it seems) arrives on and on for hundreds of thousands of lines (I guess that's the attachments)...

It is a bit hard to determine the order in which the messages are downloaded. But I am positive that the first messages that appear in the log are those that arrived last.
Comment 4 Andrew Sutherland [:asuth] 2009-09-19 00:00:05 PDT
Created attachment 401608 [details]
output from nspr_imap_log_parser

Thank you very much for providing the log file.

Because reading the log file is hard, I wrote a python script that tries to do the legwork for us.  I am attaching its output, and the script can be found here:

At first glance, it would appear that the gloda streaming is somehow causing the multiple downloads to occur.  The one exception in the file is UID 1682, but I think my log file analysis just got confused by the URL starting to be processed but then getting aborted due to the connection dying.

I did read a fair bit of the autosync code before turning to python.  Given that it checks if the message is already offline immediately before acquiring the folder semaphore and kicking off the URL, the only way it could double-request a message is if it were possible for autosync to create an offline URL, then run its logic again while the messages are still not offline.  (It has no way to know about what URLs are in flight, but I presume the use of the semaphore helps guarantee that.)

Reading the processed output of the script:
- The first number on a line with a colon is always the line number for whatever it is we are talking about.
- The hex number is the thread we think was active on that line.
Comment 5 Andrew Sutherland [:asuth] 2009-09-19 00:05:29 PDT
Myk, as you seem to have a very reproducible problem along these lines, could you run with IMAP logging enabled (as per the link way above) and try out the python script referenced above?

Also, confirming based on the fact that the IMAP log is showing duplicate downloads AFAICT.
Comment 6 WADA 2009-09-19 00:51:16 PDT
Repeated FETCH for same mail itself can easily be observed by test of Bug 516211(test mail of Bug 501253 is used. not crafted mail data. real mail data).
If the mail is viewed before whole mail download to offline-store by auto-sync, multiple FETCH for same mail is seen by repeated change of "Display Attachments Inline" from enabled to disabled or vice versa. And, after mail data is written in offline-store by auto-sync, mail data is removed from offline-store by "Compact".

Is this bug's problem depends on mail structure?
Comment 7 David :Bienvenu 2009-09-19 06:42:24 PDT
bug 516211 has to do with the downloading of a message for display, not autosync.

I suspect this bug has to do with line endings messing up our size calculation, and deciding we hadn't got the right number of bytes in the offline store, and thus marking the message as not offline. gmail doesn't always get the line endings right (lines must be CRLF terminated), so I suspect I need to add some sort of fuzz factor, or keep track of the number of lines incorrectly terminated...
Comment 8 David :Bienvenu 2009-09-21 13:20:41 PDT
Comment 9 David :Bienvenu 2009-09-21 15:25:48 PDT
Myk, if you have a reproducible case of this, running a debug build with imap protocol logging on would be very helpful - if you see a bunch of assertions from the imap parser about incorrectly terminated lines in the log, that would validate my hypothesis about line endings being the issue.
Comment 10 Myk Melez [:myk] [@mykmelez] 2009-09-21 16:05:52 PDT
I have a reproducible case of the local copies of IMAP folders growing in size by large amounts daily until I run out of disk space or Gmail disables my IMAP access for a day due to overuse.  It may well be this bug.

So today I ran Thunderbird with IMAP logging, saving the results to a log file, until Gmail disabled my access again.  The log file is 5.2GB in size, however, making it difficult to simply browse it for interesting messages.  Are there particular strings you'd like me to grep for in the file?
Comment 11 David :Bienvenu 2009-09-21 16:10:48 PDT
Was it a debug build? If so, you'd see "need exacly one CRLF, which must be at end of line" in the log if gmail was returning non CRLF terminated lines. Sadly, "exacly" is exactly how it's spelled in the assertion.
Comment 12 Myk Melez [:myk] [@mykmelez] 2009-09-21 17:10:45 PDT
myk@myk:~$ nice grep 'need exacly one CRLF, which must be at end of line' imap-2009-09-21.log | wc
      0       0       0

But it's not a debug build, so I guess that's why I don't see that text.

I just realized there's Andrew's script to run.  I'll run that next, then tomorrow I'll try generating a log using a debug build.
Comment 13 Myk Melez [:myk] [@mykmelez] 2009-09-21 23:21:08 PDT
Running the script throws a Python exception:

myk@myk:~$ python ./ imap-2009-09-21.log
Traceback (most recent call last):
  File "./", line 266, in <module>
  File "./", line 180, in parse
    meth(imap_loop, line)
  File "./", line 251, in _imapLoop_func_CreateNewLineFromSocket
    imap_loop.active_fetch_url.ref(self.iLine, self.thread_id, line)
AttributeError: 'ImapLoop' object has no attribute 'active_fetch_url'
Comment 14 Myk Melez [:myk] [@mykmelez] 2009-09-22 11:15:50 PDT
I built a debug build last night and started running it this morning. After Gmail reenabled my IMAP access a few minutes ago, my IMAP log quickly grew to ~500MB, whereupon I shut down Thunderbird.

I tried running the script on this log, but it threw another exception:

  myk@myk:~$ python ./ imap-2009-09-22.log
  Traceback (most recent call last):
    File "./", line 266, in <module>
    File "./", line 161, in parse
      imap_loop_id = int(word, 16)
  ValueError: invalid literal for int() with base 16: 'WARNING'

I also tried grepping for that assertion, but it doesn't appear:

  myk@myk:~$ nice grep 'need exacly one CRLF, which must be at end of line' imap-2009-09-22.log | wc
        0       0       0
Comment 15 David :Bienvenu 2009-09-22 13:08:27 PDT
how about "offline message too small" ?
Comment 16 Myk Melez [:myk] [@mykmelez] 2009-09-22 13:23:03 PDT
myk@myk:~$ nice grep 'offline message too small' imap-2009-09-22.log | wc
    723    8315  116413
Comment 17 David :Bienvenu 2009-09-22 13:27:47 PDT
Myk, can you run under the debugger, set a breakpoint on the line in nsMsgDBFolder::EndNewOfflineMessage that prints that message, and tell me what curStorePos, messageSize, and m_numOfflineMsgLines are?
Comment 18 Myk Melez [:myk] [@mykmelez] 2009-09-22 14:00:21 PDT
(gdb) print curStorePos
$1 = {mValue = 128302}
(gdb) print messageSize
$2 = 128303
(gdb) print m_numOfflineMsgLines
$3 = 1780
Comment 19 David :Bienvenu 2009-09-22 15:03:25 PDT
k, thx much Myk. That's frustratingly close. I was thinking of allowing a fudge factor based on the number of lines received, though for very large messages, we might still end up with incomplete messages because of that. Given that you're off by just one, I'm really tempted to try to figure out what's going on with gMail, but I'd like to relieve your pain first.
Comment 20 David :Bienvenu 2009-09-22 15:15:54 PDT
Created attachment 402193 [details] [diff] [review]
proposed fix
Comment 21 David :Bienvenu 2009-09-22 15:16:46 PDT
Myk, if you run with the patch, it should fix the problem going forward (i.e., you should only need to run with this patch once, I suspect).
Comment 22 Myk Melez [:myk] [@mykmelez] 2009-09-22 17:18:09 PDT
I built with the patch and am running with it now.  Based on Activity Manager output and my IMAP log, it seems like Thunderbird was able to download messages it previously hadn't, but once that process was done, it stopped downloading messages, and it is no longer perpetually doing so, filling up my hard drive.

So it looks to me like the patch has fixed the problem.
Comment 23 WADA 2009-09-23 02:05:51 PDT
(In reply to comment #19)
> I'm really tempted to try to figure out what's going on with gMail, (snip)

Gmail IMAP returns [CR], then Tb also sends [CR]. So NSPR log line becomes ...{CR}{CRLF}. I don't know whether {CRLF} in log is data sent to server or not. See small IMAP log attached to Bug 517837, please.
Comment 24 David :Bienvenu 2009-09-24 07:23:25 PDT
fix checked in, thx for testing, and thx, WADA, for digging into this.
Comment 25 David :Bienvenu 2009-09-25 07:13:53 PDT
*** Bug 518789 has been marked as a duplicate of this bug. ***
Comment 26 David :Bienvenu 2009-09-25 07:52:00 PDT
*** Bug 518702 has been marked as a duplicate of this bug. ***
Comment 27 Phil Johnson 2009-09-25 11:54:39 PDT
In which build should we see the fix?  I'm still seeing repeated downloading of IMAP folders in today's download:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20090925 Shredder/3.0pre ID:20090925031809
Comment 28 Wayne Mery (:wsmwk, NI for questions) 2009-09-25 12:07:19 PDT
Phil, because you are running a nightly (built this morning) you should be seeing the results of yesterday's checkins, i.e. comment 24.

Perhaps you are seeing something different - but it's hard to say without a more detailed description of your issue. A key item being, are you talking about a gmail account?
Comment 29 Phil Johnson 2009-09-25 12:15:37 PDT
Apologies, I guess I didn't read the bug carefully enough.  I'm not using gmail.  Our IMAP server is MS Exchange.  But the behavior is similar...  Every time I revisit an IMAP folder, Shredder re-downloads the same number of messages it got last time (e.g., "Deleted Items - Downloading message 1 of 645").

Should I file a separate bug for this?
Comment 30 Mahesh Asolkar 2009-09-25 12:36:03 PDT
I filed Bug 518702, which is for MS Exchange. It was marked duplicate of this bug. So the underlying issue should be similar.

I just updated Shredder (Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20090925 Lightning/1.0pre Shredder/3.0pre), and I am still seeing the same behavior I reported.
Comment 31 Phil Johnson 2009-09-25 17:03:03 PDT
(In reply to comment #29)

Another data point for my problem:  I had all of my IMAP folders selected for offline use.  I went to Account Settings -> (IMAP acct.) -> Synchronization & Storage -> Message Synchronizing -> Advanced and unchecked some of the folders so that they're no longer selected for offline use.

The folders that are not checked for offline use no longer exhibit the problem behavior (downloading hundreds or thousands of messages each time the folder is visited).  The ones that are still checked for offline use still exhibit the problem.
Comment 32 David :Bienvenu 2009-09-25 17:07:32 PDT
so I'm sure this is autosync downloading the messages over and over again, but what I don't know is why the size comparison should be so far off for Exchange - a protocol log might help.
Comment 33 David :Bienvenu 2009-09-26 11:33:57 PDT
I forgot that the IMAP part of Exchange doesn't know how big its messages are, so it just guesses when we ask for RFC822.size. The first attempt to download will fail because the rfc822 size we get doesn't match the number of bytes the server gives us. From the log I saw, the second fetch does actually tell us the correct size, but I'm not sure we update the database with that second correct size before trying to compare it with the number of bytes the server gave us.
Comment 34 David :Bienvenu 2009-10-27 05:51:02 PDT
*** Bug 524656 has been marked as a duplicate of this bug. ***
Comment 35 :Gijs Kruitbosch 2009-10-28 17:46:46 PDT
*** Bug 514386 has been marked as a duplicate of this bug. ***
Comment 36 Ludovic Hirlimann [:Usul] 2009-11-23 04:29:30 PST
*** Bug 520403 has been marked as a duplicate of this bug. ***
Comment 37 adampacker 2010-01-04 12:23:02 PST
I am using Gmail via IMAP with Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv: Gecko/20091204 Thunderbird/3.0

I am still experiencing this bug as my 55000+ messages in All Mail keep getting downloaded every time I start up Thunderbird.

I have the All Mail folder as well as a few but not all others set to synchronize as I would like to use Thunderbird to back up my mail.

It seems to me I should not be seeing this bug given the status is "RESOLVED FIXED" ?

Apologies if my comment does not conform to Bugzilla guidelines but I would like to help if I can.
Comment 38 Wayne Mery (:wsmwk, NI for questions) 2010-01-08 17:05:15 PST
adam, your issue may be one of these

bug 534297 IMAP Sync downloads the same messages over and over
bug 537498 If IMAP offline-store file size exceeds 4GB, mails downloaded at over 4GB can not be read, and downloaded again & again, even if mail folder size is within 4GB

Note You need to log in before you can comment on or make changes to this bug.