Closed Bug 517466 Opened 15 years ago Closed 15 years ago

Thunderbird keeps downloading 100+ messages on startup it has *already* downloaded (Gmail IMAP) - gmail line endings messing up our line length size calculation

Categories

(MailNews Core :: Backend, defect, P1)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0rc1

People

(Reporter: protz, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [no l10n impact])

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.4pre) Gecko/20090915 Thunderbird/3.0b4

Messages after a certain date are downloaded every time I launch thunderbird. That is to say, there are right now ~107 messages that will be downloaded if I close TB an re-open it, although I'm synced right now. This is on GMail IMAP. More information:
- might be related to bug 514386 (I also experience the "download N of M" with N > M bug
- did *not* happen with b3
- If a new mail arrives, at next TB startup, it will be 108 messages being downloaded
- I keep getting gloda errors :
2009-09-18 15:41:43	gloda.fundattr	ERROR	Message with subject 'Votre Espace Client : Confirmation de création' somehow lacks a valid author.  Bailing.

2009-09-18 15:41:43	gloda.indexer	WARN	Problem during [job:folder delta:0 id:15 items:0 offset:1 goal:2], trying to recover.  Problem was at undefined:undefined: [object StopIteration]


2009-09-18 15:41:44	gloda.fundattr	ERROR	Message with subject 'Inscription à Canardplus.com ( Canardplus.com )' somehow lacks a valid author.  Bailing.

2009-09-18 15:41:44	gloda.indexer	WARN	Problem during [job:folder delta:0 id:15 items:0 offset:2 goal:2], trying to recover.  Problem was at undefined:undefined: [object StopIteration]

These (exact) errors happen at every startup. The activity manager says it is bringing the "inbox" folder up-to-date and then pretends the inbox IS up-to-date.


Reproducible: Always

Steps to Reproduce:
Hard to say.

1. Create a Gmail account using the wizard.
2. Play with it for a while.
3. Notice some messages keep being downloaded.
Actually, this does not happen only on startup. Right after receiving the Bugzilla mail due to creating this bug, I can see now in the status bar "Downloading 1 of 108 in Inbox"... By the time I sent this bug, the activity manager said the synchronisation was complete...
(In reply to comment #0)
> 2009-09-18 15:41:43    gloda.fundattr    ERROR    Message with subject 'Votre
> Espace Client : Confirmation de création' somehow lacks a valid author. 
> Bailing.

That's an email gloda can't parse properly see bug 517170
 
> Reproducible: Always
> 
> Steps to Reproduce:
> Hard to say.
> 
> 1. Create a Gmail account using the wizard.
> 2. Play with it for a while.
> 3. Notice some messages keep being downloaded.

Can you provide us with one of those messages ?

can you log imap as explained at https://wiki.mozilla.org/MailNews:Logging to make sure TB is really downloading the messages.
The log file is a bit heavy so I stored it outside bugzilla: http://jonathan.protzenko.free.fr/log.tar.bz2

Here is what it looks like. There are thousands of lines like this at the beginning:

-1421870192[aef8f8a0]: aee58400:imap.googlemail.com:S-INBOX:CreateNewLineFromSocket: * 5520 FETCH (UID 6229 FLAGS (\Seen))^M

They go from 1 to the number of emails in my All Mail folder. Then TB starts actually downloading the messages. The first messages that appear in the log are those I just received from Bugzilla because of these bug reports. 

-1421870192[aef8f8a0]: aee58400:imap.googlemail.com:S-INBOX:CreateNewLineFromSocket: To: jonathan.protzenko@gmail.com^M
-1421870192[aef8f8a0]: ReadNextLine [stream=aee557a8 nb=78 needmore=0]
-1421870192[aef8f8a0]: aee58400:imap.googlemail.com:S-INBOX:CreateNewLineFromSocket: Subject: [Bug 517466] Thunderbird keeps
-1421870192[aef8f8a0]: ReadNextLine [stream=aee557a8 nb=43 needmore=0]
-1421870192[aef8f8a0]: aee58400:imap.googlemail.com:S-INBOX:CreateNewLineFromSocket:  it has *already* downloaded (Gmail IMAP


This mail, for instance, was already downloaded before. Then more emails from bugzilla arrive... and many mails I received previously. Then a lot of Base64 encoded data (it seems) arrives on and on for hundreds of thousands of lines (I guess that's the attachments)...

It is a bit hard to determine the order in which the messages are downloaded. But I am positive that the first messages that appear in the log are those that arrived last.
Thank you very much for providing the log file.

Because reading the log file is hard, I wrote a python script that tries to do the legwork for us.  I am attaching its output, and the script can be found here:
http://hg.mozilla.org/users/bugmail_asutherland.org/tb-test-help/file/678da3990640/nspr_imap_log_parser.py

At first glance, it would appear that the gloda streaming is somehow causing the multiple downloads to occur.  The one exception in the file is UID 1682, but I think my log file analysis just got confused by the URL starting to be processed but then getting aborted due to the connection dying.

I did read a fair bit of the autosync code before turning to python.  Given that it checks if the message is already offline immediately before acquiring the folder semaphore and kicking off the URL, the only way it could double-request a message is if it were possible for autosync to create an offline URL, then run its logic again while the messages are still not offline.  (It has no way to know about what URLs are in flight, but I presume the use of the semaphore helps guarantee that.)

Reading the processed output of the script:
- The first number on a line with a colon is always the line number for whatever it is we are talking about.
- The hex number is the thread we think was active on that line.
Myk, as you seem to have a very reproducible problem along these lines, could you run with IMAP logging enabled (as per the link way above) and try out the python script referenced above?

Also, confirming based on the fact that the IMAP log is showing duplicate downloads AFAICT.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Repeated FETCH for same mail itself can easily be observed by test of Bug 516211(test mail of Bug 501253 is used. not crafted mail data. real mail data).
If the mail is viewed before whole mail download to offline-store by auto-sync, multiple FETCH for same mail is seen by repeated change of "Display Attachments Inline" from enabled to disabled or vice versa. And, after mail data is written in offline-store by auto-sync, mail data is removed from offline-store by "Compact".

Is this bug's problem depends on mail structure?
bug 516211 has to do with the downloading of a message for display, not autosync.

I suspect this bug has to do with line endings messing up our size calculation, and deciding we hadn't got the right number of bytes in the offline store, and thus marking the message as not offline. gmail doesn't always get the line endings right (lines must be CRLF terminated), so I suspect I need to add some sort of fuzz factor, or keep track of the number of lines incorrectly terminated...
Flags: blocking-thunderbird3+
Target Milestone: --- → Thunderbird 3.0rc1
Whiteboard: [no l10n impact]
taking.
Assignee: nobody → bienvenu
Myk, if you have a reproducible case of this, running a debug build with imap protocol logging on would be very helpful - if you see a bunch of assertions from the imap parser about incorrectly terminated lines in the log, that would validate my hypothesis about line endings being the issue.
I have a reproducible case of the local copies of IMAP folders growing in size by large amounts daily until I run out of disk space or Gmail disables my IMAP access for a day due to overuse.  It may well be this bug.

So today I ran Thunderbird with IMAP logging, saving the results to a log file, until Gmail disabled my access again.  The log file is 5.2GB in size, however, making it difficult to simply browse it for interesting messages.  Are there particular strings you'd like me to grep for in the file?
Was it a debug build? If so, you'd see "need exacly one CRLF, which must be at end of line" in the log if gmail was returning non CRLF terminated lines. Sadly, "exacly" is exactly how it's spelled in the assertion.
myk@myk:~$ nice grep 'need exacly one CRLF, which must be at end of line' imap-2009-09-21.log | wc
      0       0       0

But it's not a debug build, so I guess that's why I don't see that text.

I just realized there's Andrew's script to run.  I'll run that next, then tomorrow I'll try generating a log using a debug build.
Running the script throws a Python exception:

myk@myk:~$ python ./nspr_imap_log_parser.py imap-2009-09-21.log
Traceback (most recent call last):
  File "./nspr_imap_log_parser.py", line 266, in <module>
    parser.parse(f)
  File "./nspr_imap_log_parser.py", line 180, in parse
    meth(imap_loop, line)
  File "./nspr_imap_log_parser.py", line 251, in _imapLoop_func_CreateNewLineFromSocket
    imap_loop.active_fetch_url.ref(self.iLine, self.thread_id, line)
AttributeError: 'ImapLoop' object has no attribute 'active_fetch_url'
Priority: -- → P1
I built a debug build last night and started running it this morning. After Gmail reenabled my IMAP access a few minutes ago, my IMAP log quickly grew to ~500MB, whereupon I shut down Thunderbird.

I tried running the script on this log, but it threw another exception:

  myk@myk:~$ python ./nspr_imap_log_parser.py imap-2009-09-22.log
  Traceback (most recent call last):
    File "./nspr_imap_log_parser.py", line 266, in <module>
      parser.parse(f)
    File "./nspr_imap_log_parser.py", line 161, in parse
      imap_loop_id = int(word, 16)
  ValueError: invalid literal for int() with base 16: 'WARNING'


I also tried grepping for that assertion, but it doesn't appear:

  myk@myk:~$ nice grep 'need exacly one CRLF, which must be at end of line' imap-2009-09-22.log | wc
        0       0       0
how about "offline message too small" ?
myk@myk:~$ nice grep 'offline message too small' imap-2009-09-22.log | wc
    723    8315  116413
Myk, can you run under the debugger, set a breakpoint on the line in nsMsgDBFolder::EndNewOfflineMessage that prints that message, and tell me what curStorePos, messageSize, and m_numOfflineMsgLines are?
(gdb) print curStorePos
$1 = {mValue = 128302}
(gdb) print messageSize
$2 = 128303
(gdb) print m_numOfflineMsgLines
$3 = 1780
k, thx much Myk. That's frustratingly close. I was thinking of allowing a fudge factor based on the number of lines received, though for very large messages, we might still end up with incomplete messages because of that. Given that you're off by just one, I'm really tempted to try to figure out what's going on with gMail, but I'd like to relieve your pain first.
Attached patch proposed fixSplinter Review
Attachment #402193 - Flags: superreview?(bugzilla)
Attachment #402193 - Flags: review?(bugzilla)
Myk, if you run with the patch, it should fix the problem going forward (i.e., you should only need to run with this patch once, I suspect).
I built with the patch and am running with it now.  Based on Activity Manager output and my IMAP log, it seems like Thunderbird was able to download messages it previously hadn't, but once that process was done, it stopped downloading messages, and it is no longer perpetually doing so, filling up my hard drive.

So it looks to me like the patch has fixed the problem.
(In reply to comment #19)
> I'm really tempted to try to figure out what's going on with gMail, (snip)

Gmail IMAP returns [CR], then Tb also sends [CR]. So NSPR log line becomes ...{CR}{CRLF}. I don't know whether {CRLF} in log is data sent to server or not. See small IMAP log attached to Bug 517837, please.
Whiteboard: [no l10n impact] → [no l10n impact][has patch for r/sr standard8]
Attachment #402193 - Flags: superreview?(bugzilla)
Attachment #402193 - Flags: superreview+
Attachment #402193 - Flags: review?(bugzilla)
Attachment #402193 - Flags: review+
fix checked in, thx for testing, and thx, WADA, for digging into this.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [no l10n impact][has patch for r/sr standard8] → [no l10n impact]
In which build should we see the fix?  I'm still seeing repeated downloading of IMAP folders in today's download:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4pre) Gecko/20090925 Shredder/3.0pre ID:20090925031809
Phil, because you are running a nightly (built this morning) you should be seeing the results of yesterday's checkins, i.e. comment 24.

Perhaps you are seeing something different - but it's hard to say without a more detailed description of your issue. A key item being, are you talking about a gmail account?
Apologies, I guess I didn't read the bug carefully enough.  I'm not using gmail.  Our IMAP server is MS Exchange.  But the behavior is similar...  Every time I revisit an IMAP folder, Shredder re-downloads the same number of messages it got last time (e.g., "Deleted Items - Downloading message 1 of 645").

Should I file a separate bug for this?
I filed Bug 518702, which is for MS Exchange. It was marked duplicate of this bug. So the underlying issue should be similar.

I just updated Shredder (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.4pre) Gecko/20090925 Lightning/1.0pre Shredder/3.0pre), and I am still seeing the same behavior I reported.
(In reply to comment #29)

Another data point for my problem:  I had all of my IMAP folders selected for offline use.  I went to Account Settings -> (IMAP acct.) -> Synchronization & Storage -> Message Synchronizing -> Advanced and unchecked some of the folders so that they're no longer selected for offline use.

The folders that are not checked for offline use no longer exhibit the problem behavior (downloading hundreds or thousands of messages each time the folder is visited).  The ones that are still checked for offline use still exhibit the problem.
so I'm sure this is autosync downloading the messages over and over again, but what I don't know is why the size comparison should be so far off for Exchange - a protocol log might help.
I forgot that the IMAP part of Exchange doesn't know how big its messages are, so it just guesses when we ask for RFC822.size. The first attempt to download will fail because the rfc822 size we get doesn't match the number of bytes the server gives us. From the log I saw, the second fetch does actually tell us the correct size, but I'm not sure we update the database with that second correct size before trying to compare it with the number of bytes the server gave us.
I am using Gmail via IMAP with Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0

I am still experiencing this bug as my 55000+ messages in All Mail keep getting downloaded every time I start up Thunderbird.

I have the All Mail folder as well as a few but not all others set to synchronize as I would like to use Thunderbird to back up my mail.

It seems to me I should not be seeing this bug given the status is "RESOLVED FIXED" ?

Apologies if my comment does not conform to Bugzilla guidelines but I would like to help if I can.
adam, your issue may be one of these

bug 534297 IMAP Sync downloads the same messages over and over
bug 537498 If IMAP offline-store file size exceeds 4GB, mails downloaded at over 4GB can not be read, and downloaded again & again, even if mail folder size is within 4GB
Assignee: bienvenu → nobody
Component: General → Backend
Product: Thunderbird → MailNews Core
QA Contact: general → backend
Summary: Thunderbird keeps downloading 100+ messages on startup it has *already* downloaded (Gmail IMAP) → Thunderbird keeps downloading 100+ messages on startup it has *already* downloaded (Gmail IMAP) - gmail line endings messing up our line length size calculation
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: