Last Comment Bug 714090 - TB 8.0 deletes POP mails if not download truncated parts and server deletes mail
: TB 8.0 deletes POP mails if not download truncated parts and server deletes mail
Status: RESOLVED FIXED
: dataloss
Product: MailNews Core
Classification: Components
Component: Networking: POP (show other bugs)
: 8
: All All
: -- critical with 1 vote (vote)
: Thunderbird 15.0
Assigned To: Hiroyuki Ikezoe (:hiro)
:
Mentors:
Depends on: 645976
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-29 09:58 PST by Rotis
Modified: 2012-05-21 15:38 PDT (History)
8 users (show)
ryanvm: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
POP3 log of connection when messages are to be expired on server (14.98 KB, text/plain)
2012-01-03 15:08 PST, :aceman
no flags Details
A workaround (1.95 KB, patch)
2012-05-10 19:14 PDT, Hiroyuki Ikezoe (:hiro)
mozilla: review+
Details | Diff | Review

Description Rotis 2011-12-29 09:58:45 PST
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
Build ID: 20111120135848

Steps to reproduce:

I'm using TB 8.0 and POP. Since I use a mobile modem I don't download complete messages over 1 MB of size. I get the mail on my disk, but it says "truncated, if you want the rest of the mail click here". My TB setting is to leave mails for 30 days on the server. If I until then do not download the missing parts of the truncated mail, Thunderbird deletes the already partly downloaded mail from TB without notice. I consider this a major bug. The mail should remain in my TB inbox, even I cannot download the truncated parts (photos, files) anymore.
Comment 1 Hashem Masoud 2011-12-29 23:06:16 PST
(In reply to rotis from comment #0)
See bug 531088.
Comment 2 :aceman 2012-01-02 08:47:32 PST
So when those 30 days come, TB deletes the message from the server but ALSO your local downlaoded (incomplete) copy?
That would be bad.
Comment 3 Rotis 2012-01-02 12:52:18 PST
Yes, that is what TB does. It did it with me in two cases, when I delayed the download of two important mails with attached images. After 30 days the local mails vanish as if they never existed. I would classify this bug as critical, because it causes data loss.
Comment 4 :aceman 2012-01-02 14:19:40 PST
You can do that classification (I've done now for you).
I can test if this happens for me.
Comment 5 :aceman 2012-01-02 14:33:57 PST
The incompletely downloaded messages are marked as 'b' in popstate.dat.
Normal complete messages are marked 'k'.
Messages that were locally deleted and are to be deleted on server are marked 'd'.
Maybe when the time comes to make the expiration of messages depending on their timestamps, the algorithm does something bad with messages not in the 'k' state. I'll see tomorrow when the expiration kicks in on messages downloaded today (I've set to keep msgs on server for 1 day in my test setup).
Comment 6 :aceman 2012-01-03 15:08:13 PST
Created attachment 585569 [details]
POP3 log of connection when messages are to be expired on server

Well, there is something fishy going on in this situation.

I waited 24 hours and then used Get new messages on the account that had set to leave messages on server for 1 day. TB12 hanged with the progressbar on the status bar frozen. Attached is the log of this session. There is an assertion after deletion of the message 18 that was only partially downloaded yesterday (in 'b' state in popstate.dat). After that one TB freezes hard, I had to kill it. After each new start and connection the server, the log is always the same. The server probably didn't process all the DELEs because TB didn't issue a proper QUIT.
Comment 7 :aceman 2012-01-03 15:10:10 PST
The hang is reproducible, happens at each run and the log is always the same.
If I change the expiration to 2 days (tomorrow) then the hang does not occur as no messages are attempted to be deleted on the server.
Comment 8 WADA 2012-01-03 19:27:17 PST
Log for mail seq no=18(18 272708 for LIST, 18 201201022327341470 for UIDL).
  http://kb.mozillazine.org/Session_logging_for_mail/news
> -1218275072[b745a1c0]: POP3: Entering state: 20 <= 20 SEND_DELE
> -1218275072[b745a1c0]: SEND: DELE 18
> -1218275072[b745a1c0]: Entering NET_ProcessPop3 21
> -1218275072[b745a1c0]: POP3: Entering state: 3 <= 3 WAIT_FOR_RESPONSE
> -1218275072[b745a1c0]: RECV: +OK Message deleted
> -1218275072[b745a1c0]: POP3: Entering state: 21 <= 21 DELE_RESPONSE
> -1218275072[b745a1c0]: POP3: Entering state: 15 <= 15 GET_MSG
After it, next three "15 GET_MSG" is seen, without 20 SEND_DELE, DELE command&response, 21 DELE_RESPONSE, and assertion occurs.
> -1218275072[b745a1c0]: POP3: Entering state: 15 <= 15 GET_MSG
> -1218275072[b745a1c0]: POP3: Entering state: 15 <= 15 GET_MSG
> -1218275072[b745a1c0]: POP3: Entering state: 15 <= 15 GET_MSG
> -1218275072[b745a1c0]: ###!!! ASSERTION: You can't dereference a NULL nsRefPtr with operator->().

These looks to correspond to next remaining three mails.
    LIST      UIDL
    19 2355   19 201201031649282934
    20 1907   20 201201031907490684
    21 3149   21 201201032017114491
If fetch header only(or download size limitation), mail size != data length obtaind by TOP command(TOP 0 if fetch header only, TOP NNN lines if size limitation). All mails other than "mail seq no=18" is less than 4KB, but size of 18 = 272708 bytes. 
Big difference of these two size in mail no.=18 produces buffer corruption, invalid pointer like issue?
Or server retuened other response lines than "+OK Message deleted" to "SEND: DELE 18" then Tb was confused?
Comment 9 WADA 2012-01-03 19:44:32 PST
FYI.
(In reply to :aceman from comment #5)
> I'll see tomorrow when the expiration kicks in on messages downloaded today
> (I've set to keep msgs on server for 1 day in my test setup).

You don't need to wait one day. IIRC, Tb's MsgPurge bases on timestamp of mail download, so, PC's clock change may be used for testing.
Change PC's clock to yesterday, delete popstate.dat, start Tb, fetch header only, terminate Tb.
Change back PC's clock to today, enable logging with MsgPurge:5,POP3:5,timestamp, and start Tb.
Comment 10 WADA 2012-01-03 19:59:58 PST
FYI.
http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgPurgeService.cpp#75
> 78   mMinDelayBetweenPurges = 480;  // never purge a folder more than once every 8 hours (60 min/hour * 8 hours)
> 79   mPurgeTimerInterval = 5;  // fire the purge timer every 5 minutes, starting 5 minutes after the service is created (when we load accounts)

http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgPurgeService.cpp#467
Double check of Junk status is done at here in case that \Junk flag is not supportd by IMAP server. Checking of popstate.dat content may be added to here.
Comment 11 :aceman 2012-01-04 00:03:37 PST
(In reply to WADA from comment #9)
> You don't need to wait one day. IIRC, Tb's MsgPurge bases on timestamp of
> mail download, so, PC's clock change may be used for testing.

This would probably work but could mess up the rest of the system. I am on linux and there are running some daemons that could be date sensitive.
Comment 12 :aceman 2012-01-04 00:14:53 PST
(In reply to WADA from comment #8)
It is true that all messages except 18 are below 5KB (that was the threshold set). That is why only 18 is partially downloaded. But the server responded "+OK Message deleted" also to "DELE 18". The question is why TB didn't enter state 20 after state 15 as it did after all the previous messages.
Comment 13 :aceman 2012-01-04 00:17:19 PST
To add, the log is from a self compiled debug build if TB12.
I don't know if a release build would hang in the same way and produce that assertion message.
Comment 14 WADA 2012-01-04 01:32:49 PST
(In reply to :aceman from comment #6)
> POP3 log of connection when messages are to be expired on server
> -1218275072[b745a1c0]: ###!!! ASSERTION: You can't dereference a NULL nsRefPtr with operator->().

With which build did you execute test?
Did you executed Repair Folder after downlod of mails for your teting?

Because bug 713611 started to occur on X'mas in last year and is not resolved yet, mail's offset is shifted by 2 bytes(i.e. .msf content is broken).
It looks to occur when multiple mails are downloaded(by RETR or TOP) by single new mail check, and I could observe phenomenon of bug 713611(screen shot attached to bug 714182) on three mails in four mails which was downloaded at same time with "fetch header only"(fetched by TOP 0).
To recover from it, Repair Folder(rebuild-index) is needed, and if Compact is invoked before Repair Folder, mails are coruppted.

By the way, I failed to execute duplication test of this bug with "PC's clock change" on Win, because I enabled automatic clock synchronization with time server and clock was automatically corrected before I started testing :-)
So, I used message filter's "delete from POP3 server" to force change from "b" to "d" for the partially fetched messages.
Comment 15 :aceman 2012-01-04 02:47:03 PST
The build was from hg checkout from 2012-01-03, around 20:00 CET (GMT+1).

I know about bug 713611, it hit me too. I did Repair folder, but I am not sure now if it was before or after these tests. No compaction done.

Still, msg 18 was not affected by this problem regardless of Repair, it was previously downloaded alone in one session. The other messages were affected.
Comment 16 :aceman 2012-01-04 11:06:30 PST
I tried it today on the folder that was repaired and I get the same hang.
Comment 17 WADA 2012-01-04 17:26:13 PST
How about crafted popstate.dat?
  Disable MsgPurge, change "d" of all mails to "b", and change "b" of entry for
  UIDL=201201022327341470 (mail seq no=18 in your test) to "d", then restart Tb.
Tb requests DELE for the mail only upon next new mail check.
Comment 18 :aceman 2012-01-05 00:15:05 PST
Yes, I can try that.
Comment 19 :aceman 2012-01-05 14:46:21 PST
Without doing that test so far, starting at the state from comment 6:
I run TB12 again now and it correctly expired all messages, even the number 18. Unfortunately I didn't have POP3 log enabled this time. But all messages were removed from popstate.dat and the truncated message 18 was also deleted from the folder. It can be seen in the datafile with a X-Mozilla-status: 0401. It was not moved to Trash. So I confirm the original report too, on Linux.
I don't know why it didn't hang this time. Maybe the assertion was fixed in other bug.
Comment 20 Wayne Mery (:wsmwk, NI for questions) 2012-01-18 04:57:12 PST
(In reply to Hashem Masoud from comment #1)
> (In reply to rotis from comment #0)
> See bug 531088.

how is this now different from  bug 531088?
Comment 21 :aceman 2012-01-18 06:12:38 PST
I'd say they are completely unrelated.

Bug 531088: people wondering why messages vanish from their SERVER (because they didn't notice the 'delete from server' setting enabled).
This bug: people specifically enabling the 'delete from server' setting and wondering why (partial) messages vanish from their LOCAL mailbox.
Comment 22 :aceman 2012-04-13 14:04:59 PDT
There is an explanation that this happens intentionally in bug 645976.
Comment 23 Hiroyuki Ikezoe (:hiro) 2012-05-10 19:14:11 PDT
Created attachment 623008 [details] [diff] [review]
A workaround

The original reporter must set "mail.server.default.dup_action" to 2.
If so, this patch fixes the issue.

I know this patch has a side-effect that real duplicate message which has been partially downloaded is not moved to trash (or deleted). But I think it will be preferable rather than lost of the messages. (I have actually lost some emails...)
Comment 24 David :Bienvenu 2012-05-14 16:37:13 PDT
Comment on attachment 623008 [details] [diff] [review]
A workaround

thx, yeah, that seems like a reasonable thing to do.
Comment 25 Ryan VanderMeulen [:RyanVM] 2012-05-21 15:38:23 PDT
https://hg.mozilla.org/comm-central/rev/6e88a2301b76

Note You need to log in before you can comment on or make changes to this bug.