Closed Bug 854798 Opened 7 years ago Closed 5 years ago

Compacting Berkeley Mbox file changes messageKey (to new MsgOffset after compact), causing dataloss/privacy problems (bug 817245 / bug 799450, bug 766495) due to current design problem of MsgKey=MsgOffset (for Berkeley Mbox files)

Categories

(MailNews Core :: Backend, defect, critical)

defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
Thunderbird 38.0

People

(Reporter: World, Assigned: aceman)

References

(Blocks 1 open bug)

Details

(Keywords: dataloss, privacy, Whiteboard: [this will fix most, but not all scenarios of dependent bugs; "Repair folder" can still change messageKey and break linked things])

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #794303 +++

Compact folder of Berkley Mbox file shouldn't alter messageKey by Compact operation after design change from "messageKey=messageOffset" to "messageKey=Highest key + 1", to avoid problems like bug 817245 due to messageKey value change according to messageOffset change by Compact.

MessageKey alterlation should be executed only by re-build index(Repair Folder). 

After design change from "messageKey=messageOffset" to "messageKey=highet messageKey + 1" by fix of Bug 789679 add his family, if messageKey is not altered by this bug, another "4Gig limit" problem arises - 4Gig limit of messageKey.
So, action like next is needed.
(i)  warning when near 4Gig, which urge user to do Repair Folder.
(ii) automatic re-build index invocation, with warning about "re-build index due to messageKey wrap around".
I do not understand this bug.

Also, bug 789679 is not yet checked in or actually commented on by a reviewer, so we should not base any new bugs/decisions on it.
By the way, folks, what happens to the notion of "MsgKey" if/when the paradigm of  one e-mail article per file, i.e., one file stores one e-mail article will be adopted?

I would assume that Message-ID attached to each e-mail article will be
used for the unique identifier then. 

Today, we can copy the same message from say, Inbox, to a different
folder again and again. Copies of the articles with the same contents are stored
in a mail folder.

I am not sure if such copying of the same message will be allowed in one file per one e-mail article (under a same or hashed directory for fast access) paradigm arrives and we use message-ID as the key unless
we assign still another different ID to articles (different from the Message-ID inside each e-mail article). I don't think we want to create such extra burden.

Of course, this is a futuristic problem, but if whatever clever algorithm needs to
be chosen today for avoiding some issues about 4+GB folde size, we might want to
consider also whether such clever algorithm can be carried over to a new paradigm of
one to one mapping between article and file. I bet that there is no way we can make the
today's MsgKey and messageId to co-exist for, say, POP3 e-mail download, etc.
(It is instructive to think what IMAP already does since obviously it uses messageID, or
rather the IMAP server uses the messageID to manage access to articles, I think.)

I am not familiar with the DB used for, say, POP3 download.
But we might as well adopt IMAP like messageDB as a medium-term project to use
messageID instead of locally created MsgKey even before the day of
one article per file arrives (or rather anticipating it).

My suggestion is not to create a too 
clever algorithm for numerical MsgKey issue, a simple straightforward algorithm would do.
We are likely to have to throw it away anyway if/when one article per file paradigm arrives.

JUST my ephemeral thoughts about this issue. 

TIA
(well, not sure where to post this, so here it went.)
(In reply to aceman from comment #1)
> I do not understand this bug.

I assume the issue here is that compacting a local folder can change the key of a message. (This doesn't happen for IMAP folders.)

Presumably we can only do this once the previous ESR has all the key != offset fixes, so that there's no downgrade data loss.
(In reply to :aceman from comment #1)
> I do not understand this bug.
> 
> Also, bug 789679 is not yet checked in or actually commented on by a
> reviewer, so we should not base any new bugs/decisions on it.

I take this back, the change of message keys at compact happens also in original code, without my patches.
(In reply to ISHIKAWA, chiaki from comment #2)
> By the way, folks, what happens to the notion of "MsgKey"
> if/when the paradigm of one e-mail article per file,
> i.e., one file stores one e-mail article will be adopted?

To identify file==mail in a directory==mbox", unique identifier assosiated to file is needed. So, "file# in directory which is incremented by one when Tb detects new mail==file." like one is needed. In this case, messageKey which starts from ZERO, incremented by 1 upon new mail addition to mbox==directry, is a candididate. This is absolutely same as one currently proposed for Berkley Mbox file.
Because messageOffset==0 always, messageOffset can be used for other purpose when maildir.
Above is based on assumption of "Mbox==Directory".

If all files for all mails is held in a directory or in set of multiple directories, messageKey in this context is also "an unique identifier of file". messageKey in this context should also be changed by "Rebuild-Index" like operation only, instead of by "Compact Mbox" like operation. There is no need to change messageKey in this context upon mail copy/move between Mboxes in Tb. 

FYI.
Following is MMsgDBHdr info and dump of StringProperty of message of same X-GM-MSGID in Tb 17.0.3 on MS Win.
(1) X-GM-MSGID = 1397005237944465774 in [Gmail]/All Mail (Offline-Use=Off)
> Mbox = imap://yatter.one%40gmail.com@imap.gmail.com/[Gmail]/All Mail,                 messageKey = 33190, messageOffset = 33190, messageSize = 457, offlineMessageSize =   0, date = 1332288015000000, dateInSeconds = 1332288015, StringProperty_pendingRemoval = , flags =   1, flag_Detail = { FeedMsg = false, IMAPDeleted = false, MDNReportSent = false, Read = true, Replied = false, Marked = false, Expunged = false, HasRe = false, Elided = false, Offline = false, Watched = false, SenderAuthed = false, Partial = false, Queued = false, Forwarded = false, Priorities = false, New = false, Ignored = false, MDNReportNeeded = false, Template = false, Attachment = false, Labels = false, RuntimeOnly = false }, messageId = 002-4F691A0F.8030609@x.x.x, mime2DecodedSubject = abc-def-ghi-xyz-02
>        [All_StringProperty] = { flags =  1, statusOfset = 0, sender = abcDEF pqrSTU <x1@x.x.x>, recipients = xYZ xYZ <z1@x.x.x>, subject = abc-def-ghi-xyz-02, message-id = 002-4F691A0F.8030609@x.x.x, date = 4f691a0f, dateReceived = 4f691a0f, priority = 1, msgCharSet = ISO-2022-JP, size = 1c9, threadParent = ffffffff, msgThreadId = 81a6, ProtoThreadFlags = 0, X-GM-MSGID = 1397005237944465774, X-GM-THRID = 1397005237944465774, X-GM-LABELS = INBOX/Inbox0 Offline-Use-Test/Offline-Use-002 Offline-Use-Test/Offline-Use-001, sender_name = 450|abcDEF pqrSTU, keywords = , label = 0 }
(2) X-GM-MSGID = 1397005237944465774 in Inbox (Offline-Use=On)
> Mbox = imap://yatter.one%40gmail.com@imap.gmail.com/INBOX,                            messageKey = 12184, messageOffset =  1546, messageSize = 457, offlineMessageSize = 543, date = 1332288015000000, dateInSeconds = 1332288015, StringProperty_pendingRemoval = , flags = 129, flag_Detail = { FeedMsg = false, IMAPDeleted = false, MDNReportSent = false, Read = true, Replied = false, Marked = false, Expunged = false, HasRe = false, Elided = false, Offline = true,  Watched = false, SenderAuthed = false, Partial = false, Queued = false, Forwarded = false, Priorities = false, New = false, Ignored = false, MDNReportNeeded = false, Template = false, Attachment = false, Labels = false, RuntimeOnly = false }, messageId = 002-4F691A0F.8030609@x.x.x, mime2DecodedSubject = abc-def-ghi-xyz-02
>        [All_StringProperty] = { flags = 81, statusOfset = 0, sender = abcDEF pqrSTU <x1@x.x.x>, recipients = xYZ xYZ <z1@x.x.x>, subject = abc-def-ghi-xyz-02, message-id = 002-4F691A0F.8030609@x.x.x, date = 4f691a0f, dateReceived = 4f691a0f,               msgCharSet = ISO-2022-JP,                                                                                X-GM-MSGID = 1397005237944465774, X-GM-THRID = 1397005237944465774, X-GM-LABELS = INBOX/Inbox0 Offline-Use-Test/Offline-Use-002 Offline-Use-Test/Offline-Use-001, sender_name = 450|abcDEF pqrSTU, offlineMsgSize = 21f,   msgOffset = 60a, storeToken = 1546, priority = 1, size = 1c9, threadParent = ffffffff, msgThreadId = 2f98, ProtoThreadFlags = 0, keywords = , label = 0, numLines = f }
(3) X-GM-MSGID = 1397005237944465774 in Offline-Use-Test/Offline-Use-001 (Offline-Use=On)
> Mbox = imap://yatter.one%40gmail.com@imap.gmail.com/Offline-Use-Test/Offline-Use-001, messageKey =     6, messageOffset =     0, messageSize = 457, offlineMessageSize = 543, date = 1332288015000000, dateInSeconds = 1332288015, StringProperty_pendingRemoval = , flags =   1, flag_Detail = { FeedMsg = false, IMAPDeleted = false, MDNReportSent = false, Read = true, Replied = false, Marked = false, Expunged = false, HasRe = false, Elided = false, Offline = false, Watched = false, SenderAuthed = false, Partial = false, Queued = false, Forwarded = false, Priorities = false, New = false, Ignored = false, MDNReportNeeded = false, Template = false, Attachment = false, Labels = false, RuntimeOnly = false }, messageId = 002-4F691A0F.8030609@x.x.x, mime2DecodedSubject = abc-def-ghi-xyz-02
>        [All_StringProperty] = { flags =  1, statusOfset = 0, sender = abcDEF pqrSTU <x1@x.x.x>, recipients = xYZ xYZ <z1@x.x.x>, subject = abc-def-ghi-xyz-02, message-id = 002-4F691A0F.8030609@x.x.x, date = 4f691a0f, dateReceived = 4f691a0f, priority = 1, msgCharSet = ISO-2022-JP, size = 1c9, threadParent = ffffffff, msgThreadId =    6, ProtoThreadFlags = 0, X-GM-MSGID = 1397005237944465774, X-GM-THRID = 1397005237944465774, X-GM-LABELS = "\\Inbox" INBOX/Inbox0 Offline-Use-Test/Offline-Use-002,                        sender_name = 450|abcDEF pqrSTU, keywords = , label = 0, msgOffset =   0, storeToken = 0, offlineMsgSize = 21f, numLines = f }
(4) X-GM-MSGID = 1397005237944465774 in Offline-Use-Test/Offline-Use-002 (Offline-Use=On)
> Mbox = imap://yatter.one%40gmail.com@imap.gmail.com/Offline-Use-Test/Offline-Use-002, messageKey =     2, messageOffset =     0, messageSize = 457, offlineMessageSize = 543, date = 1332288015000000, dateInSeconds = 1332288015, StringProperty_pendingRemoval = , flags =   1, flag_Detail = { FeedMsg = false, IMAPDeleted = false, MDNReportSent = false, Read = true, Replied = false, Marked = false, Expunged = false, HasRe = false, Elided = false, Offline = false, Watched = false, SenderAuthed = false, Partial = false, Queued = false, Forwarded = false, Priorities = false, New = false, Ignored = false, MDNReportNeeded = false, Template = false, Attachment = false, Labels = false, RuntimeOnly = false }, messageId = 002-4F691A0F.8030609@x.x.x, mime2DecodedSubject = abc-def-ghi-xyz-02
>        [All_StringProperty] = { flags =  1, statusOfset = 0, sender = abcDEF pqrSTU <x1@x.x.x>, recipients = xYZ xYZ <z1@x.x.x>, subject = abc-def-ghi-xyz-02, message-id = 002-4F691A0F.8030609@x.x.x, date = 4f691a0f, dateReceived = 4f691a0f, priority = 1, msgCharSet = ISO-2022-JP, size = 1c9, threadParent = ffffffff, msgThreadId =    6, ProtoThreadFlags = 0, X-GM-MSGID = 1397005237944465774, X-GM-THRID = 1397005237944465774, X-GM-LABELS = "\\Inbox" INBOX/Inbox0 Offline-Use-Test/Offline-Use-002,                        sender_name = 450|abcDEF pqrSTU, keywords = , label = 0, msgOffset =   0, storeToken = 0, offlineMsgSize = 21f, numLines = f }

Gmail assigns unieque X-GM-MSGID to unieque mail(and X-GM-THRID for unieque conversation). Gmail apparently uses Message-ID: to determine unieque mail, but never relies only on Message-ID: for which "absolute uniequeness on the Earth" is never guaranteed.
- Gmail assigns different X-GM-MSGID to mail of same message-ID;
  but different Subject: or some other headers.
  Note: BCC: header held in sent mail copy is not counted by Gmail.
- If multiple partial mail of one same mail is uploaded to Gmail IMAP,
  it depends on "How partial".
  For example;
  mail-A: from top, till Cotent-Type: multipart/mixed
  mail-B: from top, till mid of subparts of multipart/mixed
  mail-C: from top, till end of entire mail.
  Gmail consiers B & C as same mail, but A is considered different.
  Gmail parhaps checkes boundary, header of subpart etc.

I think Gmail like mechanism for "uniqueness of mail data in entire Tb" instead of "unique messageKey in Mbox == same as UID in IMAP == similar to UIDL of POP3" is ideal, but I believe it's difficult to implement.

This bug is not for "uniquness of mail data in entire Tb". This bug's request is; "reference to an unique identifier of a mail in an Mbox" should be kept consistent between "before Compact" and "after Compact", because there is no reason to change messageKey by Compact who essentially does do only "shift mail data in a file", which will change messageOffset only.

As for Compact of local mail folder, what Compact needs to changes is;
  Berkley Mbox fle : messageOffset in a file.
  maildir          : ID wich is associated to mail file in directory.
  MessaeSize, if X-Mozilla-Status:/X-Mozilla-Status2: is added.
  MessaeSize, if X-Mozilla-Keys: is added or expanded.
  There is no nee to alter messageKey.
If IMAP folder, Compact is shift of mail data in offline-store file.
  messageOffset in file is changed, as done on Berkley Mbox file.
  messaeKey can't be changed, because messageKey=UID in an Mbox of IMAP.
Message-ID is not unique as you can copy emails between folders and back.
I don't know what would be a unique identifier of a message. Maybe that is why message key was invented. To be a handle when messages need to be referenced inside TB. This key is guaranteed to be unique inside a folder.

I'd think for maildir the key is unique but offset is 0 for all messages (as each one starts at the beginning of its file). How message key is translated to the filename is something I don't know, as I have not looked at maildir yet (maybe the key (number) directly means the file name?). The changes I did are in mbox files, I hope they do not break maildir or imap. We will see.
Summary: Compact of Berkley Mbox file shouldn't alter messageKey by Compact operation after design change from "messageKey=messageOffset" to "messageKey=Highest key + 1", to avoid problems like bug 817245 due to messageKey value change according to offset change → Compact of Berkeley Mbox file shouldn't alter messageKey by Compact operation, to avoid problems like bug 817245 due to messageKey value change according to offset change
compact has always changed message keys - when the message key was the offset into the mailbox, every message that came after a deleted message had its message key changed by compact.

message-id's are not unique. message keys are.

To be clear, messageOffset shouldn't be used to access a message - the pluggable store should be (and they use the storeToken property).
(In reply to David :Bienvenu from comment #7)
> compact has always changed message keys - when the message key was the
> offset into the mailbox, every message that came after a deleted message had
> its message key changed by compact.
> 
But is it a necessity? The compact code generates the keys. If it can access the original keys of the messages, it could preserve them.
But only if we decide keys never equate offsets (otherwise keeping original high keys (=offsets) may prevent a new message being appended with a key = offset (file position) that is now lower than the already existing keys).

> To be clear, messageOffset shouldn't be used to access a message - the
> pluggable store should be (and they use the storeToken property).

We try to make this even more clear in bug 793865 that offset is not the key and should never be used as such (However, we can generate the key from the offset for a new message). However how does storeToken relate to message key? Could you please elaborate a bit on this? I've seen sparse use of storeToken in the code when looking around for usage of keys.
Client code needs to deal with the message keys getting regenerated, in general (e.g., imap server changes uid validity).

nsMsgKey is a 32 bit uint used to get an nsIMsgDBHdr from an nsIMsgDatabase. They are assigned in the order messages are added to the mailbox, but beyond that, code shouldn't make any assumptions about messageKeys. storeToken is a property of nsIMsgDBHdr that each pluggable store uses to get access to the message. Code outside the pluggable stores shouldn't make *any* assumptions about the storeToken.
Thanks David. Could you then look at the patch in bug 793865 ?
(In reply to David :Bienvenu from comment #7)
> To be clear, messageOffset shouldn't be used to access a message - the
> pluggable store should be (and they use the storeToken property).

"storeToken = 1546" is seen in dump data of StringProperty values of mail in Inbox(Offline-Use=On) of comment #5. This is same as "messageOffset =  1546". Code who sets it is;
> PR_snprintf(storeToken, sizeof(storeToken), "%lld", m_startOfNewMsg);
storeToken of offline-store file data looks currently set from offset in offline-store file if IMAP offline store.

Is fetch of message data by auto-sync invoked when panacea.dat is broken or deleted in order to set storeToken again?

> MsgToken
> http://mxr.mozilla.org/comm-central/source/mailnews/base/public/nsIMsgPluggableStore.idl#181
> 181    * @param aMsgToken token that identifies message. This is store-dependent,
> 182    *                  and must be set as a string property "storeToken" on the
> 183    *                  message hdr by the store when the message is added
> 184    *                  to the store.
storeToken is stored at;
> http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp#989
> 966 nsOfflineStoreCompactState::OnStopRequest(nsIRequest *request, nsISupports *ctxt,
> 989       msgHdr->SetMessageOffset(m_startOfNewMsg);
> 990       char storeToken[100];
> 991       PR_snprintf(storeToken, sizeof(storeToken), "%lld", m_startOfNewMsg);
> 992       msgHdr->SetStringProperty("storeToken", storeToken);
> 993       msgHdr->SetOfflineMessageSize(m_offlineMsgSize);
storeToken looks saved in panacea.dat, because SetStringProperty was FolderCacheElement.
> http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgFolderCacheElement.cpp#84
> 84 NS_IMETHODIMP nsMsgFolderCacheElement::SetStringProperty(const char *propertyName, const nsACString& propertyValue)
What's true for one pluggable store about storeToken won't be true in a different pluggable store, so the code outside the pluggable store should not treat storeToken as anything other than a black box.

No one should be writing storeTokens into panacea.dat - they're properties of msg hdrs, not folders.
Dump of msgDBHdr and StringProperty of a mail in local folder(Berkley). 
storeToken looks set from offset(==messageOffset) in Berkley too. 
> messageKey = 1915, messageOffset = 1915, messageSize = 937, offlineMessageSize = 0, date = 1360709992000000, dateInSeconds = 1360709992, StringProperty_pendingRemoval = , flags = 8388625, flag_Detail = { FeedMsg = false, IMAPDeleted = false, MDNReportSent = true, Read = true, Replied = false, Marked = false, Expunged = false, HasRe = true, Elided = false, Offline = false, Watched = false, SenderAuthed = false, Partial = false, Queued = false, Forwarded = false, Priorities = false, New = false, Ignored = false, MDNReportNeeded = false, Template = false, Attachment = false, Labels = false, RuntimeOnly = false }, messageId = 511AC968.1030000@gmail.com, mime2DecodedSubject = hello
>   [All_StringProperty] = { msgOffset = 77b, storeToken = 1915, flags = 800011, label = 0, statusOfset = 21, sender = yatter king <yatter.king@gmail.com>, recipients = yatter king <yatter.king@gmail.com>, subject = hello, message-id = 511AC968.1030000@gmail.com, references = <511A0FB8.7030704@gmail.com> <511A1079.9050204@gmail.com> <511AC55A.7050803@gmail.com>, date = 511ac968, dateReceived = 511ac968, priority = 1, msgCharSet = ISO-8859-1, size = 3a9, numLines = 12, threadParent = ffffffff, msgThreadId = 54d, ProtoThreadFlags = 0, sender_name = 85|yatter king, recipient_names = 6|yatter king, gloda-id = 6f }

stringToken in maildir looks unique file name for mail.
> http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsMsgMaildirStore.cpp#569
569 nsMsgMaildirStore::GetNewMsgOutputStream(nsIMsgFolder *aFolder,
> http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsMsgMaildirStore.cpp#618
> 618   // save the file name in the message header - otherwise no way to retrieve it
> 619   (*aNewMsgHdr)->SetStringProperty("storeToken", newName.get());
If maildir, Reusable=false looks set. 
> http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsMsgMaildirStore.cpp#579
> 579   *aReusable = false; // message per file

Correct access to mail data for msgDBHdr of messageKey=xxx looks:
1. get msgDBHdr for messageKey=xxx
2. get storeToken=sss from StringProperty of msgDBHdr
3. request mail data for storeToken=sss to pluggable store
4. each message store processor
4-A. Berkley    
   messageKey = unique ID assined by Mork
   access mail data via messageOffset, messageSize, statusOfset
4-B. IMAP Offline-use=Off
   messageKey = UID in IMAP assigned by server
   fetch mail from server using UID==messageKey, 
   or get data from Memory/Disk Cache for key==messageKey
4-C. IMAP Offline-use=On 
   messageKey = UID in IMAP assigned by server
   if nsMsgMessageFlag::Offline==Off,
      if non Gmail IMAP, fetch mail body from server
      if Gmail IMAP, if mail of same X-GM-MSGID is held in offline-store
         of special folder, use it, else fetch mail body from server
   mail data in offline-store ifile s accessed using
      messageOffset, offlineMessageSize
4-D. maildir
   messageKey = unique ID assined by Mork or someone else
   get file name for storeToken or something else
   messageOffset==0 always when maildir

In any case, key(messageKey, storeToken, ...) is merely "unique identifier" to access mail data pointed by the key.
If so, only messageKey==UID can be assumed only when IMAP server access in IMAP Mbox case.
If so, Compact should change only offset value(and size if required). Compact shouldn't change "key" value. "key" value change or "key" assignment change(uid validity change in IMAP, msgKey change in Berkley, file name change in maildir, ...) should be done only by "Rebuild Index" operation.

David, is it right?
Or messageKey value is always changed by Mork upon mail data copy to nstmp for Compact if Berkley, and it can not be avoided?
Following is interesting test result in Tb 17.0.3 on MS Win.
- Initial file size < 4GB - 4MB - 1MB, so POP3 download is initiated
- File size before Compact(after POP3 download) / after Compact < 4GB
- a 1175 bytes mail of messageKey=4293918873 is deleted before Compact.
  previous mail is messageKey=4293918872, messageOffset=4293918873,
  then messageKey is incremented by 1 when downloaded.

(1) After POP3 download, Before Compact (file size < 4GB)

> 4293918872                        = 3 GB + 1023 MB +    0 KB + 152 Bytes
> Delta from 4GB = 4GB - 4293918872 =                  1023 KB + 872 Bytes < 1024 KB == 1MB

(dump data of msgHdr property and StringProperty of storeToken)
>            mime2DecodedSubject = 1048116B-Mail-000001 : messageKey = 4293918872,  messageOffset = 4293918872,                    messageSize = 1048268, storeToken = same as messageOffset
> Deleted => mime2DecodedSubject = 1KB-Mail-000001      : messageKey = 4293918873,  messageOffset = 4293918872 + 1048268,          messageSize =    1176, storeToken = same as messageOffset
>            mime2DecodedSubject = 1KB-Mail-000002      : messageKey = 4293918874,  messageOffset = 4293918872 + 1048268 + 1176,   messageSize =    1176, storeToken = same as messageOffset
>            mime2DecodedSubject = 1KB-Mail-000003      : messageKey = 4293918875,  messageOffset = 4293918872 + 1048268 + 1176*2, messageSize =    1176, storeToken = same as messageOffset

(2) After Compact (file size < 4GB)

(dump data of msgHdr property and StringProperty of storeToken)
>            mime2DecodedSubject = 1048116B-Mail-000001 : messageKey = 4293918872,  messageOffset = 4293918872, messageSize = 1048268, storeToken = 4293918872
>            mime2DecodedSubject = 1KB-Mail-000002      : messageKey = 4293918874,  messageOffset = 4294968316, messageSize =    1176, storeToken = 4294968316
>            mime2DecodedSubject = 1KB-Mail-000003      : messageKey = 4293918875,  messageOffset = 4294969492, messageSize =    1176, storeToken = 4294969492

(i) Mode of messageKey=highest_messageKey+1 looks to start at 4GB-1MB.
(ii) When messageKey is highest_messageKey+1 and is greater than threshold(4GB-1MB?), messaeKey is already not altered by Compact.

It seems "stop generate MsgKey from offset in Compact" is sufficient.
FYI.
Behavior in Maildr.
- MessageKey starts from 1, and is incremented by 1 upon download or
  mail copy.
- Even after delete of all mails, highest messageKey is remebered,
  and "new messageKey=highest messageKey" continues.
- messageKey is re-organized(starts from 1 aain) by Repar Folder.
- storeToken(file name if Maildir) seems timestamp of download
  which is writen in "From - " line when mail is downloaded.
- Same storeToken(file name if Maildir) looks used upon mail copy,
  and if file of same name already exists, suffix of "-NN" is added.
  This is same in "copy from Maildir to Maildir" and "copy from Berkley
  to Maildir", although Bug 793524 currently occurs when "copy from
  Berkley to Maildir". When Berkley, storeToken is currently same as
  messageOffset, so, if many "mail at offset=0" in different Berkley
  Mbox is copied to a Maildir Mbox, file of 0-1, 0-2, ..., 0-99, ... is
  created.

I think that "always messageKey=highest messageKey + 1" and "copy messageKey when Compact instead of assign new msgKey8rom offset)" and "re-assignment of messageKey only by re-build index" is already possible even for Berkley Mbox file, although code which assumes messageKey==messageOffset should be removed before actually use it.
Summary: Compact of Berkeley Mbox file shouldn't alter messageKey by Compact operation, to avoid problems like bug 817245 due to messageKey value change according to offset change → Compacting Berkeley Mbox file changes messageKey (to new MsgOffset after compact), causing dataloss/privacy problems like bug 817245 due to current design problem of MsgKey=MsgOffset (for Berkeley Mbox files)
This bug causes dataloss - original content not sent.
This bug causes major breach of privacy - random unintended content from unrelated messages sent (worse, it can even happen covertly without any notice).
This bug should be fixed as a matter of urgency.

Fixing this bug along WADA's proposals and other professional input looks possible without reinventing the wheel if we just prevent MsgKey=MsgOffset.
Chances are that with this relatively small change, we can eradicate a large number of bugs caused by this.

Per this bug, TB uses MsgKey=MsgOffset (for Berkeley Mbox files) for temporary references to embed text/images from other messages which then gets derailed if compacting changes MsgOffsets while compositions/saved drafts with such references are around (offset changes -> MsgKey changes -> embeded data missing or reference now pointing to text/images from other messages which happened to end up having the referenced MsgOffset).

Iow, this major design bug is the main cause of:

Bug 817245 - If Offset of replied-mail/forwarded-Inline-mail/previous-draft-mail is altered by Compact while composing mail, Send/Save can't find image data in the original mail then Tb spins with "Attaching...", when the original mail is HTML mail with embed image.

Bug 766495 - Draft composition shows wrong in-line images from other draft, if other draft mail is placed at original offset of editing draft mail by Compact. So, if mail is sent without draft save after Compact, wrong image is silently sent by Tb.

Bug 799450 - Thunderbird adds the text of an email in the Drafts folder to an email I send (Confidential data in other/irrelevant draft mail is silently exposed to unexpected recipients by Tb)

All of these are part of a bigger wrong design in TB composition where TB only keeps references to things outside its control instead of taking and guarding immediate snapshots when the user inserts/adds such things to his composition (see attach-paradigm-fail tracker Bug 877159).

Apart from compaction messing up messageKey, the other reason for such failures is when such messages/images inside other messages etc. have simply been moved or deleted and thus no longer available when TB wants to get them from their original locations. E.g., there's:

Bug 453196 - When replying to a message with an embedded attachment(embed image of HTML mail), if the original is deleted before sending the reply, it hangs with "Attaching..."

It's very hard for users to understand and analyse such failures, hence we also entertain the following cesspool for bugs with either of these causes (compact or deletion/moving of things referenced from other messages):

Bug 532395 - Sending reply to (or forward inline of) existing HTML message with embed images fails with never-ending error message: "attaching..."

Some of these bugs have been frequently reported for a long time, and duplicates keep coming in, e.g.
Bug 532395: 72 votes, 10 dupes
Blocks: 817245, 766495, 799450
Severity: major → critical
Keywords: dataloss, privacy
(In reply to Thomas D. from comment #16)
> This bug causes dataloss - original content not sent.
> This bug causes major breach of privacy - random unintended content from
> unrelated messages sent (worse, it can even happen covertly without any
> notice).
> This bug should be fixed as a matter of urgency.
> 

Thanks for the analysis and summary. I never thought
the compaction is related to this missing image while composition.
It is indeed very difficult to connect the issues.

It has been a while since I last visited this bugzilla entry.
If someone can explain the relevant patches to apply
and how to test, I can probably check it. (Caveat: may main test environment is 64-bit linux.)

I have been bitten a data-loss during a compaction many years ago, and
ever since every time I am asked if it OK to compact by auto-compact feature, 
I feel uneasy. :-(
I do compact manually after a backup.
If I can help remove an issue from compact function, I will be glad to chime in.

TIA
> It is indeed very difficult to connect the issues.

Bug 799450 is privacy/security issue and is most critical case in friends of bug 817245. Missing image is far smaller problem than Bug 799450 :-) "Fixing this bug" can resolve such problems, and "fixing this bug" itself is not so complicated work even if it's tough work.
However, root cause of such problems is never this bug. Root cause of such problems is in design/implementation of mail composition. "messageKey==messageOffset" was design/implementation of Netscape Mail/Mozilla MailNews/Thunderbird for pretty long time. Culprit is never this bug. And I put bug 817245 in bug summary, and other bug is listed in dependency tree for bug 817245. So, I didn't pay attention on depedecy setting of this bug.
After it, I was recently aware of that such bugs was not set in dependency tree of this bug, by bugzilla mail which notified me "dependecy change of this bug by Tohmas D."...
Sorry for forgetting to set dependency.
Note: Why we can use "fix of this bug" as solution of such bugs is following. Thanks to developer(s).
  Developer(David) had clearly separated messageKey and messageOffset
  by "pluggable msgStore" support, with extending messageOffset to 64 bits integer.

For testing.
I created addon to dump properties of msgFolder object, msgDatabase object, msgDBHdr object, window object, document object, window.gFolderTreeView object, window.gFolderDisplay object, wndow.gDBView object, etc. to Error Console. "messageKey/messageOffset data etc. what I pasted to this bug" is data obtained by the my addon.
If you need, I can open it to you.
(In reply to WADA from comment #18)
> > It is indeed very difficult to connect the issues.
> 
> Bug 799450 is privacy/security issue and is most critical case in friends of
> bug 817245. Missing image is far smaller problem than Bug 799450 :-) "Fixing
> this bug" can resolve such problems, and "fixing this bug" itself is not so
> complicated work even if it's tough work.
> However, root cause of such problems is never this bug. Root cause of such
> problems is in design/implementation of mail composition.
> "messageKey==messageOffset" was design/implementation of Netscape
> Mail/Mozilla MailNews/Thunderbird for pretty long time. Culprit is never
> this bug. And I put bug 817245 in bug summary, and other bug is listed in
> dependency tree for bug 817245. So, I didn't pay attention on depedecy
> setting of this bug.
> After it, I was recently aware of that such bugs was not set in dependency
> tree of this bug, by bugzilla mail which notified me "dependecy change of
> this bug by Tohmas D."...
> Sorry for forgetting to set dependency.
> Note: Why we can use "fix of this bug" as solution of such bugs is
> following. Thanks to developer(s).
>   Developer(David) had clearly separated messageKey and messageOffset
>   by "pluggable msgStore" support, with extending messageOffset to 64 bits
> integer.
> 
> For testing.
> I created addon to dump properties of msgFolder object, msgDatabase object,
> msgDBHdr object, window object, document object, window.gFolderTreeView
> object, window.gFolderDisplay object, wndow.gDBView object, etc. to Error
> Console. "messageKey/messageOffset data etc. what I pasted to this bug" is
> data obtained by the my addon.
> If you need, I can open it to you.

Thank you for the pointers.
I never thought such serious bug as Bug 799450  exited (!)
I used saving drafts before and who knows what happened when I re-edited
the drafts and sent them out. Yikes.

The addon you mention sounds useful, I wonder if that tool is made available through mozilla server?

Anyway, maybe aceman can comment which latest patches are relevant for testing the issues?
Flags: needinfo?(acelists)
I don't think I have any patches that can help here.
If you can find places in drafts code that assumes messageKey==messageOffset (which already isn't true as you approach 4GB folder size) and uses messageKey as the offset, then feel free to fix them. Until then, can we disable compacting if a message is being composed (compose window open) or there are messages in Drafts?

I already have patches that take the messageKey!=messageOffset down to 2GB, and maybe we could lower it to 0KB so that it never equates. Maybe we would suddenly find all the bugs ;) But I haven't got response from David Bienvenu if we can really do that. So we would need to try it out ourselves in some coordinated effort (as the patch wouldn't be checked in until all the bugs are fixed).
Flags: needinfo?(acelists)
But the drafts fixes should be done in bug 799450 or bug 817245, not here.
This bug as filed is probably WONTFIX as the change of messageKey is a design decision and change of messageKey should not be a problem if all code would behave properly.
Summary: Compacting Berkeley Mbox file changes messageKey (to new MsgOffset after compact), causing dataloss/privacy problems like bug 817245 due to current design problem of MsgKey=MsgOffset (for Berkeley Mbox files) → Compacting Berkeley Mbox file changes messageKey (to new MsgOffset after compact), causing dataloss/privacy problems (bug 817245 / bug 799450, bug 766495) due to current design problem of MsgKey=MsgOffset (for Berkeley Mbox files)
[Tracking Requested - why for this release]:
tracking-b2g: --- → ?
I can actually try to do this after bug 793865 (or any other place we find that assumes offset=key) is fixed. I think we can preserve the existing keys at compaction. We should not run out of keys even if there is a big folder (near 4GB). Once we allow over 4GB folders and there is a folder created with old schema (key=offset), even 1MB of free space in the folder yields 1 million of free keys (assuming new schema of incrementing keys by one). The option of regenerating keys with new schema upon Repair folder would still be open to us. The feature here should work also in that case.

I don't think we should attempt to do this before TB38. Bug 793865 is risky enough by itself. Remember, if we turn this on and there is a forgotten place that assumes key=offset, we get dataloss quickly as immediately after compaction you get offset < key for maybe all messages.
Assignee: nobody → acelists
See Also: → 1108441
Blocks: 1108609
Actually I have found out that we already do not guarantee key==offset in all operations. E.g. messages moved by filters get key=(max.current key)+1, while messages downloaded or moved via user action do get key=offset. I noticed this on TB31 so this is already in users' hands. And I have not noticed any mass reports of huge dataloss or corrupted folders (for locally stored folders like POP3/news/rss).

So I could make the patch for this soon, if there is enough interest and nighly testers brave to run with this code;)

The question here is whether there really is code (e.g. in compose) that uses the old key to access a message it is holding onto, while the key has been changed by the compact. Or whether the code actually uses the key as the offset (as bienvenu says in older posts). Or uses the old offset, which will change after compact even after my patch.
Attached patch patchSplinter Review
So this works for me. Compact preserves keys of messages in the folder. Repair of folder should be unchanged (it still regenerates all keys), as that uses a different code path and will be affected by changes in bug 793865.
Attachment #8534024 - Flags: review?(neil)
Attachment #8534024 - Flags: review?(kent)
Comment on attachment 8534024 [details] [diff] [review]
patch

Review of attachment 8534024 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, r=me  Amazing how simple the actual change ended up being.

As a general rule, I don't like patches where most of the changes are cleanup. They tend to obscure he crux of the change. No need to change this one, but in the future I would prefer if cleanup is less than 50% of patches. If necessary, split our a cleanup only patch.

As for backwards compatibility, we've looked at this on other bugs, and convinced ourselves that key != offset has worked since at least Thunderbird 17. This will break someone who compacts on TB 38, and then tries to run TB prior to 17, but it is not the only patch that will do so.
Attachment #8534024 - Flags: review?(kent) → review+
(In reply to Kent James (:rkent) from comment #26)
> As a general rule, I don't like patches where most of the changes are
> cleanup. They tend to obscure he crux of the change. No need to change this
> one, but in the future I would prefer if cleanup is less than 50% of
> patches. If necessary, split our a cleanup only patch.
Yeah, sorry, looks like I got carried away by the formatting in that file ;)

> As for backwards compatibility, we've looked at this on other bugs, and
> convinced ourselves that key != offset has worked since at least Thunderbird
> 17. This will break someone who compacts on TB 38, and then tries to run TB
> prior to 17, but it is not the only patch that will do so.
Yes so from my observations, we already have a mix of key=offset or key<offset in a folder since TB17. But this patch will produce a new state, a large batch of messages having key>offset. Either that does not make things any worse, or it does.

So this patch just preserves existing keys, it does not affect how the keys are generated for new messages when they arrive.
Status: NEW → ASSIGNED
It's been so long that I have mostly forgotten the issues in this bug, but we need to get some of this stuff landed if we have hopes of finishing key cleanup for TB 38.

Why are you asking for double review of this? Review time is severely limited, and that greatly slows down progress. Can we just get this landed with my review, or is there some reason why Neil also has to look at it?
I think we can try to land this one.
Keywords: checkin-needed
Why does this bug depend on bug 793865?
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 38.0
If problem occurs even when messageOffset of replied mail is not changed, it may be problem when multiple "draft edit/save" is executed.
IIRC, it was imap draft only problem but it might be problem which is irrelevant to server type. And, IIRC, it was resolved...
   Reply -> Save as draft -> Edit draft -> Send => OK
   Reply -> Send Later -> Send => OK
   Reply -> Save as draft -> Edit draft , Send Later - Send => fails
   Reply -> Save as draft -> Edit draft, Save -> Edit draft, Send => fails
Checked with nightly. messegeKey was kept/messageOffset was updted by Compact, messegeKey/messageOffset was updated by Rebuild-Index.
> Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Thunderbird/38.0a1

Outstanding issue.
Rebuild-Index still sets messageKey=messageOffset. "Start from 1, increment by 1" is sufficient and is better as action of Rebuild-Index.
Status: RESOLVED → VERIFIED
Thanks for verification.

(In reply to WADA from comment #33)
> Outstanding issue.
> Rebuild-Index still sets messageKey=messageOffset. "Start from 1, increment
> by 1" is sufficient and is better as action of Rebuild-Index.
This will be a side-effect of bug 793865.
Comment on attachment 8534024 [details] [diff] [review]
patch

This has already landed, a further review is not needed.
Attachment #8534024 - Flags: review?(neil)
Flags: in-testsuite+
Blocks: 1144020
No longer blocks: 1144020
Apparently, this is still not enough to prevent bugs from "Blocks" field entirely; but at least they should now only occur after manual user intervention of "Repair folder", which rebuilds the index and again changes messageKey in the process, then the links from saved/current compositions break. The necessity of such intervention (manual or not) is another inherent design problem. I guess this lends further support to my claim that we'd be better of with immediate snapshot copies of things we now directly link to, because such links can easily break when they change, as long as we're not fully controlling them. Even embedding of content could be considered as an option.

Current: We link to original stuff whose location can change, causing link breakage
Improvement 1: Take immediate snapshot and create our own copy of stuff, and link to that (so that link is fully controlled, and may not change)
Improvemnt 2: Take immediate snapshot and embed a copy of stuff right into the current message (so that stuff is fully controlled as it's already there).

Details might be more tricky (e.g. wrt IMAP), but that's my current understanding in broad terms. Pls correct me if I'm wrong.

(In reply to WADA from comment bug 766495 comment #41)
> This bug still can occur by "messageKey change due to Rebuild-Index". So,
> essential problem of this bug is not resolved yet.
> However, it may be better to open separate bug for Rebuild-Index case in
> order to to avoid confusion.
Whiteboard: [this will fix most, but not all scenarios of dependent bugs; "Repair folder" can still change messageKey and break linked things]
Depends on: 1183490
Blocks: 1201782
No longer blocks: 1201782
You need to log in before you can comment on or make changes to this bug.