Open Bug 1554529 Opened 6 years ago Updated 1 month ago

Redundant copies of multi-labeled messages stored for GMail (maildir profile, much better for mbox)

Categories

(MailNews Core :: Backend, defect)

defect

Tracking

(Not tracked)

People

(Reporter: JamesKessel, Unassigned)

References

(Blocks 1 open bug)

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0

Steps to reproduce:

Set up Thunderbird maildir profile for GMail account with subscription not set for most label. (I believe I had had almost all labels set for subscription, but the subscription somehow became changed to a small, and apparently arbitrary, subset. I suspect that is evidence of some other bug, but I did not observe the behavior with sufficient care to enable me to report it as one. I hope the occurrence is not a necessary precondition for the instant bug.)
After completion of message down load, added subscriptions, and Synchronization Download option, for other labels, both user-defined and the built-in Important & Starred ones.

Actual results:

Thunderbird down loaded and stored on the local disc copies of the additional label's messages, notwithstanding that the messages were already stored locally for Inbox or Sent Mail.

Expected results:

As I understand it, the Thunderbird folders for the additional labels should have shared the copies that were already present. Thus allowing me to read every message from a label's folder but without (materially) consuming internet data allowance or local disc space.

Please see https://support.mozilla.org/en-US/kb/thunderbird-and-gmail#w_subscribing-to-folders-and-synchronizing-messages -- Quotation: "Note that a message can have multiple labels ... . In this case, a single copy of this message will be downloaded, but it will be displayed in all the corresponding Thunderbird folders".

I am using maildir because the account is 20+ GiB. I have not tried with MBox.

Hmm, I think the support article is lying. TB integration with Gmail isn't perfect and if it looks like a different folder, it will be downloaded/stored again.

Jorg is right and the support article is wrong if read literally. If you have the same message having multiple gmail labels, it will appear as a separate message in the imap folders downloaded by tb from gmail (where folder==label). The only way to have it not be downloaded for each label/folder is to set the folder in tb to not have offline storage so only the message header is stored locally in the folder's *.msf file.

Also, I think the main advantage of maildir is that it makes local backups faster since each message is stored in its own file rather than one big file per folder. Therefore, the fact that your account is 20+ GiB may not be sufficient reason to use the non-default maildir vs. the default mbox storage format. Just something to consider.

Mbox also won't make a difference with the issue you have raised in this bug.

Comment re implemented functionality:

Thank you for your replies, Jorg K & gene smith.

I’ve known one fundamental rule of software for a quarter century: if you want to know what it does, read the code, not the doc., nor even the comments in the code!

I’m not sure either of you has read the code, but, if not, I believe you’ve spoken from actual experience; i.e. it’s a reliable, empirical, result.

I am grateful for the advice that the functionality gap is equally present in the default, mbox, configuration. That’s saved me a lengthy, and futile, reattempt.

I’ve checked the Internet Archive, and the doc. seems to have changed from “Note that a message can have multiple labels (for instance, ‘Personal’, ‘Travel’, ‘All Mail’ and ‘Starred’). In this case, a copy of this message will be downloaded and displayed in all the corresponding Thunderbird's folders” on 9 September 2015, to “… In this case, a single copy of this message will be downloaded, but it will be displayed in all the corresponding Thunderbird folders”, on or before 4 December 2015. (Refs: https://web.archive.org/web/20150909214838/https://support.mozilla.org/en-US/kb/thunderbird-and-gmail https://web.archive.org/web/20151204125247/https://support.mozilla.org/en-US/kb/thunderbird-and-gmail.)

It’s extremely common for people to change the code and forget to change the doc. The reverse is quite rare, though. Perhaps there has been a regression.

Unless I’m missing something, the suggestion “to set the folder in tb to not have offline storage so only the message header is stored locally” won’t achieve my full objective (which I did not state in the bug report). I am trying to ensure I’ve a back up of every message, and the body of any message that had only “header-only” labels / folders wouldn’t get backed up at all (unless the message had been opened in the tb profile prior to the back up being run).

I had also hoped my back-up program could de-duplicate the message files by checksum (/ content). However, this seems to be thwarted by tb writing the time it downloaded the message (different for each label / folder) at the top of the message file.

“if read literally” - gene, what other way would one read it? “No, no...not a pun...What's that thing that spells the same backwards as forwards?”

Well, I am probably wrong but I haven't tested it to see. I think maybe it does require mbox storage but not sure. I founds bugs that involve the gmail headers X-GM-* and work was done and part of it was to avoid redundant storage with gmail when the same messages has multiple gmail labels: Bug 721316. That would make sense if someone went to the trouble of updating that documentation.

I did a quick test and set two gmail folders ([Gmail]/f1 and [Gmail]/f3) to have offline sync'd storage (mbox format). I then copied a message from f1 to f3. Looking at the storage files for f1 and f3, I see the same full email in both files. So, having the "same" message in two folders causes redundant storage. I will try it now using the labels at https://gmail.com.

Test at gmail site, setting label on message in folder [Gmail]/f1 to have label [Gmail]/f3 also results in redundant storage in the tb mbox file (the full message appears in files f1 and f3).

So now I don't know if this is a bug or not. I wasn't around when gmail was added to tb so I don't know the history firsthand. I have seen maybe more than one "meta" bugs regarding gmail support such as this: Bug 402793 (tb-gmailWIP). Looking at one of the bugs listed, Bug 901287, it seems to claim that the feature is present (Bug 901287 comment 2) but the bug is closed with resolution INCOMPLETE. So my guess is that this feature was attempted and something was added to tb but it never quite worked right.

Edit: I just realized that INCOMPLETE probably means the reporter never responded, not that the fix is incomplete.

I think maybe a problem for me is that I don't have "All Mail" sync'd for local storage. I can't because this laptop has a very small SSD, therefore I only store headers (except for folder f1 and f3 mentioned above). From reading the bug reports I think if I had all folders sync'd then mail would only appear in [Gmail]All Mail's mbox file. I haven't tried that on my desktop with harddrive but maybe that's the key. Also, not sure if mbox vs. maildir format affect this. I will check an mbox setup on desktop later.

On desktop system having full gmail sync to mbox files, messages with multiple labels are still duplicated between Inbox, All Mail and a test folder, appearing in each of the folder's mbox file. So the reporter's observations seem to be valid.

Well, doing some more tests I have concluded that the feature is actually working for the most part. My problem was I still didn't have offline storage enabled for the "system" labels: Inbox, All Mail, Sent, Drafts, etc. Now I can go to gmail site and put a label on a message in Inbox, e.g., toplevel2, and I go to toplevel2 folder in tb and see the message there too. When I click on the new message in toplevel2, no download from gmail occurs since the message is fetched from Inbox's offline storage. The offline storage file for toplevel2 remains empty.

However, if I copy a message from Inbox to toplevel2 with tb, the offline storage for toplevel2 now contains the copied message; but another download (actually imap fetch of body) doesn't occur. So you don't get a redundant download but you do have redundant storage if you do a copy in tb.

Another thing; I have done the above tests with autosync disable in preferences. I need to repeat the tests with autosync enabled, which is default. Also, I only tested with mbox offline storage format and not maildir.

Edit: Test with autosync enabled again (default setting) and still works OK. Haven't tried maildir storage format yet.

Removed and re-setup gmail account with maildir format. Now toplevel2 label placed on message in Inbox causes a new email file in folder toplevel2's cur directory. So in comment 2 above where I said,

Mbox also won't make a difference with the issue you have raised in this bug

I was wrong. Correct behavior only occurs for mbox, not for maildir.
Edit: I should also mention that the email is imap fetched via network into toplevel2/cur and not just copied there with maildir format in use.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Redundant copies of multi-labeled messages stored for GMail (maildir profile) → Redundant copies of multi-labeled messages stored for GMail (maildir profile, much better for mbox)

Great work, gene! Appreciated.

I’m standing by for the issue to be fixed via redocumentation as an mbox-only feature.

I’m kinda interested in the idea that the feature might work only for profiles with a full-sync set on “All Mail”. That raises the question of what happens if the “All Mail” sync is not set on until way into the lifetime of the profile. Would a copy of every message be pulled down (again) through IMAP, and written into the local storage associated with “All Mail”? Would the, now redundant, copies in the other folders’ local storage be purged, or would the avoidance of duplication apply only for new messages (new, as in not already stored locally for the other folders)?

Funny thing is that “All Mail” is the only folder I hadn’t set for full-sync (as I think I’m verifying, periodically, in GMail’s browser interface that all messages have at least one label).

I think requiring a full-sync of “All Mail” would be a bit of a pain. We may have only a few labels the messages for which we wish to store locally, maybe having a large archive of older messages, in the GMail account, that we certainly don’t wish to retrieve in tb. (That’s not my use-case, at the moment, though.)

I was somewhat surprised, gene, that you’d said 20 GiB of messages was not really a reason to prefer maildir over mbox. I still think of MS Doubtlook scrambling its O/PSTs fairly frequently once they’ve hit 4 or 5 GiB. Maybe this is one of the areas where the march of technology has slipped by me. I wouldn’t want you to write out an explanation, but if there’s a best-practice sort of guide, for which you can post the URL, I’d be grateful for the opportunity to update myself.

Sorry to be mixing the terms “folder” and “label”. It’s difficult to make the terminology work, since “All Mail” probably isn’t really a label.

I know marching to Google’s tune is controversial, but labels instead of folders does seem the better concept.

Component: Untriaged → Backend
OS: Unspecified → All
Product: Thunderbird → MailNews Core
Hardware: Unspecified → All

I've done some more tests on this with maildir and mbox. With maildir the problem seems to be that it fails here:

https://searchfox.org/comm-central/rev/94dde9f62467a62fdcab1ec8c5acafa14df5151d/mailnews/db/msgdb/src/nsMsgDatabase.cpp#4209

This function that returns the "row" succeeds (returns NS_OK) but the pointer to the row, *hdrRow is always null when maildir is used. It is non-null when mbox is used so the header is retrieved and seen as having offline store in the calling function (at least when message has Inbox label, see below).

However, even with mbox this only seems to work when the original label is on Inbox. There is no label returned by the imap fetch when the message is in All Mail so if the message has been moved from Inbox the access in the new label causes a re-fetch and it writes new offline storage. Sent label may also work but I didn't try it. The other system labels (Drafts, Junk, Trash) probably don't work since they don't have offline store by default, but, even if they did, probably no reason you would place other labels on them anyhow

So, in general, I would say that this feature (non-redundant storage and fetching) is effectively broken and of little use for both "plugable stores", mbox and maildir.

The database code that handles this and was added to support this feature is pretty much beyond my understanding. There are 3 comments that sort of describe the theory of operation but I think most of the discussion between the mentor Bienvenu and the "summer of code" developer was handled via IRC:
Bug 721316 comment 11
Bug 721316 comment 13
Bug 721316 comment 14

Bienvenu à la jungle!

I think the most robust design would just have a single “All Sync’ed Down Mail” folder, and every *.msf would point to messages in it (whether it’s a maildir filesystem directory, or an mbox file). Probably, have a master *.msf for the “All Sync’ed Down Mail” folder to record common attributes, such as whether the message has been read (and the current set of labels, in case we want to show those, with the message, as the GMail browser interface does).

Trouble with that idea is that we’d need some process for converting existing profiles for Gmail accounts, unless it’s decided to leave them in legacy (broken) mode.

It would be / is rather complicated if each message is stored just in the folder for which it is first downloaded. I suppose, if the message is later removed from that folder (label removed, or “archived” from Inbox, etc.) it will just be redownloaded, afresh, whenever the first folder that should also contain it is sync’ed (or, if only messages in Inbox are ever shared, whenever any of the other folders that should also contain it is sync’ed). That could be a nasty surprise to users who have renamed a large folder (or "archived" a mass of messages from the Inbox) using another client or the browser interface (were they believing the doc., anyway).

Back to plan B: change the doc. to match the code!

(In reply to James Kessel from comment #13)

Bienvenu à la jungle!

Bienvenue dans la jungle/Jangra ??
I have to always look up your pop culture references: Dead parrot; Welcome to the Jungle, etc :).

Back to plan B: change the doc. to match the code!

I really don't know who does the mozilla docs. I'm just a volunteer to work on tb network problems that arise, mostly imap related.

There was someone supposedly officially hired to work on fixing maildir plugable store issues but I don't know who or how that's going (or if it's going).

Maybe Wayne or Jorg who are cc'd on this bug know about this.

Plan B: gene, sorry but you've got me worried my comment could be, and was, read as a snarky dismal of how you've responded to the bug report. Well, that was not what it meant when I was typing it! I'm annoyed that I may clumsily have induced some of the irritation that makes volunteers start to ask "why bother?".

My reflection was that, actually, the issue seems to require a design change, thus a reasonable amount of development work, and either a greater amount of testing work or a significant risk to the stability of the product. Hence, there may be little appetite to incorporate it, not least because, in 3.5 years, there's been only this one report.

Not that I mean to down-vote this, though.

Yes, you've rightly picked me up on the French; it should have been "Bienvenue", with an "e". I'm going to stick to "à la", though, but I've no basis to claim any credibility. I'm totally lost with "Jangra". Google Translate indicated it's Javanese, but that's probably a coder's joke!

Anyway, thank you for all your work in verifying the discrepancy and thus ensuring that it's available for proper consideration and prioritization. I'll watch for movement, but it may take quite a lot of additional users to start registering interest before this is in code. At least people can find this thread if they desire the functionality.

No problem and no irritation!

Wasn't trying to correct your French since I didn't even really know Bienvenue meant welcome until I looked it up. The "Jangra" reference was to the "summer of code" developer who added the feature, Atul Jangra who work with Mr. Bienvenu. Maybe Atul is from Java, I have no idea. Anyhow, just trying to be funny too.

Anyhow, I will just leave bug in state NEW and let Wayne or Jorg decide how it should be resolved.

ref comment 2
Does the article need a change?

Flags: needinfo?(unicorn.consulting)
Flags: needinfo?(thee.chicago.wolf)

I have deleted the lines Wayne, perhaps you would like to approve it. I am still of the opinion we should default to the [Gmail] namespace and remove this oddity of a [Gmail] folder and the user support issues and documentation it creates.

Flags: needinfo?(unicorn.consulting)
Flags: needinfo?(thee.chicago.wolf)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.