Open Bug 1214407 Opened 9 years ago Updated 2 years ago

OS search integration folder wdseml should not be nessecary when using maildir accounts

Categories

(MailNews Core :: Database, defect)

Unspecified
Windows
defect

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: Thunderbird_Mail_DE, Assigned: benc)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

AFAIK the wdseml folder isn't nessecary when using a maildir store account. But the wdseml folder and the additional separate files for every mail are still created in maildir store accounts. Would it be possible to only have and use the files in /cur/ for windows search integration?
I was about to open an issue with the same content. Is there a reason why Spotlight is not working on the maildir files created by Thunderbird? One would expect that no workarounds and duplicate copies are needed once maildir is used.
Is there any progress in this matter for Thunderbird 60? Is this fixed in an other bug?
This is the definitive bug on this issue. So no, there is no change

I have a maildir test profile (under TB 38.3.0, so I recognise that things might have changed) and the maildir format used by TB is NOT suitable as it stands for indexing with Windows Search. Although the maildir message files appear to be valid EML/MHTML format files (good), they have no file extension. Lack of file extension means that it is not possible for the Windows Search indexer to know which IFilter to invoke to extract data from the files.

The solution is to ensure that some unique file extension is used for maildir message files in cur directories. Since .wdseml is currently used for this purpose it could be re-used for maildir message files without harm although a shorter file extension might ideally be preferred due to path length issues on Windows[1].

Question: Has the maildir file name format changed to have a file extension is current TB?

I should add that forcing Windows Search to directly index message files in this manner (whether they be .wdseml files generated in .mozmsgs folders or maildir message files directly) is better than nothing but it is still a kludge. The proper and better way on Windows to index a data store such as Thunderbird's message base is to use a Protocol Handler https://docs.microsoft.com/en-us/windows/win32/search/-search-3x-wds-extidx-overview[2]. This would allow TB to build its own data and metadata index (which of course it does) and then expose this to Windows Search queries. This is how Outlook works, for example. However, I recognise that this approach, superior though it is, would require writing and maintaining a custom Protocol Handler and that is far more work than using the built in EML/MHTML IFilter with a custom file extension.

Footnotes:-
1: My earlier comments on path length limit issues in Windows:
Thread beginning at: https://mail.mozilla.org/pipermail/tb-enterprise/2018-August/001426.html
My contributions to the thread which outline the problem and attempt to indicate how much of an issue maildir in TB might be on Windows are here:
https://mail.mozilla.org/pipermail/tb-enterprise/2018-August/001427.html
https://mail.mozilla.org/pipermail/tb-enterprise/2018-August/001428.html
https://mail.mozilla.org/pipermail/tb-enterprise/2018-August/001430.html
https://mail.mozilla.org/pipermail/tb-enterprise/2018-August/001431.html
https://mail.mozilla.org/pipermail/tb-enterprise/2018-August/001433.html
https://mail.mozilla.org/pipermail/tb-enterprise/2018-August/001434.html

2: The docs are current and up to date despite not referring to Windows 10.

Question: Has the maildir file name format changed to have a file extension is current TB?

Yes, in bug 1259040. For migration see bug 1526289.

(In reply to Jorg K (CEST = GMT+2) from comment #6)

Question: Has the maildir file name format changed to have a file extension is current TB?

Yes, in bug 1259040. For migration see bug 1526289.

Excellent, I'm glad that was fixed. Thanks for the links, Jorg.

I hope that non-extension-to-.eml migration can be implemented.

Assignee: nobody → benc

(NOTE: this patch is untested!)

Ok, so here's my stab at this.
The idea is that it avoids creating .mozmsgs dirs if the nsIMsgPluggableMailStore stores .eml files directly in the filesystem, and if it's in a location where the OS search can index it.
In practice, this means using maildir on Windows.

There's one detail I'm not 100% certain of:
in the folderMoveCopyCompleted() listener, there's the case where a folder is moved from a location which doesn't require .mozmsgs, to one which does require a .mozmsgs dir. I've treated this the same as the missing folderAdded() listener, which just relies on the idle processing to eventually deal with the new folder. I think this is the right thing, but I'm not totally sure.

Like I said, this isn't tested. I do need to get a windows and/or mac setup going, but for now it's probably quicker for someone else to give it a whirl, if there aren't any obvious issues with the approach I've taken.

Attachment #9150350 - Flags: feedback?(mkmelin+mozilla)

Some suggested tests:

Make up some test folders with small number of messages, some on maildir-backed accounts, some on mbox, with search integration turned on.

  • On Windows, check that the mbox folders have .mozmsgs dirs next to them in the filesystem, and that the maildir ones don't.
  • On Mac, both mbox and maildir folders should have the .mozmsgs dir (but located out in ~/Library/Caches/Metadata/... instead of in your profile dir.
  • Check that .mozmsgs contents properly update when you:
    • copy messages between folders
    • move messages between folders
    • rename folders
    • move folders
    • copy folders
    • copy messages from a maildir folder to an mbox one
    • add folders
    • add messages
    • delete messages
    • delete folders

Probably we should make up a stub OS-agnostic SearchIntegration and implement these as unit tests.

Better still might be to ditch the whole .mozmsgs mirroring thing and directly call the OS-specific search APIs to do the indexing directly, rather than relying on the OS indexing to pick up the files in the filesystem. Seems a little more... "integrated" somehow. I suspect you might be able to craft prettier displays in the OS search results too...
(Comment 5 mentions the Windows API, I think there's something similar on the Mac side too, but I'd have to look closer to say for sure. Either way, would be a separate bug).

Comment on attachment 9150350 [details] [diff] [review]
1214407-avoid-mozmsgs-if-possible-1.patch

Review of attachment 9150350 [details] [diff] [review]:
-----------------------------------------------------------------

This should probably say emtpy, not null - https://searchfox.org/comm-central/rev/cfa832291ae06d6b814f8a9ebe1991a08f25da92/mailnews/base/public/nsIMsgFolderListener.idl#100

::: mail/components/search/SearchIntegration.jsm
@@ +552,5 @@
> +        let reindexTime = this._getLastReindexTime(msgHdr.folder);
> +        this._log.debug("Reindex time for this folder is " + reindexTime);
> +        if (msgHdr.getUint32Property(this._hdrIndexedProperty) < reindexTime) {
> +          // Check if the file exists. If it does, then assume indexing to be
> +          // complete for this file

nit: add . while you're touching this

@@ +558,5 @@
> +            this._log.debug(
> +              "Message time not set but file exists; setting " +
> +                " time to " +
> +                reindexTime
> +            );

if we want to keep this debug, maybe also include msgHdr.messageId for debugging purposes.

@@ +666,5 @@
> +      }
> +
> +      for (let i = 0; i < aSrcMsgs.length; i++) {
> +        let srcMsg = aSrcMsgs[i];
> +        if (!SearchIntegration._requiresSupportDir(srcMsg.folder)) {

Hmm, a bit confused. Shouldn't we check the destFolder for this?

@@ +823,5 @@
> +  /**
> +   * Checks if a folder requires a separate .mozmsgs dir to be maintained.
> +   * Doesn't check to see if search integration is enabled or not.
> +   *
> +   * @param {nsIMsgFolder} folder The folder to check.

please add a dash after the parameter name

@@ +826,5 @@
> +   *
> +   * @param {nsIMsgFolder} folder The folder to check.
> +   * @return {boolean} True if the folder requires .mozmsgs support.
> +   */
> +  _requiresSupportDir(folder) {

this shouldn't have _ (for private). Seems pretty public.

::: mailnews/base/public/nsIMsgPluggableStore.idl
@@ +328,5 @@
> +   * This is important for search integration: if this attribute is true, then
> +   * the search integration knows it can avoid doing anything fancy and just
> +   * let the OS index the raw files directly.
> +   */
> +  readonly attribute boolean isEMLFiles;

would call it isEmlFiles
Attachment #9150350 - Flags: feedback?(mkmelin+mozilla) → feedback+
Status: NEW → ASSIGNED

(In reply to Ben Campbell from comment #9)

Better still might be to ditch the whole .mozmsgs mirroring thing and directly call the OS-specific search APIs to do the indexing directly, rather than relying on the OS indexing to pick up the files in the filesystem.

Ideally so. However, it does seem like non-trivial work to write a Protocol Handler. In comparison, it seems that leveraging the built in Windows Search IFilter for .eml/.mhtml files (which is the current approach) has passed the 'good enough' test for a long time.

(Comment 5 mentions the Windows API, I think there's something similar on the Mac side too, but I'd have to look closer to say for sure. Either way, would be a separate bug).

For the avoidance of doubt on Windows, it's not just an API that can be called from an application: One would need to write a specific component, a Protocol Handler, which is called by the Windows Search crawler/indexer to get at TB's private data store.

(In reply to Mark Rousell from comment #11)

(In reply to Ben Campbell from comment #9)

Better still might be to ditch the whole .mozmsgs mirroring thing and directly call the OS-specific search APIs to do the indexing directly, rather than relying on the OS indexing to pick up the files in the filesystem.

Ideally so. However, it does seem like non-trivial work to write a Protocol Handler. In comparison, it seems that leveraging the built in Windows Search IFilter for .eml/.mhtml files (which is the current approach) has passed the 'good enough' test for a long time.

True. "If it ain't broke..." etc etc...
I do think there needs to be a little investigation on the Mac side. Currently it index won't anything under the profile dir, even with maildir. So we maintain a separate dir out in ~/Library/Caches/Metadata. I'm pretty sure you can register extra directories for indexing with OSX, and add parsers for file formats the OS doesn't already handle. But - as on Windows - it takes an extra component. I have an idea there might already be some OSX code in C-C along these lines.

For the avoidance of doubt on Windows, it's not just an API that can be called from an application: One would need to write a specific component, a Protocol Handler, which is called by the Windows Search crawler/indexer to get at TB's private data store.

Thanks for clarifying!

Thanks for that - new patch with most of your points fixed up.

(In reply to Magnus Melin [:mkmelin] from comment #10)
> @@ +666,5 @@
> > +      }
> > +
> > +      for (let i = 0; i < aSrcMsgs.length; i++) {
> > +        let srcMsg = aSrcMsgs[i];
> > +        if (!SearchIntegration._requiresSupportDir(srcMsg.folder)) {
> 
> Hmm, a bit confused. Shouldn't we check the destFolder for this?

No, by the time we hit this code, we know destFolder has a .mozmsgs dir.
I've rearranged the prior code blocks a little and added some comments to try and make it more obvious.

> @@ +826,5 @@
> > +   *
> > +   * @param {nsIMsgFolder} folder The folder to check.
> > +   * @return {boolean} True if the folder requires .mozmsgs support.
> > +   */
> > +  _requiresSupportDir(folder) {
> 
> this shouldn't have _ (for private). Seems pretty public.

It's a detail used only by SearchIntegration, so I'd say it's not public at this point. There are a couple of messy allowances for .mozmsgs dirs out in other code, but I don't think they'd benefit from access right now.

Attachment #9150350 - Attachment is obsolete: true
See Also: → 1526289
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: