Open Bug 541207 Opened 15 years ago Updated 2 years ago

Make gloda's files not be backed up by Time Machine.

Categories

(Thunderbird :: OS Integration, defect)

x86
macOS
defect

Tracking

(Not tracked)

People

(Reporter: bwinton, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: student-project, Whiteboard: [Highly questionable: See comment #15])

Apparently you can just set something, somewhere, so I'll do a little investigation, and hopefully come up with something nice.

(Unless someone else wants to grab it, in which case I'll happily give it over to them.)

Later,
Blake.
Assignee: bwinton → nobody
Component: Search → OS Integration
QA Contact: search → os-integration
Looks like there's no programmatic way to do this:
http://support.apple.com/kb/ht1427 (search for "exclude")

The only way is for the user to specifically request that the file / folder be excluded in the System Preferences -> Time Machine settings.
bug 476239 comment 2 "http://developer.apple.com/leopard/overview/apptech.html

And as best as I can tell, the API for integrating with it is completely undocumented.  Yay Apple :|"
Blocks: tb-mac
(In reply to Javi Rueda from comment #3)
> CSBackupSetItemExcluded
> https://developer.apple.com/library/mac/documentation/macosx/Reference/
> Backup/Reference/reference.html#//apple_ref/c/func/CSBackupSetItemExcluded

Javi how do you feel about making the patch ?
No. I am not an experienced C/C++ coder.
The file to be excluded seems to be global-messages-db.sqlite [1]. This file is created and accessed in function _init [2].

I suggest excluding the file in nsMessengerOSXIntegration, when we store all macOS exclusive code, and maybe call it from that _init function. Elevation of privileges will be needed when excluding datastore. Allow to the new function which excludes the file to be called from JS will also be needed.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Thunderbird/gloda#Broad_implementation_details
[2] https://dxr.mozilla.org/comm-beta/source/mailnews/db/gloda/modules/datastore.js
arai, nomis, might you be able to pull this off?
It would help us make progress in performance on Mac.
Flags: needinfo?(arai.unmht)
Flags: needinfo?(Nomis101)
https://developer.apple.com/documentation/coreservices/1445043-csbackupsetitemexcluded?language=objc
> Discussion
> ...
> To change the backup exclusion status of a path, your application must be running with administrator privileges. 

is it really what we want to do inside Thunderbird...?
or maybe create a dedicated application and spawn it?
Flags: needinfo?(arai.unmht)
(In reply to Wayne Mery (:wsmwk) from comment #7)
> arai, nomis, might you be able to pull this off?
> It would help us make progress in performance on Mac.

Not really, I am also not an experienced C/C++ coder. :-(
Flags: needinfo?(Nomis101)
I will try it. But will need help with the XP-COM. My proposal is to add an exported method in nsMessengerOSXIntegration which will call the Backup Core operating system API. That exported method will be called, if running in a macOS platform, from the _init() function in datastore.js, just after being sure the file exists.

There will be needed a way to elevate the privileges, also, as noted previously.
Thanks.

arai suggests ... not sure if the code should live in thunderbird's core because of the privilege things, if the feature is implemented inside XPCOM things, most code needs to be executed with admin priv
Can anyone please explain the motivation for the bug. Why would you not back up global-messages-db.sqlite? You're restoring your message data and get it out of step with the existing Gloda database? Sounds like really bad idea.
(In reply to Jorg K (GMT+1) from comment #12)
> Can anyone please explain the motivation for the bug. Why would you not back
> up global-messages-db.sqlite? You're restoring your message data and get it
> out of step with the existing Gloda database? Sounds like really bad idea.

Gloda database can be generated every time user enables it in Preferences. There is no point in backing up something that is generated.

Also, the Gloda database is generated dynamically, so it is quite likely that when Time Machine is restoring that data, it is already obsolete.
(In reply to Wayne Mery (:wsmwk) from comment #11)
> Thanks.
> 
> arai suggests ... not sure if the code should live in thunderbird's core
> because of the privilege things, if the feature is implemented inside XPCOM
> things, most code needs to be executed with admin priv

Now that I am looking it, there will be no way to call the mothod for excluding the file if it is hosted in the nsMessengerOSXIntegration. It will be better that it was on its own file, instead.

And right now I am unsure if the admin privileges are needed for a file or only when excluding a path. I will test it.
Hmm, I don't think I agree here.

You have a bunch of files, mailbox files M and derived Gloda data G. You make a backup of M1 and G1.

Later, when you're at M2 and G2 you restore from backup overwriting M2 and G2 with M1 and G1. So far, so good. Now imagine not backing up G1. So you restore M1 and have M1 and G2. You've just corrupted your database since the derived data doesn't match the original data. Sure, you can blow away the derived data G2, but until you do, you're in a corrupted state where Gloda won't work.

I think this bug is a dangerous footgun in the making (results in the manufacturer shooting themselves in the foot). WONTFIX if I had something to say.

BTW, you can also argue that MSF files are derived. So we also don't back them up?
Whiteboard: [Highly questionable: See comment #14]
Whiteboard: [Highly questionable: See comment #14] → [Highly questionable: See comment #15]
From https://developer.apple.com/library/content/documentation/MacOSX/Conceptual/OSX_Technology_Overview/CoreServicesLayer/CoreServicesLayer.html#//apple_ref/doc/uid/TP40001067-CH270-BCICAIFJ

> Time Machine protects user data from accidental loss by automatically backing up data to a different hard drive. Included with this feature is a set of programmer-level functions that you can use to exclude unimportant files from the backup set. For example, you might use these functions to exclude your app’s cache files or any files that can be recreated easily. Excluding these types of files improves backup performance and reduces the amount of space required to back up the user’s system.

It is not an official guideline, but they are suggesting that by not backing up files that can be generated, macOS devices could have a better performance.

But, yes, it is true that it seems arbitrary backing up Gloda database but not the message database.
(In reply to Jorg K (GMT+1) from comment #15)
> Hmm, I don't think I agree here.
> 
> You have a bunch of files, mailbox files M and derived Gloda data G. You
> make a backup of M1 and G1.
> 
> Later, when you're at M2 and G2 you restore from backup overwriting M2 and
> G2 with M1 and G1. So far, so good. Now imagine not backing up G1. So you
> restore M1 and have M1 and G2. You've just corrupted your database since the
> derived data doesn't match the original data. Sure, you can blow away the
> derived data G2, but until you do, you're in a corrupted state where Gloda
> won't work.
> 
> I think this bug is a dangerous footgun 

Indeed, iirc one of the reasons for keeping gloda database in the same (roaming) directory with the rest of Thunderbird data was the need for consistency. However ...

Asuth, would gloda not recognize a state where the gloda database is out of step with msf files?  And if not, could it/should it?
Flags: needinfo?(bugmail)
gloda will get confused by .msf files that are more recent than the global-messages-database.sqlite.  It could be made to recognize this scenario by explicitly tagging folders with a generation id that is also tracked in the gloda database.  In the event gloda sees a generation higher than the one in its DB, it will know that this scenario has been encountered.

The safest thing to do in such a case is probably to delete the database and cause gloda to re-create the database and all that entails.  A more efficient thing that could be done would be to mark all messages as deleted, but not perform a deletion purge.  Then have gloda mark each folder super-dirty like it does at database re-creation and have it do an indexing sweep.  The indexer should rescue messages that still exist from their deletion state, saving a non-trivial amount of indexing overhead.

Note that this is only viable for the case where an explicit restore-from-backup occurred.  This would turn out horribly in a scenario like Windows roaming profile mechanism since every time the user changed computers gloda would effectively have to reindex everything.


Also note that now that OS X has a fancy new filesystem, there may be fewer issues with backing up a SQLite database if the backup engine supports copy-on-write semantics at the block level and uses it for backups.  It's probably worth finding out if that's the case, although I do understand that only newer Macs are able to leverage the new file system.
Flags: needinfo?(bugmail)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.