User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:126.96.36.199) Gecko/20080404 Iceweasel/188.8.131.52 (Debian-184.108.40.206-2)
Build Identifier: Trunk
As discussed on Bug #421482, having a backup of places can allow us to stop using over-aggressive fsync() on Linux with--at worst--loss of the history accumulated for the browser session that crashed.
I have implemented the suggestion and will attach this patch in the first comment.
Created attachment 322478 [details] [diff] [review]
Implement automatic backup and restore of places database
This patch implements automatic backup of the places database. If the database is discovered to be corrupted, the backup is tested to see if it is also corrupted. If it is not, it replaces the corrupted database. If both are corrupted, the old behavior kicks in: new database, rerun history import.
I have tested this by manually corrupting my places.sqlite by appending /dev/urandom to the file using DD. The result is a corrupted database it is restored correctly.
I have no doubt that this could be more efficient. Even with a 10MB places file, the backup and restore operations complete in less than a second.
Random thought: does it handle "clear private data" as epected?
It doesn't but, AFAIK, neither does the old behavior which creates a places.sqlite.corrupt file. I'm not opposed to it and it seems like a good idea. Clearing both .corrupt and .safety should probably be a new bug and a new patch to a different area of code.
Thanks a bunch for posting this - couple of quick thoughts:
1) OpenDatabase does not do an integrity_check (http://www.sqlite.org/pragma.html) so we'd want to do that to ensure we are detecting a corrupted database - ideally when we have a unclean shutdown (e.g. session restore fires)
2) Many places.sqlite files are >50mb and even a second to the start time is a big deal. We should probably do this in the background as we do for bookmarks.
What ensures that the copy you make is correctly written to disk in the case of a crash, or that it was consistent when you got to it?
(In reply to comment #6)
I think that integrity_check could be extremely expensive; the existing practice is to never run integrity_check. I'm all for that idea if it can be done asynchronously since we don't want to cause it to take way too long to start the browser. However, we could be looking at locking the database for writes for the entire time that integrity_check is running: 10-15 seconds. Since I'm not a Mozilla hacker, I don't know what the implications of that would be.
I think the short-term answer is implement this synchronously as the patch does and then go back and make the changes that Mike Shaver outlined in his blog for 3.1/3.5: more asynchronous use of the databases project-wide. That would include an integrity_check at startup.
(In reply to comment #7)
> What ensures that the copy you make is correctly written to disk in the case of
> a crash, or that it was consistent when you got to it?
Until integrity_check is implemented asynchronously, the consistency is confirmed the same way that corruption is currently detected: the cursory checks performed when OpenDatabase is called. It's not perfect but it *does* work in my own tests.
As for the copy, the BackupDatabaseFile currently uses nsIFile->copyTo. I don't know if the destination file handle is closed after this call is made. If so, when the dest. file handle is closed, it would be scheduled for page-out during the next commit interval. On ext3, that's at most 5 seconds.
Again, we can increase integrity when we can do the check and copy asynchronously without sacrificing startup time.
This will also lose all bookmarks created during the session, as well as any adaptive learning done by awesomebar, I think.
Relying on the OS writeout doesn't work, because you can get _partial_ writeout, and the window can easily be more than 5 seconds. (Witness the fact that an fsync can take 30 seconds to complete if there is a lot of I/O pending!) In laptop_mode, the interval is more like 30 seconds. I think you need an fsync after the copy is complete, and probably need one on shutdown as well if there isn't already.
I don't think we want to take this in 3.0.1 without making it happen in the background, but then we need to make sure that we're copying the database safely in the background and get a consistent view! sqlite likely has API to help with that. We should make sure that we understand what happens if the O/S crashes shortly after Firefox quits (possibly due to an X shutdown crash on user logout), for one thing. My current mental model of this says that we'll have a corrupt database and no indication that we need to check it, which we'll then copy over our safety database, and go merrily about our business. If the intent of this patch is that with it we can go to synchronous=OFF, we need to make sure that we understand the edge cases, since that's where data loss happens. :)
Comment on attachment 322478 [details] [diff] [review]
Implement automatic backup and restore of places database
After discussing this patch at length with Mike Shaver, I concede that all his concerns are completely valid and there's just no way that this patch is going provide sufficient integrity without blocking on an fsync() right after the copy. Doing so would cause Firefox to potentially take ages to start up.
For those who want to use libeatmydata, this patch would be an improvement on the no fsync() behavior but there's not really any guarantee that the backup is committed soon enough to prevent you from losing everything.
The right way to do this is to use some backup API in sqlite which would provide the needed thread safety to do this in a background the and the sane POSIX fsync behavior to ensure that the background backup actually gets flushed to disk. At the moment, mozStorageService doesn't provide a wrapper for the needed behavior.
Live automatic database backup is aka master-slave replication. Possible implementation plan A:
1. We keep a backup database in addition to the main in the profile.
2. The main is written in OFF mode.
3. Periodically, and only if the browser (or maybe even the whole system) is idle, the main database is queried for changes since last replication. The resulting changeset is written to the backup in a single transaction. The backup is written in FULL mode.
4. If the main database is ever determined to be corrupt, it is replaced with the backup using nsIFile.copyTo().
We may also choose to use an insert buffer. Possible implementation plan B:
1. We keep a temporary buffer database somewhere.
2. The temporary database is written in OFF mode.
3. Periodically, and only if the browser (or maybe even the whole system) is idle, all data from the buffer is read from the buffer. The resulting changeset is written to the main database in a single transaction. The main is written in FULL mode. The buffer is cleared on successful main write.
Plan B takes considerably less disk space, and it can use a database that is guaranteed to reside on local file system, not NFS (in /tmp or equivalent). But it is also considerably more work than Plan A.
Which path should we choose? Should this be posted to a newsgroup?
Splitting the database use is not the same as an automatic backup, and I believe is underway in another bug; I don't know the bug off-hand.
A newsgroup would be a decent place to propose this, or d-a-f, or even another bug. This bug isn't the right place. I meant to mark this WONTFIX based on the tragic outcome of my conversation with Jason; doing so now.
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".
In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body contains places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.