Last Comment Bug 435712 - Implement automatic places backup
: Implement automatic places backup
Status: RESOLVED WONTFIX
: perf
Product: Firefox
Classification: Client Software
Component: Bookmarks & History (show other bugs)
: Trunk
: x86 Linux
: -- enhancement with 3 votes (vote)
: ---
Assigned To: Jason Clinton
:
Mentors:
Depends on:
Blocks: 421482
  Show dependency treegraph
 
Reported: 2008-05-25 22:37 PDT by Jason Clinton
Modified: 2009-11-26 06:27 PST (History)
16 users (show)
shaver: blocking1.9.0.1-
shaver: wanted1.9.0.x-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Implement automatic backup and restore of places database (2.97 KB, patch)
2008-05-25 22:43 PDT, Jason Clinton
no flags Details | Diff | Splinter Review

Description Jason Clinton 2008-05-25 22:37:16 PDT
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20080404 Iceweasel/2.0.0.14 (Debian-2.0.0.14-2)
Build Identifier: Trunk

As discussed on Bug #421482, having a backup of places can allow us to stop using over-aggressive fsync() on Linux with--at worst--loss of the history accumulated for the browser session that crashed.

I have implemented the suggestion and will attach this patch in the first comment.


Reproducible: Always
Comment 1 Jason Clinton 2008-05-25 22:38:49 PDT
Fixing blocking.
Comment 2 Jason Clinton 2008-05-25 22:43:12 PDT
Created attachment 322478 [details] [diff] [review]
Implement automatic backup and restore of places database

This patch implements automatic backup of the places database. If the database is discovered to be corrupted, the backup is tested to see if it is also corrupted. If it is not, it replaces the corrupted database. If both are corrupted, the old behavior kicks in: new database, rerun history import.
Comment 3 Jason Clinton 2008-05-25 22:47:50 PDT
I have tested this by manually corrupting my places.sqlite by appending /dev/urandom to the file using DD. The result is a corrupted database it is restored correctly.

I have no doubt that this could be more efficient. Even with a 10MB places file, the backup and restore operations complete in less than a second.
Comment 4 Radek 'sysKin' Czyz 2008-05-25 23:21:36 PDT
Random thought: does it handle "clear private data" as epected?
Comment 5 Jason Clinton 2008-05-25 23:26:09 PDT
It doesn't but, AFAIK, neither does the old behavior which creates a places.sqlite.corrupt file. I'm not opposed to it and it seems like a good idea. Clearing both .corrupt and .safety should probably be a new bug and a new patch to a different area of code.
Comment 6 Mike Schroepfer 2008-05-25 23:58:14 PDT
Thanks a bunch for posting this - couple of quick thoughts:

1) OpenDatabase does not do an integrity_check (http://www.sqlite.org/pragma.html) so we'd want to do that to ensure we are detecting a corrupted database - ideally when we have a unclean shutdown (e.g. session restore fires)

2) Many places.sqlite files are >50mb and even a second to the start time is a big deal.  We should probably do this in the background as we do for bookmarks.
Comment 7 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-05-26 04:47:02 PDT
What ensures that the copy you make is correctly written to disk in the case of a crash, or that it was consistent when you got to it?
Comment 8 Jason Clinton 2008-05-26 09:59:37 PDT
(In reply to comment #6)

I think that integrity_check could be extremely expensive; the existing practice is to never run integrity_check. I'm all for that idea if it can be done asynchronously since we don't want to cause it to take way too long to start the browser. However, we could be looking at locking the database for writes for the entire time that integrity_check is running: 10-15 seconds. Since I'm not a Mozilla hacker, I don't know what the implications of that would be.

I think the short-term answer is implement this synchronously as the patch does and then go back and make the changes that Mike Shaver outlined in his blog for 3.1/3.5: more asynchronous use of the databases project-wide. That would include an integrity_check at startup.
Comment 9 Jason Clinton 2008-05-26 10:14:57 PDT
(In reply to comment #7)
> What ensures that the copy you make is correctly written to disk in the case of
> a crash, or that it was consistent when you got to it?

Until integrity_check is implemented asynchronously, the consistency is confirmed the same way that corruption is currently detected: the cursory checks performed when OpenDatabase is called. It's not perfect but it *does* work in my own tests.

As for the copy, the BackupDatabaseFile currently uses nsIFile->copyTo. I don't know if the destination file handle is closed after this call is made. If so, when the dest. file handle is closed, it would be scheduled for page-out during the next commit interval. On ext3, that's at most 5 seconds.

Again, we can increase integrity when we can do the check and copy asynchronously without sacrificing startup time.
Comment 10 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-05-27 18:09:10 PDT
This will also lose all bookmarks created during the session, as well as any adaptive learning done by awesomebar, I think.

Relying on the OS writeout doesn't work, because you can get _partial_ writeout, and the window can easily be more than 5 seconds.  (Witness the fact that an fsync can take 30 seconds to complete if there is a lot of I/O pending!)  In laptop_mode, the interval is more like 30 seconds.  I think you need an fsync after the copy is complete, and probably need one on shutdown as well if there isn't already.

I don't think we want to take this in 3.0.1 without making it happen in the background, but then we need to make sure that we're copying the database safely in the background and get a consistent view!  sqlite likely has API to help with that.  We should make sure that we understand what happens if the O/S crashes shortly after Firefox quits (possibly due to an X shutdown crash on user logout), for one thing.  My current mental model of this says that we'll have a corrupt database and no indication that we need to check it, which we'll then copy over our safety database, and go merrily about our business.  If the intent of this patch is that with it we can go to synchronous=OFF, we need to make sure that we understand the edge cases, since that's where data loss happens. :)
Comment 11 Jason Clinton 2008-05-27 20:40:38 PDT
Comment on attachment 322478 [details] [diff] [review]
Implement automatic backup and restore of places database

After discussing this patch at length with Mike Shaver, I concede that all his concerns are completely valid and there's just no way that this patch is going provide sufficient integrity without blocking on an fsync() right after the copy. Doing so would cause Firefox to potentially take ages to start up.

For those who want to use libeatmydata, this patch would be an improvement on the no fsync() behavior but there's not really any guarantee that the backup is committed soon enough to prevent you from losing everything.

The right way to do this is to use some backup API in sqlite which would provide the needed thread safety to do this in a background the and the sane POSIX fsync behavior to ensure that the background backup actually gets flushed to disk. At the moment, mozStorageService doesn't provide a wrapper for the needed behavior.
Comment 12 Sergey Yanovich 2008-06-16 02:10:08 PDT
Live automatic database backup is aka master-slave replication. Possible implementation plan A:
1. We keep a backup database in addition to the main in the profile.
2. The main is written in OFF mode.
3. Periodically, and only if the browser (or maybe even the whole system) is idle, the main database is queried for changes since last replication. The resulting changeset is written to the backup in a single transaction. The backup is written in FULL mode.
4. If the main database is ever determined to be corrupt, it is replaced with the backup using nsIFile.copyTo().

We may also choose to use an insert buffer. Possible implementation plan B:
1. We keep a temporary buffer database somewhere.
2. The temporary database is written in OFF mode.
3. Periodically, and only if the browser (or maybe even the whole system) is idle, all data from the buffer is read from the buffer. The resulting changeset is written to the main database in a single transaction. The main is written in FULL mode. The buffer is cleared on successful main write.

Plan B takes considerably less disk space, and it can use a database that is guaranteed to reside on local file system, not NFS (in /tmp or equivalent). But it is also considerably more work than Plan A.

Which path should we choose? Should this be posted to a newsgroup?
Comment 13 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-16 09:24:19 PDT
Splitting the database use is not the same as an automatic backup, and I believe is underway in another bug; I don't know the bug off-hand.

A newsgroup would be a decent place to propose this, or d-a-f, or even another bug.  This bug isn't the right place.  I meant to mark this WONTFIX based on the tragic outcome of my conversation with Jason; doing so now.
Comment 14 Gervase Markham [:gerv] 2009-11-26 06:27:46 PST
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv

Note You need to log in before you can comment on or make changes to this bug.