Closed Bug 422127 Opened 16 years ago Closed 16 years ago

Batch places transactions for increased performance

Categories

(Firefox :: Bookmarks & History, defect, P1)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: brettw, Assigned: ondrej)

Details

This came up in the arguments in bug 408914. Places should keep a statement open and commit it on a timer every 5 or 10 seconds or so, rather than doing small transactions for every change.

I actually thought it either already did this or there was already a bug filed, but I don't see either of those. I could be wrong.

This should make a big difference. Currently (I think) it will commit for every change. This can happen multiple times per page load, to update the URL information/add the visit, and then to set the title which happens asynchronously. Same with updating the typed count when you type in the URL bar.

These extra commits each cause fsync calls which are blocking and potentially very expensive now that async I/O was backed out. If async I/O is put back, you will still want this change, because the accumulated fsync calls are why the I/O thread would get behind and delay shutdown (bug 338884).

I would highly recommend this be fixed for Firefox 3. It should have a noticable performance impact. Even if PLT doesn't measure it, it will make the product less janky in high I/O circumstances.
Flags: blocking-firefox3?
Priority: -- → P1
Target Milestone: --- → Firefox 3 beta5
Assignee: nobody → ondrej
Note that you may still want to force commits for several classes of operations such as bookmark modification and history deletion.
And how do we keep database consistent? Will we detect for crashes and write a code, that will perform some cleanup operation next time places starts?
Crashes will not make the database inconsistent with this approach. You still have journaled atomic transactions, they just encompass more operations.

What you lose is rollback for individual transactions since sqlite doesn't support nested transactions. First, this doesn't happen very often since the only time we rollback is on write error which almost never happens in practice.

Second, even if it does, failure to rollback generally will only mean part of a two-phase modification went through. The reading code is supposed to handle all kinds of weird situations, and if it breaks, it should be fixed even if we keep rollback (since we're never guaranteed that the user hasn't done stupid things to their database or some bug made things inconsistent).
This would be really good to get, but doesn't seem like this happens often enough to block at this stage.  Really want to get this for next release.
Flags: blocking-firefox3? → blocking-firefox3-
(In reply to comment #4)
> This would be really good to get, but doesn't seem like this happens often
> enough to block at this stage.  Really want to get this for next release.

Doesn't seem like what happens often enough? Committing happens all the friggin time if I read the code correctly.
(In reply to comment #5)
> (In reply to comment #4)
> > This would be really good to get, but doesn't seem like this happens often
> > enough to block at this stage.  Really want to get this for next release.
> 
> Doesn't seem like what happens often enough? Committing happens all the friggin
> time if I read the code correctly.
> 

The complaints of performance problems due to the commits are not at all widespread. Certainly not enough to justify this change at this time, as it would entail rewriting a lot of the core Places code. We're freezing for Beta 5 in a couple of days, and after that it's into the RCs. We definitely need to look at this for the 3.1 release.
(In reply to comment #3)
> Crashes will not make the database inconsistent with this approach. You still
> have journaled atomic transactions, they just encompass more operations.

But one business transaction consisting of multiple inserts could get split into multiple independent although atomic transactions. If one of them fails, your business data could be inconsistent. So you will have to create a mechanism, that would disable the timer and commit yourself later.

Moreover, you will have to switch isolation level to the READ UNCOMMITED (using the pragma read_uncommitted), otherwise the opened transaction with the write lock would block read transactions. As we do not process money but history and bookmarks, we can probably do this, but it could be little risky.
Your business example assumes that an external viewer sees the individual transactions as "committed" and does things based on that. This doesn't generally happen for history. History operations are usually "add page 1", "set page 1 title". There is not an external entity keeping things in sync with this data.

(Sure you could invent an extension that observes history and does something based in it, assuming each operation happened and committing something of their own to disk at a more frequent schedule. The history database and the extension's database could be out-of-sync after a crash. But this is a very contrived example and the extension would want to have some error handling for this type of thing anyway. If things want to associate data with pages, they can always use the annotation service which would be committed at the same time.)

READ UNCOMMITTED is not necessary. Reading would take place in the same meta-transaction (this would act like READ UNCOMITTED but you don't actually need the flag). Remember, we only have one connection to the database. READ UNCOMMITTED only affects you if you have more.
The underlying problem seems to be that COMMIT to the main database file is slow due to the necessity of ordering writes using fsync().  (We could make use of an I/O barrier operation to order the writes, if such an operation existed, but alas it does not...)

One way around the frequent COMMIT problem is to keep two sets of tables: the main tables in the persistent database, and a set of shadow TEMP tables that hold recent changes.  As additions or changes are made to the data, writes and COMMIT those changes to the TEMP tables.  A COMMIT on a TEMP table does not do fsync() (because we don't worry about preserving TEMP tables across a power failure) then flush the change in the TEMP tables over to the main persistent database every minute or so as a single transaction.

This will, of course, complicate queries against the database since now such queries must consult two separate tables.  And depending on what the queries look like, this approach might not be practical in all cases.  But where possible, it will make a big performance improvement, I suspect.

(In reply to comment #9)
> This will, of course, complicate queries against the database since now such
> queries must consult two separate tables.  And depending on what the queries
> look like, this approach might not be practical in all cases.  But where
> possible, it will make a big performance improvement, I suspect.

In places we have about 24 selects that have join with moz_places table. We have 11 selects that join with moz_history* tables. We can certainly have shadow tables, but in this case we should probably create an extra layer on top of mozStorage that will encapsulate data operations if we want to keep the data consistent. And I'm afraid some operations would become very slow. We are joining a lot and the queries are tuned.

I would better do what Brett suggest, keep the transaction open all the time and commit on timer.





Comment #9 is being implemented in bug 449086 to resolve the fsync issue. More discussion on partitioning vs long-open-transaction is here: http://wiki.mozilla.org/Places:FsyncApproach.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
Target Milestone: Firefox 3 beta5 → ---
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in before you can comment on or make changes to this bug.