Closed Bug 395020 (vacuum-bikeshed) Opened 17 years ago Closed 14 years ago

VACUUM SQLite databases after major upgrade

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Future
Tracking Status
blocking2.0 --- -

People

(Reporter: moco, Unassigned)

References

()

Details

VACUUM after idle (even if we're doing incremental vacuuming) to defragment the database

from http://www.sqlite.org/lang_vacuum.html:

As of SQLite version 3.1, an alternative to using the VACUUM command is
auto-vacuum mode, enabled using the auto_vacuum pragma. When auto-vacuum is
enabled for a database, large deletes cause the size of the database file to
shrink. However, auto-vacuum also causes excess fragmentation of the database
file. And auto-vacuum does not compact partially filled pages of the database
as VACUUM does.

for profiles created before the fix for bug #385834 (pre-m8), we already do a full vacuum on idle (see bug #390244).

but for new profiles (or new migrations from fx 2) created after the fix for bug #385834, we do an incremental vacuum on idle.
I think the fix for this would be to nsNavHistory::PerformVacuumIfIdle()

we currently do:

     if (vacuum == 2) {
       rv = mDBConn->ExecuteSimpleSQL(
         NS_LITERAL_CSTRING("PRAGMA incremental_vacuum;"));
       NS_ENSURE_SUCCESS(rv, rv);
     }
     else {
       // if our database was created before incremental vacuuming
       // do a full vacuum on idle
       rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING("VACUUM;"));
       NS_ENSURE_SUCCESS(rv, rv);
     }

I think we'd just want to fix that code so that upon 3x (or something?) of or idle, we'd do a full vacuum, no matter current "vacuum" was.
Flags: blocking-firefox3?
just a heads up:  as bug #390244 points out, with a large places.sqlite file, a full vacuum may take a while.

a 122 MB places.sqlite from robcee (From the unit test tinderboxen) is taking me minutes to VACUUM (from within the SQLite Database Browser).
another heads up, steve, jim and jo reported problems when we'd vacuum.  see bug #390244 for more details.
Flags: blocking-firefox3? → blocking-firefox3+
Target Milestone: --- → Firefox 3 M10
This is going to be a tricky one, given that we know that VACUUMing a large database can take a long time.
We can always run it on a different thread, and use progress listeners to abort it if we want to shut down (yes, this will require some threaded code, which sucks, but what can you do :/)
Summary: VACUUM after idle (even if we're doing incremental vacuuming) to defragment the database → VACUUM after idle [or shutdown or upgrade?] (even if we're doing incremental vacuuming) to defragment the database
since sqlite uses freelists to speed up inserts, vacuum is not so useful if the db does not grow up insanely, and with the new expires (embedded after 10 days), it should not do it. 
So probably vacuum should not be done more than once a month, or it will cause a slowdown on inserts.

doing on idle will freeze the browser since it has to create a new db, attach it to the db connection and move everything, so maybe once a month, after an idle of 10 minutes a countdown (10seconds) could go out "Firefox is about to compact the history db. This process can take several minutes, click cancel to stop.", after countdown vacuum starts and a progress bar show where it is, so the user is warned before starting up.
See also bug 394379, "vacuum all sqlite files after an upgrade ?".  Should this bug be marked as a dup of that bug?
Priority: -- → P1
I don't think this should block beta 2 or be a p1, based on comments from Marco and Dr. Hipp (see bug #390244 comment #32)

I'm going to proactively move it to be a p3 / m11 (like bug #394379)

dietrich, can you elaborate on why you may have made it a P1?

Priority: P1 → P3
Target Milestone: Firefox 3 M10 → Firefox 3 M11
clearing blocking status, I don't think this is a blocker at all.

see bug #403702 (the initial comment) for some reasoning.
Flags: blocking-firefox3+
this should not be on the m11 radar, as we have no plans to implement it.

There might be a reason to do a vacuum on software update, but would probably have to do the vacuum the first time we run firefox after update, as otherwise how would we vacuum places.sqlite for other profiles / users?

See also #402076

I'm not going to wontfix this bug, so that we can keep it open for discussion, but moving to future as I don't think we should do anything for firefox 3.
Priority: P3 → --
Target Milestone: Firefox 3 M11 → Future
Failure to do this causes really poor user experience for people who browse a lot.  The Awesomebar has been regularly hanging Firefox for me for 3 to 5 seconds after I start typing in it, and continues to hang periodically as I continue to type, with a beach ball cursor for the last few months, which has been really annoying.  The length of time it hangs has been getting progressively worse over the last few months.  I saw someone post a facebook status today suggesting to open places.sqlite in sqlite and vacuum it, and after running that, it's nice and speedy again and finds things as fast as I can type them without hanging the browser at all.  For that kind of user interface performance impact, Firefox really should be doing that automatically once in a while.

Macro's countdown timer during an idle period described in comment 7 sounds like a good way to do it to me.
Flags: wanted-firefox3.6?
See also bug 403702, which suggests to add a manual command to trigger this to the menu.  You could couple that with a notification bar to show up a month or two after the last time it was run asking if the user wants to compact the database in order to improve performance, and warning that it will take a few minutes to run.
I think it'd be appropriate to do this during a major update cycle, since after 6 months to a year of accumulating data you're bound to have some fragmentation in your db.  Could we sneak this into 3.5.1? :)
Definitely the databases need to be re-org/vacuumed periodically.

Failing to do so slows Firefox performance compared to its competitor browsers -- it reduces the adoption of Firefox and helps our competitors.

Vacuuming should be done automatically. so that all Firefox users get the benefits.

Why not do re-org/vacuum the databases each month, on the first start-up or shut-down of the month for each profile?  An informational "database compacting" message box should be displayed.

Or the re-org/vacuum could be scheduled to run on the first operating system start-up of the month.

There could be an option in about:config for "experts" who want to turn the re-orgs off.

Regular users are more concerned with the average delays in going from page-to-page, and the average start-up time, than they are with the peak start-up time.  

Provided there is not some mysterious 30+ second delay occurring with no messages, a delay once a month is going to be preferable to a delay on every start-up and every enter-key depression.

An ideal database re-org utility allows the DBA to specify how much space and how much contiguous space to reserve for database growth, and to have that allocated during re-org.  This prevents needless fragmentation.

Maybe the ability to reserve contiguous space for database growth is something Firefox developers should ask makers of SQLITE to provide.
Component: Places → General
QA Contact: places → general
Summary: VACUUM after idle [or shutdown or upgrade?] (even if we're doing incremental vacuuming) to defragment the database → VACUUM SQLite databases after major upgrade
Just to be clear & for the record :
doing that maintenance operation at the Firefox level implies that all other gecko based application (Liferea...) have to implement their own workaround
(In reply to comment #15)
> I think it'd be appropriate to do this during a major update cycle, since after
> 6 months to a year of accumulating data you're bound to have some fragmentation
> in your db.  Could we sneak this into 3.5.1? :)

Not everyone is upgrading every 6 weeks :

- some users never upgrade
- some always uninstall, then install the new version
- some always download a new version, then install on top of the older version

So I think we still need a vacuum that is triggered by a timer.
(In reply to comment #19)
> So I think we still need a vacuum that is triggered by a timer.
And risk locking up their browser completely?  No way (there isn't a way around this, and canceling the operation can take a long time too).
Not in the middle of a browsing session ofcourse. No, a proposal to run the vacuum when you start up on the first day of the month or similar. Or when you shut down. And you should be able to skip it ofcourse.
actually, i can recall vacuuming places.sqlite in Firefox was taking minutes, while in other apps was taking seconds... i'm actually thinking could be that temporary moving the journal to memory while we vacuum could really speed it up for large dbs. We did use that trick in places to generate some big index. Experimenting with some temporary pragma could make vacuuming really fast, especially on modern hardware, we could set an upper limit to db size based on available memory on the device, and use memory where possible).

also we need some way to tell if a db needs to be vacuumed, skipping some useless work, unfortunatly sqlite does not provide this kind of info, but we could maybe get a guess using number of pages and number of empty (reusable) pages. There is an sqlite analyzer around that is able to calculate a % of defragmentation in the db, not sure how it does, though.

Comment 18 is correct, ideally vacuuming should be something done by storage on a regular basis, maybe on a separate thread, that would serve any storage user... but on the other side would be even harder to do, it would require to hot-backup the db, start caching all statements/params while they are executed on the original, vacuum the backed-up db, execute all cached statements on it, and then replace the original db with the vacuumed one.
Sounds like a crazy task that would be much better served by implementing real incremental vacuuming in sqlite itself, would really be just a complex storage workaround for something sqlite is missing but plan to implement in future.
> > So I think we still need a vacuum that is triggered by a timer.
> And risk locking up their browser completely?  No way (there isn't a way around
> this, and canceling the operation can take a long time too).

I find your implication that it is okay to lock up Firefox during an update disturbing.  Security and UE want updates to be instant (bug 307181), not arbitrarily slower.  This bug, as summarized, should be WONTFIXed.
Well, shutdown is already awfully slow.  Obviously it should be faster, but this sort of thing could be put on shutdown (much like is done with, e.g. Outlook Express and similar.)

Another way would be to do it when clearing private data/"recent history".  Sure, it may not be frequent, but at least it would happen sometimes.  That said, maybe it's already done then or maybe it's also desirable for that to be as fast as possible.

It would be sorta silly to have a Tools -> Options -> Advanced -> Optimize Data, too...

-[Unknown]
(In reply to comment #23)
> I find your implication that it is okay to lock up Firefox during an update
> disturbing.  Security and UE want updates to be instant (bug 307181), not
> arbitrarily slower.

If i read this bug correctly it is about MAJOR updates, that usually are not pushed for security reasons, when a major version is released old versions are still receiving instant security updates.
there is still the possibility to do this on idle.
Monthly (or more), on idle (2 or 3 minutes so we won't hit standby) we could get the list of sqlite files in the profile dir, perform an analysis of the db to get if it needs to be vacuumed, saying the user "Firefox is going to compact profile's databases." with a 10s countdown and a cancel button, and then perform vacuum. This won't be transparent to the user as doing the same on a major update.
The basic point is that there is no need to do this each month, every 6 months can be more then enough, and maybe once we are able to detect status of a db they won't even need to be vacuumed at that point.
The process should in no way block the interaction of the user with the browser, so probably even doing this in background is bad if it's going to slowdown navigation. I can't see other solutions out of on-update or on-idle
To mark this bug as wontbefixed would be to waste the coding optimization efforts of Firefox programmers.  It would be doing a big favor to Microsoft and Opera.


What ordinary users care most about the the delay every time they press the enter key, and every time they start Firefox, therefore a suitable time has to be found to clean up the databases every few months or so.  

Doing it every time private data is cleared may be far too often, because some people clear private data on a daily basis.

Until sqlite (or a replacement database) can provide information on when the DBs need re-orging, perhaps a timer to ask the user (via pop-up) every 2 months whether they want to optimize the databases would be the answer.
(In reply to comment #27)
> people clear private data on a daily basis.

or some user do never clear private data, like me.
Alias: vacuum-bikeshed
Every 2 months might not be frequent enough.  It was July 1st when I posted in here (comment 13) about my discovery of this workaround, and I'm already noticing my awesomebar getting slow again and it's only been a few weeks.
(In reply to comment #29)
> Every 2 months might not be frequent enough.  It was July 1st when I posted in
> here (comment 13) about my discovery of this workaround, and I'm already
> noticing my awesomebar getting slow again and it's only been a few weeks.

do you have a lot of live bookmarks? or do you clear history often?
The defragmentation can be so bad only if there are a lot of daily changes to the database, in such a case also fixing the underlying problem could help. A db like places should not change at high rates. If it does would be interesting to find and mitigate causes.
I do not have any live bookmarks at all, actually.  I also hardly ever clear my history.  I do a lot of web surfing though, my history has a lot in it.  I'm sure there's a sizable amount of stuff getting dropped every day due to the 90 day expiry.
Dave, could you please execute these pragma on your db and give me the data they return?

PRAGMA freelist_count
PRAGMA page_count
also please give me the filesize
i filed Bug 512854 to vacuum on idle, this could be wontfixed if that will work fine.
(In reply to comment #32)
> Dave, could you please execute these pragma on your db and give me the data
> they return?
> 
> PRAGMA freelist_count
> PRAGMA page_count
> also please give me the filesize

where do I go to do the pragma thing?
those are sql commands like VACUUM, so you can use the same way you did to run VACUUM on your db. either open places.sqlite with sqlite manager in a different profile, or use an external sqlite browser.
FWIW, there are a few extensions now that can help w/ this that were made after oremj made his blog post:
https://addons.mozilla.org/en-US/firefox/search?q=vacuum+places

Blog:
http://blog.oremj.com/2009/08/20/speedup-firefox-with-vacuum/

Go add-ons!
This bug has been kicking around for a long period of time. If you don't have any great ideas, then I recommend doing what the Linux file system does: on every 25th boot, run this algorithm. I would implement this immediately, and then you can consider fancier stuff (in addition) later.

But don't let more years pass! I did a vacuum of my Firefox 3.0 profile and both startup and typing things into the address bar got dramatically faster. I went from being **** off with Firefox to being in love again.
Keith, see comment 33 : bug 512854 is already checked in, and is now being tested. A vacuum will now happen every month (or forced every 2 months), even if you never upgrade.
Depends on: 541373
imho, this bug has to be wontfixed in favor of a monthly vacuum in bug 541373.
There is no advantage in vacuuming on upgrade against once a month (the only advantage would be not having to wait to see a faster experience, that means the advantage is limited to less than 30 days, 15 days average), and there is a clear disadvantage in making the upgrade painfully slower than it needs to be.
I agree that we should minimize the pain of upgrades as much as possible.

Maybe a compromise could be to have the timer look for whether an upgrade has occured recently, and do a vacuum earlier in that case?
(In reply to comment #40)
> Maybe a compromise could be to have the timer look for whether an upgrade has
> occured recently, and do a vacuum earlier in that case?

that would be possible and probably an interesting compromise.
Isn't this a privacy issue? If I clear my history, doesn't that leave some of the deleted data in the .sqlite file? And vacuuming reduces the amount of deleted data left over?
(In reply to comment #42)
> Isn't this a privacy issue? If I clear my history, doesn't that leave some of
> the deleted data in the .sqlite file? And vacuuming reduces the amount of
> deleted data left over?

No, deleted data is always erased with zeroes.
Vacuum only shrinks space that is already empty.
Instead of making it periodic, you could make it need-based. For instance, you could monitor the time that using the awesome bar locks up the GUI, and if it becomes excessive perform the vacuum upon the next shutdown, or prompt the user in some way - perhaps the information bar that is used for things like remembering passwords and blocking pop-ups?
And how do you determine what is excessive. Whatever value you choose, it will always be too large or too small for someone .
Re: Jo Hermans. Measure it. Average and standard deviation the first 100 or so uses of the awesome bar. If at any point after this the time exceeds, say, 3 standard deviations - either vacuum or prompt for vacuum. Then, re-baseline.

It won't capture all cases, but I think that would improve usability overall.

For the severely picky, something could be put into about:config. A setting to turn on or off the behavior should do it. The nag box could also have a checkbox that says "Do not bother me again" or similar.
(In reply to comment #47)
> Re: Jo Hermans. Measure it. Average and standard deviation the first 100 or so
> uses of the awesome bar. If at any point after this the time exceeds, say, 3
> standard deviations - either vacuum or prompt for vacuum. Then, re-baseline.
There are some problems with this method:
- if there was some data in places, then it mighty be actually too slow and you are using that for benchmark
- if there was no data (fresh profile) the results are probably too good, as the amount of data is too small...

I guess it would make sense to run cleanup after some number of transactions (if possible deletes)
(In reply to comment #48)
>- if there was some data in places, then it mighty be actually too slow and you
are using that for benchmark

This is a good point, but I would start with a fresh install (or at least a fresh vacuum), which would - as you point out - likely err on the side of "too fast".

> - if there was no data (fresh profile) the results are probably too good, as
> the amount of data is too small...

This would self-correct if the average and standard deviation were zeroed after each vacuum operation.
I think we should prioritize vacuuming in one very specific case. A lot of our older profiles are still using 4K pages. We have since bumped to 8K and then to 32K. Yet many users are still stuck on 4K pages. So i think we should force a vacuum for any db with an obsolete page setting. Obsolete page size imples a) that it's an old profile and b) there is a perf win to be had.
As per the current summary, this would only happen on major upgrades, so if it lands in time for Fx 4 and doesn't land on the 3.6 branch, this would only take effect at a 4.x to 5.x upgrade right? Therefore if this misses 4.0 (and also doesn't subsequently ship in a 4.x minor release) it won't run until 5.x -> 6.x? I think that's a good enough reason to at least look at the blocking status.
blocking2.0: --- → ?
Hmm, I definitely should have put more thought into that as it doesn't make sense (the new version will be running the code post upgrade, not the old version).

My point was supposed to be that a user only does a major upgrade once per cycle, so if this isn't fixed in the build they're upgrading to then it will be a long time until they next get a major upgrade. If the performance hit is severe for users on 4K pages (as opposed to 32K per Taras's comment 50) then this is not good. Of course major upgrades aren't usually offered in the .0 of a branch so that leaves more time, but only if this is something that would be accepted in a minor release.

Sorry for the double post.
It's not even clear how we want to do this yet, and consumers can opt into a regularly scheduled vacuum with code added in bug 541373.  In fact, I'm going to say we don't want to do this, and bug 541373 is the way to go moving forward.
Status: NEW → RESOLVED
blocking2.0: ? → -
Closed: 14 years ago
Resolution: --- → WONTFIX
Flags: wanted-firefox3.6?
You need to log in before you can comment on or make changes to this bug.