Closed Bug 634374 Opened 11 years ago Closed 4 years ago

Fix page_size of WAL databases

Categories

(Toolkit :: Storage, defect)

defect
Not set
normal

Tracking

()

RESOLVED INACTIVE

People

(Reporter: mak, Unassigned)

References

Details

(Keywords: perf, Whiteboard: p=0)

In Firefox 4 we had to disable this due to lock contention (some databases like places.sqlite have multiple connections attached, see bug 628921), we have to find a way to properly support this, since most of the users are not on the default page_size we would like.

A possible approach could be to have the Vacuum Manager store somewhere (prefs?) a list of the databases that need page_size change, we could then proceed fixing them at upgrade time.  While usually we avoid doing stuff during upgrade, this would be a event that happens just once (unless we Globally change the Storage suggested page_size).
Keywords: perf
idea: create a update-tasks category that is invoked during upgrade process, add vacuum manager to it.
I wonder if we may use the new Maintenance service to fix this, though the service should be able to access a SQLite implementation.
If we do need to be elevated I think it would be better to do this at upgrade time via a one time operation in /PostProcess in browser/installer/windows/nsis/shared.nsh because not all machines will have maintenance service installed.  

Can this be done before we open the database inside core code though? 
I think it's best to avoid upgrade changes completely when possible since all changes to application update brings significant risk.

Maintenance service should be used only when:
- We need elevated access (or system access)
- We need to do things more than once in between upgrades
- It is not critically important that we do the task at all, but it improves the user experience in some way.
(In reply to Brian R. Bondy [:bbondy] from comment #3)
> Can this be done before we open the database inside core code though? 

Yes, though it may be a slow operation causing the UI to delay by many seconds.

This must be done when the database is closed and we can interrupt the user for seconds, that makes it practically impossible from the app itself. Doesn't need elevated privileges.
Currently we don't have a good hook point for this kind of maintenance.

> - It is not critically important that we do the task at all, but it improves
> the user experience in some way.

this is not critically important, though it's a thing that once done may reduce memory usage and increase perf.
Btw, I was just guessing the kind of acceptable use-cases for the service.
> Btw, I was just guessing the kind of acceptable use-cases for the service.

Ya I don't think we fully nailed down the criteria yet, that was just my best effort guess above. 

Is this something that we may need to do periodically? 
Perhaps we could do it on app shutdown after the sqlite db is closed?  Causing a slightly longer than normal application shutdown?
It's sort of an upgrade task that we should do just once for each affected database, but should not appear as a long startup or even worse an hang.
Long shutdowns are not seen as a viable solution, especially now that perf team is considering to use abrupt exit(0).

We have a possible alternative solution that may take a while to implement (involves having all statements coming from a common cache, so we can finalize and restart all of them at once), that would allow to do it even when the browser is open (on idle), may work in most cases though it couldn't work if any add-on opens its own connection to one of these databases.
A couple of problems with doing this via updater:
- We relaunch the browser after updates, which means we'd have to do this as a sync operation.  The postupdate process (which would be the most obvious place to do it) is currently an async operation with Firefox. 
- The post update process only works for the current user running the update and I don't believe there is code to do things per profile in updater at all. 

Another option would be at application shutdown to call helper.exe (same as we use for uninstaller and for setting the default browser) and to do the work inside helper.exe with a special command line.  I'm a bit concerned about what happens if the user tries to open the app while this is happening though, so you may have to for example lock the exe or add code to wait to open the db if the user relaunches.
You can deal with the concurrency issue by opening the db in a non-exclusive mode and copying out the data into a temporary file first(that's faster than vacuum anyway), then moving it back if the file is still available.
I meant to say I agree with Brian, that a helper process should do this on shutdown as he described.
(Copying the sqlite db file and then performing the operation and then moving back)++;
Would using helper.exe show some kind of feedback to the user, like a progress bar?
no, we could do like a wait cursor for example, I guess it could if it had to.
If by "wait cursor" you mean the classic clock cursor, I think it is hated enough by users, and mentally associated to hangs, so wouldn't be that kind of feedback we should give for more than a couple seconds.
I'd prefer no UI in that case, if it's one time only and it will happen on shutdown I'm not sure it's worth that much effort to make some kind of UI.
Blocks: 746486
So, looks like we still have 30% of users on the old pagesize in the release channel, Taras thinks we can compare this data with slow SQL data to figure if these users are having measurable problems.
We still don't have a very good solution apart from blocking everything on startup or shutdown to fix the thing, maybe with a simple progress dialog, so it's worth to gather any data we can to understand if the cost is worth the benefit.
(In reply to Marco Bonardo [:mak] from comment #15)
> We still don't have a very good solution apart from blocking everything on
> startup or shutdown to fix the thing

How about we do it as part of firefox update? I think there's a way to run js after the new version has been patched but before it is relaunched.
(In reply to ben turner [:bent] from comment #16)
> How about we do it as part of firefox update? I think there's a way to run
> js after the new version has been patched but before it is relaunched.

This was discussed in the past, afair the points against that were:
- scope creeping update code
- making the update process take longer and be more annoying
Though, it's likely both can be addressed, by making a separate dedicated component for "profile maintenance" and by the fact we should ensure it won't happen at every update but we instead coalesce needed cleanups at specific times in the product life.
small update: based on telemetry data I got from Taras this bug is still worth it.
We have about 30% of users on the old pagesize, the wins are also interesting, looks like we can reduce the reads times up to 4 times with the right page size. so definitely not a wontfix.

What's left to do is to design the user interaction with this maintenance task.
Creeping up the scope of updates may not be totally nice, both cause updates are a process already "suffered" by the users, and cause having more specialized code doing few things is generally always a win.
We could just make this a special maintenance task that we suggest to the user, maybe like we do when we notify slow startups.
So we could just tell the user we detected a problem in his profile and we can fix it, but that will require restarting the browser and session data will be lost until the restart happens. What we'll do is a background copy of the database, and simply switch the old one on either shutdown or startup. This should not be too much complicated.
Gavin, any idea if this may be satisfying or do you have any better ideas on how to approach this at the browser UI level?
Flags: needinfo?(gavin.sharp)
Summary: Make Vacuum Manager able to fix page_size of WAL databases → Fix page_size of WAL databases
(In reply to Marco Bonardo [:mak] from comment #18)
> So we could just tell the user we detected a problem in his profile and we
> can fix it, but that will require restarting the browser and session data
> will be lost until the restart happens. What we'll do is a background copy
> of the database, and simply switch the old one on either shutdown or
> startup. This should not be too much complicated.

This seems reasonable, but I'm inclined to have this be automatic (without prompting/UI). A one-time slow startup or shutdown seems acceptable if we can guarantee that it only happens once.
Flags: needinfo?(gavin.sharp)
Sure, though it may take many seconds (we can try getting a measure on a slow SD or pendrive with the average db size to guess an estimate) and the user may be tempted to kill the process if we don't show any progress ui.  That has 2 downsides:
1. we may keep trying again and again and he may keep killing the process, at that point all of the startup/shutdown will be slow for that user. Or that may happen when the OS shuts down and again we may try do to that over and over.
2. while it's rare, there's some possibility of corruption if the process is killed at the wrong time.
(In reply to Marco Bonardo [:mak] from comment #20)
> 1. we may keep trying again and again and he may keep killing the process,
> at that point all of the startup/shutdown will be slow for that user. Or
> that may happen when the OS shuts down and again we may try do to that over
> and over.

We should generally guarantee that we only try once, I think. Or at least, don't automatically retry (i.e. require that we manually decide to "try again" for all users).

> 2. while it's rare, there's some possibility of corruption if the process is
> killed at the wrong time.

Yeah, this is a strong argument for a non-silent process. Maybe we should talk to someone from the UX team to figure out a good way to handle this.
Whiteboard: p=0
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.