Bug 1869060 Comment 4 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Marco Bonardo [:mak]

on 2023-12-11 05:20:20 PST

I may need more time to think about the technicalities. But let me throw out some thoughts.

(In reply to Mike Conley (:mconley) (:⚙️) from comment #2)
> 1. It uses a Promise as a return value from mozIStorageAsyncConnection, whereas the rest of that interface tends to use callbacks

The main reason is that callbacks used to be easier to handle on the cpp side, and Storage has quite a few cpp consumers.
I'm not sure how ergonomic cpp Promises are today, I think they need a js context and boilerplate. Sqlite.sys.mjs is pretty much our promise-happy wrapper around Storage, so it remains easy to use on both languages. What's the reason for the API to return a promise, how does that help us?

> 2. The created backup file _does_ appear to contain things like bookmarks and history from the originating database - however, it's also significantly smaller in filesize from the original.

This may or may not be a problem. If you do a data comparison are contents identical?
The size difference may just be due to Chunked Growth (we artificially pump the size of the db to avoid disk fragmentation of future insertions).
Try setting `places.database.growthIncrementKiB` to 0 and see if you get a db of the same size.

> 3. [...] I figured if I did this on the async execution thread, I'd actually be blocking any other queries from being made, since the thread would either be doing the backing up or sleeping. Is it a bad idea to work with the SQLite db handle on something like a low-priority background thread like this?

I don't know off-hand, must think more about it, but I think it would be better to open a new handle. Reusing the same handle on a third thread, as some connections like Places already use two, makes me a bit nervous.

> 4. Reading through the documentation at https://www.sqlite.org/backup.html and https://www.sqlite.org/c3ref/backup_finish.html, it _seems_ like we're configured in such a way that the backup would eventually finish and never be stuck in a situation where it'd need to continually restart 

I think it's sufficient for the backup handle to be in the same process as the writing handle.
Places, due to history, is written often so it's a good candidate to test.

> 5. Any other gotchas that I'm not considering here?

6. how do we plan to handle a corruption, if it's detected during a backup operation (I assume the backup api would return SQLITE_CORRUPT, so detection should be possible). Would we just annotate the db and give up trying to backup it? (there is a recover API now, but let's not add too much here).
7. how do we ensure a backup has enough time to complete in a meaningful timeframe? I'd like to see some example measurements of the time necessary to backup a large database (let's say 100MiB to be realistic) on a mechanical disk slow machine. Will users having most short sessions have sufficient time to do it? Do we have telemetry around users having mostly short sessions (how many users, average session time)?

Revision 1 by

Marco Bonardo [:mak]

on 2023-12-11 05:22:46 PST

I may need more time to think about the technicalities. But let me throw out some thoughts.

(In reply to Mike Conley (:mconley) (:⚙️) from comment #2)
> 1. It uses a Promise as a return value from mozIStorageAsyncConnection, whereas the rest of that interface tends to use callbacks

The main reason is that callbacks used to be easier to handle on the cpp side, and Storage has quite a few cpp consumers.
I'm not sure how ergonomic cpp Promises are today, I think they need a js context and boilerplate. Sqlite.sys.mjs is pretty much our promise-happy wrapper around Storage, so it remains easy to use on both languages. What's the reason for the API to return a promise, how does that help us?

> 2. The created backup file _does_ appear to contain things like bookmarks and history from the originating database - however, it's also significantly smaller in filesize from the original.

This may or may not be a problem. If you do a data comparison are contents identical?
The size difference may just be due to Chunked Growth (we artificially pump the size of the db to avoid disk fragmentation of future insertions).
Try setting `places.database.growthIncrementKiB` to 0 and see if you get a db of the same size.

> 3. [...] I figured if I did this on the async execution thread, I'd actually be blocking any other queries from being made, since the thread would either be doing the backing up or sleeping. Is it a bad idea to work with the SQLite db handle on something like a low-priority background thread like this?

I don't know off-hand, must think more about it, but I think it would be better to open a new handle. Reusing the same handle on a third thread, as some connections like Places already use two, makes me a bit nervous.

> 4. Reading through the documentation at https://www.sqlite.org/backup.html and https://www.sqlite.org/c3ref/backup_finish.html, it _seems_ like we're configured in such a way that the backup would eventually finish and never be stuck in a situation where it'd need to continually restart 

I think it's sufficient for the backup handle to be in the same process as the writing handle.
Places, due to history, is written often so it's a good candidate to test.

> 5. Any other gotchas that I'm not considering here?

6. how do we plan to handle a corruption, if it's detected during a backup operation (I assume the backup api would return SQLITE_CORRUPT, so detection should be possible). Would we just annotate the db and give up trying to backup it? (there is a recover API now, but let's not add too much here).
7. how do we ensure a backup has enough time to complete in a meaningful timeframe? I'd like to see some example measurements of the time necessary to backup a large database (let's say 100MiB to be realistic) on a mechanical disk slow machine. Will users having most short sessions have sufficient time to do it? Do we have telemetry around users having mostly short sessions (how many users, average session time)?
8. I couldn't find (yet) how SQLite handles partial backups... do they use a tmp folder or a tmp file name? If the backup is interrupted I mean.

Revision 2 by

Marco Bonardo [:mak]

on 2023-12-11 05:23:52 PST

I may need more time to think about the technicalities. But let me throw out some thoughts.

(In reply to Mike Conley (:mconley) (:⚙️) from comment #2)
> 1. It uses a Promise as a return value from mozIStorageAsyncConnection, whereas the rest of that interface tends to use callbacks

The main reason is that callbacks used to be easier to handle on the cpp side, and Storage has quite a few cpp consumers.
I'm not sure how ergonomic cpp Promises are today, I think they need a js context and boilerplate. Sqlite.sys.mjs is pretty much our promise-happy wrapper around Storage, so it remains easy to use on both languages. What's the reason for the API to return a promise, how does that help us?

> 2. The created backup file _does_ appear to contain things like bookmarks and history from the originating database - however, it's also significantly smaller in filesize from the original.

This may or may not be a problem. If you do a data comparison are contents identical?
The size difference may just be due to Chunked Growth (we artificially pump the size of the db to avoid disk fragmentation of future insertions).
Try setting `places.database.growthIncrementKiB` to 0 and see if you get a db of the same size.

> 3. [...] I figured if I did this on the async execution thread, I'd actually be blocking any other queries from being made, since the thread would either be doing the backing up or sleeping. Is it a bad idea to work with the SQLite db handle on something like a low-priority background thread like this?

I don't know off-hand, must think more about it, but I think it would be better to open a new handle. Reusing the same handle on a third thread, as some connections like Places already use two, makes me a bit nervous.

> 4. Reading through the documentation at https://www.sqlite.org/backup.html and https://www.sqlite.org/c3ref/backup_finish.html, it _seems_ like we're configured in such a way that the backup would eventually finish and never be stuck in a situation where it'd need to continually restart 

I think it's sufficient for the backup handle to be in the same process as the writing handle.
Places, due to history, is written often so it's a good candidate to test.

> 5. Any other gotchas that I'm not considering here?

6. how do we plan to handle a corruption, if it's detected during a backup operation (I assume the backup api would return SQLITE_CORRUPT, so detection should be possible). Would we just annotate the db and give up trying to backup it? (there is a recover API now, but let's not add too much here).
7. how do we ensure a backup has enough time to complete in a meaningful timeframe? I'd like to see some example measurements of the time necessary to backup a large database (let's say 100MiB to be realistic) on a mechanical disk slow machine. Will users having most short sessions have sufficient time to do it? Do we have telemetry around users having mostly short sessions (how many users, average session time)?
8. I couldn't find (yet) how SQLite handles partial backups... do they use a tmp folder or a tmp file name? If the backup is interrupted I mean.
9. we surely want telemetry for time to backup, success/failure/interruption, per db telemetryName.

Back to Bug 1869060 Comment 4

Bugzilla

Quick Search

Bug 1869060 Comment 4 Edit History