1588068 - IDB: Database integrity check can't be aborted

Reporter

Description

•

6 years ago

I reported this issue in bug 1541776 comment 22. We use progress handlers (sqlite3_progress_handler) with all database connections to check if database operations need to be aborted. This is especially useful during shutdown. However, it seems that "pragma integrity_check" doesn't use the progress handler. Integrity checks may last long time which sometimes results in shutdown hangs.

I found out that this was already requested by some users, for example here:
https://www.sqlite.org/src/info/33fdaab1a0

Jan Varga [:janv]

Reporter

Comment 1

•

6 years ago

:drh, do you think this could be fixed in sqlite ?

Flags: needinfo?(drh)

D. Richard Hipp

Comment 2

•

6 years ago

PRAGMA integrity_check does invoke sqlite3_progress_handler, just like every other statement. The progress-handler callback is invoked on certain byte-code operations, typically jumps. If those byte-code jump instructions are not performed, then the progress-handler is not invoked.

PRAGMA integrity_check does perform jumps. But there is one opcode (the IntegrityCk opcode, which you can see near the top of an EXPLAIN listing for PRAGMA integrity_check) that scans the entire database for low-level formatting errors, and that one opcode can run for several seconds or more, depending on the size of the database and/or I/O performance. All the while that the IntegrityCk opcode is running, no progress-handler callbacks are invoked.

The IntegrityCk opcode is not the only opcode that might run for a while. The Count opcode does a high-speed count of the number of rows in a table and is used to optimize "SELECT count(*) ..." queries. It also can take a while on a database with a large number of rows (though it is much much faster than loading and counting rows separately, which is why the opcode exists in the first place.)

We could, in theory, split up the IntegrityCk opcode into multiple opcodes that run in a loop, and thus invoke the progress-handler more frequently. But there is a trade-off. Splitting up the opcode means that the "PRAGMA integrity_check" command uses more memory and runs slower - even for the overwhelmingly common case where the application is not using progress-handler.

I'm thinking that it might not be a good choice to slow down billions of mobile phones in order to help FF shutdown a little faster in an obscure corner case where FF happens to be running an integrity check at the same moment that the user selects "Close". Why can't you just kill the thread in which the integrity-check is running? Or, for that matter, why can't FF simply shutdown by invoking exit(0)?

Flags: needinfo?(drh)

Jan Varga [:janv]

Reporter

Comment 3

•

6 years ago

Ok, thanks for the explanation. Firefox shutdown is quite complicated thing. See https://github.com/asutherland/asuth-gecko-notes/blob/master/ipc/SHUTDOWN.md
We do use exit(0) for content processes in optimized builds, but the main process must handle open transactions, save telemetry, etc. So we can't just call exit(0) in the main process. We could try to kill the thread, because IndexedDB has currently its own dedicated thread pool for database maintenance. However that might change in future since we're trying to reduce number of threads in general and reuse thread pools if possible.

What about sqlite3_interrupt ? With some additional modifications to support "integrity_check", would that slow down SQLite too ?

D. Richard Hipp

Comment 4

•

6 years ago

it would be possible to add additional interrupt-checks to the IntegrityCk opcode. There would still be performance impact, but the impact would not be nearly as great as splitting IntegrityCk into multiple opcodes and continuing to use progress-handler. Using sqlite3_interrupt() is definitely a better option. It is probably also better for FF in general, as it does not involve frequent calls to the progress handler callback.

Jan Varga [:janv]

Reporter

•

6 years ago

Blocks: 1541370

Jens Stutte [:jstutte]

Updated

•

5 years ago

Severity: normal → S3

Haiyang Xu

Updated

•

4 years ago

Assignee: nobody → hxu

Haiyang Xu

Comment 8

•

4 years ago

•

Edited

In https://bugzilla.mozilla.org/show_bug.cgi?id=1320301#c0, Macro mentioned that

A partial support to sqlite3_interrupt may be useful for this use-case.

Only for read-only connections (it's too scary to interrupt random writes)

But actually in the official documentation of sqlite3, it is stated that sqlite3_interrupt can be also to interrupt write operations such as INSERT, UPDATE and DELETE. And when those operations are interrupted, the entire transaction will be rolled back. So I suppose, we can safely use sqlite3_interrupt to replace the current usage of sqlite3_progress_handler, especially in shutdown scenario.

Andrew Sutherland [:asuth] (he/him)

Comment 9

•

4 years ago

Yeah, to be clear, I think Marco was just indicating that for the "places" subsystem's purposes that interrupting writes would be scary. Largely, I think because for "places", the expensive/long-running queries are all read-only queries made on the read-only connection, so there was little benefit to places to interrupt writes, especially as interrupting a write would potentially lose user data.

In contrast, for IndexedDB:

We have no problem interrupting maintenance since the data isn't changing, we're just trying to optimize.
IDB transactions would be aborted/rolled back if we're ever planning to interrupt them, so there's no harm in interrupting a write.

Haiyang Xu

Comment 10

•

4 years ago

:asuth, Thanks for the clarification!

Jan Varga [:janv]

Reporter

Comment 11

•

4 years ago

Yeah, but we found out yesterday that Interrupt currently only works with read-only databases:
https://searchfox.org/mozilla-central/rev/2c4b830b924f42283632b70f39a60fd36433dd4d/storage/mozStorageConnection.cpp#1746

And it seems that the method is intended for async connections overall, so we will need to introduce some changes in mozStorage before we can replace the progress handler in IndexedDB.

Andrew Sutherland [:asuth] (he/him)

Comment 12

•

4 years ago

Yes, there are definitely mozStorage enhancements that are necessary. It likely could make sense to do that as part of addressing bug 1121282 which would change our ownership constraints, as it would still make sense for there to be a specific nsISerialEventTarget that would be the expected source of an interrupt call being made on a synchronous connection. For all Quota Manager clients, that presumably would be the QuotaManager I/O thread or specific nsISerialEventTarget otherwise associated with the given DirectoryLock.

Haiyang Xu

Comment 13

•

4 years ago

:drh, do you think if it is safe to use sqlite3_interrupt() for a read-write connection?

There are some assumption in our code indicating that it's not safe to use it for a read-write connection. See mozStorageConnection.cpp, this patch and Marco's comment in Bug 1320301

We are not sure if these assumption are still true? The official doc says it's

An SQL operation that is interrupted will return SQLITE_INTERRUPT. If the interrupted SQL operation is an INSERT, UPDATE, or DELETE that is inside an explicit transaction, then the entire transaction will be rolled back automatically.

So I assume it's safe to use it even for read-write connections?

D. Richard Hipp

Comment 14

•

4 years ago

sqlite3_interrupt() is safe to use even for a read-write connection.

Andrew Sutherland [:asuth] (he/him)

Comment 15

•

4 years ago

(In reply to Haiyang Xu from comment #13)

There are some assumption in our code indicating that it's not safe to use it for a read-write connection. See mozStorageConnection.cpp, this patch and Marco's comment in Bug 1320301

Just to reiterate, these were policy decisions that we made for the consumer that wanted interrupt functionality (places) and which we documented into the code to make sure people didn't change the policy decisions without consideration and potentially updating existing callers that might depend on those safeguards. I don't think there's anything unsound about using interrupt for IDB here as long as we properly address the "But it is not safe to call this routine with a database connection that is closed or might close before sqlite3_interrupt() returns." point documented in https://www.sqlite.org/c3ref/interrupt.html and the related ownership invariants mozStorage has traditionally maintained.

:mak, can you confirm my understanding of the situation in comment 9 onwards?

Flags: needinfo?(mak)

Marco Bonardo [:mak]

Comment 16

•

4 years ago

•

Edited

When we introduced the use of sqlite3_interrupt, we wanted to do it gradually.
When I said it's unsafe, I didn't really meant that from a SQLite integrity point of view, I meant from the "it's easy to misuse it" point of view.
In a use-case like Places, where we have multiple threads writing, calling sqlite3_interrupt to interrupt long work on thread1, could easily cause unexpected dataloss by interrupting writes from thread2. Those cases would need some kind of synchronization and the API should try to protect from that kind of misuse. There are still probably other components using multiple write threads even if we tried to change most of them to the async-only connection, or their own thread and sync connection (Fwiw, I'd love to see a flag that makes impossible to create new hybrid sync/async connections, so we don't add more consumers, and then those marked hybrid connection could NOT use interrupt).
The decision to allow it only on read-only connections was actually to speed up its adoption, to interrupt long reads (improve responsiveness of certain UI pieces).

A connection that keeps all the operations on the same thread, is surely easier to get right, thus we could change the policy to be that. You should still ensure to handle the "closed connection" case properly, and ensure synchronization within your thread, so that if thread2 invokes interrupt, thread1 doesn't enqueue further operations until interrupt is done (otherwise I guess you may end up in a situation where thread1 unexpectedly finishes, queues up another write, and you end up interrupting that).

Thus yes, we can absolutely do it, but we must limit the possibility of misuse as far as possible.

Flags: needinfo?(mak)

Haiyang Xu

•

4 years ago

(In reply to Marco Bonardo [:mak] from comment #18)

I'm confused, the patch is not what we discussed, is this a temporary workaround because we have a short deadline?

No. it's just the first part. Will have follow-up patches to replace current usage of sqlite3_progress_handler with sqlite3_interrupt.

Jan Varga [:janv]

Reporter

Comment 20

•

4 years ago

Haiyang,
I think Marco provided enough info in comment 16, there are multiple things which need to be addressed.
We need to look at it from generic mozStorage point of view (there are for example async connections which we don't use in IndexedDB).

Flags: needinfo?(jvarga)

Jan Varga [:janv]

Reporter

Comment 21

•

4 years ago

Also, it's probably a good idea to file a new mozStorage bug for interrupt changes.

Haiyang Xu

Comment 22

•

4 years ago

(In reply to Jan Varga [:janv] from comment #21)

Also, it's probably a good idea to file a new mozStorage bug for interrupt changes.

Yeah. I agree since it's a bit out of the topic of this bug.
See Bug1736844

Haiyang Xu

Comment 23

•

4 years ago

Attached file Bug 1588068 - Use sqlite3_interrupt to abort database maintenance operation. r=#dom-storage — Details

Depends on D129023

Phabricator Automation

Comment 24

•

4 years ago

Comment on attachment 9246851 [details]
Bug 1588068 - Extend Connection::Interrupt() in storage API with force option; r=#dom-storage-reviewers

Revision D129023 was moved to bug 1736844. Setting attachment 9246851 [details] to obsolete.

Attachment #9246851 - Attachment is obsolete: true

Jari Jalkanen [:jari]

Assignee

Updated

•

3 years ago

Comment 25

•

3 years ago

Jari took over the patch.

Assignee: hxu → jjalkanen

BugBot [:suhaib / :marco/ :calixte]

Comment 26

•

3 years ago

The leave-open keyword is there and there is no activity for 6 months.
:jjalkanen, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(jjalkanen)

Jari Jalkanen [:jari]

Assignee

Comment 27

•

3 years ago

•

Edited

The patch needs to be rebased but due to other priorities the final review is delayed. Two rounds of comments have already been done.

Flags: needinfo?(jjalkanen)

Phabricator Automation

Updated

•

2 years ago

Attachment #9247090 - Attachment description: Bug 1588068 - Use sqlite3_interrupt() to abort database maitenance operation; r=#dom-storage-reviewers → Bug 1588068 - Use sqlite3_interrupt to abort database maintenance operation. r=#dom-storage

Pulsebot

Comment 28

•

2 years ago

Pushed by jjalkanen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/da64a3f29eb3 Use sqlite3_interrupt to abort database maintenance operation. r=dom-storage-reviewers,asuth

Treeherder Bug Filer

Updated

•

2 years ago

Regressions: 1834084

Serban Stanca [:SerbanS]

Comment 29

•

2 years ago

Backed out for causing xpcshell failures in test_idle_maintenance.js.

Backout link
Push with failures
Failure Log
Failure line: TEST-UNEXPECTED-FAIL | xpcshell.ini:dom/indexedDB/test/unit/test_idle_maintenance.js | testSteps/< - [testSteps/< : 196] Maintenance decreased file sizes or left them the same - false == true

Flags: needinfo?(jjalkanen)

Jari Jalkanen [:jari]

Assignee

Comment 30

•

2 years ago

The test needs to be investigated.

Flags: needinfo?(jjalkanen)

Pulsebot

Comment 31

•

2 years ago

Pushed by jjalkanen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/04b1cb7b7266 Use sqlite3_interrupt to abort database maintenance operation. r=dom-storage-reviewers,asuth

Cristina Horotan [:chorotan]

Comment 32

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/04b1cb7b7266

Jari Jalkanen [:jari]

Assignee

Comment 33

•

2 years ago

The test was originally changed to trigger maintenance more often. However, this also increased the rate at which the trigger is sent too early because the orchestration of the operations is based on timeouts, due to run-to-completion semantics of the API. The test test_idle_maintenance.js was then reverted back to the original version which stores less data and is therefore more reliable with a small timeout.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → FIXED

BugBot (nomail) [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

Keywords: leave-open

Bug 1588068 - Extend Connection::Interrupt() in storage API with force option; r=#dom-storage-reviewers 4 years ago Haiyang Xu 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1588068 - Use sqlite3_interrupt to abort database maintenance operation. r=#dom-storage 4 years ago Haiyang Xu 48 bytes, text/x-phabricator-request		Details \| Review