Closed Bug 1588068 Opened 5 years ago Closed 11 months ago

IDB: Database integrity check can't be aborted

Categories

(Core :: Storage: IndexedDB, defect, P2)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: janv, Assigned: jjalkanen)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

I reported this issue in bug 1541776 comment 22. We use progress handlers (sqlite3_progress_handler) with all database connections to check if database operations need to be aborted. This is especially useful during shutdown. However, it seems that "pragma integrity_check" doesn't use the progress handler. Integrity checks may last long time which sometimes results in shutdown hangs.

I found out that this was already requested by some users, for example here:
https://www.sqlite.org/src/info/33fdaab1a0

:drh, do you think this could be fixed in sqlite ?

Flags: needinfo?(drh)

PRAGMA integrity_check does invoke sqlite3_progress_handler, just like every other statement. The progress-handler callback is invoked on certain byte-code operations, typically jumps. If those byte-code jump instructions are not performed, then the progress-handler is not invoked.

PRAGMA integrity_check does perform jumps. But there is one opcode (the IntegrityCk opcode, which you can see near the top of an EXPLAIN listing for PRAGMA integrity_check) that scans the entire database for low-level formatting errors, and that one opcode can run for several seconds or more, depending on the size of the database and/or I/O performance. All the while that the IntegrityCk opcode is running, no progress-handler callbacks are invoked.

The IntegrityCk opcode is not the only opcode that might run for a while. The Count opcode does a high-speed count of the number of rows in a table and is used to optimize "SELECT count(*) ..." queries. It also can take a while on a database with a large number of rows (though it is much much faster than loading and counting rows separately, which is why the opcode exists in the first place.)

We could, in theory, split up the IntegrityCk opcode into multiple opcodes that run in a loop, and thus invoke the progress-handler more frequently. But there is a trade-off. Splitting up the opcode means that the "PRAGMA integrity_check" command uses more memory and runs slower - even for the overwhelmingly common case where the application is not using progress-handler.

I'm thinking that it might not be a good choice to slow down billions of mobile phones in order to help FF shutdown a little faster in an obscure corner case where FF happens to be running an integrity check at the same moment that the user selects "Close". Why can't you just kill the thread in which the integrity-check is running? Or, for that matter, why can't FF simply shutdown by invoking exit(0)?

Flags: needinfo?(drh)

Ok, thanks for the explanation. Firefox shutdown is quite complicated thing. See https://github.com/asutherland/asuth-gecko-notes/blob/master/ipc/SHUTDOWN.md
We do use exit(0) for content processes in optimized builds, but the main process must handle open transactions, save telemetry, etc. So we can't just call exit(0) in the main process. We could try to kill the thread, because IndexedDB has currently its own dedicated thread pool for database maintenance. However that might change in future since we're trying to reduce number of threads in general and reuse thread pools if possible.

What about sqlite3_interrupt ? With some additional modifications to support "integrity_check", would that slow down SQLite too ?

it would be possible to add additional interrupt-checks to the IntegrityCk opcode. There would still be performance impact, but the impact would not be nearly as great as splitting IntegrityCk into multiple opcodes and continuing to use progress-handler. Using sqlite3_interrupt() is definitely a better option. It is probably also better for FF in general, as it does not involve frequent calls to the progress handler callback.

Ok, sounds good, can we expect that in some future release of SQLite ?
And thanks for letting us know about the performance impact, I'll file a new bug to replace the progress handler with interrupt calls where possible.

SQLite version 3.31.0 will probably contain faster response to sqlite3_interrupt() in the OP_IntegrityCk and OP_Count opcodes.

Awesome, thanks!

No longer blocks: 1541370
Blocks: 1541370
Severity: normal → S3
Assignee: nobody → hxu

In https://bugzilla.mozilla.org/show_bug.cgi?id=1320301#c0, Macro mentioned that

A partial support to sqlite3_interrupt may be useful for this use-case.

  1. Only for read-only connections (it's too scary to interrupt random writes)

But actually in the official documentation of sqlite3, it is stated that sqlite3_interrupt can be also to interrupt write operations such as INSERT, UPDATE and DELETE. And when those operations are interrupted, the entire transaction will be rolled back. So I suppose, we can safely use sqlite3_interrupt to replace the current usage of sqlite3_progress_handler, especially in shutdown scenario.

Yeah, to be clear, I think Marco was just indicating that for the "places" subsystem's purposes that interrupting writes would be scary. Largely, I think because for "places", the expensive/long-running queries are all read-only queries made on the read-only connection, so there was little benefit to places to interrupt writes, especially as interrupting a write would potentially lose user data.

In contrast, for IndexedDB:

  • We have no problem interrupting maintenance since the data isn't changing, we're just trying to optimize.
  • IDB transactions would be aborted/rolled back if we're ever planning to interrupt them, so there's no harm in interrupting a write.

:asuth, Thanks for the clarification!

Yeah, but we found out yesterday that Interrupt currently only works with read-only databases:
https://searchfox.org/mozilla-central/rev/2c4b830b924f42283632b70f39a60fd36433dd4d/storage/mozStorageConnection.cpp#1746

And it seems that the method is intended for async connections overall, so we will need to introduce some changes in mozStorage before we can replace the progress handler in IndexedDB.

Yes, there are definitely mozStorage enhancements that are necessary. It likely could make sense to do that as part of addressing bug 1121282 which would change our ownership constraints, as it would still make sense for there to be a specific nsISerialEventTarget that would be the expected source of an interrupt call being made on a synchronous connection. For all Quota Manager clients, that presumably would be the QuotaManager I/O thread or specific nsISerialEventTarget otherwise associated with the given DirectoryLock.

:drh, do you think if it is safe to use sqlite3_interrupt() for a read-write connection?

There are some assumption in our code indicating that it's not safe to use it for a read-write connection. See mozStorageConnection.cpp, this patch and Marco's comment in Bug 1320301

We are not sure if these assumption are still true? The official doc says it's

An SQL operation that is interrupted will return SQLITE_INTERRUPT. If the interrupted SQL operation is an INSERT, UPDATE, or DELETE that is inside an explicit transaction, then the entire transaction will be rolled back automatically.

So I assume it's safe to use it even for read-write connections?

sqlite3_interrupt() is safe to use even for a read-write connection.

(In reply to Haiyang Xu from comment #13)

There are some assumption in our code indicating that it's not safe to use it for a read-write connection. See mozStorageConnection.cpp, this patch and Marco's comment in Bug 1320301

Just to reiterate, these were policy decisions that we made for the consumer that wanted interrupt functionality (places) and which we documented into the code to make sure people didn't change the policy decisions without consideration and potentially updating existing callers that might depend on those safeguards. I don't think there's anything unsound about using interrupt for IDB here as long as we properly address the "But it is not safe to call this routine with a database connection that is closed or might close before sqlite3_interrupt() returns." point documented in https://www.sqlite.org/c3ref/interrupt.html and the related ownership invariants mozStorage has traditionally maintained.

:mak, can you confirm my understanding of the situation in comment 9 onwards?

Flags: needinfo?(mak)

When we introduced the use of sqlite3_interrupt, we wanted to do it gradually.
When I said it's unsafe, I didn't really meant that from a SQLite integrity point of view, I meant from the "it's easy to misuse it" point of view.
In a use-case like Places, where we have multiple threads writing, calling sqlite3_interrupt to interrupt long work on thread1, could easily cause unexpected dataloss by interrupting writes from thread2. Those cases would need some kind of synchronization and the API should try to protect from that kind of misuse. There are still probably other components using multiple write threads even if we tried to change most of them to the async-only connection, or their own thread and sync connection (Fwiw, I'd love to see a flag that makes impossible to create new hybrid sync/async connections, so we don't add more consumers, and then those marked hybrid connection could NOT use interrupt).
The decision to allow it only on read-only connections was actually to speed up its adoption, to interrupt long reads (improve responsiveness of certain UI pieces).

A connection that keeps all the operations on the same thread, is surely easier to get right, thus we could change the policy to be that. You should still ensure to handle the "closed connection" case properly, and ensure synchronization within your thread, so that if thread2 invokes interrupt, thread1 doesn't enqueue further operations until interrupt is done (otherwise I guess you may end up in a situation where thread1 unexpectedly finishes, queues up another write, and you end up interrupting that).

Thus yes, we can absolutely do it, but we must limit the possibility of misuse as far as possible.

Flags: needinfo?(mak)
Keywords: leave-open

This patch extends the Connection::Interrupt() method with an optional parameter: aForce. When passed in aForce = true, the underlying sqlite3 database connection will be interrupted immediately regardless of connection type. Note: NS_ERROR_NOT_INITIALIZED will be throwed if the database connection is not ready or already closed.

I'm confused, the patch is not what we discussed, is this a temporary workaround because we have a short deadline?

Flags: needinfo?(jvarga)

(In reply to Marco Bonardo [:mak] from comment #18)

I'm confused, the patch is not what we discussed, is this a temporary workaround because we have a short deadline?

No. it's just the first part. Will have follow-up patches to replace current usage of sqlite3_progress_handler with sqlite3_interrupt.

Haiyang,
I think Marco provided enough info in comment 16, there are multiple things which need to be addressed.
We need to look at it from generic mozStorage point of view (there are for example async connections which we don't use in IndexedDB).

Flags: needinfo?(jvarga)

Also, it's probably a good idea to file a new mozStorage bug for interrupt changes.

(In reply to Jan Varga [:janv] from comment #21)

Also, it's probably a good idea to file a new mozStorage bug for interrupt changes.

Yeah. I agree since it's a bit out of the topic of this bug.
See Bug1736844

Comment on attachment 9246851 [details]
Bug 1588068 - Extend Connection::Interrupt() in storage API with force option; r=#dom-storage-reviewers

Revision D129023 was moved to bug 1736844. Setting attachment 9246851 [details] to obsolete.

Attachment #9246851 - Attachment is obsolete: true
See Also: → 1755630

Jari took over the patch.

Assignee: hxu → jjalkanen

The leave-open keyword is there and there is no activity for 6 months.
:jjalkanen, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(jjalkanen)

The patch needs to be rebased but due to other priorities the final review is delayed. Two rounds of comments have already been done.

Flags: needinfo?(jjalkanen)
Attachment #9247090 - Attachment description: Bug 1588068 - Use sqlite3_interrupt() to abort database maitenance operation; r=#dom-storage-reviewers → Bug 1588068 - Use sqlite3_interrupt to abort database maintenance operation. r=#dom-storage
Pushed by jjalkanen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/da64a3f29eb3
Use sqlite3_interrupt to abort database maintenance operation. r=dom-storage-reviewers,asuth
Regressions: 1834084

Backed out for causing xpcshell failures in test_idle_maintenance.js.

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: TEST-UNEXPECTED-FAIL | xpcshell.ini:dom/indexedDB/test/unit/test_idle_maintenance.js | testSteps/< - [testSteps/< : 196] Maintenance decreased file sizes or left them the same - false == true
Flags: needinfo?(jjalkanen)

The test needs to be investigated.

Flags: needinfo?(jjalkanen)
Pushed by jjalkanen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/04b1cb7b7266
Use sqlite3_interrupt to abort database maintenance operation. r=dom-storage-reviewers,asuth

The test was originally changed to trigger maintenance more often. However, this also increased the rate at which the trigger is sent too early because the orchestration of the operations is based on timeouts, due to run-to-completion semantics of the API. The test test_idle_maintenance.js was then reverted back to the original version which stores less data and is therefore more reliable with a small timeout.

Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: