Open Bug 1125157 Opened 9 years ago Updated 8 months ago

Add an interface allowing to observe a connection and/or gather telemetry, especially database corruption

Categories

(Toolkit :: Storage, enhancement, P2)

enhancement
Points:
5

Tracking

()

People

(Reporter: mak, Unassigned)

References

(Depends on 1 open bug, Blocks 4 open bugs)

Details

We need an interface that can observe given events on a Storage connection.

For example when close is invoked (We need this to cleanup Sqlite.jsm wrappers) or when some sqlite API calls returns SQLITE_CORRUPT (very often single calls detect a corruption, but just opening the database doesn't)
Consumers are likely interested into these events.

I'd guess the notifications should happen on the same thread where the connection was opened, so we can't just use the observer service.

There are 2 possibilities:

1. specific interface per event like { onClose(optionalData), onCorrupt(optionalData) }
2. generic interface like { observe(eventName, optionalData) }

I'm not sure what the data might be, for the 2 given examples, so initially there won't be any data. In future might consider an nsISupports.

Considered the consumer would register for a specific connection, we should not even need to pass it. But I wonder if some consumer might want to register a single observer for multiple connections, in such a case it could be worth to make it

observe(eventName, connection)

Thoughts?
Flags: firefox-backlog+
Points: --- → 5
Flags: needinfo?(bugmail)
I've commented on bug 1091851 directly to better understand the "close" use-case there.

(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #0)
> or when some sqlite API calls returns SQLITE_CORRUPT (very often
> single calls detect a corruption, but just opening the database doesn't)
> Consumers are likely interested into these events.

SQLITE_CORRUPT does seem like the perfect use for a notification for API friendliness.  (Since SQLite won't detect corruption unless we're telling it to do something, there's always going to be a sync or async call whose return value would indicate it.  But it is an ugly thing to have that handler code everywhere.  It basically does demand writing a wrapper around mozStorage!)

> I'd guess the notifications should happen on the same thread where the
> connection was opened, so we can't just use the observer service.

Sounds reasonable to me, especially if we do what I proposed on bug 1121282 so we don't tie consumers down to specific threads for no reason.

I agree that asynchronous dispatches are the right way to go.  The observer service is main-thread only, synchronous, and has its own GC footguns, whereas the dispatch model works better.

> There are 2 possibilities:
> 
> 1. specific interface per event like { onClose(optionalData),
> onCorrupt(optionalData) }
> 2. generic interface like { observe(eventName, optionalData) }

I think a specific interface is desirable for typing control so we can have the IDL and XPConnect protect us from getting JS-implemented methods into a situation where we try and run them on the wrong thread.

(And the consumer would need to explicitly set the callback/listener.)
Flags: needinfo?(bugmail)
No longer blocks: 1091851
How complex would this bug be, is this something on which we could mentor a contributor with some previous experience?
Flags: needinfo?(mak77)
I think we didn't reach consensus about what to do here, we agreed might want some sort of interface to observe corruption, but there's no clear design on how to do that yet.
Flags: needinfo?(mak77)
Thanks, so if someone experienced enough took the bug now, they wouldn't need a lot of mentoring, so definitely not a good mentoring candidate.
Blocks: 1240238
So, I guess we could have something like addConnectionObserver accepting an mozIConnectionObserver that at the beginning will just have an onCorruption method. it will be invoked on the same thread where the connection was open, then it's up to the consumer to do the work on the right thread.
The only thing left to figure is reference ownership, we likely want the observers to implement nsIWeakReference, the connection should not be a strong owner. We should throw if the observer does not implement that.

Alternative ideas?
Blocks: 1220723
Priority: -- → P3
Depends on: 1315897
bonus point: along with the corruption notification, in case of attached dbs, it would be really useful to tell which of the dbs is corrupt... It may be hard, but a consumer using attached dbs couldn't distinguish and everyone would end up having to create their own solution.
== telling which DB is corrupt

Briefly grepping the source, it looks like although there is pervasive use of `return SQLITE_CORRUPT_BKPT` (which expands to a global sqlite3_log call without sqlite3* or extra metadata), nothing really checks the error code.  The code just (early) returns to propagate the error out.  The exception is shell.c whose "dump" command retries with "ORDER BY rowid DESC" appended, presumably with the intent to recover as much data as possible.

Although SQLite could be more specific about which thing is corrupted, arguably if we see any signs of corruption, then we should suspect everything might be corrupted and do a more thorough "PRAGMA integrity_check"  and report any/all corrupt databases.  It looks like the integrity check explicitly checks all database files and the single-column error strings can be parsed in some cases, but it'd probably make sense to ask team SQLite for a variant that takes a per-database error limit instead of a total-across-all-databases limit and returns a "PRAGMA database_list"-like result table of {"seq","name","file","error_count"}.  Or something similar that doesn't complicate SQLite too much but also doesn't require us coordinating opening N separate connections and running a separate integrity_check on each.

== interface

Agreed on the strongly typed IDL interfaces with specific callback methods.  I'd suggest we also define the "when" of the callbacks strictly too, with both Async and Sync observers.  For onCorruption:
- SyncObserver: We synchronously invoke onCorruption before returning.  The onCorruption handler has the ability to alter our return value from NS_ERROR_FILE_CORRUPTED to a new NS_ERROR_STORAGE_HANDLED_CORRUPTION so that it doesn't need to use a side-channel if it wants to indicate to the caller that some type of meaningful handling occurred.
- AsyncObserver: We dispatch the special onCorruption runnable to the owning thread prior to dispatching the mozIStorageStatementCallback's handleError and handleCompletion.

The explicit intent in both cases is that the specialized corruption-handling code gets a chance to run before any cascading errors happen.  In particular, async code can use mozIStoragePendingStatement::cancel, and sync code can issue cancel calls against its own job abstractions too.

Do you still want close notifications?  I know bug 1091851 was fixed without this and I made the argument there that closing shouldn't be a surprise, while corruption is.  While I still think that's a good argument about how state machines should work, if we're adding observers, I can see an argument for close notifications being made for unit testing purposes, for fancy debugging, and for loosely coupled functionality like database-backup or vacuuming.  (Some of these might imply a mozStorageService global observer or something.)

== strong/weak references

Agreed that it would be bad if the observer created a cycle that kept the connection alive forever.  A weak reference seems like a good choice.
Blocks: 969285
Blocks: 1410877
Priority: P3 → P2
Depends on: 1644780
Type: defect → enhancement
Summary: Add an interface allowing to observe a connection → Add an interface allowing to observe a connection and/or gather telemetry, especially database corruption
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.