Closed
Bug 855413
Opened 12 years ago
Closed 9 years ago
Recover from corrupted databases
Categories
(Firefox Health Report Graveyard :: Client: Desktop, defect, P2)
Firefox Health Report Graveyard
Client: Desktop
Tracking
(firefox20 unaffected, firefox21 wontfix, firefox22- affected, firefox23 affected)
RESOLVED
WONTFIX
Tracking | Status | |
---|---|---|
firefox20 | --- | unaffected |
firefox21 | --- | wontfix |
firefox22 | - | affected |
firefox23 | --- | affected |
People
(Reporter: gps, Unassigned)
References
(Depends on 1 open bug)
Details
(Whiteboard: [no-nag] p=8)
Attachments
(1 obsolete file)
One of the many features of FHR that slipped through the cracks was the ability to recover from a corrupted database. Error reporting indicates this currently impacts less than 0.05% of users. Although, you figure this number will grow over time. So, it's important we address this before the total number gets out of hand.
Open question: In case of data loss, do we report in the payload that we had to create a new database? If so, how do we wish to record this?
Reporter | ||
Comment 1•12 years ago
|
||
Bumping up priority because this makes FHR unusable and could result in about:healthreport behaving poorly.
status-firefox20:
--- → unaffected
status-firefox21:
--- → affected
status-firefox22:
--- → affected
status-firefox23:
--- → affected
tracking-firefox21:
--- → ?
Priority: P2 → P1
Updated•12 years ago
|
Comment 2•12 years ago
|
||
As a first pass, I propose two payload additions to track corruption:
1. A longitudinal provider that is added on the date when corruption occurred.
2. A top-level field containing the number of times corruption has been detected. Obviously this would have to be stored outside of sqlite.
Reporter | ||
Comment 3•12 years ago
|
||
Do you need all dates when corruption occurred or just the most recent? If it's the latter, that's significantly easier to implement since we can just toss it in the database :)
Reporter | ||
Updated•12 years ago
|
Component: Metrics and Firefox Health Report → Client: Desktop
Product: Mozilla Services → Firefox Health Report
Updated•12 years ago
|
Whiteboard: [no-nag]
Comment 4•12 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #1)
> Bumping up priority because this makes FHR unusable and could result in
> about:healthreport behaving poorly.
Is this something we are targeting for Fx21 and who would be the right assignee on this ?
Comment 6•12 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #5)
> This would be on me.
Hi :gps, do we have a patch upcoming here to address within Fx 21 timeframe ?
Reporter | ||
Comment 7•12 years ago
|
||
Sorry, this one slipped through the cracks. While we'd like it for Beta 21, we realize it's probably too late in the cycle. We'll aim for 22 in some form.
FWIW, incident rate on the beta channel is less than 0.01%. While this does represent a complete failure, it's not affecting a large number of users.
Updated•12 years ago
|
Updated•12 years ago
|
Priority: P1 → P2
Updated•12 years ago
|
Comment 8•12 years ago
|
||
I added an init callback "onExecuteError" as well as 4 methods to Sqlite.jsm to facilitate with database corruption.
It does not alter any current behaviour.
The only issue I had is with properly testing the .reindex() method as it is not that easy to corrupt a database in such a way that .reindex() will actually perform valid changes and leave the database in a clean state.
My question now is - given that .reindex() is just a dignified wrapper for connection.execute('REINDEX') - to what degree is extensive testing required for this method?
The basic workflow in a client should now be:
> let conn = Sqlite.openConnection({
> onExecuteError: function (error) {
> try {
> conn.attemptCorruptionFixIfError(Array.prototype.slice.call(error.errors)).then(
> null, () => {
> // Database is irreversably corrupted and has been backed up and closed.
> // Perform any other shutdown opperations for the client
> }
> );
> }
> }
> // other options
> });
Attachment #789908 -
Flags: review?(mak77)
Updated•12 years ago
|
Assignee: gps → smirea
Comment 9•12 years ago
|
||
Comment on attachment 789908 [details] [diff] [review]
Recover from corrupt databases;
sent patch to the wrong bug. cancelling
Attachment #789908 -
Attachment is obsolete: true
Attachment #789908 -
Flags: review?(mak77)
Reporter | ||
Comment 10•11 years ago
|
||
Benjamin: The existence of this open bug may surprise you. I suspect the underlying problem has gotten severe enough for us to fix and this should be bumped to a P1.
Fortunately, we had the foresight to have clients upload errors and stack traces as part of payloads so we can identify client-side errors and triage them. (I was doing this pretty actively when FHR first launched but I haven't in the past ~9 months and I don't think anyone has since.)
Anyway, we should be able to look for payloads with { "notInitialized": 1 }. If that's present, the FHR client failed to fully initialize (likely due to DB or similar catastrophic failure) *and that client will likely no longer send meaningful data until the problem is fixed*. Furthermore, payloads may have an "errors" top-level property that is an array of strings containing error messages. I'd *really* like to get a periodic report of aggregated error counts so we can actively triage commonly-occurring issues.
If we're not accounting for these clients in an error state in metrics, it's quite possible our daily users as reported by FHR is slowly declining due to unrecoverable client-side issues (such as SQLite database corruption). I estimate we could be under-reporting by millions.
Flags: needinfo?(benjamin)
Updated•11 years ago
|
Assignee: steven.mirea → nobody
Flags: needinfo?(benjamin) → firefox-backlog+
Whiteboard: [no-nag] → [no-nag][p=8]
Updated•11 years ago
|
Whiteboard: [no-nag][p=8] → [no-nag] p=8
Comment 11•11 years ago
|
||
Current FHR data:
notInitialized: 0.2%
So this is an issue, but not an immediate priority.
Reporter | ||
Comment 12•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #11)
> Current FHR data:
> notInitialized: 0.2%
>
> So this is an issue, but not an immediate priority.
I believe that number.
But I was also auditing the code path for upload and believe I found an oversight that will cause us to under-report this value. I will file a new bug to fix that.
Status: ASSIGNED → NEW
Comment 13•9 years ago
|
||
FHR is going away per bug 1209088, wontfix.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Updated•7 years ago
|
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•