Closed Bug 855413 Opened 11 years ago Closed 8 years ago

Recover from corrupted databases

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect, P2)

defect

Tracking

(firefox20 unaffected, firefox21 wontfix, firefox22- affected, firefox23 affected)

RESOLVED WONTFIX
Tracking Status
firefox20 --- unaffected
firefox21 --- wontfix
firefox22 - affected
firefox23 --- affected

People

(Reporter: gps, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [no-nag] p=8)

Attachments

(1 obsolete file)

One of the many features of FHR that slipped through the cracks was the ability to recover from a corrupted database. Error reporting indicates this currently impacts less than 0.05% of users. Although, you figure this number will grow over time. So, it's important we address this before the total number gets out of hand.

Open question: In case of data loss, do we report in the payload that we had to create a new database? If so, how do we wish to record this?
Bumping up priority because this makes FHR unusable and could result in about:healthreport behaving poorly.
QA Contact: anthony.s.hughes
As a first pass, I propose two payload additions to track corruption:
1. A longitudinal provider that is added on the date when corruption occurred.
2. A top-level field containing the number of times corruption has been detected. Obviously this would have to be stored outside of sqlite.
Do you need all dates when corruption occurred or just the most recent? If it's the latter, that's significantly easier to implement since we can just toss it in the database :)
Component: Metrics and Firefox Health Report → Client: Desktop
Product: Mozilla Services → Firefox Health Report
Whiteboard: [no-nag]
(In reply to Gregory Szorc [:gps] from comment #1)
> Bumping up priority because this makes FHR unusable and could result in
> about:healthreport behaving poorly.

Is this something we are targeting for Fx21 and who would be the right assignee on this ?
This would be on me.
Assignee: nobody → gps
Status: NEW → ASSIGNED
(In reply to Gregory Szorc [:gps] from comment #5)
> This would be on me.

Hi :gps, do we have a patch upcoming here to address within Fx 21 timeframe ?
Sorry, this one slipped through the cracks. While we'd like it for Beta 21, we realize it's probably too late in the cycle. We'll aim for 22 in some form.

FWIW, incident rate on the beta channel is less than 0.01%. While this does represent a complete failure, it's not affecting a large number of users.
Priority: P1 → P2
Blocks: 853234
Depends on: 901755
Attached patch Recover from corrupt databases; (obsolete) — Splinter Review
I added an init callback "onExecuteError" as well as 4 methods to Sqlite.jsm to facilitate with database corruption.
It does not alter any current behaviour.

The only issue I had is with properly testing the .reindex() method as it is not that easy to corrupt a database in such a way that .reindex() will actually perform valid changes and leave the database in a clean state.
My question now is - given that .reindex() is just a dignified wrapper for connection.execute('REINDEX') - to what degree is extensive testing required for this method?

The basic workflow in a client should now be:

> let conn = Sqlite.openConnection({
>   onExecuteError: function (error) {
>     try {
>       conn.attemptCorruptionFixIfError(Array.prototype.slice.call(error.errors)).then(
>         null, () => {
>           // Database is irreversably corrupted and has been backed up and closed. 
>           // Perform any other shutdown opperations for the client
>         }
>       );
>     }
>   }
>   // other options
> });
Attachment #789908 - Flags: review?(mak77)
Assignee: gps → smirea
Comment on attachment 789908 [details] [diff] [review]
Recover from corrupt databases;

sent patch to the wrong bug. cancelling
Attachment #789908 - Attachment is obsolete: true
Attachment #789908 - Flags: review?(mak77)
Benjamin: The existence of this open bug may surprise you. I suspect the underlying problem has gotten severe enough for us to fix and this should be bumped to a P1.

Fortunately, we had the foresight to have clients upload errors and stack traces as part of payloads so we can identify client-side errors and triage them. (I was doing this pretty actively when FHR first launched but I haven't in the past ~9 months and I don't think anyone has since.)

Anyway, we should be able to look for payloads with { "notInitialized": 1 }. If that's present, the FHR client failed to fully initialize (likely due to DB or similar catastrophic failure) *and that client will likely no longer send meaningful data until the problem is fixed*. Furthermore, payloads may have an "errors" top-level property that is an array of strings containing error messages. I'd *really* like to get a periodic report of aggregated error counts so we can actively triage commonly-occurring issues.

If we're not accounting for these clients in an error state in metrics, it's quite possible our daily users as reported by FHR is slowly declining due to unrecoverable client-side issues (such as SQLite database corruption). I estimate we could be under-reporting by millions.
Flags: needinfo?(benjamin)
Assignee: steven.mirea → nobody
Flags: needinfo?(benjamin) → firefox-backlog+
Whiteboard: [no-nag] → [no-nag][p=8]
Whiteboard: [no-nag][p=8] → [no-nag] p=8
Current FHR data:
notInitialized: 0.2%

So this is an issue, but not an immediate priority.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #11)
> Current FHR data:
> notInitialized: 0.2%
> 
> So this is an issue, but not an immediate priority.

I believe that number.

But I was also auditing the code path for upload and believe I found an oversight that will cause us to under-report this value. I will file a new bug to fix that.
Status: ASSIGNED → NEW
Depends on: 1053315
FHR is going away per bug 1209088, wontfix.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: