Closed Bug 855413 Opened 11 years ago Closed 8 years ago

Recover from corrupted databases

Tracking

(firefox20 unaffected, firefox21 wontfix, firefox22- affected, firefox23 affected)

Status:

RESOLVED WONTFIX

Tracking Flags:

Tracking

Status

firefox20

---

unaffected

firefox21

---

wontfix

firefox22

affected

firefox23

---

affected

People

(Reporter: gps, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [no-nag] p=8)

Attachments

(1 obsolete file)

Recover from corrupt databases; 11 years ago Stefan Mirea [:smirea] 21.08 KB, patch		Details \| Diff \| Splinter Review

Gregory Szorc [:gps]

Reporter

Description

•

11 years ago

One of the many features of FHR that slipped through the cracks was the ability to recover from a corrupted database. Error reporting indicates this currently impacts less than 0.05% of users. Although, you figure this number will grow over time. So, it's important we address this before the total number gets out of hand.

Open question: In case of data loss, do we report in the payload that we had to create a new database? If so, how do we wish to record this?

Gregory Szorc [:gps]

Reporter

Comment 1

•

11 years ago

Bumping up priority because this makes FHR unusable and could result in about:healthreport behaving poorly.

status-firefox20: --- → unaffected

status-firefox21: --- → affected

status-firefox22: --- → affected

status-firefox23: --- → affected

tracking-firefox21: --- → ?

Priority: P2 → P1

u279076

Updated

•

11 years ago

QA Contact: anthony.s.hughes

bhavana bajaj [:bajaj]

Updated

•

11 years ago

tracking-firefox21: ? → +

Mark Reid [:mreid]

Comment 2

•

11 years ago

As a first pass, I propose two payload additions to track corruption:
1. A longitudinal provider that is added on the date when corruption occurred.
2. A top-level field containing the number of times corruption has been detected. Obviously this would have to be stored outside of sqlite.

Gregory Szorc [:gps]

Reporter

Comment 3

•

11 years ago

Do you need all dates when corruption occurred or just the most recent? If it's the latter, that's significantly easier to implement since we can just toss it in the database :)

Gregory Szorc [:gps]

Reporter

Updated

•

11 years ago

Component: Metrics and Firefox Health Report → Client: Desktop

Product: Mozilla Services → Firefox Health Report

Alex Keybl [:akeybl]

Updated

•

11 years ago

Whiteboard: [no-nag]

bhavana bajaj [:bajaj]

Comment 4

•

11 years ago

(In reply to Gregory Szorc [:gps] from comment #1)
> Bumping up priority because this makes FHR unusable and could result in
> about:healthreport behaving poorly.

Is this something we are targeting for Fx21 and who would be the right assignee on this ?

Gregory Szorc [:gps]

Reporter

Comment 5

•

11 years ago

This would be on me.

Assignee: nobody → gps

Status: NEW → ASSIGNED

bhavana bajaj [:bajaj]

Comment 6

•

11 years ago

(In reply to Gregory Szorc [:gps] from comment #5)
> This would be on me.

Hi :gps, do we have a patch upcoming here to address within Fx 21 timeframe ?

Gregory Szorc [:gps]

Reporter

Comment 7

•

11 years ago

Sorry, this one slipped through the cracks. While we'd like it for Beta 21, we realize it's probably too late in the cycle. We'll aim for 22 in some form.

FWIW, incident rate on the beta channel is less than 0.01%. While this does represent a complete failure, it's not affecting a large number of users.

status-firefox21: affected → wontfix

tracking-firefox21: + → ---

tracking-firefox22: --- → ?

Alex Keybl [:akeybl]

Updated

•

11 years ago

tracking-firefox22: ? → +

Mike Connor [:mconnor]

Updated

•

11 years ago

Priority: P1 → P2

Mike Connor [:mconnor]

Updated

•

11 years ago

Blocks: 853234

Alex Keybl [:akeybl]

Updated

•

11 years ago

tracking-firefox22: + → -

Gregory Szorc [:gps]

Reporter

Updated

•

11 years ago

Depends on: 901755

Stefan Mirea [:smirea]

Comment 8

•

11 years ago

Attached patch Recover from corrupt databases; (obsolete) — Details — Splinter Review

I added an init callback "onExecuteError" as well as 4 methods to Sqlite.jsm to facilitate with database corruption.
It does not alter any current behaviour.

The only issue I had is with properly testing the .reindex() method as it is not that easy to corrupt a database in such a way that .reindex() will actually perform valid changes and leave the database in a clean state.
My question now is - given that .reindex() is just a dignified wrapper for connection.execute('REINDEX') - to what degree is extensive testing required for this method?

The basic workflow in a client should now be:

> let conn = Sqlite.openConnection({
>   onExecuteError: function (error) {
>     try {
>       conn.attemptCorruptionFixIfError(Array.prototype.slice.call(error.errors)).then(
>         null, () => {
>           // Database is irreversably corrupted and has been backed up and closed. 
>           // Perform any other shutdown opperations for the client
>         }
>       );
>     }
>   }
>   // other options
> });

Attachment #789908 - Flags: review?(mak77)

Stefan Mirea [:smirea]

Updated

•

11 years ago

Assignee: gps → smirea

Stefan Mirea [:smirea]

Comment 9

•

11 years ago

Comment on attachment 789908 [details] [diff] [review]
Recover from corrupt databases;

sent patch to the wrong bug. cancelling

Attachment #789908 - Attachment is obsolete: true

Attachment #789908 - Flags: review?(mak77)

Gregory Szorc [:gps]

Reporter

Comment 10

•

10 years ago

Benjamin: The existence of this open bug may surprise you. I suspect the underlying problem has gotten severe enough for us to fix and this should be bumped to a P1.

Fortunately, we had the foresight to have clients upload errors and stack traces as part of payloads so we can identify client-side errors and triage them. (I was doing this pretty actively when FHR first launched but I haven't in the past ~9 months and I don't think anyone has since.)

Anyway, we should be able to look for payloads with { "notInitialized": 1 }. If that's present, the FHR client failed to fully initialize (likely due to DB or similar catastrophic failure) *and that client will likely no longer send meaningful data until the problem is fixed*. Furthermore, payloads may have an "errors" top-level property that is an array of strings containing error messages. I'd *really* like to get a periodic report of aggregated error counts so we can actively triage commonly-occurring issues.

If we're not accounting for these clients in an error state in metrics, it's quite possible our daily users as reported by FHR is slowly declining due to unrecoverable client-side issues (such as SQLite database corruption). I estimate we could be under-reporting by millions.

Flags: needinfo?(benjamin)

Benjamin Smedberg

Updated

•

10 years ago

Assignee: steven.mirea → nobody

Flags: needinfo?(benjamin) → firefox-backlog+

Whiteboard: [no-nag] → [no-nag][p=8]

Marco Mucci [:MarcoM]

Updated

•

10 years ago

Whiteboard: [no-nag][p=8] → [no-nag] p=8

Benjamin Smedberg

Comment 11

•

10 years ago

Current FHR data:
notInitialized: 0.2%

So this is an issue, but not an immediate priority.

Gregory Szorc [:gps]

Reporter

Comment 12

•

10 years ago

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #11)
> Current FHR data:
> notInitialized: 0.2%
> 
> So this is an issue, but not an immediate priority.

I believe that number.

But I was also auditing the code path for upload and believe I found an oversight that will cause us to under-report this value. I will file a new bug to fix that.

Status: ASSIGNED → NEW

Gregory Szorc [:gps]

Reporter

Updated

•

10 years ago

Depends on: 1053315

Alessio Placitelli [:Dexter]

Comment 13

•

8 years ago

FHR is going away per bug 1209088, wontfix.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → WONTFIX

BMO Automation

Updated

•

6 years ago

Product: Firefox Health Report → Firefox Health Report Graveyard

You need to log in before you can comment on or make changes to this bug.