Closed Bug 1053315 Opened 10 years ago Closed 10 years ago

FHR stops reporting if it fails to initialize

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 34

People

(Reporter: gps, Assigned: gps)

References

Details

(Keywords: regression)

Attachments

(1 file)

In bug 855413 comment 11, bsmedberg says that 0.3% of FHR submissions are setting the "notInitialized" flag. This flag is set when the client fails to initialize. We send a skeleton payload in this scenario so the server has a rough idea of how many clients are in an error state. I believe I've found a bug introduced by bug 854018 that will cause us to not submit payloads in the case of initialization failure. Here's the relevant code for doing the upload: return Task.spawn(function doUpload() { try { // The test for upload locking monkeypatches getJSONPayload. // If the next two lines change, be sure to verify the test is // accurate! this._uploadInProgress = true; let payload = yield this.getJSONPayload(); let histogram = Services.telemetry.getHistogramById(TELEMETRY_PAYLOAD_SIZE_UNCOMPRESSED); histogram.add(payload.length); let lastID = this.lastSubmitID; yield this._state.addRemoteID(id); <----- Bug starts here let hrProvider = this.getProvider("org.mozilla.healthreport"); if (hrProvider) { let event = lastID ? "continuationUploadAttempt" : "firstDocumentUploadAttempt"; hrProvider.recordEvent(event, now); } <----- Bug ends here TelemetryStopwatch.start(TELEMETRY_UPLOAD, this); let result; try { let options = { deleteIDs: this._state.remoteIDs.filter((x) => { return x != id; }), telemetryCompressed: TELEMETRY_PAYLOAD_SIZE_COMPRESSED, }; result = yield client.uploadJSON(this.serverNamespace, id, payload, options); TelemetryStopwatch.finish(TELEMETRY_UPLOAD, this); } catch (ex) { TelemetryStopwatch.cancel(TELEMETRY_UPLOAD, this); if (hrProvider) { hrProvider.recordEvent("uploadClientFailure", now); } throw ex; } yield this._onBagheeraResult(request, false, now, result); } finally { this._uploadInProgress = false; } }.bind(this)); The bug is the code for recording the upload attempt metric assumes that FHR is initialized and working properly. If FHR failed to initialize (such as the SQLite database being corrupt), that code will raise and we'll never upload. So, if FHR fails to initialize, we'll stop submitting payloads for these clients. It will look to metrics like people stopped using Firefox. This has been a bug since Firefox 23. Assuming this bug is legit, the only way for us to know the impact is if a) we fix it b) we compare FHR reporting against other pings, such as the blocklist.
Flags: firefox-backlog+
When I initially filed this, I thought it could be wide-impacting. I'm now having second thoughts. More accurately, I'm not sure what the impact is. The question resides in whether the FHR provider gets registered. Registration requires a SQLite connection. So, if we fail to establish a connection to SQLite at all, we're good: we don't have a bug. However, if we do manage to bind to SQLite and get the provider registered and the INSERT into SQLite throws, we get the bug. So, our surface area isn't as large as I initially thought. However, a lot of SQLite corruption issues aren't detected until we attempt to write to the database or read certain pages (opening the database can work just fine). Early in FHR days (~Firefox 21) when I was looking at stacks submitted from clients, we were seeing these SQLite errors while writing individual FHR events. So, I think we still have sufficient exposure to this bug to warrant fixing and uplift.
If recording FHR data during uploading raised an exception, it could potentially abort the upload. This would appear to Mozilla as clients that suddenly stopped using Firefox. This patch adds explicit exception trapping around event record to ensure this doesn't happen.
Attachment #8472504 - Flags: review?(benjamin)
Assignee: nobody → gps
Status: NEW → ASSIGNED
Hi Gregory, can you provide a point value and if the bug should be marked as [qa+] or [qa-] for verification.
Iteration: --- → 34.2
QA Whiteboard: [qa?]
Flags: needinfo?(gps)
Automated test coverage of this should be fine.
Points: --- → 1
QA Whiteboard: [qa?] → [qa-]
Flags: needinfo?(gps)
Attachment #8472504 - Flags: review?(benjamin) → review+
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 34
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: