Closed Bug 846133 Opened 12 years ago Closed 12 years ago

Store FHR state in files, not prefs

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect, P2)

Product:

Component:

Type:

defect

Priority:

P2

Severity:

normal

Tracking

(firefox22+ verified)

Status:

RESOLVED FIXED

Milestone:

Firefox 23

Tracking Flags:

Tracking

Status

firefox22

+

verified

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(1 file, 2 obsolete files)

Store state in file, v1 12 years ago Gregory Szorc [:gps] 19.91 KB, patch	rnewman : feedback+	Details \| Diff \| Splinter Review
Refactor state, v2 12 years ago Gregory Szorc [:gps] 36.33 KB, patch	rnewman : review+	Details \| Diff \| Splinter Review
Refactor state, v3 12 years ago Gregory Szorc [:gps] 37.46 KB, patch	rnewman : review+ akeybl : approval-mozilla-beta+	Details \| Diff \| Splinter Review

Gregory Szorc [:gps]

Assignee

Description

•

12 years ago

Currently the bulk of FHR's state is stored in preferences. Since we don't force a pref flush, we are susceptible to data loss (as explained in bug 842824 comment #7). I believe the preference with the most to lose here is the last submitted document ID. Consider this scenario: 1) Client uploads document with ID A. 2) Client uploads document with ID B and requests A be deleted. B is recorded as last document in prefs. 3) Application crash or similar causes pref change to not be persisted. 4) Client restarted. Last document ID is A (not B). 5) Client uploads document C, asks to delete A. Document B is orphaned on the server forever. To some degree, the documents on the server can be de-duplicated. However, this isn't an exact science. I'm pretty sure the Metrics folk would prefer it if the scope of the problem were reduced as much as possible. I think a simple solution here is to store the document ID (and possibly other state) in the database. The database is already there: why not? Alternatively, we could write out a JSON file in the profile directory. We could also maintain a history of document IDs and periodically issue server deletes. But this feels suboptimal. I'm trying to think if any other prefs really need durable storage. Sure, we could store last submit time and next scheduled time in the DB. But if these updates are lost and the client submits a new payload earlier than expected, so what? Chalk this bug up as another tick on the "we need better preferences" tally.

Richard Newman [:rnewman]

Comment 1

•

12 years ago

Store each uploaded document on disk, with its ID as its name. When a deletion succeeds, delete the document on disk. If you wish, you can truncate the non-latest file to avoid taking up space, and to use as a sentinel. If the app dies or deletion fails, no problem: the file (and thus ID) is still on disk. In most cases only one document would persist, but this scheme allows for this flavor of recoverability without any additional writes.

Gregory Szorc [:gps]

Assignee

Comment 2

•

12 years ago

(In reply to Richard Newman [:rnewman] from comment #1) > Store each uploaded document on disk, with its ID as its name. > > When a deletion succeeds, delete the document on disk. > > If you wish, you can truncate the non-latest file to avoid taking up space, > and to use as a sentinel. > > If the app dies or deletion fails, no problem: the file (and thus ID) is > still on disk. > > In most cases only one document would persist, but this scheme allows for > this flavor of recoverability without any additional writes. Interesting idea. A variation would be to record the payload in a constant-named file and create empty files with the non-deleted document IDs. In both scenarios, how would you record the last document ID? File time? Size > 0?

Gregory Szorc [:gps]

Assignee

Comment 3

•

12 years ago

We also need to consider that the scenario described in the initial comment is a subset of the general one where the server receives the document but the client doesn't get the response. This is because the client doesn't update the last document ID until a server response is received. If we want robustness, we need to record 2 writes (possibly and definitely uploaded).

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 4

•

12 years ago

This sounds like a worthwhile enhancement. One further thing to consider that we could look at on the server-side is to make sure we support sending more than one Obsolete-Document header in a submission. That way you could re-try deletes without adding more calls to the server.

Richard Newman [:rnewman]

Updated

•

12 years ago

Priority: -- → P4

Gregory Szorc [:gps]

Assignee

Comment 5

•

12 years ago

This is item #2 from the FHR perf review: 2. Flushing prefs FHR currently uses browser prefs to store data such as lastPingTime, lastSubmitID, last submission success, etc. It will need to explicitly flush the prefs store to disk for this information to survive a crash. Currently, prefs can only be flushed on the main thread, and this is likely to be a source of jank as typical pref files are sizeable.

Blocks: 828152

(dormant account)

Comment 6

•

12 years ago

Switching to writing to a separate file should increase robustness and reduce overall IO. I think this should be higher priority

Richard Newman [:rnewman]

Comment 7

•

12 years ago

(In reply to Taras Glek (:taras) from comment #6) > Switching to writing to a separate file should increase robustness and > reduce overall IO. I think this should be higher priority P4 in our current triage scheme means "less important than an initial Android release of FHR", which is P3. P1 means beta blocker, P2 more important than Android. We're not parking this bug to rust under a tree -- indeed, it'll be partly solved by the Android work -- but addressing it would have less impact than work elsewhere, so it's not at the top of the list. Is there a compelling business reason that I've missed to tackle this over the other bugs to which we've accorded a higher priority?

Gregory Szorc [:gps]

Assignee

Updated

•

12 years ago

Component: Metrics and Firefox Health Report → Client: Desktop

Product: Mozilla Services → Firefox Health Report

Gregory Szorc [:gps]

Assignee

Comment 8

•

12 years ago

Attached patch Store state in file, v1 (obsolete) — Details — Splinter Review

Enough with preferences. This is my first stab at saving state in a file. It needs more tests before r?.

Assignee: nobody → gps

Status: NEW → ASSIGNED

Attachment #742090 - Flags: feedback?(rnewman)

Richard Newman [:rnewman]

Comment 9

•

12 years ago

Comment on attachment 742090 [details] [diff] [review] Store state in file, v1 Review of attachment 742090 [details] [diff] [review]: ----------------------------------------------------------------- Generally thumbs up, but seems like you might want to put all of the state/file handling stuff in its own proto object that knows how to pickle and unpickle. ::: services/common/bagheeraclient.js @@ +189,5 @@ > > let result = new BagheeraClientRequestResult(); > result.namespace = namespace; > result.id = id; > + result.deleteID = deleteID; Why not save the earlier work and just do if (options.deleteID) { result.deleteID = options.deleteID; } here? ::: services/healthreport/healthreporter.jsm @@ +917,5 @@ > + if (!ids || !ids.length) { > + return null; > + } > + > + return ids[ids.length - 1]; Use ids[0] and .unshift instead of .push later. @@ +1053,5 @@ > + // If the state file doesn't exist, we should attempt to migrate from > + // preferences. > + let lastID = this._prefs.get("lastSubmitID", null) || null; > + let lastPingDate = CommonUtils.getDatePref(this._prefs, "lastPingTime", > + 0, this._log, OLDEST_ALLOWED_YEAR); Split all the prefs stuff into a separate migration method. Easier to test, no? @@ +1218,5 @@ > > let histogram = Services.telemetry.getHistogramById(TELEMETRY_PAYLOAD_SIZE_UNCOMPRESSED); > histogram.add(payload.length); > > + let lastID = this._remoteState.remoteIDs[this._remoteState.remoteIDs.length - 1]; Similarly, 0.

Attachment #742090 - Flags: feedback?(rnewman) → feedback+

Gregory Szorc [:gps]

Assignee

Comment 10

•

12 years ago

Attached patch Refactor state, v2 (obsolete) — Details — Splinter Review

I'm pretty happy with the test coverage. Handling of multiple IDs still isn't as robust as I would like. But, at least we now handle multiple IDs! I think I'm content with keeping multiple ID handling a non-robust until server can deliver support for multiple document deletion.

Attachment #742090 - Attachment is obsolete: true

Attachment #747715 - Flags: review?(rnewman)

Richard Newman [:rnewman]

Comment 11

•

12 years ago

Comment on attachment 747715 [details] [diff] [review] Refactor state, v2 Review of attachment 747715 [details] [diff] [review]: ----------------------------------------------------------------- ::: services/healthreport/healthreporter.jsm @@ +127,5 @@ > + } > + > + // If the state file does not exist, we attempt to migrate from > + // preferences (which is where we stored state until we introduced the > + // file. Closing paren. @@ +132,5 @@ > + this._log.warn("Saved state file does not exist."); > + > + let prefs = this._reporter._prefs; > + > + let lastID = prefs.get("lastSubmitID", null) || null; Presumably your || null is to catch zero or empty values? If so, please add a comment to that effect. @@ +152,5 @@ > + this._s.lastPingTime = lastPingDate.getTime(); > + } > + > + yield this.save(); > + prefs.reset(["lastSubmitID", "lastPingTime"]); My concern here is how we handle users who upgrade and downgrade the same profile. It happens. How might we detect this? What if we have a state file but we still have the prefs? Can we do anything? @@ +190,5 @@ > + }.bind(this)); > + }, > + > + save: function () { > + this._log.info("Writing state file."); + this._filename ::: services/healthreport/tests/xpcshell/test_healthreporter.js @@ +623,5 @@ > + let id = reporter.lastSubmitID; > + > + reporter._shutdown(); > + > + //Ensure reloading state works. Space after //.

Attachment #747715 - Flags: review?(rnewman) → review+

Gregory Szorc [:gps]

Assignee

Comment 12

•

12 years ago

Attached patch Refactor state, v3 — Details — Splinter Review

It changed enough that I'm requesting another review.

Attachment #747715 - Attachment is obsolete: true

Attachment #748067 - Flags: review?(rnewman)

Gregory Szorc [:gps]

Assignee

Comment 13

•

12 years ago

This is more important than a P4.

Priority: P4 → P2

Richard Newman [:rnewman]

Updated

•

12 years ago

Attachment #748067 - Flags: review?(rnewman) → review+

Gregory Szorc [:gps]

Assignee

Comment 14

•

12 years ago

https://hg.mozilla.org/services/services-central/rev/9d7444c9f21b

Whiteboard: [fixed in services]

Gregory Szorc [:gps]

Assignee

Comment 15

•

12 years ago

There is a strong possibility we'd like this uplifted to 22. We won't have enough time for this to bake on Nightly/23 before 22 becomes Beta. So, I'm thinking maybe 22b2 or 22b3 should be the target. I'll request formal approval later.

tracking-firefox22: --- → ?

Gregory Szorc [:gps]

Assignee

Updated

•

12 years ago

Summary: Consider more robust storage of FHR document ID and other state → Store FHR state in files, not prefs

Alex Keybl [:akeybl]

Comment 16

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #15) > There is a strong possibility we'd like this uplifted to 22. We won't have > enough time for this to bake on Nightly/23 before 22 becomes Beta. So, I'm > thinking maybe 22b2 or 22b3 should be the target. I'll request formal > approval later. Note that if this format change lands while we're on Beta and causes a late regression, we would not necessarily slip our release date to fix FHR. Is this bug still important to you?

Gregory Szorc [:gps]

Assignee

Comment 17

•

12 years ago

(In reply to Alex Keybl [:akeybl] from comment #16) > (In reply to Gregory Szorc [:gps] from comment #15) > > There is a strong possibility we'd like this uplifted to 22. We won't have > > enough time for this to bake on Nightly/23 before 22 becomes Beta. So, I'm > > thinking maybe 22b2 or 22b3 should be the target. I'll request formal > > approval later. > > Note that if this format change lands while we're on Beta and causes a late > regression, we would not necessarily slip our release date to fix FHR. Is > this bug still important to you? That's why I'd like to get it in b2 or b3. That should give us a week or two to verify the fix and back out if needed. That's doable, right?

Gregory Szorc [:gps]

Assignee

Comment 18

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/9d7444c9f21b

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Whiteboard: [fixed in services]

Target Milestone: --- → Firefox 23

Alex Keybl [:akeybl]

Comment 19

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #17) > > Note that if this format change lands while we're on Beta and causes a late > > regression, we would not necessarily slip our release date to fix FHR. Is > > this bug still important to you? > > That's why I'd like to get it in b2 or b3. That should give us a week or two > to verify the fix and back out if needed. That's doable, right? It's theoretically doable, but we need to be explicitly saying that this is worth the risk of a critical metrics gathering regression in FHR that we would not respin the release for. And we'd also be committing to the fact that 3 weeks is enough to know whether or not this change caused a client-side regression. This patch doesn't meet our normal bar, so it really needs to be Priority 0. And this would have to land for b2 (going to build Tuesday).

Alex Keybl [:akeybl]

Updated

•

12 years ago

Flags: needinfo?(gps)

Gregory Szorc [:gps]

Assignee

Comment 20

•

12 years ago

Brendan: Do you have any data?

Flags: needinfo?(gps) → needinfo?(bcolloran)

Alex Keybl [:akeybl]

Comment 21

•

12 years ago

This will miss Firefox 22 beta as of this afternoon.

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 22

•

12 years ago

Alex, If we request that it be pushed based on our analysis of the data in Nightly/Aurora, but there does turn out to be a detectable problem with it in the beta channel within the next three weeks, would it still be possible to back the change out and go with the original behavior for 22 release? I'm unclear whether we can still revert the change during the last three weeks of beta or if it has to stay in good or bad all the way through to release.

Alex Keybl [:akeybl]

Comment 23

•

12 years ago

Comment on attachment 748067 [details] [diff] [review] Refactor state, v3 Spoke with Daniel. We agreed that we'd get this in for Beta 2 (this needs to land today) given the current risk profile and benefit, and back out before Beta 6 if we run into regressions. If we find non-user impacting issues post-release, we would not spin a 22.0.1 specifically for this change.

Attachment #748067 - Flags: approval-mozilla-beta+

Alex Keybl [:akeybl]

Updated

•

12 years ago

tracking-firefox22: ? → +

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 24

•

12 years ago

Based on rough preliminary analysis of Nightly data, and gps's input on the patch, I'd like to request approval to land on beta. 2 weeks is a good amount of time for us to study the beta data carefully to look for any regression and request a backout. We won't request any modification of the patch in that case, just a backout. If it turns out there is an undetected regression that makes it to release, we'll deal with it without expecting a respin.

Gregory Szorc [:gps]

Assignee

Comment 25

•

12 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/9cebaf242de3

status-firefox22: --- → fixed

Simona Badau, Desktop QA

Comment 26

•

12 years ago

Is anything Qa can verify here? I see that this is covered by automation. Please let me know if manual testing is also needed around this (please also offer some guidance).

Gregory Szorc [:gps]

Assignee

Comment 27

•

12 years ago

I'd appreciate a basic spot check that FHR continues to submit data both with a fresh profile and after upgrading a previously-submitted Fx 21 profile to 22b2+. Although, we should be able to verify both easily enough once 22b2 is released to the wild via Hadoop jobs.

Simona Badau, Desktop QA

Comment 28

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #27) > I'd appreciate a basic spot check that FHR continues to submit data both > with a fresh profile and after upgrading a previously-submitted Fx 21 > profile to 22b2+. Is there a way to see the the files where the FHR state are stored? Where can I find them?

Richard Newman [:rnewman]

Comment 29

•

12 years ago

(In reply to Simona B [QA] from comment #28) > Is there a way to see the the files where the FHR state are stored? Where > can I find them? $PROFILE/healthreport/state.json. It should look something like {"v":1,"remoteIDs":["f7c63a13-ae66-9f47-aaaa-ddddd580b5bf"],"lastPingTime":1369515206887}

Simona Badau, Desktop QA

Comment 30

•

12 years ago

Considering the time the patch was submitted, I'll verify the fix on Firefox 22 beta 3 (I can't see the state.json file in Firefox beta 2).

Simona Badau, Desktop QA

Comment 31

•

12 years ago

Mozilla/5.0 (Windows NT 6.0; rv:22.0) Gecko/20100101 Firefox/22.0 Build ID: 20130528181031 Verified as fixed on Firefox 22 beta 3: - FHR continues to submit data (verified as suggested in Comment 27 - FHR states are stored in files (used the indications from Comment 29).

status-firefox22: fixed → verified

Updated

•

12 years ago

Flags: needinfo?(bcolloran)

Gregory Szorc [:gps]

Assignee

Updated

•

12 years ago

Blocks: 883277

Matthew N. [:MattN]

Updated

•

10 years ago

Depends on: 1123137

Updated

•

7 years ago

Product: Firefox Health Report → Firefox Health Report Graveyard

You need to log in before you can comment on or make changes to this bug.