If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Store FHR state in files, not prefs

RESOLVED FIXED in Firefox 22

Status

Firefox Health Report
Client: Desktop
P2
normal
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: gps, Assigned: gps)

Tracking

unspecified
Firefox 23
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox22+ verified)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

5 years ago
Currently the bulk of FHR's state is stored in preferences. Since we don't force a pref flush, we are susceptible to data loss (as explained in bug 842824 comment #7).

I believe the preference with the most to lose here is the last submitted document ID. Consider this scenario:

1) Client uploads document with ID A.
2) Client uploads document with ID B and requests A be deleted. B is recorded as last document in prefs.
3) Application crash or similar causes pref change to not be persisted.
4) Client restarted. Last document ID is A (not B).
5) Client uploads document C, asks to delete A. Document B is orphaned on the server forever.

To some degree, the documents on the server can be de-duplicated. However, this isn't an exact science. I'm pretty sure the Metrics folk would prefer it if the scope of the problem were reduced as much as possible.

I think a simple solution here is to store the document ID (and possibly other state) in the database. The database is already there: why not? Alternatively, we could write out a JSON file in the profile directory.

We could also maintain a history of document IDs and periodically issue server deletes. But this feels suboptimal.

I'm trying to think if any other prefs really need durable storage. Sure, we could store last submit time and next scheduled time in the DB. But if these updates are lost and the client submits a new payload earlier than expected, so what?

Chalk this bug up as another tick on the "we need better preferences" tally.
Store each uploaded document on disk, with its ID as its name.

When a deletion succeeds, delete the document on disk.

If you wish, you can truncate the non-latest file to avoid taking up space, and to use as a sentinel.

If the app dies or deletion fails, no problem: the file (and thus ID) is still on disk.

In most cases only one document would persist, but this scheme allows for this flavor of recoverability without any additional writes.
(Assignee)

Comment 2

5 years ago
(In reply to Richard Newman [:rnewman] from comment #1)
> Store each uploaded document on disk, with its ID as its name.
> 
> When a deletion succeeds, delete the document on disk.
> 
> If you wish, you can truncate the non-latest file to avoid taking up space,
> and to use as a sentinel.
> 
> If the app dies or deletion fails, no problem: the file (and thus ID) is
> still on disk.
> 
> In most cases only one document would persist, but this scheme allows for
> this flavor of recoverability without any additional writes.

Interesting idea. A variation would be to record the payload in a constant-named file and create empty files with the non-deleted document IDs.

In both scenarios, how would you record the last document ID? File time? Size > 0?
(Assignee)

Comment 3

5 years ago
We also need to consider that the scenario described in the initial comment is a subset of the general one where the server receives the document but the client doesn't get the response. This is because the client doesn't update the last document ID until a server response is received. If we want robustness, we need to record 2 writes (possibly and definitely uploaded).
This sounds like a worthwhile enhancement.  One further thing to consider that we could look at on the server-side is to make sure we support sending more than one Obsolete-Document header in a submission.  That way you could re-try deletes without adding more calls to the server.
Priority: -- → P4
(Assignee)

Comment 5

5 years ago
This is item #2 from the FHR perf review:

2. Flushing prefs FHR currently uses browser prefs to store data such as lastPingTime, lastSubmitID, last submission success, etc. It will need to explicitly flush the prefs store to disk for this information to survive a crash. Currently, prefs can only be flushed on the main thread, and this is likely to be a source of jank as typical pref files are sizeable.
Blocks: 828152

Comment 6

5 years ago
Switching to writing to a separate file should increase robustness and reduce overall IO. I think this should be higher priority
(In reply to Taras Glek (:taras) from comment #6)
> Switching to writing to a separate file should increase robustness and
> reduce overall IO. I think this should be higher priority

P4 in our current triage scheme means "less important than an initial Android release of FHR", which is P3. P1 means beta blocker, P2 more important than Android.

We're not parking this bug to rust under a tree -- indeed, it'll be partly solved by the Android work -- but addressing it would have less impact than work elsewhere, so it's not at the top of the list.

Is there a compelling business reason that I've missed to tackle this over the other bugs to which we've accorded a higher priority?
(Assignee)

Updated

5 years ago
Component: Metrics and Firefox Health Report → Client: Desktop
Product: Mozilla Services → Firefox Health Report
(Assignee)

Comment 8

5 years ago
Created attachment 742090 [details] [diff] [review]
Store state in file, v1

Enough with preferences. This is my first stab at saving state in a file. It needs more tests before r?.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Attachment #742090 - Flags: feedback?(rnewman)
Comment on attachment 742090 [details] [diff] [review]
Store state in file, v1

Review of attachment 742090 [details] [diff] [review]:
-----------------------------------------------------------------

Generally thumbs up, but seems like you might want to put all of the state/file handling stuff in its own proto object that knows how to pickle and unpickle.

::: services/common/bagheeraclient.js
@@ +189,5 @@
>  
>      let result = new BagheeraClientRequestResult();
>      result.namespace = namespace;
>      result.id = id;
> +    result.deleteID = deleteID;

Why not save the earlier work and just do

  if (options.deleteID) {
    result.deleteID = options.deleteID;
  }

here?

::: services/healthreport/healthreporter.jsm
@@ +917,5 @@
> +    if (!ids || !ids.length) {
> +      return null;
> +    }
> +
> +    return ids[ids.length - 1];

Use

  ids[0]

and .unshift instead of .push later.

@@ +1053,5 @@
> +        // If the state file doesn't exist, we should attempt to migrate from
> +        // preferences.
> +        let lastID = this._prefs.get("lastSubmitID", null) || null;
> +        let lastPingDate = CommonUtils.getDatePref(this._prefs, "lastPingTime",
> +                                                     0, this._log, OLDEST_ALLOWED_YEAR);

Split all the prefs stuff into a separate migration method. Easier to test, no?

@@ +1218,5 @@
>  
>        let histogram = Services.telemetry.getHistogramById(TELEMETRY_PAYLOAD_SIZE_UNCOMPRESSED);
>        histogram.add(payload.length);
>  
> +      let lastID = this._remoteState.remoteIDs[this._remoteState.remoteIDs.length - 1];

Similarly, 0.
Attachment #742090 - Flags: feedback?(rnewman) → feedback+
(Assignee)

Comment 10

4 years ago
Created attachment 747715 [details] [diff] [review]
Refactor state, v2

I'm pretty happy with the test coverage.

Handling of multiple IDs still isn't as robust as I would like. But, at least we now handle multiple IDs! I think I'm content with keeping multiple ID handling a non-robust until server can deliver support for multiple document deletion.
Attachment #742090 - Attachment is obsolete: true
Attachment #747715 - Flags: review?(rnewman)
Comment on attachment 747715 [details] [diff] [review]
Refactor state, v2

Review of attachment 747715 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/healthreport/healthreporter.jsm
@@ +127,5 @@
> +        }
> +
> +        // If the state file does not exist, we attempt to migrate from
> +        // preferences (which is where we stored state until we introduced the
> +        // file.

Closing paren.

@@ +132,5 @@
> +        this._log.warn("Saved state file does not exist.");
> +
> +        let prefs = this._reporter._prefs;
> +
> +        let lastID = prefs.get("lastSubmitID", null) || null;

Presumably your || null is to catch zero or empty values?

If so, please add a comment to that effect.

@@ +152,5 @@
> +            this._s.lastPingTime = lastPingDate.getTime();
> +          }
> +
> +          yield this.save();
> +          prefs.reset(["lastSubmitID", "lastPingTime"]);

My concern here is how we handle users who upgrade and downgrade the same profile. It happens. How might we detect this? What if we have a state file but we still have the prefs? Can we do anything?

@@ +190,5 @@
> +    }.bind(this));
> +  },
> +
> +  save: function () {
> +    this._log.info("Writing state file.");

+ this._filename

::: services/healthreport/tests/xpcshell/test_healthreporter.js
@@ +623,5 @@
> +    let id = reporter.lastSubmitID;
> +
> +    reporter._shutdown();
> +
> +    //Ensure reloading state works.

Space after //.
Attachment #747715 - Flags: review?(rnewman) → review+
(Assignee)

Comment 12

4 years ago
Created attachment 748067 [details] [diff] [review]
Refactor state, v3

It changed enough that I'm requesting another review.
Attachment #747715 - Attachment is obsolete: true
Attachment #748067 - Flags: review?(rnewman)
(Assignee)

Comment 13

4 years ago
This is more important than a P4.
Priority: P4 → P2
Attachment #748067 - Flags: review?(rnewman) → review+
(Assignee)

Comment 14

4 years ago
https://hg.mozilla.org/services/services-central/rev/9d7444c9f21b
Whiteboard: [fixed in services]
(Assignee)

Comment 15

4 years ago
There is a strong possibility we'd like this uplifted to 22. We won't have enough time for this to bake on Nightly/23 before 22 becomes Beta. So, I'm thinking maybe 22b2 or 22b3 should be the target. I'll request formal approval later.
tracking-firefox22: --- → ?
(Assignee)

Updated

4 years ago
Summary: Consider more robust storage of FHR document ID and other state → Store FHR state in files, not prefs
(In reply to Gregory Szorc [:gps] from comment #15)
> There is a strong possibility we'd like this uplifted to 22. We won't have
> enough time for this to bake on Nightly/23 before 22 becomes Beta. So, I'm
> thinking maybe 22b2 or 22b3 should be the target. I'll request formal
> approval later.

Note that if this format change lands while we're on Beta and causes a late regression, we would not necessarily slip our release date to fix FHR. Is this bug still important to you?
(Assignee)

Comment 17

4 years ago
(In reply to Alex Keybl [:akeybl] from comment #16)
> (In reply to Gregory Szorc [:gps] from comment #15)
> > There is a strong possibility we'd like this uplifted to 22. We won't have
> > enough time for this to bake on Nightly/23 before 22 becomes Beta. So, I'm
> > thinking maybe 22b2 or 22b3 should be the target. I'll request formal
> > approval later.
> 
> Note that if this format change lands while we're on Beta and causes a late
> regression, we would not necessarily slip our release date to fix FHR. Is
> this bug still important to you?

That's why I'd like to get it in b2 or b3. That should give us a week or two to verify the fix and back out if needed. That's doable, right?
(Assignee)

Comment 18

4 years ago
https://hg.mozilla.org/mozilla-central/rev/9d7444c9f21b
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Whiteboard: [fixed in services]
Target Milestone: --- → Firefox 23
(In reply to Gregory Szorc [:gps] from comment #17)
> > Note that if this format change lands while we're on Beta and causes a late
> > regression, we would not necessarily slip our release date to fix FHR. Is
> > this bug still important to you?
> 
> That's why I'd like to get it in b2 or b3. That should give us a week or two
> to verify the fix and back out if needed. That's doable, right?

It's theoretically doable, but we need to be explicitly saying that this is worth the risk of a critical metrics gathering regression in FHR that we would not respin the release for. And we'd also be committing to the fact that 3 weeks is enough to know whether or not this change caused a client-side regression.

This patch doesn't meet our normal bar, so it really needs to be Priority 0. And this would have to land for b2 (going to build Tuesday).

Updated

4 years ago
Flags: needinfo?(gps)
(Assignee)

Comment 20

4 years ago
Brendan: Do you have any data?
Flags: needinfo?(gps) → needinfo?(bcolloran)
This will miss Firefox 22 beta as of this afternoon.
Alex,

If we request that it be pushed based on our analysis of the data in Nightly/Aurora, but there does turn out to be a detectable problem with it in the beta channel within the next three weeks, would it still be possible to back the change out and go with the original behavior for 22 release?

I'm unclear whether we can still revert the change during the last three weeks of beta or if it has to stay in good or bad all the way through to release.
Comment on attachment 748067 [details] [diff] [review]
Refactor state, v3

Spoke with Daniel. We agreed that we'd get this in for Beta 2 (this needs to land today) given the current risk profile and benefit, and back out before Beta 6 if we run into regressions.

If we find non-user impacting issues post-release, we would not spin a 22.0.1 specifically for this change.
Attachment #748067 - Flags: approval-mozilla-beta+

Updated

4 years ago
tracking-firefox22: ? → +
Based on rough preliminary analysis of Nightly data, and gps's input on the patch, I'd like to request approval to land on beta.  2 weeks is a good amount of time for us to study the beta data carefully to look for any regression and request a backout.  We won't request any modification of the patch in that case, just a backout.

If it turns out there is an undetected regression that makes it to release, we'll deal with it without expecting a respin.
(Assignee)

Comment 25

4 years ago
https://hg.mozilla.org/releases/mozilla-beta/rev/9cebaf242de3
status-firefox22: --- → fixed
Is anything Qa can verify here? I see that this is covered by automation.
Please let me know if manual testing is also needed around this (please also offer some guidance).
(Assignee)

Comment 27

4 years ago
I'd appreciate a basic spot check that FHR continues to submit data both with a fresh profile and after upgrading a previously-submitted Fx 21 profile to 22b2+.

Although, we should be able to verify both easily enough once 22b2 is released to the wild via Hadoop jobs.
(In reply to Gregory Szorc [:gps] from comment #27)
> I'd appreciate a basic spot check that FHR continues to submit data both
> with a fresh profile and after upgrading a previously-submitted Fx 21
> profile to 22b2+.

Is there a way to see the the files where the FHR state are stored? Where can I find them?
(In reply to Simona B [QA] from comment #28)

> Is there a way to see the the files where the FHR state are stored? Where
> can I find them?

$PROFILE/healthreport/state.json.

It should look something like

  {"v":1,"remoteIDs":["f7c63a13-ae66-9f47-aaaa-ddddd580b5bf"],"lastPingTime":1369515206887}
Considering the time the patch was submitted, I'll verify the fix on Firefox 22 beta 3 (I can't see the state.json file in Firefox beta 2).
Mozilla/5.0 (Windows NT 6.0; rv:22.0) Gecko/20100101 Firefox/22.0
Build ID: 20130528181031

Verified as fixed on Firefox 22 beta 3:
- FHR continues to submit data (verified as suggested in Comment 27
- FHR states are stored in files (used the indications from Comment 29).
status-firefox22: fixed → verified

Updated

4 years ago
Flags: needinfo?(bcolloran)
(Assignee)

Updated

4 years ago
Blocks: 883277
Depends on: 1123137
You need to log in before you can comment on or make changes to this bug.