Longitudinal recording of FHR submission activity

RESOLVED FIXED in Firefox 22

Status

defect
RESOLVED FIXED
7 years ago
10 months ago

People

(Reporter: gps, Assigned: gps)

Tracking

unspecified
Firefox 23
Dependency tree / graph

Firefox Tracking Flags

(firefox22 fixed)

Details

Attachments

(1 attachment)

Saptarshi and I have identified an alarming number of FHR documents that have no "lastPingDate" but have several days of data. This is extremely weird. If FHR is collecting data, it should be uploading documents. Some possibilities:

* FHR is losing or not recording the last document ID / last ping date successfully.
* Client is experiencing several failures when attempting to upload documents before finally getting through many days later.
* Lots of clients disable FHR then re-enable it.

I suspect the first is the culprit here. However, we really don't know.

I propose we add longitudinal recording of FHR's document upload state to the payload. I propose a per-day structure that looks like:

{
  "uploadAttempts": 3,
  "uploadSuccess": 1,
  "uploadFailures": 2,
}

This will allow us to know:

* Whether clients are losing lastPingDate and last document ID during/after upload.
* What percentage of upload requests fail.
* Whether FHR's upload scheduling is firing during active sessions (we can correlate session occurrences with lack of upload attempts, for example).

We should also consider adding:

* An enumerated code for why failures occurred (network/transport failure, server error, ???).
* Whether uploading was enabled on a given day. This will help us see if users are opting in and out of FHR. Although, we could probably infer this by noticing a lack of upload attempts in the history.

We need policy review before we can implement anything.
What's the rate on Nightly?

What's the rate on Nightly builds after we added prefs flushing?

Will also be interested to see how many uploads time out -- when the numbers don't add up, because the user quit the browser before we hit our timeout. There must be non-zero of those.

I'd be inclined to separate "failure" (500 etc.) from "error" (exception thrown).
Component: Metrics and Firefox Health Report → Client: Desktop
Product: Mozilla Services → Firefox Health Report
Flags: needinfo?(bcolloran)
in addition to helping us debug FHR, this information would also be very useful in developing a full picture of Firefox retention rates. So far, I've been able to come up with four reasons why an FHR record in our database might not include any pings after a given date X:
(1) the record is orphaned
(2) the FF instance has been permanently abandoned
(3) the FF instance has not been used after date X (but will be used again)
(4) the FF instance has been used after date X, but has not successfully submitted a packet after date X

(aside: if you can think of other reasons, email me)

having the longitudinal submission info would let us estimate the the probability of submitting a packet after date X, given that the instance has been active after date X. The more things we can think of to pin each of these factors down, the better our ultimate estimates of retention and engagement can be.
Flags: needinfo?(bcolloran)
As far as I'm concerned, this bug is blocked on sign-off for collecting *any* new data element.

Over to mconnor for that.
Flags: needinfo?(mconnor)
I'm good with adding this data, this is in support of delivering the service effectively and efficiently.
Flags: needinfo?(mconnor)
This patch adds the following daily counters:

firstDocumentUploadAttempt
continuationUploadAttempt
uploadSuccess
uploadTransportFailure
uploadServerFailure
uploadClientFailure

The sum of the first 2 should be the total number of upload attempts in a day. This (currently) should be no more than 3. We distinguish between uploads when the client had a previous "lastPingDate" set. This should hopefully allow us to see which clients are losing track of when they upload (by allowing us to see a history of "first upload" over time).

Each upload attempt should be paired by a result condition. Hopefully most are "uploadSuccess." But, we could see network errors (transport failures), server failures, or (gasp) a client failure. We should never see a client failure.

Richard gets code review. Brendan and mconnor get sign-off that we are collecting what we need to and can collect.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Attachment #743304 - Flags: review?(rnewman)
Attachment #743304 - Flags: feedback?(mconnor)
Attachment #743304 - Flags: feedback?(bcolloran)
That looks excellent, Greg. I'm looking forward to seeing it roll in...
Attachment #743304 - Flags: review?(rnewman) → review+
Please look at and/or clear your f? flags before this makes it to central!

https://hg.mozilla.org/services/services-central/rev/eb460fae0ece
Whiteboard: [fixed in services]
I have updated the docs: https://github.com/mozilla-services/docs/commit/67e81495334d4edeec3f7bf26c1678ec55fd4a7a

This should be live on docs.services.mozilla.com momentarily.
Keywords: dev-doc-needed
Comment on attachment 743304 [details] [diff] [review]
Record upload counts, v1

looks good to me.
Attachment #743304 - Flags: feedback?(bcolloran) → feedback+
https://hg.mozilla.org/mozilla-central/rev/eb460fae0ece
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Whiteboard: [fixed in services]
Target Milestone: --- → Firefox 23
Comment on attachment 743304 [details] [diff] [review]
Record upload counts, v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): FHR
User impact if declined: Metrics won't be able to deal with orphaned records for another release.
Testing completed (on m-c, etc.): Been on m-c for a while. Aurora for a week or so. Automated testing in place. No regressions seen.
Risk to taking this patch (and alternatives if risky): I believe this to be low risk. Any risk would be immediately apparent via changes in client upload behavior.
String or IDL/UUID changes made by this patch: None

This will also make the uplift of bug 860094 much easier since the patches conflict.
Attachment #743304 - Flags: feedback?(mconnor) → approval-mozilla-beta?
Comment on attachment 743304 [details] [diff] [review]
Record upload counts, v1

In support of bug 860094's success in FF22.
Attachment #743304 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I don't see any of the counters from Comment 5 anywhere in the Raw Data section of the about:healtreport page. Should they be available in the UI, should I see them anywhere?
(In reply to Simona B [QA] from comment #14)
> I don't see any of the counters from Comment 5 anywhere in the Raw Data
> section of the about:healtreport page. Should they be available in the UI,
> should I see them anywhere?

They should show up in the raw data *after* the first data submission has finished. You'll only see data for the current day (it isn't retroactive).
Considering the time the patch was submitted, I'll verify the fix on Firefox 22 beta 3 (I can't see the counters in Firefox beta 2).
(In reply to Gregory Szorc [:gps] (on holiday until June 10) from comment #15)
> (In reply to Simona B [QA] from comment #14)
> > I don't see any of the counters from Comment 5 anywhere in the Raw Data
> > section of the about:healtreport page. Should they be available in the UI,
> > should I see them anywhere?
> They should show up in the raw data *after* the first data submission has
> finished. 

Mozilla/5.0 (Windows NT 6.0; rv:22.0) Gecko/20100101 Firefox/22.0
Build ID: 20130528181031

On Firefox 22 beta 3 I can see 2 of the counters mentioned in Comment 5, and each of them are "uploadSuccess":
firstDocumentUploadAttempt
continuationUploadAttempt

Is there a way to see the rest of the counters mentioned in Comment 5?
uploadTransportFailure
uploadServerFailure
uploadClientFailure

> You'll only see data for the current day (it isn't retroactive).

Also, in the raw data section I'm seeing the data for 3 different days:
- "2013-05-29": - no data was submitted


- "2013-05-30": {
        "org.mozilla.addons.counts": {
          "_v": 2,
          "extension": 12,
          "plugin": 13,
          "service": 1,
          "theme": 4
        },
        "org.mozilla.healthreport.submissions": {
          "_v": 1,
          "firstDocumentUploadAttempt": 1,
          "uploadSuccess": 1
        },


- "2013-05-31": {
        "org.mozilla.addons.counts": {
          "_v": 2,
          "extension": 19,
          "plugin": 13,
          "service": 1,
          "theme": 4
        },
        "org.mozilla.healthreport.submissions": {
          "_v": 1,
          "continuationUploadAttempt": 1,
          "uploadSuccess": 1
        },

Is this expected?
Blocks: 1053315
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.