Closed Bug 1092375 Opened 10 years ago Closed 9 years ago

Heartbeat Telemetry Experiment

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: tdowner, Assigned: glind)

References

Details

Attachments

(5 files, 3 obsolete files)

We will need a telemetry experiment for heartbeat!
QA Contact: kamiljoz
I claim the experiment addon is ready for Blake's review at  https://github.com/gregglind/firefox-pulse/tree/master/heartbeat-telemetry-experiment-1

Unclear:
- how to best handle 'daily' recurring deploy.  At Telemetry, or here?

Incomplete / blocked
- deployment / monitoring of engagement pages (waiting on contractor)



relevant:  https://bug1092376.bugzilla.mozilla.org/attachment.cgi?id=8526074
Flags: needinfo?(benjamin)
What info did you need from me?
Flags: needinfo?(benjamin) → needinfo?(glind)
As a note, I'll be doing the review at https://github.com/gregglind/firefox-pulse/pull/87/files
Status: NEW → ASSIGNED
Attachment #8529130 - Attachment mime type: application/json → text/plain
Comment on attachment 8529130 [details]
Sample collected packet for Heartbeat Upload

>{
>  "person_id": "6d5cc889-6fae-3649-8c33-e115b8661640",

What is this and why do we need it?

>  "flow_id": "e3dcba35-3069-f44d-a9c7-341219c4e921",

What is this?

>  "flow_began_ts": 1417017597805,
>  "flow_offered_ts": 1417017597928,
>  "flow_voted_ts": 1417018113949,
>  "flow_engaged_ts": 1417018120585,

What timezone/baseline are these timestamps?

>  "profile_age": 0.6726974652774516,

Age from what point? I would prefer not to collect this in general.

>  "profile_usage": {
>    "total": 0
>  },

What is this?

>  "extra": {
>    "crashes": {
>      "total": 0,
>      "submitted": 0,
>      "pending": 0
>    },

What's this?

>    "prefs": {
>      "browser.search.defaultenginename": "chrome://browser-region/locale/region.properties",
>      "privacy.donottrackheader.enabled": false,
>      "privacy.donottrackheader.value": 1,
>      "places.history.enabled": true,
>      "browser.tabs.remote.autostart": false,
>      "gecko.buildID": "20141125004001"
>    },

You should not send any pref values with this payload unless I've specifically reviewed them for privacy. For now, I think you should exclude them altogether.

>    "engage": [
>      [
>        1417018120586,
>        "support-firefox-slow",
>        "neutral"
>      ]
>    ]

What's this?
Flags: needinfo?(glind)
Personid:  for this experiment.  Needed to track how many 'attempts' it takes to actually get to 'popping up' a question.  Needed to guide the 'wait until showing' interval.
Flowid:  primary key of the interaction.  For the experiment, the id of the actual 'question experience'.  One at time but a user could have multiple.  At most they will see one 'popup'.
Flow times:  timestamp for when each state of the experience was reached.  Timezones are whatever Date.now() reports at the client.
Profile_age:  (in days) useful (predictive) covariate to explain satisfication.  (I didn't realize there are concerns over this).  From FHR.
Profile_usage:  fhr days and ticks.
extra:
- crashes:  number of reported, total crashes (from FHR)
- engage: clicks on the post-voting engagement page.  Instrumented here for building predictive models.  Also instrumented through GA (untested.  Once known to work, GA will be sufficient for tabulation, but not modeling)
- prefs:  buildID, distributionID from FHR.  We are trying to find markers that predict and explain scoring, response, and differences from General Release audience.
Flags: needinfo?(glind)
(In reply to Gregg Lind (User Advocacy - Test Pilot) from comment #7)
> Personid:  for this experiment.  Needed to track how many 'attempts' it
> takes to actually get to 'popping up' a question.  Needed to guide the 'wait
> until showing' interval.

I need more detail. What generates the personid? Is it stored on the client? If so does it expire? Why is it necessary to have a personid in the payload to answer those questions? 

> Flowid:  primary key of the interaction.  For the experiment, the id of the
> actual 'question experience'.  One at time but a user could have multiple. 
> At most they will see one 'popup'.

Is this stored on the client? If so does it expire?

> Profile_usage:  fhr days and ticks.

The data schema doesn't list these. Days/ticks over what time frame?

> extra:
> - crashes:  number of reported, total crashes (from FHR)

Over what time frame?

> - prefs:  buildID, distributionID from FHR.  We are trying to find markers
> that predict and explain scoring, response, and differences from General
> Release audience.

please remove all pref recording for this round.
Flags: needinfo?(glind)
Prefs have been removed.  This removes the ability to understand the impact of search settings on score, response, and engagement.

Personid has been removed.

Personid and flowid are uuids.  they are stored on the client in prefs for the duration of the study, and removed after uninstall (or can be made to).

Personid is what allows proper estimation of the response rate.  They can be dropped this round, but will have to be dealt with eventually. Without this, we will underestimate the response rate, and be unable to measure the number of attempts it takes to trigger the UI element.  In previous studies, fewer attempts was criteria for deciding which UI to use.

Ticks (removed):  sum(FHR clean active ticks) over last 30 days.  (or age of profile).  Removing this removes the ability of using profile age to predict or model score, uptake and engagement.  This variable was significant in previous studies.
Flags: needinfo?(glind)
If you have some suggestions for good possible additional covariates that
- are likely to correspond with differences in user perceived satisfaction
- are consistent with our data collection policy

please suggest them.  

Even this previously collected data performed poorly (<10% of variability explained).
> Personid and flowid are uuids.  they are stored on the client in prefs for
> the duration of the study, and removed after uninstall (or can be made to).

Yes, please make sure the flow IDs are removed on uninstall.

> Personid is what allows proper estimation of the response rate.  They can be
> dropped this round, but will have to be dealt with eventually. Without this,
> we will underestimate the response rate, and be unable to measure the number
> of attempts it takes to trigger the UI element.  In previous studies, fewer
> attempts was criteria for deciding which UI to use.

I don't understand this. Why don't you just measure the actual thing you care about, which is the number of times the UI is shown, and the # of times people respond to it? This shouldn't require a user identifier.

> 
> Ticks (removed):  sum(FHR clean active ticks) over last 30 days.  (or age of
> profile).  Removing this removes the ability of using profile age to predict
> or model score, uptake and engagement.  This variable was significant in
> previous studies.

I don't mind keeping ticks and active days as long as they are well defined. 30 days sounds reasonable.

Profile age is also ok as long as it's relative to the ping date and isn't an absolute value. I recommend truncating to an integer rather than dealing with floats.
The reason I suggested recording the heartbeat data directly within telemetry is that we can correlate it against everything else in the FHR/telemetry payload without data duplication. Obviously that's not ready right now, so we'll use the separate collection system, but that means we need to avoid user identifiers.
All implemented.

I don't quite grok "Profile age is also ok as long as it's relative to the ping date and isn't an absolute value".  Can you explain that (algorithm) a bit further?  What about profile age / usage would be acceptable?

A.  {ndays_used_in_last_30: x,  total_tick_in_last_30: x, possible: min(profile_age_days,30)}
B.  profileAge: days, capped at 365  (or some other figure?)

The idea here is to use Newness and/or Usage for prediction / explanation.  It's not solved what the Right transform of that info is, thus my looseness in capturing it.
An absolute profile age can be used to identify a particular user, especially if it has high float precision. A relative age, especially if it is rounded to an integer, provides no way to go back from a ping to identify a user.
Can you update the JSON attachment so I can mark the feedback+ on it?
Flags: needinfo?(glind)
(sorry, still don't really understand your relative vs absolute profile stuff.  Literally, how do those two measures differ, if one has the ping time?)
Flags: needinfo?(glind)
It's the rounding that improves things. The other question was about the data format.
Sample collected packet, accounting for data collection concerns.
Attachment #8529130 - Attachment is obsolete: true
Attachment #8529130 - Flags: review?(benjamin)
Attachment #8529230 - Flags: review?(benjamin)
Benjamin:  new question..

so, e10s.  Can we exclude them?  If so, manifest, or in addon?  (i will do both).  Would be nice to have a count there, but willing to take it as a wontfix.
Comment on attachment 8529230 [details]
heartbeat.packet.r2.txt

FWIW, I didn't ask you to omit buildid/partnerid: just the prefs tree. r=me in either case
Attachment #8529230 - Flags: review?(benjamin) → review+
I don't understand the e10s question. We're shipping this experiment to beta where e10s isn't even an option.

If it were an issue, we can't exclude them by manifest, so you'd have to do it within the addon.
Newest code diff: https://github.com/gregglind/firefox-pulse/pull/90

I claim experiment is ready, but I can't figure out how to test it (in experiment form)

- I build it as a 128.
- but installing it as a normal xpi (by 'file') shows it in the epxerient list, but shows it as completed.

```
cfx xpi \
	  --static-args='{"phonehome": true, "testing": false}' \
	  --templatedir ext-template/ \
	  --output-file addons/heartbeat-telemetry-experiment-1-type-128.xpi
```

Manifest being prepared.
Flags: needinfo?(bwinton)
Flags: needinfo?(benjamin)
Attached file manifest.json (obsolete) —
(manifest file)
Attachment #8529857 - Flags: review?(benjamin)
Attached file experiment.xpi (obsolete) —
(built experiment)
(hg push on the telex server wasn't working for me.  Attached instead)
Attachment #8529857 - Attachment mime type: application/json → text/plain
Kamil can you show Gregg how to test an experiment?
Flags: needinfo?(benjamin) → needinfo?(kamiljoz)
Comment on attachment 8529857 [details]
manifest.json

This must have a sample of 10% or less, so that there are people available to run other experiments.

Maxversion should be 38.0 not *

I'll check the start/end dates later.
Attachment #8529857 - Flags: review?(benjamin) → review-
Comment on attachment 8529857 [details]
manifest.json

Ther other dates (startTime/endTime/maxActiveSeconds) are correct.
Apologies for the delayed response! I was finishing up some testing/verification/automation updates for the 31.3.0esr and 34.0.5 releases.

I managed to successfully install the hearbeat experiment using the following steps:

Prerequisites: (note: I used Ubuntu 14.04 but it's basically the same steps on OSX)

* Install Mozilla build tools (I'm not 100% sure you need this, but I always install the build tools just incase)
** Ubuntu: wget -O bootstrap.py https://hg.mozilla.org/mozilla-central/raw-file/default/python/mozboot/bin/bootstrap.py && python bootstrap.py
** OSX: curl https://hg.mozilla.org/mozilla-central/raw-file/default/python/mozboot/bin/bootstrap.py > bootstrap.py && python bootstrap.py
* Install Genshi: http://genshi.edgewall.org/wiki/Download (you'll need pip installed)

Step #1: Pull the telemetry experiment code

* hg clone http://hg.mozilla.org/webtools/telemetry-experiment-server/

Step #2: adding manifest.json and .xpi 

* cd into "cd telemetry-experiment-server/"
* cd into "cd experiments/'
* mkdir heartbeat (place both the .xpi and manifest.json into the "heartbeat" directory)

Step #3: Building the experiment using python

* make sure you're in the "telemetry-experiment-server\" directory
* python build.py <your directory name> <"base URL"> Example --> python build.py _out "http://localhost:8080"
* launch a local server (i used my personal website): python -m SimpleHTTPServer 8080

Step #4: Installing experiment via browser

* install the appropriate browser that will match the criteria
* about:config
* devtools.chrome.enabled;true
* experiments.logging.level;0 (so you can see errors etc.. in the Browser Console)
* experiments.manifest.uri (the url to your firefox-manifest.json, example: http://kamiljozwiak.com/firefox-manifest.json
* disable/enable "experiments.enabled" (the experiment should be installed)

Greg, let me know if you run into any issues. I should be available the entire weekend so just shoot over an email or ping me on IRC. (email probably the better route). You can also take a look at https://wiki.mozilla.org/QA/Telemetry.
Flags: needinfo?(kamiljoz)
Attached file experiment.xpi
V2.0.1 with bugfixes
Attachment #8529859 - Attachment is obsolete: true
Attached file manifest.json
Fixes to sample size (which I had misunderstood)
Attachment #8529857 - Attachment is obsolete: true
(I think the info has been provided.  If not, please re-needinfo me!)
Flags: needinfo?(bwinton)
Not sure whether this actually got deployed, but it's no longer needed now.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
(concur.  This is old news.  Not needed.)
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: