Closed Bug 1193535 Opened 5 years ago Closed 4 years ago

Store Heartbeat Scores in Unified Telemetry

Categories

(Toolkit :: Telemetry, defect, minor)

42 Branch
defect
Not set
minor

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: glind, Assigned: vladan)

References

Details

(Whiteboard: [measurement:client] [measurement:client:tracking])

Attachments

(2 files, 9 obsolete files)

12.19 KB, application/x-zip-compressed
Details
58 bytes, text/x-review-board-request
MattN
: review+
vladan
: review+
Details
It is useful to have Heartbeat (self-reported satisfaction) score associated with profile data. 

Claim:  Storing Heartbeat score in Unified Telemetry is the best path for getting this.  

I nominate Sam Penrose as my helper :)
We can just submit is as a histogram, then it is showing up at telemetry.mozilla.org and can use it in analysis.

See:
https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe
Note the privacy review requirement and the releaseChannelCollection field.

Is the rating collected in the remote content or do we see it client-side already and only pass it to the remote content?
I recommend that we use a separate ping type for this. Smaller targeted pings is a better long-term solution than stuffing more things into our existing pings, and will make the basic analysis and reporting much easier.
This seems like data that we'll want to correlate to specific sessions though?
If yes, then we can either submit it with the session (e.g. as a histogram) without any additional effort or we'd need to link this by submitting the current session id too and link it together later.
An alternative would be to make TelemetryController.submitExternalPing() take an |addSessionId| option, although that still leaves linking it to sessions in analysis later.
No, I don't think that this needs to be analyzed/correlated with particular sessions. We'll want to correlate it with environment data on a regular (realtime?) basis, but in-depth analysis can use the by-clientID API to reconstruct a greater history.
I can take this. Gregg: how does the heartbeat score get recorded?
Assignee: nobody → vdjeric
Flags: needinfo?(glind)
Right now, heartbeat sends the data to Input where it gets recorded in mysql tables. The API used is documented here:

http://fjord.readthedocs.org/en/latest/hb_api.html

Table definitions are in Django ORM here:

https://github.com/mozilla/fjord/blob/master/fjord/heartbeat/models.py#L38

Vladan: Does that answer your questions?
(I claim the tuple here is something like:

source:  heartbeat-first-impression  (this is WIDE)
score: 1-5

I think this is sufficient.
)
Flags: needinfo?(glind)
tl;dr - modify UITour.jsm `maybeNotifyHeartbeat` function.

Vladan to clarify some things.

Moving pieces:

- [1] UITour-lib.js:  bundled into self-repair.mozilla.org
- [2] UITour.jsm:  privileged code bundled with fx.


When the user clicks the 'star', it does a lot of things:
- UITour.js sends messages back across the wire to client-side
- which then phones home, and does other things in `maybeNotifyHeartbeat` function.

I think the place to intercept is in `UITour.jsm`  at https://hg.mozilla.org/mozilla-central/file/dd2a1d737a64/browser/components/uitour/UITour.jsm#l1180 .    or maybe:

https://hg.mozilla.org/mozilla-central/file/dd2a1d737a64/browser/components/uitour/UITour.jsm#l1243


[1] https://hg.mozilla.org/mozilla-central/file/dd2a1d737a64/browser/components/uitour/UITour-lib.js
[2] https://hg.mozilla.org/mozilla-central/file/dd2a1d737a64/browser/components/uitour/UITour.jsm
As a side note, bug #1194238 covers sending heartbeat data to telemetry and only telemetry--i.e. ditching Input as the heartbeat backend.

Can we tackle that as well with the work being done here? Is there anything useful out of storing heartbeat data in two places?
:glind :willkg, are we going to store those cases where a client was presented with an HB prompt but did not endorse a star?
I think I have enough info to get started, I'll post a patch for feedback tomorrow or Friday
Depends on: 1211222
Attached patch hearbeatTelemetry.patch (obsolete) — Splinter Review
This patch generates a Telemetry ping for each of the 5 Heartbeat events: NotificationOffered, NotificationClosed, LearnMore, Voted, Engaged.

The ping will be sent soon to Telemetry servers shortly after the event. 

All pings will have a Telemetry clientID, none will have Telemetry's Firefox session ID, and only the Voted ping will contain the user's environment.
Attachment #8669388 - Flags: feedback?(glind)
Attached file sample.heartbeat.pings.zip (obsolete) —
This is what the submitted data looks like with this patch
Unresolved:

1. Are you ok with these 5 pings? Check out the sample pings in the attached zip files 
2. Does Heartbeat code still need to send notifications to the self-support page? If not, can you guys rip it out in a follow-up bug?
3. If we don't need the old Heartbeat system, then we can remove the timestamp field from the Telemetry ping because all Telemetry pings have a similar creationDate field
4. Privacy: with this patch, a user's interactions with the Heartbeat system are reported on with precise timestamps, on an opt-out basis on all channels. This isn't a new behaviour, but Telemetry opt-out probes require evidence of a user benefit from the monitoring. 
4a) What's the description of user benefit for Heartbeat? 
4b) Who is going to monitor the collected data?
4c) Some Heartbeat prompts only appear in private-browsing mode, so that means we'll be effectively reporting on users' private-browsing mode usage. I'd like to hear another data collection peer's opinion on this.
Flags: needinfo?(glind)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #13)
> Created attachment 8669388 [details] [diff] [review]
> hearbeatTelemetry.patch

I'll add tests & update docs after I get confirmation this is what we want to do
> 4c) Some Heartbeat prompts only appear in private-browsing mode, so that means we'll be effectively reporting on users' private-browsing mode usage. I'd like to hear another data collection peer's opinion on this.

We could omit clientIDs for the private-browsing specific surveys, would that be ok? You wouldn't be able to associate the Heartbeat scores with users or user histories (you would still have the Environment block)
Attached patch hearbeatTelemetry.patch (obsolete) — Splinter Review
Attachment #8669388 - Attachment is obsolete: true
Attachment #8669388 - Flags: feedback?(glind)
Attachment #8669390 - Flags: feedback?(glind)
If i understand this right we would get >=2 pings per heartbeat client with this implementation?
Can't we rework this a bit and always send one ping with the results here?

I'm worried that we are sending more and more trivial pings and get resulting send/storage/processing overhead without clear benefits.
Status: NEW → ASSIGNED
Flags: needinfo?(vladan.bugzilla)
(Georg, the will be for the users who are offered the survey... which is 1/million or so in release per day.  I agree making the packet meaningful here is important.)
Flags: needinfo?(glind)
1.  WE are thinking through the lists of keys to white list in the packet.

- surveyId
- surveyVersion
- score (0 during offer phase)

(is technically sufficient, I think)

- `flowId` (requested, see (2) below)


2.  Which phases?  

- rweiss wants to know if there is systematic bias in the response.  For this, we need the 'offer' and 'vote' phases.  Having the flow_id allows uniting those.  

Exposing the flow_id allows connecting these to any surveys that come from these interactions, allowing us to assess Bias in response to surveys (for any UT traits).

I recommend that we record both phases, and collect 'flow_id'


Notes:  
a. We are currently sample at 10/million/day on release, with proposed "bursts" around new release.
Flags: needinfo?(vladan.bugzilla)
Flags: needinfo?(rweiss)
Flags: needinfo?(benjamin)
(re #5, I agree with Bsmedberg the environment is enough to get started here.  Correlating with 'had a crash recently' is also important.)
I do not think we should separate pings for multiple phases. We should be to record a single ping in the client which represents both the offer and the vote. That significantly reduces the processing complexity because on the server there's no need to cross-reference pings, which makes both realtime processing and batch reporting much simpler.
Flags: needinfo?(benjamin)
(In reply to Georg Fritzsche [:gfritzsche] [away Oct 6 - Oct 8] from comment #19)
> If i understand this right we would get >=2 pings per heartbeat client with
> this implementation?
> Can't we rework this a bit and always send one ping with the results here?

Not easily. The first ping reports that the user has been *presented* with the Heartbeat rating toolbar. If the user ignores the toolbar, that is the only ping that will be sent so it can't always be coalesced with another event. 
I guess it might be possible to track these toolbar appearances server-side but that requires assuming everything between the server and client is working properly.

> I'm worried that we are sending more and more trivial pings and get
> resulting send/storage/processing overhead without clear benefits.

As Gregg said, Heartbeat will only prompt a few thousand users to take the Heartbeat survey on the entire release channel. Even if Firefox sent 5 pings for each of those users for each survey, it would be a manageable amount of data to store and process.
(In reply to Gregg Lind (User Advocacy - Heartbeat - Test Pilot) from comment #21)
> 1.  WE are thinking through the lists of keys to white list in the packet.
> - surveyId
> - surveyVersion
> - score (0 during offer phase)
> 
> (is technically sufficient, I think)
> 
> - `flowId` (requested, see (2) below)

I think these are ok to include from a privacy point of view

> I recommend that we record both phases, and collect 'flow_id'

I'm ok with this as well
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #24)
> I do not think we should separate pings for multiple phases. We should be to
> record a single ping in the client which represents both the offer and the
> vote. That significantly reduces the processing complexity because on the
> server there's no need to cross-reference pings, which makes both realtime
> processing and batch reporting much simpler.

The main issue (as I see it) is that there is no clear "survey end" so it's not clear when Heartbeat should stop waiting for the user's vote or other input (closing the survey, clicking learn more, etc).

We *could* define a notion of a "survey end". For example: the survey "ends" when the user votes, or the user exits their Firefox session, or within 48 hours of the survey being offered -- whichever happens first. Then we can send a single ping at the end of the survey, and that ping will contain timestamps for when the survey was offered, the user's score, whether the user clicked "Learn More" etc -- all in one ping.

This approach would delay survey reporting, e.g. if the user exits their session (thus ending the survey) but doesn't start Firefox again for a month, then their Heartbeat report will be delayed by a month.

Gregg: would you be ok with this approach?
Flags: needinfo?(glind)
I'll assert that should be good enough. We might should also set a timeout on the notification bar so that there's an upper bound: an hour, for example.
Gregg and I have discussed offline about whether this format suits my needs; I believe it does.
Flags: needinfo?(rweiss)
Flags: needinfo?(glind)
I am okay with all of this.  I owe a report on how long to leave the window open.  I want to propose TWO HOURS, but should have a reasonable reason for doing so :)

GL
Vladan, I claim 2 hours is sufficient:

I would rather take the risk of missing a vote than missing an attempt.  


================
This is time between offer and VOTE (for those who scored)

# 99%: => 29410996   8hr
# 95%  => 4766704    < 2hr
# 90%  => 1651422    < 30 min
# 50%  => 25090      < 1 sec
Vladan,

1.  I think this is all settled.  

2.  I am worried about 'not recording attempts' unless we attempt to send all packets during shutdown.  (that is, I am worried that by waiting too long, we will undercount attempts, and overestimate response.)

3.  Awesome job, let's land this!
Flags: needinfo?(vladan.bugzilla)
Benjamin, last go round.

Concerns:

1.  If we do the 'flow / interaction' (combined) packet, it will undercount the 'offers' and overstate the acceptance rate.  But packets will be minimized.

2.  If we do separated packets, they have to be correlated during processing, but we will have truer (real-time) estimates for it all.  (shorter lag, more estimate).

Opinions?  Patch is written as though plan 1.  I find this acceptable, but generally prefer 2, because I am a completist.  

Eager to get this.
Flags: needinfo?(vladan.bugzilla) → needinfo?(benjamin)
We will count all the offers. The only question is the latency of receiving the data for the offers that weren't accepted. I don't think we particularly need to worry about this and should go with option 1 (combined packet) for all its advantages in processing and reporting.
Flags: needinfo?(benjamin)
It sounds like we are going with option 1? Let us know if anything else is needed before landing the patch.
Comment on attachment 8669390 [details] [diff] [review]
hearbeatTelemetry.patch

Review of attachment 8669390 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/uitour/UITour.jsm
@@ +1144,5 @@
> +          addClientId: true,
> +          addEnvironment: aAddEnvironment,
> +        });
> +
> +      this.notify(aEventName, aParams);

this is going to need things from `aOptions`.

@@ +1249,5 @@
>          maybeNotifyHeartbeat("Heartbeat:Voted", {
>            flowId: aOptions.flowId,
>            score: rating,
>            timestamp: Date.now(),
> +        }, true);

Pass some piece of aOptions as well to 'params'

These 3 fields + aOptions is *sufficient* but might be overkill.

`flowId` is a uuid per heartbeart 'offer'.  This is stored locally on the client (in localstore).  I am not if there are any risks to having this be in the packet.  It does mean we can tie telemetry to surveyGizmo answers.

From the tour, the args are:

		var args = {
			message: message,
			thankyouMessage: thankyouMessage,
			flowId: flowId,
			engagementURL: engagementURL,
			learnMoreLabel: learnMoreLabel,
			learnMoreURL: learnMoreURL,
		};

(as well as an overridable set of options.)

I think I cram thing into that, but it seems unruly.
Attachment #8669390 - Flags: feedback?(glind) → feedback+
Attached patch heartbeatTelemetry2.patch (obsolete) — Splinter Review
Attachment #8669390 - Attachment is obsolete: true
Attachment #8682227 - Flags: feedback?(glind)
Attached file sample.heartbeat.pings2.zip (obsolete) —
I'll add tests & docs after the f+ on the approach from Gregg
Attachment #8669389 - Attachment is obsolete: true
Attachment #8682227 - Flags: feedback?(glind) → feedback+
Looks good to me.  Has:

- whitelist of a few useful keys (surveyId, surveyVersion, testing)
- 2 hour (7200 second) 'wait' (which should capture 95% of all responses)

Takk!
Blocks: 1223895
Attachment #8682229 - Attachment is obsolete: true
Attached patch heartbeatTelemetry3.patch (obsolete) — Splinter Review
Added tests & docs. Also handled survey's window getting closed by the user.

Gregg, are you the right person to review this?

Ally, can you comment on privacy aspects? e.g. precise timestamps. Gregg can provide you with background on any past privacy reviews done for the non-Telemetry version of this functionality.
Attachment #8682227 - Attachment is obsolete: true
Attachment #8686394 - Flags: review?(glind)
Attachment #8686394 - Flags: feedback?(ally)
Comment on attachment 8686394 [details] [diff] [review]
heartbeatTelemetry3.patch

Minor:  the `rst` mentions those extra fields without saying how they appear in the packet.
Attachment #8686394 - Flags: review?(glind) → review+
Ok, I'll make that change. Just waiting on the privacy review now
I have been waiting to hear from Gregg about prior privacy/data collection/de-identification reviews for the non-telemetry since Vlad mentioned it in comment 41.
Flags: needinfo?(glind)
Comment on attachment 8686394 [details] [diff] [review]
heartbeatTelemetry3.patch

Review of attachment 8686394 [details] [diff] [review]:
-----------------------------------------------------------------

Ordinarly, I would hold this review until the prior data reviews surface. However, I understand there is pressure to get this in and have data by mozlando, so I am willing to give a r+ with the understanding that if previous data reviews turn up issues, followups will be handled promptly. Sound fair?
 
My biggest concern is that we're usually pretty strict about opt-out collections having an expiration version and I don't see a discussion of one in the bug or in the patch. That said, the strings appear to be tightly controlled and ratings+flowId probably fall into a small enough set of buckets that fingerprinting is probably not a meaningful issue longterm. Data may prove me wrong.


I'm glad to see we are not reporting for the private browsing case.

::: browser/components/uitour/UITour.jsm
@@ +818,4 @@
>        // The browser message manager is disconnected when the <browser> is
>        // destroyed and we want to teardown at that point.
>        case "message-manager-close": {
> +

unneeded newline? :)

@@ +1116,5 @@
> +    if (!this.surveyOptions) {
> +      // Event occurred after the ping was sent
> +      return;
> +    } else if (this.surveyOptions.privateWindowsOnly) {
> +      // No Telemetry from private-window-only Heartbeats

Good, good.

@@ +1124,5 @@
> +    let ts = Date.now();
> +    let sendPing = false;
> +    switch (aEventName) {
> +      case "Heartbeat:NotificationOffered":
> +        this.surveyResults.flowId = this.surveyOptions.flowId;

flowId can only have the form of "ui-ratefirefox-"#, "invalidEngagementButtonLabel-"#, "notMostRecent-"#, "privateWindowsOnly_noneOpen-"#, "ui-engagewithfirefox-"#,  "ui-privateWindowsOnly-"#, correct?

@@ +1187,5 @@
> +    TelemetryController.submitExternalPing("heartbeat",
> +      payload,
> +      {
> +        addClientId: true,
> +        addEnvironment: true,

So we're submitting the clientid and the environment on every heartbeat ping? I was under the impression from the bug comments that we didn't want to do that for every ping.

::: browser/components/uitour/test/browser_UITour_heartbeat.js
@@ +73,5 @@
> + *        The Telemetry payload to verify
> + * @param aFlowId
> + *        Expected value of the flowId field.
> + * @param aExpectedFields
> + *        Array of expected fields. No other fields are allowed.

Good.
Attachment #8686394 - Flags: feedback?(ally) → feedback+
(I have searched and searched for the privacy / discussion.  Cannot find it.  Pinging BDS)
Flags: needinfo?(glind) → needinfo?(benjamin)
:ally :bsmedberg :glind, some relevant conversation here: https://bugzilla.mozilla.org/show_bug.cgi?id=1092375
Attached patch heartbeatTelemetry.ready.patch (obsolete) — Splinter Review
This patch is ready to check in, assuming the privacy discussion doesn't result in change requests.

> So we're submitting the clientid and the environment on every heartbeat ping?
> I was under the impression from the bug comments that we didn't want to do
> that for every ping.

We do need clientIDs to be able to associate survey responses with their browser histories. The environment is needed to bucket responses, based on OS, etc

I'll let Gregg answer the flowId question in comment 45
Attachment #8686394 - Attachment is obsolete: true
Attachment #8688752 - Flags: review+
Attachment #8688752 - Flags: feedback+
(In reply to Vladan Djeric (:vladan) -- please needinfo! PTO Nov 16/19/20 from comment #43)
> Ok, I'll make that change. Just waiting on the privacy review now

I think this patch needs a Firefox peers review.
Comment on attachment 8688752 [details] [diff] [review]
heartbeatTelemetry.ready.patch

Review of attachment 8688752 [details] [diff] [review]:
-----------------------------------------------------------------

Matt: can you review this patch as a Firefox peer? Or nominate someone else?
Attachment #8688752 - Flags: review?(MattN+bmo)
:ally, re #45

1.  What is a `flowid`

`flowid`:  
- a 'per-interaction' uuid.  
- doesn't connect to other ids (other than through this bug)
- connects phases of the interaction (offer, seen, vote)
- connects score to any 'post-heartbeat' interaction (surveys) 
- flowids are stored at the client in localstore.  We could be more aggressive about clearing them out.

To see this in action:

`https://self-repair.mozilla.org/en-US/repair/?{%22runner%22:{%22alwaysRun%22:true}}`

Then look in your `localStorage` instance.  


2.  Tying score to environment (via telemetry) is the main goal of this bug.  This allows analyses connecting USER SENTIMENT to ADDONS, PROFILE AGE and other vars.  Sending ENV makes these sorts of analysis simple.  

(Here, User Sentiment (score) is the outcome variable, given a profile state.)
Comment on attachment 8688752 [details] [diff] [review]
heartbeatTelemetry.ready.patch

Review of attachment 8688752 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't do a full review but don't like that we're now storing state instead of using a closure.

::: browser/app/profile/firefox.js
@@ +212,5 @@
>  pref("browser.uitour.url", "https://www.mozilla.org/%LOCALE%/firefox/%VERSION%/tour/");
>  // This is used as a regexp match against the page's URL.
>  pref("browser.uitour.readerViewTrigger", "^https:\\/\\/www\\.mozilla\\.org\\/[^\\/]+\\/firefox\\/reading\\/start");
> +// How long to show a Hearbeat survey (two hours, in seconds)
> +pref("browser.uitour.surveyDuration", 7200);

Does this need to be a pref instead of just a const in UITour.jsm?

::: browser/components/uitour/UITour.jsm
@@ +97,5 @@
>  
> +  /* Heartbeat survey configuration */
> +  surveyOptions: null,
> +  surveyWindow: null,
> +  surveyEndTimer: null,

Do we really need to hold a reference to the window and timer? Have you considered Timer.jsm instead?

@@ +1114,5 @@
> +  maybeNotifyHeartbeat: function(aEventName, aParams = {}) {
> +    if (!this.surveyOptions) {
> +      // Event occurred after the ping was sent
> +      return;
> +    } else if (this.surveyOptions.privateWindowsOnly) {

No else after return necessary

@@ +1208,5 @@
>     * Show the Heartbeat UI to request user feedback. This function reports back to the
>     * caller using |notify|. The notification event name reflects the current status the UI
>     * is in (either "Heartbeat:NotificationOffered", "Heartbeat:NotificationClosed",
>     * "Heartbeat:LearnMore", "Heartbeat:Engaged" or "Heartbeat:Voted").
>     * When a "Heartbeat:Voted" event is notified

Please update this comment

@@ +1241,5 @@
>    showHeartbeat(aChromeWindow, aOptions) {
> +    // Initialize survey state
> +    this.surveyOptions = aOptions;
> +    this.surveyResults = {};
> +    this.surveyEndTimer = null;

Uh… why would we use module-level state for this? This isn't a core part of UITour so it doesn't fit IMO. Also this wouldn't work if there were multiple ongoing heartbeat (which perhaps we don't do but the old code handled it fine).
Attachment #8688752 - Flags: review?(MattN+bmo) → review-
I think the original analysis happened in person, but it's pretty straightforward. Asking people about their experience is one of the best ways we have to improve it. It will not surprise users to know that if we ask their opinion we are going to record the response. Correlating that response against their environment is how we know where to improve.

I do have some questions about the specific docs being proposed here:

+This ping is submitted after a Firefox Heartbeat survey. Even If the user exits
+the browser, closes the survey window, or ignores the survey, Heartbeat will
+submit a ping to Telemetry for sending during the same session.

This is slightly incorrect: if the user exits the browser, the ping will be sent in the following session. We don't do synchronous network waits on shutdown.

Please document "flowID". I don't know what kind of data it is or what it's for, or how it's different from surveyId.

If there is no vote (shutdown or timeout), will votedTS and score be present (null?) or completely absent?
what are "... other timestamps ..."

How much control does the self-support page have to add other pieces of data to this payload? Which pieces of data come from there and which are controlled by the browser?
Flags: needinfo?(benjamin)
(Benjamin, answering as able.  Vladan might have better insights.

1. Thanks for the catches on the doc stuff.  

2. "How much control does the self-support page have to add other pieces of data to this payload? Which pieces of data come from there and which are controlled by the browser?"

As far as I know, currently, WE CAN see:
- via `UITOUR.getConfiguration` [1]
  - "appinfo",
  - "availableTargets",
  - "loop",
  - "selectedSearchEngine",
  - "sync"
- via usual `browser`, `navigator` and other DOM things
  - locale
  - plugins (but not addons)
- via Heartbeat
  - Geolocation at COUNTRY, via `geo.mozilla.org`


[1] UITOUR:  https://hg.mozilla.org/mozilla-central/file/tip/browser/components/uitour/UITour.jsm#l1781
Flags: needinfo?(vladan.bugzilla)
Comment on attachment 8688752 [details] [diff] [review]
heartbeatTelemetry.ready.patch

Review of attachment 8688752 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/docs/heartbeat-ping.rst
@@ +1,3 @@
> +
> +"heartbeat" ping
> +=================

You also need to update telemetry/docs/index.rst
Whiteboard: [measurement:client] [measurement:client:tracking]
Hey folks. What's the status on this patch? We have several Q1 goals that rely on this data. Do we need a quick sync to get back on track after the holidays?
(In reply to Matt Grimes [:Matt_G] from comment #57)
> Hey folks. What's the status on this patch? We have several Q1 goals that
> rely on this data. Do we need a quick sync to get back on track after the
> holidays?

I'm working on this patch again, I'll have it up for review on Monday, and try to have it landed before end of week (when I leave Mozilla)
Flags: needinfo?(vladan.bugzilla)
Attached patch heartbeatTelemetry2.patch (obsolete) — Splinter Review
Applied reviews
Attachment #8688752 - Attachment is obsolete: true
Attachment #8712464 - Flags: review?(MattN+bmo)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #53)
> This is slightly incorrect: if the user exits the browser, the ping will be
> sent in the following session. We don't do synchronous network waits on
> shutdown.

It races against shutdown, and in my testing, it does get sent before Firefox completely exits.
Attached patch heartbeatTelemetry 2.1.patch (obsolete) — Splinter Review
Attachment #8712464 - Attachment is obsolete: true
Attachment #8712464 - Flags: review?(MattN+bmo)
Attachment #8712470 - Flags: review?(MattN+bmo)
Blocks: e10s-beta
Comment on attachment 8712470 [details] [diff] [review]
heartbeatTelemetry 2.1.patch

Review of attachment 8712470 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like toolkit/components/telemetry/docs/pings.rst also needs an update.

Thanks Vladan. I have a decent number of comments but no fundamental issues. Note that since I don't know have all of the context of how the data will be used so I'm mostly assuming the pings are sending the data the team agreed upon.

::: browser/components/uitour/UITour.jsm
@@ +1147,4 @@
>     */
>    showHeartbeat(aChromeWindow, aOptions) {
> +    // Initialize survey state
> +    let surveyOptions = aOptions;

It would seem like this is only used to know if the ping was sent so I would prefer it was a boolean with a name that just said that more plainly. I think you can s/surveyOptions/aOptions/ below other than the null assignment.

@@ +1165,1 @@
>          return;

Should we log some something in this case for troubleshooting? Not sure which log level? debug? warn? error?

@@ +1174,5 @@
> +      let sendPing = false;
> +      switch (aEventName) {
> +        case "Heartbeat:NotificationOffered":
> +          surveyResults.flowId = surveyOptions.flowId;
> +          surveyResults.offeredTS = ts;

I thought there was a mention of sending a ping upon an offer. Just noting that sendPing = false here so that won't happen.

@@ +1216,5 @@
> +      }
> +
> +      // Send the ping to Telemetry
> +      let payload = surveyResults;
> +      payload.version = 1;

Note that the assignment isn't doing a copy so you're actually modifying `surveyResult` too.

@@ +1217,5 @@
> +
> +      // Send the ping to Telemetry
> +      let payload = surveyResults;
> +      payload.version = 1;
> +      for (let meta of ["surveyId", "surveyVersion", "testing"]) {

It's unfortunate that we use "Id" instead of "ID" but use "TS" instead of "Ts". I guess surveyId is fine to match flowId.

@@ +1225,5 @@
> +      }
> +
> +      log.debug("Sending payload to Telemetry: " +
> +                "aEventName=" + aEventName + ", " +
> +                "payload=" + JSON.stringify(payload));

Nit: Console.jsm logging supports logging rich objects which you can then inspect in the Browser Console so you don't need to stringify the payload if you give it as its own argument (i.e. `… + "payload=", payload)`)

@@ +1229,5 @@
> +                "payload=" + JSON.stringify(payload));
> +
> +      TelemetryController.submitExternalPing("heartbeat",
> +        payload,
> +        {

I think these three lines can be combined into one:

TelemetryController.submitExternalPing("heartbeat", payload, {

@@ +1271,5 @@
>      }
>      // Create the notification. Prefix its ID to decrease the chances of collisions.
>      let notice = nb.appendNotification(aOptions.message, "heartbeat-" + aOptions.flowId,
>        "chrome://browser/skin/heartbeat-icon.svg", nb.PRIORITY_INFO_HIGH, buttons, function() {
>          // Let the consumer know the notification bar was closed. This also happens

This function is actually a generic event callback, not just for removal so we should be returning early if the argument (aEventType/aEventName?) isn't "removed". https://mxr.mozilla.org/mozilla-central/source/toolkit/content/widgets/notification.xml#204 Would you mind fixing this while you're here?

@@ +1274,5 @@
>        "chrome://browser/skin/heartbeat-icon.svg", nb.PRIORITY_INFO_HIGH, buttons, function() {
>          // Let the consumer know the notification bar was closed. This also happens
>          // after voting.
> +        maybeNotifyHeartbeat("Heartbeat:NotificationClosed");
> +        nb.removeNotification(notice);

I thought this was a callback for when a notification was closed so I don't think you should have to remove again. See https://mxr.mozilla.org/mozilla-central/source/toolkit/content/widgets/notification.xml#204 . I could actually imagine this causing an infinite loop depending on the implementation.

@@ +1399,5 @@
>      // Let the consumer know the notification was shown.
> +    maybeNotifyHeartbeat("Heartbeat:NotificationOffered");
> +
> +    // End the survey if the user quits, closes the window, or
> +    // hasn't responded in X hours

s/in X hours/before expiration./ since the duration is measured in seconds.

::: browser/components/uitour/test/browser_UITour_heartbeat.js
@@ +88,5 @@
> +
> +  // Check for expected fields
> +  for (let field of aExpectedFields) {
> +    ok(field in aPayload, "The payload should have the field '" + field + "'");
> +    if (field.slice(-2) == "TS") {

`field.endsWith("TS")` is clearer IMO. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/endsWith

@@ +531,5 @@
> +
> +  /**
> +   * Test that the survey closes itself after a while and submits Telemetry
> +   */
> +  taskify(function test_telemetry_surveyExpired() {

This will fail `mach eslint` as you are using `yield` with making this a generator function with `*`: `function*`. Please run `mach eslint` with your patch applied.

@@ +565,5 @@
> +    });
> +
> +    gContentAPI.showHeartbeat("How would you rate Firefox?", "Thank you!", flowId, engagementURL);
> +    yield telemetryPromise;
> +    Services.prefs.clearUserPref("browser.uitour.surveyDuration");

Nit: this should probably also appear in a registerCleanupFunction in case the test hangs mid-way before the cleanup.

::: toolkit/components/telemetry/docs/heartbeat-ping.rst
@@ +3,5 @@
> +=================
> +
> +This ping is submitted after a Firefox Heartbeat survey. Even if the user exits
> +the browser, closes the survey window, or ignores the survey, Heartbeat will
> +submit a ping to Telemetry for sending during the same session.

"submit a ping to Telemetry for sending"

Isn't "submit" and "send"[ing] here talking about the same thing? If not, I think this could be clearer.

@@ +54,5 @@
> +   * closedTS: when the Heartbeat notification bar was closed
> +   * expiredTS: indicates that the survey expired after 2 hours of no interaction (threshold regulated by "browser.uitour.surveyDuration" pref)
> +   * windowClosedTS: the user closed the entire Firefox window containing the survey, thus ending the survey. This timestamp will also be reported when the survey is ended by the browser being shut down.
> +* The surveyId/surveyVersion fields identify a specific survey (like a "1040EZ" tax paper form). The flowID is a UUID that uniquely identifies a single user's interaction with the survey. Think of it as a session token.
> +* The self-support page cannot include additional data in this payload. Only the the 3 surveyId/surveyVersion/testing fields are under the self-support page's control.

I don't think this is correct since `flowID` also comes from content IIRC
Attachment #8712470 - Flags: review?(MattN+bmo) → review+
Flags: needinfo?(MattN+bmo)
I addressed all review comments as best as I could but wasn't sure about this comment:

(Quoting Matthew N. [:MattN] from comment #62)
> @@ +1174,5 @@
> > +      let sendPing = false;
> > +      switch (aEventName) {
> > +        case "Heartbeat:NotificationOffered":
> > +          surveyResults.flowId = surveyOptions.flowId;
> > +          surveyResults.offeredTS = ts;
> 
> I thought there was a mention of sending a ping upon an offer. Just noting
> that sendPing = false here so that won't happen.

Gregg/Matt, is it correct that we decided not to send upon an offer. i.e. we must wait until the notification bar or window closes?
Flags: needinfo?(mgrimes)
Flags: needinfo?(glind)
Flags: needinfo?(MattN+bmo)
Comment on attachment 8714962 [details]
MozReview Request: Bug 1193535 - Store Heartbeat Scores in Unified Telemetry. r=MattN

https://reviewboard.mozilla.org/r/33285/#review30015
Attachment #8714962 - Flags: review?(MattN+bmo) → review+
(I would prefer to send on offer, for measuring uptake and response, and deviation from population norms.)
Flags: needinfo?(glind)
We are sending one ping, either when the user submits a response or when it times out. That should allow us to measure response rates and deviations without having to correlate two separate pings.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #67)
> We are sending one ping, either when the user submits a response or when it
> times out. That should allow us to measure response rates and deviations
> without having to correlate two separate pings.

In that case it's ready to land. I'll do a try push now.
Flags: needinfo?(mgrimes)
> >    showHeartbeat(aChromeWindow, aOptions) {
> > +    // Initialize survey state
> > +    let surveyOptions = aOptions;
> 
> It would seem like this is only used to know if the ping was sent so I would
> prefer it was a boolean with a name that just said that more plainly. I
> think you can s/surveyOptions/aOptions/ below other than the null assignment.

Yes, that would work

> @@ +1165,1 @@
> >          return;
> 
> Should we log some something in this case for troubleshooting? Not sure
> which log level? debug? warn? error?

I think Debug. It's possible to get harmless "races", for example if the survey timeout triggers right after the user dismisses the bar.

> I thought there was a mention of sending a ping upon an offer. Just noting
> that sendPing = false here so that won't happen.

As Benjamin pointed out, we'll only send a single ping on survey "end", there will not be a separate ping for offers

> > +      // Send the ping to Telemetry
> > +      let payload = surveyResults;
> > +      payload.version = 1;
> 
> Note that the assignment isn't doing a copy so you're actually modifying
> `surveyResult` too.

That's true

> @@ +565,5 @@
> > +    });
> > +
> > +    gContentAPI.showHeartbeat("How would you rate Firefox?", "Thank you!", flowId, engagementURL);
> > +    yield telemetryPromise;
> > +    Services.prefs.clearUserPref("browser.uitour.surveyDuration");
> 
> Nit: this should probably also appear in a registerCleanupFunction in case
> the test hangs mid-way before the cleanup.

Good point

> ::: toolkit/components/telemetry/docs/heartbeat-ping.rst
> @@ +3,5 @@
> > +=================
> > +
> > +This ping is submitted after a Firefox Heartbeat survey. Even if the user exits
> > +the browser, closes the survey window, or ignores the survey, Heartbeat will
> > +submit a ping to Telemetry for sending during the same session.
> 
> "submit a ping to Telemetry for sending"
> 
> Isn't "submit" and "send"[ing] here talking about the same thing? If not, I
> think this could be clearer.

Not quite, Heartbeat would hand off the ping to Telemetry, but Telemetry would decide when to send it -- immediately in the normal case, but there's also send throttling

> > +* The self-support page cannot include additional data in this payload. Only the the 3 surveyId/surveyVersion/testing fields are under the self-support page's control.
> 
> I don't think this is correct since `flowID` also comes from content IIRC

Correct, Gregg confirmed that Heartbeat 'content' generates the flowId UUID in the browser. Unlike the other 3 meta-fields though, it's not supplied by the server
Comment on attachment 8714962 [details]
MozReview Request: Bug 1193535 - Store Heartbeat Scores in Unified Telemetry. r=MattN

https://reviewboard.mozilla.org/r/33285/#review30191

::: toolkit/components/telemetry/docs/heartbeat-ping.rst:7
(Diff revision 1)
> +submit a ping to Telemetry during the same session.

Nit: technically, Heartbeat will submit the ping to the Telemetry service, Telemetry might not send it to Telemetry servers right away
Attachment #8714962 - Flags: review+
> I think Debug. It's possible to get harmless "races", for example if 
> the survey timeout triggers right after the user dismisses the bar.

On second thought, this ought to be rare, so Warn is fine too
Gregg, when this is in Nightly, can you please verify with a test survey on a local build that appropriate payloads are being sent and that the page callbacks still work fine. I don't want to find out after Nightly that it's not meeting your needs as the cost of fixing this is much smaller now.

Thanks!
Flags: needinfo?(glind)
https://hg.mozilla.org/mozilla-central/rev/7b81b08f1899
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
CONFIRMED OK:

Packet looks like this:

    "payload": {
      "offeredTS": 1455301991539,
      "version": 1,
      "flowId": "13ac1a7a-10a0-4b33-9cf9-702e12ebf46c",
      "score": 3,
      "closedTS": 1455301999947,
      "votedTS": 1455301996621
    },

This is adequate for these tasks:

1.  Calculating score
2.  Calculating response (slight overcount)
3.  Having covariates.
4.  tying surveys to response.


(test plan was:
- Settings:  toolkit.telemetry.server;http://localhost:5000
- https://github.com/vdjeric/gzipServer/blob/master/gzipServer.py at 5000
- https://self-repair.mozilla.org/en-US/repair/?{%22phonehome%22:{%22testing%22:true},%22runner%22:{%22alwaysRun%22:true}}  


This needs https://github.com/mozilla/self-repair-server/pull/104/files to land to be fully useful.
Flags: needinfo?(glind)
(landed server changes, observed AS EXPECTED:

    "payload": {
      "offeredTS": 1455303715686,
      "version": 1,
      "surveyId": "heartbeat-by-user-first-impression",
      "testing": 1,
      "flowId": "907f3a17-47a7-458d-a44d-2f3e4b20d5f2",
      "score": 1,
      "surveyVersion": "40",
      "closedTS": 1455303730867,
      "votedTS": 1455303727532
    },
Everything checks out on our end. Ilana has been pulling Nightly data this morning and is quite happy with the output. I say we are ready to uplift. Thoughts or concerns? This is MOST impactful once it hits release.
(In reply to Matt Grimes [:Matt_G] from comment #78)
> Everything checks out on our end. Ilana has been pulling Nightly data this
> morning and is quite happy with the output. I say we are ready to uplift.
> Thoughts or concerns?

The risk seems fairly contained to showHeartbeat so if you're fine with potential breakage there then I think it's okay. The benefit of uplift doesn't seem very clear to me other than below. Does this provide you data that the existing upload doesn't provide? i.e. what's the rush?

> This is MOST impactful once it hits release.

That's true of every code change but we have the trains for various reasons :P
Very true, but in this case we're trying to establish some baseline data prior to the e10s rollout. Being able help assess if there is impact to satisfaction with each stage of shipping e10s will be extremely helpful.
(In reply to Matt Grimes [:Matt_G] from comment #80)
> Very true, but in this case we're trying to establish some baseline data
> prior to the e10s rollout. Being able help assess if there is impact to
> satisfaction with each stage of shipping e10s will be extremely helpful.

Is this not possible with the current, existing Heartbeat data collection?
Did we have any QA going through the different states and verifying the correct data is sent?
The existing data upload?  We have no existing data upload of environment covariates. Using the very small amount of data of the tour explains < 2% of variation in satisfaction.  

The point of this bug was to get that data to explain variation, by including addons, etc.  

As for rush, request for covariate data has been open for a year.  This particular bug has been open for 6 months.  Now, it is landed, and we want it in the population that actually matters... release.
[User impact if declined]: Less insight into e10s rollout and possible negative impact.
[Test coverage]: I've pulled the nightly data out using spark and there have been no problems. Everything's looking good on our end. However, the data in and of itself is mostly useful here as a testing mechanism, not for actual insights.
[Risks and why]: Low risk.  Affects only UI tour.
Comment on attachment 8669388 [details] [diff] [review]
hearbeatTelemetry.patch

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]:
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: 
[String/UUID change made/needed]:
Attachment #8669388 - Flags: approval-mozilla-beta?
Comment on attachment 8714962 [details]
MozReview Request: Bug 1193535 - Store Heartbeat Scores in Unified Telemetry. r=MattN

Approval Request Comment
See commen 83 +
[String/UUID change made/needed]: None
Attachment #8714962 - Flags: approval-mozilla-beta?
Attachment #8714962 - Flags: approval-mozilla-aurora?
Comment on attachment 8714962 [details]
MozReview Request: Bug 1193535 - Store Heartbeat Scores in Unified Telemetry. r=MattN

We should not uplift this on beta (which is at RC and has already merged to release).

I support uplift to aurora for 46 and make sure that we have confidence in the numbers on that audience.
Attachment #8714962 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Comment on attachment 8714962 [details]
MozReview Request: Bug 1193535 - Store Heartbeat Scores in Unified Telemetry. r=MattN

Approved for uplift to aurora, though it's unclear here from comment 83 what useful data it will give us. Sounds like this may help us get a new system up and running though.
Attachment #8714962 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Duplicate of this bug: 1194238
You need to log in before you can comment on or make changes to this bug.