Closed Bug 1308030 Opened 8 years ago Closed 8 years ago

Add telemetry probes for Narrate

Categories

(Toolkit :: Reader Mode, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: eeejay, Assigned: eeejay)

References

Details

Attachments

(1 file)

We should figure out how much this feature is used and how much coverage we have in terms of synthesis language.
Comment on attachment 8801407 [details] Bug 1308030 - Add telemetry probes to narrate. , data-review=bsmedberg The docs say give a problem statement so here goes: Speech synthesis has limited coverage when it comes to all the languages our users consume. For this reason, Narrate will not function well (or at all) with certain languages. The first probe will give us insight into what languages Narrate most often encounters (and whether the OS has a voice for that language) so we know what priorities we should set when it comes to expanding our synthesis coverage. The second probe measures speaking time of narrate. This will give us a better understanding of the usage of this feature. Instead of a simple flag to toggle on play, I chose to measure the duration. This will tell us if users actually use the feature to listen to content or press "stop" right after they first play.
Attachment #8801407 - Flags: feedback?(benjamin)
Comment on attachment 8801407 [details] Bug 1308030 - Add telemetry probes to narrate. , data-review=bsmedberg https://reviewboard.mozilla.org/r/86150/#review85164 r=me but I have a number of comments. ::: toolkit/components/narrate/NarrateControls.jsm:158 (Diff revision 1) > }, > > /** > * Returns true if synth voices are available. > */ > - _setupVoices: function() { > + _setupVoices: function(initial = false) { Instead of the parameter, I would probably just add something like: ``` let initial = !!this._voicesInitialized; this._voicesInitialized = true; ``` and then check `initial` inside the promise's `then` callback. ::: toolkit/components/narrate/NarrateControls.jsm:180 (Diff revision 1) > + if (initial) { > + histogram.add(language, 0); > + } > + > + if (options.length && !this.voiceSelect.options.length) { > + histogram.add(language, 1); > + } Shouldn't the second be an `else if` ? ::: toolkit/components/narrate/NarrateControls.jsm:267 (Diff revision 1) > + if (speaking) { > + TelemetryStopwatch.start("NARRATE_CONTENT_SPEAKTIME_MS", this); > + } else { > + TelemetryStopwatch.finish("NARRATE_CONTENT_SPEAKTIME_MS", this); > + } Do we need to explicitly call `finish()` if/when the page is unloaded? ::: toolkit/components/telemetry/Histograms.json:10338 (Diff revision 1) > + "releaseChannelCollection": "opt-out", > + "expires_in_version": "never", I'm assuming you're sorting out with Benjamin why this is never expiring as well as opt-out. ::: toolkit/components/telemetry/Histograms.json:10342 (Diff revision 1) > + "bug_numbers": [1308030], > + "releaseChannelCollection": "opt-out", > + "expires_in_version": "never", > + "kind": "enumerated", > + "keyed": true, > + "n_values": 8, This seems to just be either 0 or 1. Is it really worth creating 8 buckets?
Attachment #8801407 - Flags: review?(gijskruitbosch+bugs) → review+
Comment on attachment 8801407 [details] Bug 1308030 - Add telemetry probes to narrate. , data-review=bsmedberg https://reviewboard.mozilla.org/r/86150/#review85202 ::: toolkit/components/telemetry/Histograms.json:10339 (Diff revision 1) > + }, > + "NARRATE_CONTENT_BY_LANGUAGE": { > + "alert_emails": ["eisaacson@mozilla.com"], > + "bug_numbers": [1308030], > + "releaseChannelCollection": "opt-out", > + "expires_in_version": "never", opt-out/never data collection has a pretty high bar. You need to describe how this provides long-term value, who's responsible for providing that value, and how (what dashboards/alerting system are you going to develop). Since this is product metrics, which product manager is working with you on this? ::: toolkit/components/telemetry/Histograms.json:10341 (Diff revision 1) > + "alert_emails": ["eisaacson@mozilla.com"], > + "bug_numbers": [1308030], > + "releaseChannelCollection": "opt-out", > + "expires_in_version": "never", > + "kind": "enumerated", > + "keyed": true, For keyed histograms, the description needs to describe the key format very precisely. I assume that it is some sort of language marker: but is this a predetermined/known set of values, or is it values provided by web pages? If it's website-provided values, we typically do not want to record those directly in our telemetry payload because it's potentially quite identifying. We may need to reduce risk by using a whitelist of known language codes or other techniques. ::: toolkit/components/telemetry/Histograms.json:10343 (Diff revision 1) > + "releaseChannelCollection": "opt-out", > + "expires_in_version": "never", > + "kind": "enumerated", > + "keyed": true, > + "n_values": 8, > + "description": "Number of Narrate initializations broken up by content language (0 = initialized, 1 = language matched)" I'm having trouble parsing this description, but that's partly because I don't know how language matching works for the narrate feature. Does the code record a "0" every time the user asks to narrate some content? Does it then also record a "1" if we found a matching language and actually started narrating? Or does it record only one *or* the other value? Please attempt to write down more details about what triggers probe collection. ::: toolkit/components/telemetry/Histograms.json:10353 (Diff revision 1) > + "releaseChannelCollection": "opt-out", > + "expires_in_version": "never", > + "kind": "exponential", > + "high": 300000, > + "n_buckets": 50, > + "description": "Time in MS that content is narrated" This isn't a scalar/counter, so you're recording each narration "session" separately. Is that necessary/useful for your product analysis? Or could you record this in a single scalar that counts up over time? Exponential seems like an odd choice for this: do you need so much granularity at the low end of this metric?
Comment on attachment 8801407 [details] Bug 1308030 - Add telemetry probes to narrate. , data-review=bsmedberg It's hard to say "r- for now" with reviewboard: I'd like to see a new data request with some questions answered and some of the collection clarified.
Attachment #8801407 - Flags: feedback?(benjamin) → feedback-
Thanks for the feedback! (In reply to Benjamin Smedberg [:bsmedberg] from comment #4) > Comment on attachment 8801407 [details] > Bug 1308030 - Add telemetry probes to narrate. > > https://reviewboard.mozilla.org/r/86150/#review85202 > > ::: toolkit/components/telemetry/Histograms.json:10339 > (Diff revision 1) > > + }, > > + "NARRATE_CONTENT_BY_LANGUAGE": { > > + "alert_emails": ["eisaacson@mozilla.com"], > > + "bug_numbers": [1308030], > > + "releaseChannelCollection": "opt-out", > > + "expires_in_version": "never", > > opt-out/never data collection has a pretty high bar. You need to describe > how this provides long-term value, who's responsible for providing that > value, and how (what dashboards/alerting system are you going to develop). > I'm concerned that having it be opt-in will bias the data towards our more heavily used locales with early adopters and not give us a good picture of the true language distribution of our users. The expiry version is definitely negotiable. Should probably be on for 1 or 2 versions, and then we can revisit. Is that how it works? Can we change that field later if we find the data useful to have more permanently? > Since this is product metrics, which product manager is working with you on > this? Peter Dolanjski. ni?ing him so he sees this and offers more product-focused input. As for dashboard, I see us reviewing the following queries. - How often is reader mode invoked? (broken by language) - of those reader mode instances, how often is narrate offered? (broken up by language) - of those times reader mode is offered, how often is it actually used? - what is the average narrate time? (if we see that the average is just a few seconds it may inform us that users just click once to play with it but don't actually listen extensively). Peter, what do you think of those queries? > > ::: toolkit/components/telemetry/Histograms.json:10341 > (Diff revision 1) > > + "alert_emails": ["eisaacson@mozilla.com"], > > + "bug_numbers": [1308030], > > + "releaseChannelCollection": "opt-out", > > + "expires_in_version": "never", > > + "kind": "enumerated", > > + "keyed": true, > > For keyed histograms, the description needs to describe the key format very > precisely. I assume that it is some sort of language marker: but is this a > predetermined/known set of values, or is it values provided by web pages? It is an ISO 639-2 language code that is the output of our own LanguageDetector module (that in-turn uses an emscriptened library, cld2, that does Bayesian math to figure out what language the given content is in). https://dxr.mozilla.org/mozilla-central/source/browser/components/translation/LanguageDetector.jsm > If it's website-provided values, we typically do not want to record those > directly in our telemetry payload because it's potentially quite > identifying. That shouldn't be a problem. This string is not taken from content directly. > We may need to reduce risk by using a whitelist of known > language codes or other techniques. cld2 has a limited corpus of language codes (~200) that it will output. > > ::: toolkit/components/telemetry/Histograms.json:10343 > (Diff revision 1) > > + "releaseChannelCollection": "opt-out", > > + "expires_in_version": "never", > > + "kind": "enumerated", > > + "keyed": true, > > + "n_values": 8, > > + "description": "Number of Narrate initializations broken up by content language (0 = initialized, 1 = language matched)" > > I'm having trouble parsing this description, but that's partly because I > don't know how language matching works for the narrate feature. Does the > code record a "0" every time the user asks to narrate some content? It records a 0 when the narrate *button* is initialized (essentially, whenever reader mode is invoked). So even if the button remains hidden because of a lack of language support we still recorded a 0. > Does it then also record a "1" if we found a matching language and actually started > narrating? It records 1 when the platform finds an appropriate voice for the contents language, and shows the narrate button. This is regardless of whether the user presses the button. > Or does it record only one *or* the other value? It records both. The number of narrate button initializations, and the number of narrate button unhides. This is confusing, I know. The reason I chose to do it this way is because of a limitation in the speech synthesis API. Voices can be loaded asynchronously. When they are loaded they emit a "voiceschanged" DOM event so that we know to update the list of available voices. Voices can be loaded or unloaded at any given time with no query or warning. Basically, this means I can know that we found a language match with a voice, but I can't know if we failed to find a match. So I can't do "if (success) record 1; else record 0". I got around that by recording an initialization unconditionally, and recording a language match success (a.k.a. a button unhide). The failure is implicit, if there was no match then the absence of a "1" would indicate that. Meaning, the Narrate button was never shown to the user. For example, given the keyed histogram "it": Count of 0s: 1000 Count of 1s: 750 We know that only 75% of the users who are accessing Italian content in reader mode have an Italian synthesis voice installed on their system, and narrate should work well. The other 25% have the narrate feature hidden because of lack of support (we may want to uplift this to release branches where Narrate isn't hidden yet, but we will at least get more insight as to the support coverage among our users). > Please attempt > to write down more details about what triggers probe collection. I assume you mean in the "description" field, not here.. But I'll say here that every user who enters reader mode will trigger the probe. Regardless of whether they use narrate or not. > > ::: toolkit/components/telemetry/Histograms.json:10353 > (Diff revision 1) > > + "releaseChannelCollection": "opt-out", > > + "expires_in_version": "never", > > + "kind": "exponential", > > + "high": 300000, > > + "n_buckets": 50, > > + "description": "Time in MS that content is narrated" > > This isn't a scalar/counter, so you're recording each narration "session" > separately. Is that necessary/useful for your product analysis? Or could you > record this in a single scalar that counts up over time? Not sure I understand the proper use here. My intention was to record two things: - Number of times someone presses "play". - Rough duration of played content. Won't we know the former by the number of samples and the latter by their values? > > Exponential seems like an odd choice for this: do you need so much > granularity at the low end of this metric? I chose exponential because I don't need granularity at the high end of the metric. On second thought maybe linear would suffice..
Flags: needinfo?(pdolanjski)
(In reply to Eitan Isaacson [:eeejay] from comment #6) > - of those times reader mode is offered, how often is it actually used? Oops I meant "of those times narrate is offered".
(In reply to Eitan Isaacson [:eeejay] from comment #6) > As for dashboard, I see us reviewing the following queries. > - How often is reader mode invoked? (broken by language) > - of those reader mode instances, how often is narrate offered? (broken up > by language) > - of those times reader mode is offered, how often is it actually used? > - what is the average narrate time? (if we see that the average is just a > few seconds it may inform us that users just click once to play with it but > don't actually listen extensively). > > Peter, what do you think of those queries? This is exactly what I'd be looking for. It helps make further investment decisions and decisions on whether or not we need to raise the visibility of the feature. I think it's also implied by the use of the histogram, but just to be explicit, we likely need the median and other quantiles as well since I expect the average to be heavily skewed. Ideally we could correlate retention/usage figures to users who use the narration regularly.
Flags: needinfo?(pdolanjski)
Comment on attachment 8801407 [details] Bug 1308030 - Add telemetry probes to narrate. , data-review=bsmedberg https://reviewboard.mozilla.org/r/86150/#review85202 > opt-out/never data collection has a pretty high bar. You need to describe how this provides long-term value, who's responsible for providing that value, and how (what dashboards/alerting system are you going to develop). > > Since this is product metrics, which product manager is working with you on this? Added an expire version of 53. > I'm having trouble parsing this description, but that's partly because I don't know how language matching works for the narrate feature. Does the code record a "0" every time the user asks to narrate some content? Does it then also record a "1" if we found a matching language and actually started narrating? Or does it record only one *or* the other value? Please attempt to write down more details about what triggers probe collection. Tried to clarify the language in the description. Record *all* Narrate initialization attempts (0). Record successful initialization attempts (1). > This isn't a scalar/counter, so you're recording each narration "session" separately. Is that necessary/useful for your product analysis? Or could you record this in a single scalar that counts up over time? > > Exponential seems like an odd choice for this: do you need so much granularity at the low end of this metric? changed to linear with 10 buckets. So we get 30 second increments up to 5 minutes.
Comment on attachment 8801407 [details] Bug 1308030 - Add telemetry probes to narrate. , data-review=bsmedberg https://reviewboard.mozilla.org/r/86150/#review85164 > Shouldn't the second be an `else if` ? No. We record 0 on initialization, AND 1 if we have an initial voice match. > Do we need to explicitly call `finish()` if/when the page is unloaded? Good point!! > I'm assuming you're sorting out with Benjamin why this is never expiring as well as opt-out. I'll add an expiration version.. > This seems to just be either 0 or 1. Is it really worth creating 8 buckets? Wanted to have more options for the future, i'll dial it down to 4..
Attachment #8801407 - Flags: review?(gijskruitbosch+bugs)
Attachment #8801407 - Flags: review+
Attachment #8801407 - Flags: feedback?(benjamin)
Re-asking for review from Gijs since setupVoices changed a bit in this patch. Just tried to make it cleaner.
Comment on attachment 8801407 [details] Bug 1308030 - Add telemetry probes to narrate. , data-review=bsmedberg Changes LGTM. Thanks!
Attachment #8801407 - Flags: review?(gijskruitbosch+bugs) → review+
Comment on attachment 8801407 [details] Bug 1308030 - Add telemetry probes to narrate. , data-review=bsmedberg https://reviewboard.mozilla.org/r/86150/#review87370 data-review=me ::: toolkit/components/telemetry/Histograms.json:10339 (Diff revision 2) > + }, > + "NARRATE_CONTENT_BY_LANGUAGE": { > + "alert_emails": ["eisaacson@mozilla.com"], > + "bug_numbers": [1308030], > + "releaseChannelCollection": "opt-out", > + "expires_in_version": "53", Feel free to make this 56: exploratory measurements for 6 months of release are fine. ::: toolkit/components/telemetry/Histograms.json:10352 (Diff revision 2) > + "bug_numbers": [1308030], > + "releaseChannelCollection": "opt-out", > + "expires_in_version": "53", > + "kind": "linear", > + "high": 300000, > + "n_buckets": 10, feel free to use more buckets (up to 100) if that helps your analysis. We do record "sum" separately, so you can get an accurate total time no matter the bucket size.
Attachment #8801407 - Flags: review+
Attachment #8801407 - Flags: feedback?(benjamin)
Pushed by eisaacson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8804e6bf128c Add telemetry probes to narrate. r=bsmedberg,Gijs, data-review=bsmedberg
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
I'm not sure NARRATE_CONTENT_SPEAKTIME_MS worked properly. I can't find results.
(In reply to David Bolter [:davidb] from comment #18) > I'm not sure NARRATE_CONTENT_SPEAKTIME_MS worked properly. I can't find > results. There aren't a lot of results, so you need to turn off sanitization: https://mzl.la/2gOd0Zf Not sure if the quantity of results is worryingly low - the other probe reports more results, so I guess it's surprising the counts don't really match up. Eitan doesn't seem to be accepting needinfo requests, and I can't spot anything wrong from a quick look at the code.
NARRATE_CONTENT_BY_LANGUAGE is recorded whenever a user enters reader mode, so the number is expected to be disproportionately higher than NARRATE_CONTENT_SPEAKTIME_MS which is only recorded when narrate is actually used. Don't know if the number of SPEAKTIME_MS records is objectively low or not. It's not in the release channel yet. The good news is that a good proportion of narrate users (~30%) listen for more than 5 minutes (with a reasonable spread in the 1-5 range and a 20% dropout under 10 seconds). Something does look broken with NARRATE_CONTENT_BY_LANGUAGE, the 1 values should be lesser than the 0s. I wrote about how that should work in comment #6. I think I misunderstood how histograms work. Hopefully that data is not entirely junk and I can get something out of it. Otherwise a fix is in order.
Blocks: 1324868
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: