Closed Bug 1608461 Opened 3 months ago Closed 1 month ago

Determine what kind of telemetry in-product Search Tips should collect

Categories

(Firefox :: Address Bar, task, P1)

task
Points:
3

Tracking

()

RESOLVED FIXED
Firefox 75
Iteration:
75.1 - Feb 10 - Feb 23
Tracking Status
firefox75 --- fixed

People

(Reporter: harry, Assigned: adw)

References

Details

Attachments

(2 files)

Ben or Teon, we're in the process of porting Search Tips from the experiment to the product. What kind of telemetry should we be collecting in this version of Search Tips?

Flags: needinfo?(teon)
Flags: needinfo?(bmiroglio)

Can you clarify which experiments involve Search Tips? I saw it as a renaming of Nudges but have gotten a bit lost in what falls under the Search Tips umbrella. Ideally, we'd have complete experiment data to help inform the in-product telemetry.

Flags: needinfo?(teon)
Flags: needinfo?(htwyford)
Flags: needinfo?(bmiroglio)

Search Tips is a renaming of Nudges: showing prompts to interact with the Urlbar on the newtab page and on the default search engine's homepage.

Flags: needinfo?(htwyford)

So, that experiment is on track to ship next week on 1/14. Would you mind waiting a few days after 1/14 to allow for some analysis on the experiment data? This way we can spot any clear gaps or nice-to-haves assuming the in-product telemetry will be structured similarly.

Flags: needinfo?(htwyford)

Sure, we can wait! We're porting Tips to be in-product now but it won't be preffed on for a wide audience until we see positive experiment results. We'll settle this bug after we get results but before preffing on.

Flags: needinfo?(htwyford)

I was wondering how experimentation fits in with feature landing. to clarify, if we are running an experiment to find out if this feature is desirable for the end user, I thought the point of the add-on approach and urlbar redesign was to bypass the need to land the code in tree until there was a product recommendation to include it.

I think that's the general approach. The problem here is that we'd like to get these features out in 74 but the experiment won't be done until right before 74 hits Release. Afaik the current plan is to develop these features behind a pref in time to make 74, but then preffing them off last-minute if we get negative experiment results.

Points: --- → 3
Priority: -- → P2
Duplicate of this bug: 1606911

Drew, what is the current experiment measuring in addition to standard telemetry?

Flags: needinfo?(adw)

In addition to event telemetry, the experiment has only a single keyed scalar that records the number of times each tip type is shown, the tip types being redirect and onboarding. I would think we'd want to continue recording that in the port.

Flags: needinfo?(adw)

Do we care about the tip being interacted with?

Flags: needinfo?(teon)

Is the only interaction with the search tip just the dismissal of it? would this be separate from the intervention?

Flags: needinfo?(teon)

Clicking the "Okay, Got It" button on a Search Tip focuses the Urlbar. This is nothing interesting for the Search Tip that appears on about:newtab, but for the one that appears on search engine homepages, clicking the button moves the focus from the in-content search box to the Urlbar. Interventions are a separate thing altogether. They appear in response to a query in the Urlbar and have a more actionable button. I broke down the various tip-related projects in bug 1606923 comment 4 if more clarification is needed.

I'll try to come up with a proposal.

Assignee: nobody → mak
Status: NEW → ASSIGNED

thanks for the link to the comment :harry, that really helped clarify the different tips in the project. :mak, sounds good

(In reply to Marco Bonardo [:mak] from comment #10)

Do we care about the tip being interacted with?

We currently record tip picks as part of the FX_URLBAR_SELECTED_RESULT_TYPE enumerated histogram (tips are recorded as value 12), plus the FX_URLBAR_SELECTED_RESULT_INDEX and FX_URLBAR_SELECTED_RESULT_INDEX_BY_TYPE histograms. That's in addition to the event telemetry, which isn't enabled by default of course.

So it seems like the only thing we might need here is a histogram for shown count, if we want that.

Also note that now that bug 1611873 is landed, we treat focusing/selecting the urlbar while a search tip is showing as picking the tip. There shouldn't be any extra work required in this bug to account for that since we go through the input.pickElement path, but I mention it just in case.

One other thing is that FX_URLBAR_SELECTED_RESULT_TYPE doesn't distinguish between picks of the tip's main button vs. its help button though. That's not relevant for search tips since they don't show a help button, but it is relevant for interventions.

That said, FX_URLBAR_SELECTED_RESULT_TYPE doesn't currently capture the kind of tip the user picked. So now that we use tips for both search tips and interventions, we can't currently use that histogram to distinguish between the two. For that matter, we also can't use it to distinguish between onboarding vs. redirect search tips. Do we want to? We could still use the histogram, but we would need to add values for each type we want to capture.

We could use both a categorical histogram or a keyed scalar. The former is good up to 50 entries and it's a bit less flexible (must add labels to the histogram definition, over 50 must create a new histogram). The latter can go up to 100 entries and can track any label, just adding them in code.
There's no strong reason to prefer one or the other in our case; though, if we use the same histogram for both tips and interventions, maybe the keyed scalar flexibility may help us improving measurements along the way and 100 entries may be enough for most needs. Experiments could add more tips/interventions and that means the set of keys becomes unknown, that suggests to use a keyed scalar.

For tips in particular, we want to know if the tip was effective, that means:

  1. was it shown?
    We can surely count the number of times the tip was shown to the user, but what if one of the tips was not shown because the other ones filled shownCount (see 2.)
  2. did the user ignore it, or pick it? did he learn from it?
    The shown count may not help us here, because it is general for all the tips. so we could have one tip having shownCount impressions, and the other one 0, or both at shownCount/2. Drew, was counting all the tips in the same shownCount on purpose? Maybe we should have a separate shownCount for each tip, then we could easily have a tip_name_count and if it's < MAX it means the tip was picked, otherwise it was ignored.

Note however that picking or ignoring the tip doesn't necessarily show a positive or negative effect. A user may be ignoring the tip but still learn to search from the urlbar from it, as well as a user may pick the tip because he finds it annoying and wants it to go away.

The counting approach is a lot more useful for interventions; there we can effectively count the impressions, the times it was picked and the times the help url was visited, that would be 3 scalars per intervention. By comparing impressions and picks one may get an idea of efficiency.

Setting a ni? on Drew for the question in 2, but this is open for everyone's thoughts.

Flags: needinfo?(adw)

To sum up I'm suggesting that we separate shownCount per tip. Then, for tips we report a tip_name_count scalar. For interventions we report intervention_name_count, intervention_name_button_picked, intervention_name_help_picked.

temporarily unassigning while we discuss, and I'm working on a regression atm, so putting it back on the backlog until actionable.

Assignee: mak → nobody
Status: ASSIGNED → NEW

Some telemetry doc says that "Keyed scalars should only be used if the set of keys are not known beforehand." I'm not sure how much stock we should put in that. https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/collection/scalars.html

Re: your questions, I think we really need to hear from Ben and Teon what data they'd like to record. If they want shown counts per tip type, then we should add shown counts per tip type, but if they don't, we shouldn't. Same for picks. My default position is that we should record the same telemetry for tip results that we do for other results.

Flags: needinfo?(adw)

(In reply to Drew Willcoxon :adw from comment #20)

Some telemetry doc says that "Keyed scalars should only be used if the set of keys are not known beforehand." I'm not sure how much stock we should put in that. https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/collection/scalars.html

Yes, that's why I'm saying it makes sense if experiments can add new tips/interventions (I don't know atm, but it's a possibility).

Re: your questions, I think we really need to hear from Ben and Teon what data they'd like to record. If they want shown counts per tip type, then we should add shown counts per tip type, but if they don't, we shouldn't. Same for picks. My default position is that we should record the same telemetry for tip results that we do for other results.

The problem is that I'm not sure we can gather anything useful from Search Tips in the current situation with a unified count, and I'm not even sure why we have a unified count, the 2 search tips are different and we already risk to only show one, while it may be useful to show both.
Suppose we add a third one and we've consumed all the counts, the new one will never be shown, and if we reset the counter the user will see again the old ones, that maybe they picked already. It sounds like a bogus behavior to me.

Flags: needinfo?(adw)

Teon, do we care about knowing the shown count per each Search Tip type (currently "onboarding" or "redirect").
Do we care about knowing how many times an Intervention button was picked and how many times the help link was picked instead?

Flags: needinfo?(teon)

I would go with keyed scalars if there were a choice between those and the histograms. Counts in this case are sufficient. The labeling of them above would work well for us and keyed scalars would allow for easier extraction over histograms.

I would say I care about the number of times an intervention button was clicked. I think the help count could be useful to as it might signal that the intervention might be useful but the user needs some clarification. taken jointly, I would say that we could see it as providing some user value as it either does it or gives them more info or context for it.

Flags: needinfo?(teon)
Blocks: 1606913

picking up again, we are going to split the counters per search tip, so we can use counting scalars more easily.
The spec is 1 tip per session, 4 impressions per tip.

Assignee: nobody → mak
Status: NEW → ASSIGNED
Flags: needinfo?(adw)

(In reply to Teon Brooks [:teon] from comment #23)

I would go with keyed scalars if there were a choice between those and the histograms. Counts in this case are sufficient. The labeling of them above would work well for us and keyed scalars would allow for easier extraction over histograms.

Teon, I just want to make sure -- Marco asked about shown counts specifically. Do we care about shown counts?

Flags: needinfo?(teon)
Depends on: 1613855

yes, I would include the counts as well.

Flags: needinfo?(teon)

I'm out sick, if anyone wants to pick this up, feel free to, my proposal is in comment 18, it's a keyed scalar specific to tips, counting impressions and clicks.

Assignee: mak → htwyford
Iteration: --- → 75.1 - Feb 10 - Feb 23

Comment on attachment 9125476 [details]
Bug 1608461 - Add telemetry for Search Tips and Interventions impressions

  1. What questions will you answer with this data?
  • Are our new Intervention and Search Tips features being seen by users? Are they being interacted with?
  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:
  • Determine if these new features are serving their purpose and are worth continued investment.
  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?
  • Collecting the total number of Tips seen. This was not sufficient since we had no way of distinguishing Search Tips and Interventions in this data nor could we measure interaction with these features.
  1. Can current instrumentation answer these questions?
  • No. See question 3.
  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

Measurements:
browser.urlbar.tip_name_count - Category 1 - Measures how many and what types of Search Tips are seen by a user.

browser.urlbar.intervention_name_count - Category 1 - Measures how many and what types of Interventions are seen by a user.

browser.urlbar.intervention_name_button_picked - Category 1 - Measures how many and what types of Interventions are interacted with.

browser.urlbar.intervention_name_help_picked - Category 1 - Measures on what types of interventions the Help button is clicked.

  1. How long will this data be collected? Choose one of the following:
  • I’d like for the Search Data Science team (Teon Brooks, Ben Miroglio) to permanently monitor this data.
  1. What populations will you measure?
  • All countries, release channels, and locales that Interventions and Search Tips are enabled in. For Interventions, this is currently early Beta, all countries, but only EN locales. For Search Tips this is currently early Beta, all countries, and all locales. It is expected that these features will eventually be available in all locales in all release channels.
  1. If this data collection is default on, what is the opt-out mechanism for users?
  • Standard telemetry opt-out.

These are probably best answered by the Search Data Science team:

  1. Please provide a general description of how you will analyze this data.

  2. Where do you intend to share the results of your analysis?

  3. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection?

Attachment #9125476 - Flags: data-review?(teon)

Comment on attachment 9125476 [details]
Bug 1608461 - Add telemetry for Search Tips and Interventions impressions

reassigning data-review to :tdsmith since I've been involved with the probe creation.

Attachment #9125476 - Flags: data-review?(teon) → data-review?(tdsmith)

Comment on attachment 9125476 [details]
Bug 1608461 - Add telemetry for Search Tips and Interventions impressions

What values are possible for the keys of the scalars? Is there a list somewhere? If the scalar keys are a known discrete set of values, any unique value of the scalar should trigger a follow-on data-review and it might be helpful to leave a comment to that effect. If they aren't known in advance, more detail about where they come from would be helpful.

Also, we need answers to the last few questions of the data-review form (Teon?).

Clearing the flag for now; please data-review? me again.

Attachment #9125476 - Flags: data-review?(tdsmith) → data-review-

The scalar urlbar.tips.tip_name_count will take on one of these values. The scalars urlbar.tips.intervention_name_count, urlbar.tips.intervention_name_button_picked, and urlbar.tips.intervention_name_help_picked will take on one of these values. They are the names of the Search Tips and Interventions shown in the Urlbar.

ni?teon, could you please answer the last three questions on the data collection form before I re-request data review?

Flags: needinfo?(teon)

These are the responses to the rest of the data collection request form above in https://bugzilla.mozilla.org/show_bug.cgi?id=1608461#c29

  1. Please provide a general description of how you will analyze this data.
  • We would like to better understand how users will interact to different prompts within the address bar. we will look at the counts of number of interventions presented, the counts of intervention dismissals, and the count of click on help. We want to understand if we are providing user value with the new information we are presenting.
  1. Where do you intend to share the results of your analysis?
  • We will share the results internally within Experimenter and the data science reports repository
  1. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection?
  • No third-party tools will be used in this data collection.
Flags: needinfo?(teon)

Comment on attachment 9125476 [details]
Bug 1608461 - Add telemetry for Search Tips and Interventions impressions

Please see comment 32 for possible values and comment 33 for the answers to the last few questions on the data-review form.

Note that the scalar names have changed from comment 32, given review feedback on the patch. In the same order as comment 32, they are now named: urlbar.tips.search_tip_shown_count, urlbar.tips.intervention_shown_count, urlbar.tips.intervention_button_picked, and urlbar.tips.intervention_help_picked.

Attachment #9125476 - Flags: data-review- → data-review?(tdsmith)

What format does Data prefer, to store and analyze the information (or is it the same for you)?

  1. one scalar with keys like ${uniquename}-shown_count, ${uniquename}-help_picked, ${uniquename}-button_picked
  2. one scalar for tips, one scalar for interventions with keys like ${name}-shown_count, ${name}-help_picked, ${name}-button_picked
  3. one scalar for tips shown count, one scalar for interventions shown count, one scalar for interventions button picked, one scalar for interventions help picked (current approach, we won't count button picked on tips though)
Flags: needinfo?(teon)

And this is a general question too, do we have specific advantages in having more or less scalars, in cases like this?

option 1 works for me. in this case, it would be to understand and attribute the given prompts to changes in search volumes, and to see if they're noticed and/or cause a nuisance (coupled with survey).

Flags: needinfo?(teon)

Comment on attachment 9125476 [details]
Bug 1608461 - Add telemetry for Search Tips and Interventions impressions

Thanks Teon. Since this is in flux, I'm clearing the data-review? until we have a final patch.

Attachment #9125476 - Flags: data-review?(tdsmith)
Attachment #9125476 - Attachment description: Bug 1608461 - Add telemetry for Search Tips and Interventions impressions. r?adw → Bug 1608461 - Add telemetry for Search Tips and Interventions impressions
Blocks: 1616284
Attached file data-request.md

This is mostly copied from comment 29, which you previously looked at. It includes the up-to-date keyed scalar name and key names.

Attachment #9127765 - Flags: data-review?(tdsmith)
Blocks: 1616631
Assignee: htwyford → adw
Priority: P2 → P1
Comment on attachment 9127765 [details]
data-request.md

1) Is there or will there be **documentation** that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, in Scalars.yaml. The keys of the scalar are derived from objects in the code; those objects should have comments documenting that the keys are reported in telemetry and that adding new keys to the objects is an expanded data collection that should trigger data-review.

2) Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, the Firefox telemetry opt-out.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Teon.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, interaction data.

5) Is the data collection request for default-on or default-off?

Default-on.

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data?

No, permanent collection.

9) Does the data collection use a third-party collection tool?

No.

--

data-review+, pending documentation that adding new keys to certain objects is a new collection
Attachment #9127765 - Flags: data-review?(tdsmith) → data-review+

Thanks Tim. I'll add some comments to the code as you mention. Note too that we have a telemetry.rst doc that we are updating for this new scalar plus all its keys. That's being done in bug 1616284.

Blocks: 1617318
Pushed by dwillcoxon@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a6e13b672e0f
Add telemetry for Search Tips and Interventions impressions r=mak
Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 75
You need to log in before you can comment on or make changes to this bug.