Closed Bug 1359306 Opened 7 years ago Closed 7 years ago

Implant telemetry probe for search

Categories

(Firefox :: Settings UI, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
Firefox 56
Tracking Status
firefox56 --- fixed

People

(Reporter: chsiang, Assigned: rickychien)

References

Details

(Whiteboard: [photon-preference])

User Story

Updated on 7/20/2017:
https://docs.google.com/document/d/1KaSX2SOzE0Ewi5I9vqn7C9Aq2b4-DlkYOr_XRizbDIQ/edit


--Below criteria are outdated---
Acceptance Criteria
- We know how many searches were performed by users in a specific time-frame
- We know the top queries

Attachments

(2 files)

      No description provided.
Priority: P1 → P3
Target Milestone: Firefox 56 → ---
Blocks: 1330552
No longer blocks: 1357285
Attached video search preferences.mov
Hi Elvin,
Kindly help us review this bug.

Cindy
Flags: needinfo?(ellee)
Hi Cindy, that's for the NI. Has this already undergone normal data review with the appropriate data steward? If not, you should use this bug for that purpose.

I can open a separate legal review bug.
Flags: needinfo?(ellee) → needinfo?(chsiang)
Hi Elvin,
No it hasn't. Could you kindly open a legal review bug and flag the right person for the review and I'll follow up?

Thanks so much.
Cindy
Flags: needinfo?(chsiang) → needinfo?(ellee)
Flags: needinfo?(ellee)
Cindy - the Firefox Data Stewards are Ben Smedberg, Chenxia Liu, Francois Marier, and Rebecca Weiss. I'm not sure how they go about determining which data steward will review a particular probe, but you can find out more here. https://wiki.mozilla.org/Firefox/Data_Collection
Hi Cindy,

The link Elvin posted ( https://wiki.mozilla.org/Firefox/Data_Collection ) explains what data review is. Basically, we need to make sure that the information being collected will be used for some specific purpose, and will also not be sensitive data.

> - We know how many searches were performed by users in a specific time-frame
> - We know the top queries

For each of these, please write out what data will be collected (for example, add a count to a histogram every time somebody clicks search), what it will be used for, who will be checking this data that is collected.

When you actually add the code, request review from one of the data peers on the code that is being added. If you're adding them as histograms to Histograms.json (here are some docs for histogram usage http://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/histograms.html?highlight=histograms ), when you make a patch to add the histogram, please flag one of the data stewards for review.

For the second probe ("top queries") I'd recommend being thoughtful about how you're collecting that data. I probably wouldn't suggest just capturing the search query, because people could be typing anything in there (like personally identifiable information, or any kind of data there), and you'd also have a bunch of user-entered strings that wouldn't be useful. Perhaps you could look at what items are the top hits (e.g. end up highlighted)? This wouldn't help with the question of what the top things people search for that aren't in the settings, though.
Assignee: nobody → rchien
Status: NEW → ASSIGNED
Flags: qe-verify-
Priority: P3 → P1
Target Milestone: --- → Firefox 56
Hi Chenxia,
Thank you for your recommendation. That was really helpful. I created a document to address your questions. Kindly take a look and let me know your feedback.

Thanks!
Cindy
User Story: (updated)
Flags: needinfo?(liuche)
the link to the document was updated in the user story section.
Summary: [User Story] [Preferences] Implant telemetry probe for search → Implant telemetry probe for search
Patch is updated based on latest `User Story` and it will be landed after data review approved.
Comment on attachment 8888244 [details]
Bug 1359306 - Implant telemetry probe for search

https://reviewboard.mozilla.org/r/159196/#review165394

I would push back on collecting text a user types every second into the preferences box. Can we do something else like "send the text that is highlighted on search"? I think there are privacy concerns in that users would probably expect the things they type to be local-only. I'd prefer for this probe to be implemented differently - is there any other way you can get the data you need other than collecting user-input strings every second into a local text box?

As an aside, this is not the proper way to add a scalar probe. It needs to be documented in the scalars.yaml file, as described in the Scalars documentation ( http://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/scalars.html )

"Scalar probes are required to be registered, both for validation and transparency reasons, in the Scalars.yaml definition file."

Thanks for the patch, Ricky - I will discuss with Cindy in the bug regarding what is being collected, and make some suggestions for alternatives.
Attachment #8888244 - Flags: review-
Hi Cindy, thanks for the documentation. I was looking at the "Purpose of Collection" section of the document, and I had a few concerns:

1. "One way to see the utilization is to compare the histogram with daily active user or monthly active user. That is why we should record the time."

I don't actually see time being recorded (just the query, every second) - my question is, are you looking for how often people use the preferences search box? I think that you could do so by simply recording the number of times users search, which would be a much better and easier to see histogram.

2. "The team can do experiments to see if minor tweak will increase the usage."

I think "usage" could be measured the same way, either by searches completed, or the number of times people interact with the search box.

3. "The team can have a base for a better recommendation list based on the most seeked term."

I'm not sure how collecting what people type into the box into a histogram will help with that - that will certainly be a lot of data, and would create a histogram with a long tail of not-very-useful keys. Perhaps you could collect instead how many search result in matches, versus searches total?

Also, in the "Who Can Use the Data" section, I am actually looking for a specific person who will be in charge of looking at the data and making decisions based on that, and also will take care of filing a bug to remove a probe if it stops being something we want to keep collecting. "Anyone" is too broad and doesn't give a single responsible party (and the problem would be that this ends up being "nobody is looking at the data or responsible for it").

Please let me know if you have more questions! It's good to collect data so that we can improve our products, so I want to help you collect useful data in a useful way.
Flags: needinfo?(liuche) → needinfo?(chsiang)
Attachment #8888244 - Flags: review?(benjamin)
> Can we do something else like "send the text that is highlighted on search"?

Chenxia, this patch is under condition of "send the text that is highlighted on search" as you mentioned, and even only send the text when query string length is longer than 2. 

You can also simply apply the patch and then replace `Services.telemetry.keyedScalarAdd` with a console.log to see the output log.

I've updated my patch with Scalars.yaml registered.

Thanks!
Comment on attachment 8888244 [details]
Bug 1359306 - Implant telemetry probe for search

https://reviewboard.mozilla.org/r/159196/#review165782

Clearing review until the data stewards have expressed that it's okay that we gather these things this way.
Attachment #8888244 - Flags: review?(mconley)
Hi Ricky,

I don't see the updated patch, but I'll review the documentation when it lands.

As a data peer, I don't technically review the code because I can be flagged for a lot of components that I'm not actively involved in. If there's no documentation, it's hard for me to give data review.

Also, in the bigger picture, the point of data review is to have clear documentation around our telemetry collection so that it's easy to understand what data we're collecting, without needing to read the code. It's the data peer's job to make sure the description of data collection is clear and adheres to Mozilla's data collection policies, and I rely on the code reviewer to make sure that what's actually being collected in code is consistent with the documentation.

I understand that it's not always clear what data review entails, but we are trying to make that clearer!
Flags: needinfo?(rchien)
Hi Chenxia,
Thanks for your feedback. Please see my response inline. 

(In reply to Chenxia Liu [:liuche] from comment #16)
> Hi Cindy, thanks for the documentation. I was looking at the "Purpose of
> Collection" section of the document, and I had a few concerns:
> 
> 1. "One way to see the utilization is to compare the histogram with daily
> active user or monthly active user. That is why we should record the time."
> 
> I don't actually see time being recorded (just the query, every second) - my
> question is, are you looking for how often people use the preferences search
> box? I think that you could do so by simply recording the number of times
> users search, which would be a much better and easier to see histogram.

cindy: I have got rid off the subtext of having time being recorded.

> 
> 2. "The team can do experiments to see if minor tweak will increase the
> usage."
> 
> I think "usage" could be measured the same way, either by searches
> completed, or the number of times people interact with the search box.

cindy: Yes this is how it's implemented. It's measured by searches completed and it's counted when a match is highlighted for more than 1 second.

> 
> 3. "The team can have a base for a better recommendation list based on the
> most seeked term."
> 
> I'm not sure how collecting what people type into the box into a histogram
> will help with that - that will certainly be a lot of data, and would create
> a histogram with a long tail of not-very-useful keys. Perhaps you could
> collect instead how many search result in matches, versus searches total?

cindy: It is not collecting what people type into the box but the terms highlighted as you recommended before.
You can refer to the "data collected" section in the document. It stated: When a user stays on a highlighted result for more than 1 second, we will add a count to a histogram together with the highlighted term. 

> 
> Also, in the "Who Can Use the Data" section, I am actually looking for a
> specific person who will be in charge of looking at the data and making
> decisions based on that, and also will take care of filing a bug to remove a
> probe if it stops being something we want to keep collecting. "Anyone" is
> too broad and doesn't give a single responsible party (and the problem would
> be that this ends up being "nobody is looking at the data or responsible for
> it").

cindy: I put my name down.

> 
> Please let me know if you have more questions! It's good to collect data so
> that we can improve our products, so I want to help you collect useful data
> in a useful way.

cindy:
Let me know if there is anything that's still not clear. Thank you for your help!
Flags: needinfo?(chsiang)
Flags: needinfo?(liuche)
Chenxia, thanks for the clarification!

I'll wait for your documentation review. Thanks!
Flags: needinfo?(rchien)
Comment on attachment 8888244 [details]
Bug 1359306 - Implant telemetry probe for search

https://reviewboard.mozilla.org/r/159196/#review166790

Code seems good, but (goes without saying) please wait until a data steward gives a data-review+ before landing. Thanks!

::: browser/components/preferences/in-content-new/findInPage.js:287
(Diff revision 5)
>        } else {
>          // Creating tooltips for all the instances found
>          this.listSearchTooltips.forEach((anchorNode) => this.createSearchTooltip(anchorNode, this.query));
> +
> +        // Implant search telemetry probe after user stops typing for a while
> +        if (this.query.length >= 2) {

I wonder if it's worth trimming the query string too (and when setting as key), so that we don't add a result with just whitespace.
Attachment #8888244 - Flags: review?(mconley) → review+
Comment on attachment 8888244 [details]
Bug 1359306 - Implant telemetry probe for search

https://reviewboard.mozilla.org/r/159196/#review166790

> I wonder if it's worth trimming the query string too (and when setting as key), so that we don't add a result with just whitespace.

`this.query` has been trimed at https://searchfox.org/mozilla-central/source/browser/components/preferences/in-content-new/findInPage.js#223
Comment on attachment 8888244 [details]
Bug 1359306 - Implant telemetry probe for search

https://reviewboard.mozilla.org/r/159196/#review166990

::: toolkit/components/telemetry/Scalars.yaml:405
(Diff revision 5)
>        - main
> +  search_query:
> +    bug_numbers:
> +        - 1359306
> +      description: >-
> +        The count of query strings when user performs a search query in about:preferences.

Please also include a description of what the keys are, see other prefs that have "keyed: true"

https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Scalars.yaml#163

::: toolkit/components/telemetry/Scalars.yaml:406
(Diff revision 5)
> +  search_query:
> +    bug_numbers:
> +        - 1359306
> +      description: >-
> +        The count of query strings when user performs a search query in about:preferences.
> +        The telemetry data will be sent if there is a successful search result highlighted.

nit: will be recorded if
Flags: needinfo?(liuche) → needinfo?(rchien)
Thanks for the updates to the reasons this probe is needed, Cindy, I think it looks good now and is very detailed!

I'm the contents of the reasoning here as a comment, because google docs are not public.

Background
In Firefox 56, Preferences team developed “Find in Preferences” feature, giving users ability to quickly find settings under Preferences by typing a search term in “Find in Preferences” box. We would like to understand how often this new feature is utilized and to set a foundation for a potential smart search feature in the next phase. 

Data Collected
When a user stays on a highlighted result for more than 1 second, we will add a count to a histogram together with the highlighted term.

Purpose of Collection
The team can base on the utilization of this feature to decide if we should invest resources for future enhancement 
The team can do experiments to see if they increase the usage.
For instance, if we add animation or onboarding message to make users aware of the feature, does it increase the usage?
The team can have a base for a better recommendation list based on the most seeked term.
The current implementation is to highlight everything that matches the search term, like a control + F does in a document. A better UX would be having a drop down of recommendations, a practice that modern web users have attuned to.

Who Will Use This Data
Cindy Hsiang
chsiang@mozilla.com
Comment on attachment 8888244 [details]
Bug 1359306 - Implant telemetry probe for search

https://reviewboard.mozilla.org/r/159196/#review166994

This is an opt-in probe that expires in 62, and collects successfully matched in the preferences search, which is Type 2 data. We're not collecting non-matching arbitrary user input. Data-review r+ from me once the probe description in Scalars.yaml is updated to explicitly include what the keys of the histogram are.
Attachment #8888244 - Flags: review+
Flags: needinfo?(rchien)
Thanks for the feedback. Let's land it!
Pushed by rchien@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f65f933ecdce
Implant telemetry probe for search r=liuche,mconley
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c1ed71da5707
Backed out changeset f65f933ecdce for bustage
Flags: needinfo?(rchien)
Pushed by rchien@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d02f661a04be
Implant telemetry probe for search r=liuche,mconley
https://hg.mozilla.org/mozilla-central/rev/d02f661a04be
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.