Closed Bug 1686330 Opened 3 years ago Closed 3 years ago

Add telemetry for tab-to-search result impressions, per-engine

Categories

(Firefox :: Address Bar, task, P2)

task
Points:
2

Tracking

()

RESOLVED FIXED
87 Branch
Iteration:
87.1 - Jan 25 - Feb 7
Tracking Status
firefox87 --- fixed

People

(Reporter: bugzilla, Assigned: bugzilla)

References

Details

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1685734 +++

Product has requested that we add telemetry to count the number of times we show tab-to-search results, per-engine. This may pose some privacy difficulties.

Tab-to-search results are shown when a search engine domain is autofilled. For example, if a user has the Amazon search engine installed and has Amazon in their history or bookmarks, then beginning to type "amazon.com" will eventually autofill to Amazon and show the Amazon tab-to-search result. In most cases, this happens when the user types "a", "am", or "ama".

We already keep an aggregate count of how often we show tab-to-search results. We record how often tab-to-search results are selected on a per-engine basis. However, by recording how often tab-to-search results are shown on a per-engine basis, we may be indirectly recording user searches. For example if a user has a value of 4 for a (say) urlbar.tabtosearch.impressions.Amazon probe, we'd know that

  • The user has the Amazon search engine installed,
  • Amazon is in their history or bookmarks, although we wouldn't know how often, and
  • The user typed some part of "amazon.com" four separate times.

I'm requesting data-review so someone with more expertise here can determine if this type of telemetry would be permissible and if it would be considered Type II or Type III telemetry.

Chris, could you offer some advice here? Please let me know if you need clarification.

Flags: needinfo?(chutten)

Hm. This is an interesting one. In the broadest sense this is a coarse measurement of what users type in the awesomebar, which would be playing with Cat4 data because the awesomebar can have any text at all pasted into it.

However, in addition to the coarseness, this is much more limited than all that. All we learn about what's typed are substrings (or strings "close to" for some distance measurement) of a list of known-at-compile-time strings. (Right? Correct me if I'm wrong, but tab-to-search only works on certain installed engines? Or does it work on arbitrary engines? (I suppose if it worked on arbitrary installed engines it'd be no worse, but I'm interested to know)) This is far closer in territory to Cat2 Interaction because this is more or less "Did the user interact with a piece of the UI using the keyboard". These are (weird, ornate) keyboard shortcuts, if you think about it.

But then we have the extra wrinkle of needing elements in the places db (history, bookmarks) to enable these "keyboard shortcut"-like things. Knowing stuff about a client's browsing history is Cat3.

I'm unsure what Category this would eventually end up as. As a Steward, my instructions about what to do in those cases are clear: don't give data-review+ and ask for help.

ni?Emily - What's the categorization of the counts of how often tab-to-search is shown given that it is shown based on what users type matching (or close-to-matching) a known piece of text and the client's stored history and bookmarks?

Flags: needinfo?(chutten) → needinfo?(emily)

Thanks Chris!

(In reply to Chris H-C :chutten from comment #2)

(Right? Correct me if I'm wrong, but tab-to-search only works on certain installed engines? Or does it work on arbitrary engines? (I suppose if it worked on arbitrary installed engines it'd be no worse, but I'm interested to know))

Tab-to-search works for any installed engine, including engines the user added themselves. So if I add an IMdB engine, I'd get an IMdB tab-to-search result when "imdb.com" is autofilled. However, in our other tab-to-search telemetry, we've been grouping all user-installed engines together in an "Other" bucket, so we don't know via telemetry what engines a user has installed beyond the built-in engines.

So yes, all the strings for which we would store the semi-identifiable telemetry described in comment 0, like typing "am" to fill "amazon.com", are known at compile time. There is one of these strings per built-in engine we offer, which makes for about ~80 of them once you consider all the different engines we ship in different locales.

Approved for nightly/beta channels. I'll circle back on release. Thanks!

Flags: needinfo?(emily)
Assignee: nobody → htwyford
Status: NEW → ASSIGNED
Iteration: --- → 87.1 - Jan 25 - Feb 7
Blocks: 1685734
No longer depends on: 1685734

Comment on attachment 9199323 [details]
Bug 1686330 - Add telemetry for tab-to-search result impressions, per-engine. r?mak!

  1. What questions will you answer with this data?
    For which engines do we serve the most tab-to-search results?

  2. Why does Mozilla need to answer these questions?

These data will allow us to invest in the feature better, since we will have a better understanding of the relative popularity of tab-to-search engines and how that differs from the relative popularity of general search engine usage. For example: are shopping engines more commonly seen as tab-to-search results?

  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

Aggregate data about the total number of tab-to-search results. This was insufficient since it would lack the granularity of per-engine usage.

  1. Can current instrumentation answer these questions?

No. We only have aggregate data about the total number of tab-to-search results without per-engine granularity.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

Measurement Description
The number of times we show tab-to-search results in the address bar, keyed by the engines shown in the tab-to-search result. See comment 0 for more information.

Data Collection Category
See comment 2.

Tracking Bug #
1686330

  1. Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.
    The patch adds information on the probes to the Urlbar telemetry documentation.

  2. How long will this data be collected? Choose one of the following:

The search team wants to permanently monitor this data. Teon will own the probe.

  1. What populations will you measure?

Beta/Nightly Desktop only. All locales.

  1. If this data collection is default on, what is the opt-out mechanism for users?
    It is opt-in. Also it is just a normal keyed scalar, so users can also disable it by disabling Telemetry in Firefox.

  2. Please provide a general description of how you will analyze this data.
    Looking at the relative popularity of the engines shown in tab-to-search results.

  3. Where do you intend to share the results of your analysis?
    Internally, with the search team and interested leadership.

  4. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? If so:
    No.

Attachment #9199323 - Flags: data-review?(chutten)
Attachment #9199323 - Flags: data-review?(chutten)
Attached file data collection review

In future please attach data collection reviews as attachments to better integrate with Data Steward workflows.

Attachment #9200102 - Flags: data-review?(chutten)

Comment on attachment 9200102 [details]
data collection review

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, :teon is responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 3, Stored Content and Communications. This can reveal coarse browser history data as described above in the bug.

Is the data collection request for default-on or default-off?

Default on for pre-release channels only.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes. See explicit approval in Comment#4

Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.


Result: datareview+

Attachment #9200102 - Flags: data-review?(chutten) → data-review+
Pushed by htwyford@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b129d712e3bc
Add telemetry for tab-to-search result impressions, per-engine. r=mak

And another new revision with a green try: https://treeherder.mozilla.org/jobs?repo=try&revision=0fa4e1cfe3c2ad5efd556fb689ecc0373d790024. From Phabricator:

There were two issues causing test failures:
(1) I didn't set Services.telemetry.canRecordExtended,
(2) Previous tests were showing tab to search results, but weren't simulating engagements with fireInputEvent/blur. This meant that enginesShown was filling up and was never cleared. I fixed the tests causing these issues, and added a try/catch/finally plus error reporting to UrlbarProviderTabToSearch.onEngagement.

Pushed by htwyford@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5563ecd92e62
Add telemetry for tab-to-search result impressions, per-engine. r=mak
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: