Closed Bug 1752953 Opened 2 years ago Closed 2 years ago

Add/modify telemetry for Firefox Suggest best match MVP

Categories

(Firefox :: Address Bar, task, P1)

task
Points:
5

Tracking

()

VERIFIED FIXED
100 Branch
Tracking Status
firefox99 --- verified
firefox100 --- verified

People

(Reporter: adw, Assigned: adw)

References

()

Details

Attachments

(4 files)

We need to add telemetry for the best match MVP. Rebecca, do you have any thoughts on that? And in addition to the below, there's been discussion about moving to Glean; if we need to add any new probes for this, do you have an opinion on whether they should use Glean?

Below I've copied the telemetry section from the two best match docs here and here and I'll comment inline.

For reference, the Suggest telemetry doc is here.

Clicks

We have the following already:

We can add two other scalars for best match and its help button if that would be helpful. We can add an is_best_match boolean to the contextual services ping.

Impressions

The story is similar to clicks:

We can add a new scalar and modify the ping.

For specific keyword, how many times shown and how many times clicks

We need to define exactly what this means, specifically the keyword part. Each suggestion has a list called keywords that contains strings that can match the suggestion. Is that what it refers to? Or the user's search string? (These will probably end up being the same thing.) Do we need this for the MVP? Would an event work, or scalars?

Tag in telemetry to signify this is a best match

This would be accomplished as I mentioned above by using new probes and adding an is_best_match field to the pings.

Sponsored v.s. non-sponsored

What would this probe look like?

Flags: needinfo?(rburwei)

Hi Drew - thanks for reaching out.

1. Awesomebar Glean telemetry vs contextual_services.* pings

I have been working with Mark and Chris to get Awesomebar telemetry on Glean, and my preference is to use those probes to meet the vast majority of our Firefox Suggest and Best Match data needs. The Awesomebar Glean probes will collect data on all urlbar results, not just Firefox Suggestions -- they will provide a more holistic, unified view of the Awesomebar.

Going forward, I would to shift away from the contextual_services.* pings for data analysis and use the Awesomebar Glean telemetry instead. From my point of view, any probes we implement in contextual_services.* pings is a temporary fix to fill some gap in the transition towards Awesomebar Glean telemetry. Please let me know if this thinking is misguided!

Chris, Mark or Drew -- Do you see any reasons why we should use the contextual_services.* pings to collect data for Best Match instead of relying solely on the Awesomebar telemetry? For example: deadlines not lining up, there's some incompatibility with Glean and Best Match, the Awesomebar telemetry implementation team cannot provide ongoing support for the Awesomebar telemetry like Drew has been for Firefox Suggest, etc...

2. What won't be covered by the Awesomebar Glean telemetry

There are a couple of data asks from Product for Best Match that are out-of-scope for the Awesomebar Glean telemetry.

a. Configurations

From the Best Match doc:

If users trigger a best match suggestion, only show it 10% of the time in the best match position. 90% of the time it will be in the same last position.
The percentage match (80% match against query prefix) and user probability (10% of the time in best match position) should be configurable.

I would prefer if these configurations could be set by branches in a Nimbus experiment, and we collect the configuration settings for each client via all of our usual Nimbus infrastructure. Drew, does that work for you?

b. Clicks on the Help Link

From the Best Match doc:

Includes a “?” icon that leads the user to a SUMO page to allow users to learn more about this feature (identical to the Fx Suggest results functionality)

I prefer to also collect these clicks in the Awesomebar telemetry instead of contextual_services.* pings, but I need to understand more about how the Help Link works first. I am reaching out to UX to get some answers. Just flagging this as something to come back to, just in case.

3. Keywords

From the Best Match doc:

For specific keyword, how many times shown and how many times clicks

At the time of search, the user's search string and the keyword are the same because we're not doing any sort of fuzzy matching between user search strings and keywords. Collecting search strings is very sensitive, and that data collection goes through Merino, not the Firefox Desktop Telemetry infrastructure and not Glean.

To answer the question above, we would have to send info about which suggestions were Best Matches in the contextual_services.* pings, and then join that to the Merino logs. In my opinion, the incremental gain of answering this question is not worth the extra privacy-related work we would have to do. I will talk with Product about work-arounds on the analysis side. There is a lot of insight we can gain that gets at the heart of this question without having to collect this exact data.

Flags: needinfo?(standard8)
Flags: needinfo?(rburwei)
Flags: needinfo?(chris)

@Nan -- Flagging you on this bugzilla ticket in case you want to weigh in about Awesomebar Glean telemetry vs contextual_services.* pings.

Flags: needinfo?(najiang)

(In reply to Rebecca BurWei from comment #2)

@Nan -- Flagging you on this bugzilla ticket in case you want to weigh in about Awesomebar Glean telemetry vs contextual_services.* pings.

I have no objections to using Awesomebar Glean for the Best Match telemetry. Given that the MVP for Best Match is targeting Fx 99, is Awesomebar Glean ready for that? Also, because both impressions & clicks of Firefox Suggest are shared with our partner, I believe we'd do the same for Best Match, we have to take that into account as well.

As for migrating contextual_services.* pings over to Glean, I just wanted to note that that was already brought up on several occasions and I think that's the correct way to take. Since the contextual_services.* pings are also used for external data sharing, I'd recommend that let's make sure the migration plan is well laid out before we take any actions there.

Flags: needinfo?(najiang)

Thanks Rebecca, replies below and one question for you about whether it's OK from an analysis POV to have a mix of Glean and old-style telemetry.

(In reply to Rebecca BurWei from comment #1)

Going forward, I would to shift away from the contextual_services.* pings for data analysis and use the Awesomebar Glean telemetry instead. From my point of view, any probes we implement in contextual_services.* pings is a temporary fix to fill some gap in the transition towards Awesomebar Glean telemetry. Please let me know if this thinking is misguided!

Sounds good to me, and I think that's your call as to which data is helpful for data science. Judging by Nan's comment here it sounds like we may be able to remove those pings entirely once we have Glean equivalents.

Chris, Mark or Drew -- Do you see any reasons why we should use the contextual_services.* pings to collect data for Best Match instead of relying solely on the Awesomebar telemetry? For example: deadlines not lining up, there's some incompatibility with Glean and Best Match, the Awesomebar telemetry implementation team cannot provide ongoing support for the Awesomebar telemetry like Drew has been for Firefox Suggest, etc...

No reason from my POV, and actually I want to make sure that you all in data science have the right data in the right formats. If telemetry instead of pings (or in addition to pings) would be more helpful for you, then that's what we ought to do, regardless of whether we keep the pings around for other reasons (e.g., because that's how we currently share data with partners).

However, my concern with Glean in particular vs. old-style telemetry is the timeline, since the best match MVP is for 99. I would prefer to defer the Glean conversation until after best match is done, when we're not trying to ship something soonish. But, it seems like we're always trying to ship something soonish, and it may be feasible to at least implement new probes using Glean.

There are three sets of probes I'd like to ask about:

  1. New probes for best match
  2. Existing Suggest probes (using old-style telemetry)
  3. Existing search and address bar probes that aren't related to Suggest

Rebecca, is it a problem for you if some of these are Glean and others are old style? For example, would it be OK to use Glean for 1 and keep 2 as old style? I think it's likely that 3 will not be converted over to Glean in time for 99. In that case, should we keep everything consistent and use old style for 1?

I would prefer if these configurations could be set by branches in a Nimbus experiment, and we collect the configuration settings for each client via all of our usual Nimbus infrastructure. Drew, does that work for you?

Yes, but Nive needs input here too, as we discussed in the meeting today. If we need to be able to change configurations while the experiment is running, then using Nimbus for those configurations won't work because Nimbus variables can't be changed once the experiment is live. If we can't use Nimbus, then we'll need some out-of-band telemetry, likely reported by a pref in the telemetry environment data (assuming we keep that around w/r/t to the Glean-conversion discussion).

I prefer to also collect these clicks in the Awesomebar telemetry instead of contextual_services.* pings, but I need to understand more about how the Help Link works first.

We do collect these clicks in telemetry already via the contextual.services.quicksuggest.help scalar, and my question is whether a new scalar just for best matches would be helpful.

At the time of search, the user's search string and the keyword are the same because we're not doing any sort of fuzzy matching between user search strings and keywords. Collecting search strings is very sensitive, and that data collection goes through Merino, not the Firefox Desktop Telemetry infrastructure and not Glean.

Exactly right, yes.

In my opinion, the incremental gain of answering this question is not worth the extra privacy-related work we would have to do. I will talk with Product about work-arounds on the analysis side. There is a lot of insight we can gain that gets at the heart of this question without having to collect this exact data.

Great, thanks. Please let me know when something is actionable.

Hi Drew --

In response to:

Rebecca, is it a problem for you if some of these are Glean and others are old style? For example, would it be OK to use Glean for 1 and keep 2 as old style? I think it's likely that 3 will not be converted over to Glean in time for 99. In that case, should we keep everything consistent and use old style for 1?

I don't know much about Glean vs old-style. Could you help me understand:

  • If there are two pieces of data (e.g., 2 scalars) sent in the same ping, is it possible for one scalar to be implemented in Glean and the other scalar to be implemented in old-style?
  • What is the relationship between a ping and a probe? I can think in pings. I don't really understand what a probe is.

My concern is this: we end up with datasets that use different client-level identifiers, and I can't do the joins needed to run the analyses Product wants. contextual_services.* pings use context_id. Most of old-style telemetry uses client_id. Glean has its own id.

For the new Awesomebar Glean telemetry, Mark and Chutten advised against adding the old-style telemetry client_id, so the new Awesomebar Glean telemetry will be unjoinable to old-style telemetry by design. However, if the new Awesomebar Glean telemetry will not be ready in time for Best Match, then I need all of the Best Match telemetry to be joinable to the existing contextual_services.* ping data.

Yes, but Nive needs input here too, as we discussed in the meeting today. If we need to be able to change configurations while the experiment is running, then using Nimbus for those configurations won't work because Nimbus variables can't be changed once the experiment is live. If we can't use Nimbus, then we'll need some out-of-band telemetry, likely reported by a pref in the telemetry environment data (assuming we keep that around w/r/t to the Glean-conversion discussion).

Changing configs during an experiment is not a good idea from the point of view of running a clean, controlled experiment. I will talk to Nive to better understand her use case.

We do collect these clicks in telemetry already via the contextual.services.quicksuggest.help scalar, and my question is whether a new scalar just for best matches would be helpful.

Yes, we would like to count the number of help clicks on best matches.

What is the relationship between a ping and a probe? I can think in pings. I don't really understand what a probe is.

Sorry, by "probe" I mean a scalar, histogram, event, environment data, or counter -- a single instrumentation in Firefox that aggregates data in a particular format. Anything discussed in the data collection doc (aside from "custom pings"). In the Suggest telemetry doc, that's all the data described in the following sections: Histograms, Scalars, Events, and Environment.

"Ping" kind of has two meanings:

  • "Custom pings", like the contextual services ones we use for Suggest for impressions and clicks, described here
  • "Telemetry pings" that bundle up the data from probes, described here

So probes aggregate data inside Firefox which is then periodically bundled up inside a telemetry ping and sent to our servers.

Custom pings are a way of circumventing the restrictions of the standard probes and they let us include arbitrary data. You could consider custom pings as being a special type of probe, but it might not be helpful to mix terms like that.

If there are two pieces of data (e.g., 2 scalars) sent in the same ping, is it possible for one scalar to be implemented in Glean and the other scalar to be implemented in old-style?

It's possible to implement one scalar in Glean and one in old style, but I don't know enough about Glean to say whether the two will be sent in the same ping. I suspect not. Ultimately, Glean and old style will be unjoinable, to your point, and I think that's the main point.

At this point it sounds like we should continue using old-style telmetry for best match, given that we won't be able to convert existing telemetry to Glean and you won't be able to join Glean to old style. We'll have to postpone Glean conversion until after the best match MVP ships in 99 and try to do it all at once.

Yes, we would like to count the number of help clicks on best matches.

OK, so it sounds like a new scalar alongside the existing one will be required.

The two scalars should be disjoint, right? i.e., clicking a best match help button should increment the scalar for best matches but not the scalar for usual Suggest suggestions. This is important because best matches are also Suggest suggestions. Same question for clicks and impressions on best matches themselves.

Thanks Drew. Sounds good to me - continuing to use old-style for Best Match.

The two scalars should be disjoint, right? i.e., clicking a best match help button should increment the scalar for best matches but not the scalar for usual Suggest suggestions. This is important because best matches are also Suggest suggestions. Same question for clicks and impressions on best matches themselves.

  • Clicks and impressions on Best Matches should also count as clicks and impressions for Suggest suggestions.
  • Clicks on a help button attached to a urlbar result should be disjoint from clicks on the actual urlbar result itself.
  • Clicks on a help button attached to a Best Match should also count as a click on a help button attached to a Suggest suggestion.

I'm hopefully going to clarify a couple of things: the address bar telemetry being added will be a custom Glean ping or two - it does not make sense for us to implement that on 'old' telemetry. This will also replace the current event that is sent to telemetry for experiments.

The glean work is schedule for 99, I would expect that we should have at least the equivalent data to the current event, even if we don't have the full data request implemented.

The rest of the address bar telemetry (scalars, etc) will remain on the telemetry system for the time being.

From what I understand, I would assume that best match is its own result type in the address bar code, and therefore will be picked up by the new custom glean pings without much extra effort (if any?).

Clicks and impressions on Best Matches should also count as clicks and impressions for Suggest suggestions.

We might want to add a flag field to the existing custom pings (impressions & clicks) for labelling the type of the suggest. E.g.

"type": ["best-match", "firefox-suggest"]

Thanks Mark - yes, that helps.

The rest of the address bar telemetry (scalars, etc) will remain on the telemetry system for the time being.

Just to clarify - When you say "rest of the address bar telemetry", are you referring to all of the telemetry that is described here, but not included in the new Awesomebar Glean telemetry?

From what I understand, I would assume that best match is its own result type in the address bar code, and therefore will be picked up by the new custom glean pings without much extra effort (if any?).

A Best Match is a special type of Firefox Suggestion, so info about it should be included in typeSelected_extra (extra information that is recorded / non-missing only when the urlbar result is one of the 3 Firefox Suggest types).

(In reply to Rebecca BurWei from comment #10)

Thanks Mark - yes, that helps.

The rest of the address bar telemetry (scalars, etc) will remain on the telemetry system for the time being.

Just to clarify - When you say "rest of the address bar telemetry", are you referring to all of the telemetry that is described here, but not included in the new Awesomebar Glean telemetry?

Everything on the firefox source docs page apart from the "Event Telemetry" section (which is what the awesomebar glean telemetry is effectively going to replace).

From what I understand, I would assume that best match is its own result type in the address bar code, and therefore will be picked up by the new custom glean pings without much extra effort (if any?).

A Best Match is a special type of Firefox Suggestion, so info about it should be included in typeSelected_extra (extra information that is recorded / non-missing only when the urlbar result is one of the 3 Firefox Suggest types).

Ok, that makes sense.

Flags: needinfo?(standard8)

Thanks Mark.

(In reply to Mark Banner (:standard8) from comment #8)

This will also replace the current event that is sent to telemetry for experiments.

What do you mean by the current event? Based on the rest of your comment, it sounds like maybe the engagement event? But that doesn't quite make sense because urlbar event telemetry isn't automatically enabled for experiments and we definitely haven't been enabling it for Suggest experiments. (Maybe we should have been...)

The rest of the address bar telemetry (scalars, etc) will remain on the telemetry system for the time being.

I think I was mistaken about the scope of Glean for urlbar. You're saying that Glean is entirely parallel to telemetry, at least right now. It doesn't sound like there's any conflict with best match at all, and all the telemetry we need for it should continue to be "old style", i.e., telemetry and not Glean.

Flags: needinfo?(standard8)

From what I understand, I would assume that best match is its own result type in the address bar code, and therefore will be picked up by the new custom glean pings without much extra effort (if any?).

Also this isn't true right now IIRC -- more of a note-to-self to add a type for Suggest suggestions as part of this bug I guess, or maybe a separate bug would be better.

It might be challenging to begin capturing telemetry to support Best Match's MVP in Glean in advance of additional efforts to migrate Address Bar telemetry to Glean.

Rebecca, any concerns if we capture telemetry in the existing system for the Best Match MVP. After the MVP is released, we can look at migrating the probes to Glean. Thoughts?

Flags: needinfo?(chris) → needinfo?(rburwei)

Rebecca, an concerns if we capture telemetry in the existing system for the Best Match MVP. After the MVP is released, we can look at migrating the probes to Glean. Thoughts?

That works for me.

Flags: needinfo?(rburwei)
Points: --- → 5

(In reply to Drew Willcoxon :adw from comment #12)

Thanks Mark.

(In reply to Mark Banner (:standard8) from comment #8)

This will also replace the current event that is sent to telemetry for experiments.

What do you mean by the current event? Based on the rest of your comment, it sounds like maybe the engagement event? But that doesn't quite make sense because urlbar event telemetry isn't automatically enabled for experiments and we definitely haven't been enabling it for Suggest experiments. (Maybe we should have been...)

I meant the event here: https://searchfox.org/mozilla-central/rev/38652b98c6dd3bf42403eeb8c5305902b9a6e938/browser/components/urlbar/UrlbarController.jsm#861-867

(urlbar.engagement/urlbar.abandonment)

Although we are replacing that event, the intention is to turn it on permanently.

The rest of the address bar telemetry (scalars, etc) will remain on the telemetry system for the time being.

I think I was mistaken about the scope of Glean for urlbar. You're saying that Glean is entirely parallel to telemetry, at least right now. It doesn't sound like there's any conflict with best match at all, and all the telemetry we need for it should continue to be "old style", i.e., telemetry and not Glean.

My impression was that if we have the new Glean ping/event for this address bar telemetry and we add the best match information to it, that would be sufficient for best match. However, we're obviously concerned about timing and setting up Glean, so maybe overlapping the systems is the way to go.

(In reply to Drew Willcoxon :adw from comment #13)

From what I understand, I would assume that best match is its own result type in the address bar code, and therefore will be picked up by the new custom glean pings without much extra effort (if any?).

Also this isn't true right now IIRC -- more of a note-to-self to add a type for Suggest suggestions as part of this bug I guess, or maybe a separate bug would be better.

Yeah, I agree, I didn't realise this is a new field/option, that Rebecca has now added to the telemetry doc.

Flags: needinfo?(standard8)
See Also: → 1757658

I'll try to summarize where we're at based on the discussion here and elsewhere. Please anyone let me know if your understanding is different.

  • No Glean right now
  • Except for the remaining bullet points below, all of the required analysis as specified in the spec doc can be done using the existing contextual services pings for impressions and clicks with the addition of the new match_type field indicating whether they correspond to best matches or not (bug 1754622, awaiting data steward approval)
  • Suggestion dismissal has been removed from the requirements, so no telemetry required for that
  • Telemetry related to the preferences UI is in bug 1756917
  • I'm getting clarity from Nive about this from the doc: "Total # of searches in URL bar" -- we may need something new here, or maybe turn on urlbar engagement telemetry
  • The Nimbus exposure event needs to be recorded the first time the user sees a best match

I've filed bug 1757658 for the Nimbus exposure event. The only remaining piece is the "Total # of searches in URL bar" part, so this bug can handle that if necessary, and if no work is required for that, we can close this.

Except for the remaining bullet points below, all of the required analysis as specified in the spec doc can be done using the existing contextual services pings for impressions and clicks with the addition of the new match_type field indicating whether they correspond to best matches or not (bug 1754622, awaiting data steward approval)

Nan, can we get the new match_type added to the table contextual_services.event_aggregates, so that the data can be retained for analysis longer than 2 weeks?

I'm getting clarity from Nive about this from the doc: "Total # of searches in URL bar" -- we may need something new here, or maybe turn on urlbar engagement telemetry

I recommend using our existing telemetry for this. I don't know the names of the probes, but there is a bigquery table called search_clients_engines_sources_daily and a filtering on source = urlbar there will give you a count of searches in the urlbar. I make this recommendation for analysis and comparison reasons, but if there's some reason that probe is unreliable or measuring the wrong thing, I'm open to other ideas.

The Nimbus exposure event needs to be recorded the first time the user sees a best match

Is this correct: The Nimbus exposure event will be recorded

  • the first time a user sees a best match, if they are in the treatment group
  • the first time a user would have seen a best match, if they are in the control group
Blocks: 1757768

Nan, can we get the new match_type added to the table contextual_services.event_aggregates, so that the data can be retained for analysis longer than 2 weeks?

Sure, I think we can do that. Filed the bug 1757768 to track that.

:klukas, this match_type is specifically for quicksuggest.* tables, whereas event_aggregates stores aggregates for both quicksuggest and topsites. Can we add match_type to it and leave it "NULL" for topsites aggregates?

No longer blocks: 1757768
Flags: needinfo?(jklukas)

(In reply to Nan Jiang [:nanj] from comment #19)

Nan, can we get the new match_type added to the table contextual_services.event_aggregates, so that the data can be retained for analysis longer than 2 weeks?

Sure, I think we can do that. Filed the bug 1757768 to track that.

:klukas, this match_type is specifically for quicksuggest.* tables, whereas event_aggregates stores aggregates for both quicksuggest and topsites. Can we add match_type to it and leave it "NULL" for topsites aggregates?

Yes, that's totally reasonable.

Flags: needinfo?(jklukas)

(In reply to Rebecca BurWei from comment #18)

I recommend using our existing telemetry for this. I don't know the names of the probes, but there is a bigquery table called search_clients_engines_sources_daily and a filtering on source = urlbar there will give you a count of searches in the urlbar. I make this recommendation for analysis and comparison reasons, but if there's some reason that probe is unreliable or measuring the wrong thing, I'm open to other ideas.

I think I know what that refers to, and it's searches performed using a search engine from the address bar, e.g., typing "porcupine" in the address bar and pressing the enter key to do a search on google.com, or typing "moz" and picking the "mozilla" Google search suggestion to do a search on google.com. IOW it records uses of the address bar as a search access point.

It is not the number of sessions/engagements in the address bar, which is what the spec doc seems to want when it talks about "Total # of searches with Fx Suggest impression", etc., but I'm not sure.

If we do want the total number of sessions/engagements, we do have code in place to record telemetry events each time a session ends, but it's preffed off by default. This is what I've referred to in previous comments as "engagement events" and "engagement telemetry". Is that what we want? Are events OK? Would a scalar(s)/counter(s) be better?

Is this correct: The Nimbus exposure event will be recorded

  • the first time a user sees a best match, if they are in the treatment group
  • the first time a user would have seen a best match, if they are in the control group

We talked about this over Slack, but just to record the outcome here in the bug, the logic is:

If the user is in a treatment branch and they did not disable best match, the event is recorded the first time they trigger a best match; if the user is in a treatment branch and they did disable best match, the event is not recorded at all. If the user is in the control branch, the event is recorded the first time they would have triggered a best match. (Users in the control branch cannot "disable" best match since the feature is totally hidden from them.)

Flags: needinfo?(rburwei)

If we do want the total number of sessions/engagements, we do have code in place to record telemetry events each time a session ends, but it's preffed off by default. This is what I've referred to in previous comments as "engagement events" and "engagement telemetry". Is that what we want? Are events OK? Would a scalar(s)/counter(s) be better?

Ah ok. Yes, we want to count the total number of search sessions/engagements. I believe product would also like to add abandoned search sessions to this count as well. I would prefer a scalar/counter over an event.

Flags: needinfo?(rburwei)

OK. In that case, this is what I'm thinking is left to do here:

  • Augment the existing engagement event telemetry with two new scalars, one for completed engagements and one for abandonments
  • Add a Nimbus variable corresponding to the engagement telemetry pref so we can flip it on in our Nimbus experiments
  • Add four new scalars for best matches. This doesn't seem strictly necessary because as we've discussed the contextual services impression and click pings cover everything in detail, but having scalars may make it easier to do comparsions with the engagement scalars from the first bullet point above.
    • Sponsored best match impressions
    • Non-sponsored best match impressions
    • Sponsored best match clicks
    • Non-sponsored best match clicks

This adds new keyed scalars for best match that are analogous to the current
non-best-match scalars, but they are broken out by sponsored vs. non-sponsored:

contextual.services.quicksuggest.impression_sponsored_bestmatch
contextual.services.quicksuggest.impression_nonsponsored_bestmatch
contextual.services.quicksuggest.click_sponsored_bestmatch
contextual.services.quicksuggest.click_nonsponsored_bestmatch
contextual.services.quicksuggest.help_sponsored_bestmatch
contextual.services.quicksuggest.help_nonsponsored_bestmatch

For best matches, these new scalars are incremented in addition to the current
non-best-match scalars.

This adds two new scalars for engagements and abandonments in the urlbar:

urlbar.engagement
urlbar.abandonment

We already have engagement event telemetry but it's preffed off by default, and
for the upcoming best match experiment, data science would prefer scalars so we
easily measure total engagement volume. (See bug 1752953 comment 22.) Recording
simple scalars for engagements and abandonments in addition to the optional
event telemetry seems totally reasonable.

The existing urlbar.picked.* scalars are sort of proxies for engagement, but a
single scalar would make analysis easier, and there is no similar existing
scalar for abandonments.

This revision hooks into the TelemetryEvent class, but it records the scalars
regardless of browser.urlbar.eventTelemetry.enabled because there's no reason
to not always enable it.

This modifies the Nimbus update observer in UrlbarPrefs so that it does not
update the Firefox Suggest scenario unnecessarily. The scenario and the prefs
related to the scenario only need to be updated when either of these happens:

  • A relevant Nimbus variable changes
  • The current default-branch value of a relevant pref is incorrect for the
    intended scenario

Currently, any time a pref changes that is declared as a fallback pref for a
Nimbus urlbar variable, we end up updating the scenario all over again even if
it was some totally unrelated variable like bestMatchEnabled.

We have very good test coverage for scenario updates, so I'm confident that this
revision works properly and that we can catch regressions when we modify related
code in the future.

While working on this, I found that the quickSuggestNonSponsoredEnabled was
accidentally removed by D130159 (https://hg.mozilla.org/mozilla-central/rev/ccda4432cdc4d7180a9304e05b52f046616bbf2b)
so I added it back. We haven't used this variable in any experiments or rollouts
so it was never a problem.

Attached file data-request.md

Data review request for 8 new telemetry scalars related to Firefox Suggest, best match, and urlbar

Attachment #9267053 - Flags: data-review?(chutten)

Comment on attachment 9267053 [details]
data-request.md

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Drew Willcoxon is responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does the data collection use a third-party collection tool?

No.


Result: datareview+

Attachment #9267053 - Flags: data-review?(chutten) → data-review+
Pushed by dwillcoxon@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f284a36b647a
Part 1: Add best match scalar telemetry. r=daisuke
https://hg.mozilla.org/integration/autoland/rev/5825b5733007
Part 2: Add scalar telemetry for urlbar engagements and abandonments. r=mak
https://hg.mozilla.org/integration/autoland/rev/92706c4e7192
Part 3: Don't update the Firefox Suggest scenario unnecessarily. r=daisuke
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 100 Branch

STR for QA

Verification involves testing many telemetry scalars that are incremented when certain interactions occur. In addition to making sure certain scalars are incremented, we should also verify that other scalars are not incremented.

One way to do these STR would be to restart Firefox after each set of STR. In that case, scalars that were previously recorded will be cleared (of course).

Another way would be to keep Firefox open the entire time. In that case, scalars that were previously recorded will stick around, and the important point is that they should not be incremented unless the STR say they should be.

Engagement and abandonment scalars

This bug adds 2 telemetry scalars (about:telemetry#scalars-tab_search=urlbar) that are incremented for each engagement and abandonment in the urlbar. These 2 scalars aren't related to best match or Suggest and are always recorded.

STR: urlbar.engagement

  1. Type anything in the urlbar and press enter
  2. Verify urlbar.engagement is recorded with a value of 1
  3. Type something else in the urlbar and click a result in the panel
  4. Verify urlbar.engagement now has a value of 2

STR: urlbar.abandonment

  1. Click inside the urlbar
  2. Click the web page so the urlbar is no longer focused
  3. Verify urlbar.abandonment is recorded with a value of 1. Abandonment is easy to trigger so if the value is greater than 1, then you probably triggered some abandonments earlier, so try restarting the app. The important thing is that this scalar is incremented with each abandonment.
  4. Click inside the urlbar again
  5. Switch tabs
  6. Verify urlbar.abandonment now has a value of 2

Best match keyed scalars

This bug also adds 6 keyed scalars (about:telemetry#keyed-scalars-tab_search=quicksuggest) related to best match. For each scalar, the key name is the position at which the best match appeared. Usually the key name will be 2 since best match usually appears at position 2. The value corresponding to the key is the number of records for that particular position.

The 6 new keyed scalars are:

contextual.services.quicksuggest.impression_sponsored_bestmatch
contextual.services.quicksuggest.impression_nonsponsored_bestmatch
contextual.services.quicksuggest.click_sponsored_bestmatch
contextual.services.quicksuggest.click_nonsponsored_bestmatch
contextual.services.quicksuggest.help_sponsored_bestmatch
contextual.services.quicksuggest.help_nonsponsored_bestmatch

In addition to these 6 new keyed scalars, the following 3 existing keyed scalars for Suggest will also be recorded since best matches are part of Suggest:

contextual.services.quicksuggest.impression
contextual.services.quicksuggest.click
contextual.services.quicksuggest.help

These STR require Firefox Suggest and best match to be enabled (browser.urlbar.bestMatch.enabled).

STR: contextual.services.quicksuggest.impression_sponsored_bestmatch

  1. Type espresso coffee to trigger the Amazon coffee maker best match
  2. Click a different result, not the best match
  3. Verify the values for the following keyed scalars have been incremented under the 2 key name:
    contextual.services.quicksuggest.impression_sponsored_bestmatch
    contextual.services.quicksuggest.impression
    
  4. Verify none of the other keyed scalars have been incremented

STR: contextual.services.quicksuggest.click_sponsored_bestmatch

  1. Type espresso coffee to trigger the Amazon coffee maker best match
  2. Click the best match
  3. Verify the values for the following keyed scalars have been incremented under the 2 key name:
    contextual.services.quicksuggest.click_sponsored_bestmatch
    contextual.services.quicksuggest.click
    contextual.services.quicksuggest.impression_sponsored_bestmatch
    contextual.services.quicksuggest.impression
    
  4. Verify none of the other keyed scalars have been incremented

STR: contextual.services.quicksuggest.help_sponsored_bestmatch

  1. Type espresso coffee to trigger the Amazon coffee maker best match
  2. Click the ? help button in the best match
  3. Verify the values for the following keyed scalars have been incremented under the 2 key name:
    contextual.services.quicksuggest.help_sponsored_bestmatch
    contextual.services.quicksuggest.help
    contextual.services.quicksuggest.impression_sponsored_bestmatch
    contextual.services.quicksuggest.impression
    
  4. Verify none of the other keyed scalars have been incremented

STR: contextual.services.quicksuggest.impression_nonsponsored_bestmatch

  1. Type betty to trigger the Betty White Wikipedia best match
  2. Click a different result, not the best match
  3. Verify the values for the following keyed scalars have been incremented under the 2 key name:
    contextual.services.quicksuggest.impression_nonsponsored_bestmatch
    contextual.services.quicksuggest.impression
    
  4. Verify none of the other keyed scalars have been incremented

STR: contextual.services.quicksuggest.click_nonsponsored_bestmatch

  1. Type betty to trigger the Betty White Wikipedia best match
  2. Click the best match
  3. Verify the values for the following keyed scalars have been incremented under the 2 key name:
    contextual.services.quicksuggest.click_nonsponsored_bestmatch
    contextual.services.quicksuggest.click
    contextual.services.quicksuggest.impression_nonsponsored_bestmatch
    contextual.services.quicksuggest.impression
    
  4. Verify none of the other keyed scalars have been incremented

STR: contextual.services.quicksuggest.help_nonsponsored_bestmatch

  1. Type betty to trigger the Betty White Wikipedia best match
  2. Click the ? help button in the best match
  3. Verify the values for the following keyed scalars have been incremented under the 2 key name:
    contextual.services.quicksuggest.help_nonsponsored_bestmatch
    contextual.services.quicksuggest.help
    contextual.services.quicksuggest.impression_nonsponsored_bestmatch
    contextual.services.quicksuggest.impression
    
  4. Verify none of the other keyed scalars have been incremented
Flags: qe-verify+
Flags: in-testsuite+

Comment on attachment 9266414 [details]
Bug 1752953 - Part 1: Add best match scalar telemetry.

Beta/Release Uplift Approval Request

  • User impact if declined: This is required for the Firefox Suggest best match experiment in 99.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Please see comment 31
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): These patches mainly add new Firefox Suggest telemetry. One of the patches makes a deeper change to Suggest initialization, but it only affects Suggest and it's well covered by existing tests.
  • String changes made/needed:
Attachment #9266414 - Flags: approval-mozilla-beta?
Attachment #9266416 - Flags: approval-mozilla-beta?
Attachment #9266629 - Flags: approval-mozilla-beta?

Comment on attachment 9266414 [details]
Bug 1752953 - Part 1: Add best match scalar telemetry.

Approved for 99.0b3. Thanks.

Attachment #9266414 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9266416 [details]
Bug 1752953 - Part 2: Add scalar telemetry for urlbar engagements and abandonments.

Approved for 99.0b3. Thanks.

Attachment #9266416 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9266629 [details]
Bug 1752953 - Part 3: Don't update the Firefox Suggest scenario unnecessarily.

Approved for 99.0b3. Thanks.

Attachment #9266629 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

I have verified this issue on the latest Nightly 100.0a1 (Build ID: 20220313212642) and Beta 99.0b3 (Build ID: 20220313185831) on Windows 10 x64, macOS 10.15.7 and Linux Ubuntu 20.04 x64.

  1. In order to verify this issue I have used the STR from comment 31.
  2. I verified that the urlbar.engagement and urlbar.abandonment scalars are correctly triggered when interacting with the URL bar as mentioned in comment 31.
  3. I verified that all the 6 new keyed scalars are correctly triggered when following the STR from comment 31.
  4. I verified that depending on the scenario, only the values of the correct keyed scalars are incremented; the values of the remaining keyed scalars are not incremented.
  5. I verified that the existing 3 keyed scalars are recorded correctly.
  6. I also verified that the key name is 2 given the position of the best match result.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: