Closed Bug 1737651 Opened 3 years ago Closed 3 years ago

Add latency probe for quick suggest remote settings results

Categories

(Firefox :: Address Bar, task, P1)

task
Points:
3

Tracking

()

VERIFIED FIXED
96 Branch
Iteration:
96.1 - Nov 1 - Nov 14
Tracking Status
firefox94 + verified
firefox95 --- verified
firefox96 --- verified

People

(Reporter: adw, Assigned: adw)

References

Details

Attachments

(2 files)

Per meeting with Corey today, it would help our Merino experiments to measure the latency of quick suggest results. Ideally we would be able to compare the latency of remote settings suggestions vs. Merino suggestions to see if/how much slower Merino suggestions are and whether the latency difference between the two affects the user's behavior/experience.

Latency would be defined as the time from when the urlbar starts a new search (i.e., the user typed a character) to the time when the suggestion is shown. The urlbar already has PLACES_AUTOCOMPLETE_1ST_RESULT_TIME_MS and PLACES_AUTOCOMPLETE_6_FIRST_RESULTS_TIME_MS histograms that measure similar latency for the first and sixth results. The new histogram(s) would be similar.

In the meeting I brought up the possibility of adding one histogram for remote settings and another for Merino, but on second thought I'm not sure that's a good idea. The way quick suggest currently works is that no suggestion is shown to the user until both types are fetched. In other words, the slower suggestion dictates the latency as defined above. I'd imagine that once Merino is enabled, it would negatively impact the remote settings probe, since I'd expect Merino to be slower. So it's probably a better idea to have a single quick suggest histogram, and we could compare its values on the control population, where only RS is enabled, to its values on the treatment population, where both RS and Merino are enabled.

I recommend adding separate probes for each service, if possible. If there is any possibility/situation where RemoteSettings is slower than Merino, it will be obfuscate interpretation of difference observed by Merino in the Merino+RemoteSetting case versus the Merino only case if only one probe is used. Actually, the histogram will appear worse for Merino than its actually performance, because it is showing the worst case value of both methods. Having a Merino-only probe addresses this issue.

OK. We can have separate probes that measure the time from the start of the search to the time that suggestions are fetched from each. That's slightly different from choosing as the end time the time at which the user is actually shown one of the suggestions. In other words, there's some period between the time both suggestions have been fetched and the time that one of them appears to the user. But the length of that period is negligible and constant for each type of suggestion, so it's probably OK that it's not captured in the probe(s).

Corey, I started working on this today but I ended up with more questons. I'm going to need some help understanding what we want to record exactly and what it will tell us.

This is the basic logic/algorithm for fetching Firefox Suggest suggestions:

  1. The user types a character and we start two new independent fetches, one from remote settings and one from Merino
  2. Each fetch finishes independently of the other:
    1. Remote settings finishes
    2. Merino finishes
  3. Did either fetch return any suggestions? If no, stop. If yes, continue.
  4. Is it OK to show any of the suggestions to the user? If no, stop. If yes, pick one and show it.

I think it's clear we should start the stopwatches after step 1, but after what step do we stop them, and which fetches do we record latencies for?

If we stop them after step 2, then in many cases we'll record latencies (a) when no suggestions are returned at all and (b) when suggestions are returned but it's not OK to show them to the user (e.g., because they're sponsored but the user has turned off sponsored suggestions). Since we do these fetches for each character typed and most search strings will not return any suggestions at all, for the vast majority of time we record latencies in this case, the user will not see any suggestions at all.

If we stop them after step 3, then in some cases we'll record latencies when suggestions are returned but it's not OK to show them to the user. And if one of the fetches didn't return any suggestions, should its latency be recorded?

If we stop them after step 4, then we'll record latencies when one of the suggestions is shown to the user. If one of the fetches didn't return any suggestions, or if it did but the suggestion we show to the user did not come from it, should its latency be recorded?

Also, regardless of when we stop the stopwatches, what do the latencies tell us? Say we find out remote settings generally takes ~50ms and Merino takes ~200ms. What does that mean? Is that good or bad, and compared to what? Compared to how long the result at the 10th position usually takes to show up? Or only non-Firefox-Suggest results at the 10th position?

Flags: needinfo?(cdowhygelund)

I believe we want to go with 3. My rationale is that the probe is intended to measure the time it takes from a user to enter character to getting response from server.

  • Assumption: The time it takes for this process to complete is independent as to whether a suggestion is found. i.e., the latency distribution of responses containing suggestions, is similar (enough) to those responses empty of suggestions. Is this true, or is my assessment naive?

The comparison is how much latency using Merino adds compared to RemoteSettings, since Merino is being used to replace RemoteSettings. If they are both processing the same search queries, and the assumption above holds, then differences in the distributions informs how Merino adds latency to suggestions being served relative to RemoteSettings. This difference can then be compared to user studies to infer its impact on user perceived performance.

I agree in that it gets complicated making that inference as to user perceived performance, given how the results are shown, etc...

Flags: needinfo?(cdowhygelund)

Regarding which fetches to record, I believe all timings should be included, including those that don't return results. This is based upon the assumption I noted in the previous comment. This will reduce sparsity in this measure.

Regarding if Merino uses the fallback, the desired behavior of the probe is to still record the time to response, rather than the timeout value.

Iteration: 95.2 - Oct 18 - Oct 31 → 96.1 - Nov 1 - Nov 14
Depends on: 1737928

This adds a new histogram called
FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS. That's a verbose name, but
urlbar-related histograms are prefixed with FX_URLBAR, and if the
QUICK_SUGGEST part wasn't there it wouldn't be clear which urlbar use of
remote settings the probe was referring to.

The histogram has a range of 30 seconds, which is too big for something that
happens entirely on the client, but Corey says it's more important that it has
the same buckets and period as the similar Merino latency histogram.

Depends on D130530

Attached file request.md

Data review for the FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS histogram

Attachment #9249807 - Flags: data-review?(cdowhygelund)

request.md

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes, it will be available with other telemetry on DTMO, file Histograms.jsm and https://firefox-source-docs.mozilla.org/browser/urlbar/telemetry.html.

Is there a control mechanism that allows the user to turn the data collection on and off?

Clients may use the Firefox telemetry opt-out mechanism.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Drew Willcoxon and the Contexual Services team.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction data

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes

Does the data collection use a third-party collection tool?

No


Result: datareview+

Attachment #9249807 - Flags: data-review?(cdowhygelund) → data-review+
Pushed by dwillcoxon@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/151c788b1ca6 Add a latency histogram for the remote settings quick suggest source. r=nanj
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch

@Drew in order to verify these new histograms are the following scenarios valid?

FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS

  • Verify that this histogram increases every time a Sponored/Non-Sponsored result is triggered from Remote Settings.
  • Verify that this histogram doesn't increase when a Sponored/Non-Sponsored result is triggered from Merino.

FX_URLBAR_MERINO_LATENCY_MS
FX_URLBAR_MERINO_RESPONSE

  • Verify that both histograms increase every time a Sponored/Non-Sponsored result is triggered from Merino.
  • Verify that these histograms don't increase when a Sponored/Non-Sponsored result is triggered from Remote Settings.

Are there any new histograms added with this patch? Also, I am not sure exactly what the values mean and if we should pay more attention to them.

Flags: needinfo?(adw)

Comment on attachment 9249526 [details]
Bug 1737651 - Add a latency histogram for the remote settings quick suggest source.

Beta/Release Uplift Approval Request

  • User impact if declined: We need this for the Firefox Suggest preferences redesign targeting 95/94.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Please see bug comments
  • List of other uplifts needed: Please see uplift spreadsheet: https://docs.google.com/spreadsheets/d/1LavihS-VOPFYEyum7mrx6FKXmuQeHi9xQHfGNSxjnoY/edit?usp=sharing
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This only adds some new telemetry for measuring Firefox Suggest suggestion latency. Has tests.
  • String changes made/needed:
Flags: needinfo?(adw)
Attachment #9249526 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9249526 [details]
Bug 1737651 - Add a latency histogram for the remote settings quick suggest source.

Approved for 95.0b5.

Attachment #9249526 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

(In reply to Cosmin Muntean [:cmuntean], Ecosystem QA from comment #13)

FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS

  • Verify that this histogram increases every time a Sponored/Non-Sponsored result is triggered from Remote Settings.
  • Verify that this histogram doesn't increase when a Sponored/Non-Sponsored result is triggered from Merino.

Your first point is correct but not the second, so in other words a new value for this histogram should be recorded every time a suggestion is triggered from remote settings. Even when the suggestion that is shown to the user comes from Merino, as long as browser.urlbar.quicksuggest.remoteSettings.enabled is true (which it is by default), then internally Firefox also fetches a suggestion from remote settings.

So to verify this histogram, just make sure that pref is true and then trigger a suggestion. I would also recommend making sure browser.urlbar.merino.enabled is false (it is by default) so you're sure that the suggestion you see comes from remote settings and not from Merino.

One last thing, this histogram records latency values in terms of milliseconds. It's not simply a counter that's increased every time. I expect most recorded values will be small, like less than 50.

FX_URLBAR_MERINO_LATENCY_MS
FX_URLBAR_MERINO_RESPONSE

  • Verify that both histograms increase every time a Sponored/Non-Sponsored result is triggered from Merino.
  • Verify that these histograms don't increase when a Sponored/Non-Sponsored result is triggered from Remote Settings.

To be clear, this bug only added the FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS histogram. It did not add these two other ones. FX_URLBAR_MERINO_RESPONSE is in bug 1737923. FX_URLBAR_MERINO_LATENCY_MS was added in bug 1727799, but since QA did not verify that bug, I'll leave it up to you if you want to verify FX_URLBAR_MERINO_LATENCY_MS here or in bug 1727799. Either way works for me.

FX_URLBAR_MERINO_LATENCY_MS is very similar to FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS, it just measures Merino's latency instead of remote settings. So to verify it, set browser.urlbar.merino.enabled=true and then trigger a suggestion. I would also recommend setting browser.urlbar.quicksuggest.remoteSettings.enabled=false so you're sure that the suggestion you see comes from Merino and not remote settings.

Are there any new histograms added with this patch?

Only FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS

Also, I am not sure exactly what the values mean and if we should pay more attention to them.

You don't need to pay attention to the particular values, just that one new value is recorded each time a suggestion is fetched.

I expect the values for FX_URLBAR_MERINO_LATENCY_MS to be bigger than the values for FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS, so that's OK FWIW.

Flags: in-testsuite+
Summary: Add latency probe for quick suggest results → Add latency probe for quick suggest remote settings results
QA Whiteboard: [qa-triaged]

We have verified this bug on the latest Nightly 96.0a1 build (Build ID: 20211109190508) and the latest Beta 95.0b5 (Build ID: 20211109194756) on Windows 10 x64, macOS 10.15.7 and Ubuntu 20.04 x64.

  • In order to verify this issue we have used the scenarios from comment 16. The "FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS" histogram is correctly registered.
Status: RESOLVED → VERIFIED

[Tracking Requested - why for this release]: We need this for the Firefox Suggest preferences redesign targeting 95/94.

Comment on attachment 9249526 [details]
Bug 1737651 - Add a latency histogram for the remote settings quick suggest source.

Beta/Release Uplift Approval Request

  • User impact if declined: We need this for the Firefox Suggest preferences redesign targeting 95/94.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Please see bug comments
  • List of other uplifts needed: Please see uplift spreadsheet: https://docs.google.com/spreadsheets/d/1LavihS-VOPFYEyum7mrx6FKXmuQeHi9xQHfGNSxjnoY/edit?usp=sharing
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This only adds some new telemetry for measuring Firefox Suggest suggestion latency. Has tests.
  • String changes made/needed:
Attachment #9249526 - Flags: approval-mozilla-release?

We have verified this bug on Firefox 94.0.2 try build on Windows 10 x64, macOS 10.15.7 and Ubuntu 20.04 x64.

  • In order to verify this issue we have used the scenarios from comment 16. The "FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS" histogram is correctly registered.

Comment on attachment 9249526 [details]
Bug 1737651 - Add a latency histogram for the remote settings quick suggest source.

Approved for 94.0.2.

Attachment #9249526 - Flags: approval-mozilla-release? → approval-mozilla-release+

We have verified this bug on Firefox 94.0.2 candidate build (Build ID: 20211117154346) on Windows 10 x64, macOS 10.15.7 and Ubuntu 20.04 x64.

  • In order to verify this issue we have used the scenarios from comment 16. The "FX_URLBAR_QUICK_SUGGEST_REMOTE_SETTINGS_LATENCY_MS" histogram is correctly registered.
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: