Closed Bug 1655034 Opened 2 months ago Closed 2 months ago

The "Places Autocomplete 6 First Results ms" measurement seems to be broken

Categories

(Firefox :: Address Bar, defect, P1)

defect
Points:
2

Tracking

()

VERIFIED FIXED
81 Branch
Iteration:
81.1 - July 27 - Aug 09
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox78 --- unaffected
firefox79 --- unaffected
firefox80 + verified
firefox81 --- verified

People

(Reporter: itiel_yn8, Assigned: harry)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Thank you!
I suspect recent changes to the heuristic results may have made this measurement bogus. Maybe bug 1645521 or bug 1648468, it's around that date.

Severity: -- → S3
Keywords: regression
Priority: -- → P1
Regressed by: 1648468

[Tracking Requested - why for this release]:

Important Urlbar telemetry probe is broken in 80.

Assignee: nobody → htwyford
Status: NEW → ASSIGNED
Iteration: --- → 81.1 - July 27 - Aug 09
Points: --- → 2

Finding what was wrong here proved tricky. I think it's because there's nothing necessarily broken and that we really are seeing are seeing a significant number of queries taking less than 50ms to fetch six results. Here's a big write-up since I thought it was an interesting journey.

tl;dr We're no longer waiting for browser.urlbar.delay after fetching heuristic results.

First, I tried to determine if bug 1645521 or bug 1648468 was the culprit here. It wasn't clear which caused the problem since I would expect modest speed improvements from both. I did a(n admittedly un-scientific) test of ~100 queries on the baseline patch before bug 1645521 was applied and got these telemetry results:

PLACES_AUTOCOMPLETE_6_FIRST_RESULTS_TIME_MS
71 samples, average = 98.5, sum = 6,990

  0 |  0  0%
 50 |######  4  6%
 56 |#########################  18  25%
 62 |##################  13  18%
 69 |#  1  1%
 86 |#  1  1%
119 |###################  14  20%
132 |#####################  15  21%
147 |######  4  6%
225 |#  1  1%
250 |  0  0%

The first cluster of results at 50, 56, and 62ms were queries for which six results could be fetched from Places. The second cluster at 119 and 132ms were queries for which at least some of the first six results needed to be fetched from the network.

Applying bug 1645521, I did a similar set of ~100 queries. Naturally they weren't exactly the same queries but I tried to do similar queries in the same order. My results:

PLACES_AUTOCOMPLETE_6_FIRST_RESULTS_TIME_MS
115 samples, average = 79.8, sum = 9,175

  0 |###############  20  17%
 50 |####  5  4%
 56 |#########################  33  29%
 62 |#########  12  10%
 69 |#  1  1%
 86 |#  1  1%
119 |##############  18  16%
132 |###############  20  17%
147 |##  2  2%
164 |#  1  1%
182 |##  2  2%
202 |  0  0%

You can see the same two clusters as in the first query. There's also a shift towards 0ms queries. Since this probe has a low value of 50, the 0ms category includes all queries that took less than 50ms.

Finally, I applied bug 1648468 and did the same test:

PLACES_AUTOCOMPLETE_6_FIRST_RESULTS_TIME_MS
101 samples, average = 82.7, sum = 8,352

  0 |#########################  32  32%
 50 |##  3  3%
 56 |################  20  20%
119 |############  15  15%
132 |#################  22  22%
147 |#####  6  6%
164 |#  1  1%
250 |#  1  1%
474 |#  1  1%
527 |  0  0%

There's a larger shift towards 0ms queries: they comprise 32% of all queries. The two local/network queries clusters exist.

I should note that I ran each set of 100 queries a few times on each patch. The samples above are representative of the full set of results. These incremental improvements are about in line with what we see in the telemetry (screenshot attached; "heuristic patch" is bug 1645521 and "autofill" is bug 1648468).

I added a number of print statements along the whole lifetime of the query and it appears there's nothing wrong: everything is happening in the right order, and this telemetry probe is firing at the right time, when we really do have >=6 results ready to display.

I started to think that this wasn't broken and we were seeing a significant improvement in query speed because:

  1. Network-based queries were still taking the same amount of time despite needing to go through the heuristic timer introduced in bug 1645521.
  2. There were incremental improvements in bug 1645521 and bug 1648468, suggesting it wasn't just one major source of breakage.
  3. I added a line to print the value of the timer. Common values for DB-only queries were 6-15ms. This is uniformly 50ms less than the baseline.

The -50ms uniformity had me looking around for 50ms timers in urlbar code and I remembered that we wait 50ms in UnifiedComplete after fetching heuristic results. The two patches each factor heuristic results out of UnifiedComplete, explaining why we increasingly skip that timer.

Do we wait on this timer to not hit the DB in cases where the user quickly types then selects something?

If we wanted to reintroduce a timer like this, we could take Drew's advice in bug 1652483 comment 3 and add a step to the query process to fetch heuristic results. We could wait for the delay timer after that step. If we do that, I'm inclined to close this bug and simply raise the priority on bug 1652483.

(In reply to Harry Twyford [:harry] from comment #3)

tl;dr We're no longer waiting for browser.urlbar.delay after fetching heuristic results.

That looks wrong, or at least unwanted.
The delay exists to avoid starving I/O or network on every input, we accept to skip delay for the heuristic to make it more responsive.

The -50ms uniformity had me looking around for 50ms timers in urlbar code and I remembered that we wait 50ms in UnifiedComplete after fetching heuristic results. The two patches each factor heuristic results out of UnifiedComplete, explaining why we increasingly skip that timer.

The heuristics always skipped the delay, even before.

So it looks like now we skip the delay in some case where we should not.

The problem is here:
https://searchfox.org/mozilla-central/rev/c6676771df58c6e0098574bc6b11517acbf264cf/toolkit/components/places/UnifiedComplete.jsm#807-808

The if (hasHeuristic) check now is wrong. This was supposed to always have a delay, unless there was no heuristic result (then it didn't make much sense to delay, it was pretty much only the empty search so the user was not typing). Now that other providers can return heuristic results, this if will be false more often.
The long term solution may be to move all the heuristic results to separate providers, and change UrlbarProviderUnifiedComplete to execute the delay itself.
The short term solution may be to change this condition from if (hasHeuristic) to if (this._trimmedOriginalSearchString), so basically skip the delay only for the empty search.

(In reply to Harry Twyford [:harry] from comment #3)

If we wanted to reintroduce a timer like this, we could take Drew's advice in bug 1652483 comment 3 and add a step to the query process to fetch heuristic results. We could wait for the delay timer after that step. If we do that, I'm inclined to close this bug and simply raise the priority on bug 1652483.

It looks a bit scary for an uplift, and because UnifiedComplete is an heuristic provider, I suspect it wouldn't solve the problem unless we completely port it into a modern provider, that again is not something we can uplift.

(In reply to Marco Bonardo [:mak] from comment #4)

So it looks like now we skip the delay in some case where we should not.

Yes, we skip the delay any time the heuristic result comes from a provider other than UnifiedComplete. So most notably Autofill and HeuristicFallback.

(In reply to Marco Bonardo [:mak] from comment #5)

The short term solution may be to change this condition from if (hasHeuristic) to if (this._trimmedOriginalSearchString), so basically skip the delay only for the empty search.

Sounds good! I prepared a patch and this fixes the issue.

Pushed by htwyford@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/188fed33437c
Ensure UnifiedComplete waits for browser.urlbar.delay in most cases. r=mak
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 81 Branch

Comment on attachment 9166570 [details]
Bug 1655034 - Ensure UnifiedComplete waits for browser.urlbar.delay in most cases. r?mak

Beta/Release Uplift Approval Request

  • User impact if declined: Important Urlbar telemetry probe will break. There may also be issues with inputting text in the Urlbar, although no one's reported any issues related to that yet.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: 1. In a fresh profile, type and erase many things in the Urlbar. Make sure some of them are URLs already in history/bookmarks. Fresh profiles come with Mozilla bookmarks so try typing:
    "m"; "mo", "moz", etc.
  1. In about:telemetry, search for PLACES_AUTOCOMPLETE_6_FIRST_RESULTS_TIME_MS. There should be no values in the "50" column.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): One-line change to restore old behaviour that was accidentally removed in a different patch.
  • String changes made/needed:
Attachment #9166570 - Flags: approval-mozilla-beta?
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

I reproduced this issue using Fx 80.0a1(2020-07-24) on Windows 10 x64.
I verified it using the steps from comment 11 using Fx 81.0a1(2020-07-31), on Windows 10 x64, macOS 10.13 and Ubuntu 18.04.

Flags: qe-verify+

Comment on attachment 9166570 [details]
Bug 1655034 - Ensure UnifiedComplete waits for browser.urlbar.delay in most cases. r?mak

approved for 80.0b3

Attachment #9166570 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+

(In reply to Harry Twyford [:harry] from comment #11)

  1. In about:telemetry, search for PLACES_AUTOCOMPLETE_6_FIRST_RESULTS_TIME_MS. There should be no values in the "50" column.

Harry, I would imagine that the above should read that there shouldn't be values in the "0" column for the comment 11 STR with the fix? Can you confirm it?

Flags: needinfo?(htwyford)

That's correct. There should either be no values in the "0" column or no "0" column appearing at all.

Flags: needinfo?(htwyford)

Reproduced the issue with the STR from comment 11 and comment 16 addition on affected Nightly 80.0a1 2020-07-24 and proceeded to verify the fix with 80.0b6 2020-08-07 on Windows 10 and Ubuntu 20.

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-triaged]
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.