For autofill, consider collapsing http/https prefixes per domain, or all prefixes per domain

RESOLVED FIXED in Firefox 62

Status

()

enhancement
P1
normal
RESOLVED FIXED
Last year
9 months ago

People

(Reporter: adw, Assigned: adw)

Tracking

Trunk
mozilla63
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox62+ fixed, firefox63+ fixed)

Details

(Whiteboard: [fxsearch])

Attachments

(2 attachments)

Bug 1464154 comment 45 (and others) describes an interesting case where Julien has two bugzilla origins, one for http and another for https.  Neither origin alone crosses the autofill frecency threshold, so neither is autofilled, which is annoying.  But added together they do.  We should consider fixing this case somehow so that bugzilla is autofilled.

We could make http and https a special case, so that they're effectively treated as a single prefix and https "wins."  It's unlikely that users have other groups of prefixes we'd want to similarly collapse.  Ideally though I think we should come up with a general solution that effectively searches primarily by domain and then secondarily by the full origin.
Blocks: 1464154
Whiteboard: [fxsearch]
Tracking this for 62 since it affects search autofill quality.  I would like to see whatever solution you decide on tested in 62 beta as early as possible in the cycle (ideally before July 12, the "EARLY_BETA_OR_EARLIER" milestone).  That will give us time for more testing before 62 release.
Working on a patch for this.  It's actually pretty small
Assignee: nobody → adw
Status: NEW → ASSIGNED
Comment on attachment 8989560 [details]
Bug 1469651 - For autofill, collapse all prefixes per host.

https://reviewboard.mozilla.org/r/254590/#review261622

::: toolkit/components/places/UnifiedComplete.js:319
(Diff revision 1)
> +            GROUP BY host
> +            HAVING host_frecency >= ${SQL_AUTOFILL_FRECENCY_THRESHOLD}
> +          ) AS grouped_hosts
> +          JOIN moz_origins ON moz_origins.host = grouped_hosts.host
>            ORDER BY frecency DESC, id DESC
>            LIMIT 1 `;

off-hand it looks like this query could be simplified

SELECT :query_type,
       fixup_url(host) || '/',
       prefix || host || '/',
       TOTAL(frecency),
       ${bookmarkedFragment} AS bookmarked,
	   id
FROM moz_origins
WHERE host BETWEEN :searchString AND :searchString || X'FFFF'
   OR host BETWEEN 'www.' || :searchString AND 'www.' || :searchString || X'FFFF'
GROUP BY host HAVING frecency > 0 AND frecency = MAX(frecency)
ORDER BY 4 desc
LIMIT 1
Attachment #8989560 - Flags: review?(mak77) → review+
I'm not sure about that...  Combining fixed-up and non-fixed-up hosts in one WHERE clause seems wrong, and experimenting with it a little bit, the multidotted test task in test_autofill_origins.js fails, which seems to corroborate that.

But, getting rid of the JOIN is interesting and IMO the more important part.  I ended up with this:

> ${SQL_AUTOFILL_WITH}
> SELECT :query_type,
>        host || '/',
>        prefix || host || '/',
>        frecency,
>        ${bookmarkedFragment} AS bookmarked,
>        id
> FROM moz_origins
> WHERE host BETWEEN :searchString AND :searchString || X'FFFF'
>       ${conditions}
> GROUP BY host
> HAVING TOTAL(frecency) >= ${SQL_AUTOFILL_FRECENCY_THRESHOLD}
>    AND frecency = MAX(frecency)
> UNION ALL
> SELECT :query_type,
>        fixup_url(host) || '/',
>        prefix || host || '/',
>        frecency,
>        ${bookmarkedFragment} AS bookmarked,
>        id
> FROM moz_origins
> WHERE host BETWEEN 'www.' || :searchString AND 'www.' || :searchString || X'FFFF'
>       ${conditions}
> GROUP BY host
> HAVING TOTAL(frecency) >= ${SQL_AUTOFILL_FRECENCY_THRESHOLD}
>    AND frecency = MAX(frecency)
> ORDER BY frecency DESC, id DESC
> LIMIT 1

test_autofill_origins.js passes with that, but I can't convince myself that it's right.  The sqlite doc says (https://sqlite.org/lang_select.html#resultset third bullet point):

"If a HAVING clause is a non-aggregate expression, it is evaluated with respect to an arbitrarily selected row from the group."

So doesn't frecency = MAX(frecency) just pick out a random row in the group to compare to the max frecency of the group?  If it happened to pick out a row with < max, the HAVING would fail, right?

The next paragraph goes on to say:

"Each expression in the result-set is then evaluated once for each group of rows. If the expression is an aggregate expression [...]  Otherwise, it is evaluated against a single arbitrarily chosen row from within the group."

Doesn't that mean that `id` will be the ID of a random row in the host group?  Not necessarily the one with the max frecency in the group?
Flags: needinfo?(mak77)
(In reply to Drew Willcoxon :adw from comment #6)
> Doesn't that mean that `id` will be the ID of a random row in the host
> group?  Not necessarily the one with the max frecency in the group?

yes, normally a group by will pick a "random" row. This is a bit of a hack, because internally it will sort rows to find the max, and then just use values for the max rows since it has it at hand.
But on a second thought I agree with you, it's a hack and maybe it won't be stable in the future. Likely it doesn't even work in other SQL engines.

Let's keep what you have for now, we can always come back to it later, there isn't much to win in readability if we can't coalesce the UNION, and the join is unavoidable if we want to be sure.
Flags: needinfo?(mak77)
Pushed by dwillcoxon@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cb9a1ff4a249
For autofill, collapse all prefixes per host. r=mak
https://hg.mozilla.org/mozilla-central/rev/cb9a1ff4a249
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Posted patch Beta/62 patchSplinter Review
Here's a Beta/62 patch, no changes from the m-c patch.  I'd like to uplift this but only after we uplift bug 1467627, in order to keep the landing of these bugs the same on 62 as on 63.  I'll request uplift once bug 1467627 is requested/uplifted.
(In reply to Drew Willcoxon :adw from comment #10)
> this but only after we uplift bug 1467627, in order to keep the landing of
> these bugs the same on 62 as on 63

*in order to keep the order of the landing of these bugs the same on 62 as on 63
Comment on attachment 8990425 [details] [diff] [review]
Beta/62 patch

Approval Request Comment
[Feature/Bug causing the regression]:
Bug 1239708 -- improving autofill in the urlbar

[User impact if declined]:
Bug 1239708 is already on beta.  If we decline this patch, autofill will actually be unintentionally worse in some cases for users on 62 compared to what they would have had if bug 1239708 hadn't been fixed.

[Is this code covered by automated tests?]:
Yes

[Has the fix been verified in Nightly?]:
No

[Needs manual test from QE? If yes, steps to reproduce]:
No

[List of other uplifts needed for the feature/fix]:
None

[Is the change risky?]:
Low risk

[Why is the change risky/not risky?]:
This is a fairly small change in only one file, and the functionality is well covered by automated tests.

[String changes made/needed]:
None
Attachment #8990425 - Flags: approval-mozilla-beta?
This looks less risky (no migration), but from discussion with :adw, it should still land after we land the patch from bug 1467627 in beta. (Probably Monday for both patches).
Comment on attachment 8990425 [details] [diff] [review]
Beta/62 patch

OK for beta uplift. Please land the patch in bug 1467627 first.
Attachment #8990425 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Drew Willcoxon :adw from comment #12)
> [Is this code covered by automated tests?]:
> Yes

> [Needs manual test from QE? If yes, steps to reproduce]:
> No

Setting qe-verify flag to - based on the automation coverage and Drew's assessment on manual testing needs.
Flags: qe-verify-
Depends on: 1489060
You need to log in before you can comment on or make changes to this bug.