Closed Bug 1184960 Opened 6 years ago Closed 6 years ago

Limit what we try to get search suggestions for

Categories

(Firefox :: Address Bar, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 42
Iteration:
42.3 - Aug 10
Tracking Status
firefox42 --- fixed

People

(Reporter: mak, Assigned: mak)

References

(Depends on 1 open bug)

Details

(Whiteboard: [suggestions][fxsearch])

Attachments

(1 file, 1 obsolete file)

Currently we are just using the user typed string, but I think this has some performance and privacy disadvantages:

1. We should likely limit what we query about to a maximum number of characters, it's unlikely to get suggestions for very long phrases and we just pay a performance price for low benefit. This limit should be discussed.

2. We should probably not ask suggestions about things that look like urls, to avoid disclosing potentially unsafe informations about networks or passwords.

Sure, we'll get different results from the search bar, but this is the location bar and it expects different kind of input.
Flags: firefox-backlog+
Having 1 settable through an about:config pref might also be good for users willing to keep search suggestions enabled but not willing to send sensitive info, for example they could set it to 2 or 3 and get suggestions for jut a few letters.
Assignee: nobody → mak77
Status: NEW → ASSIGNED
(In reply to Marco Bonardo [::mak] from comment #0)
> Currently we are just using the user typed string, but I think this has some
> performance and privacy disadvantages:
> 
> 1. We should likely limit what we query about to a maximum number of
> characters, it's unlikely to get suggestions for very long phrases and we
> just pay a performance price for low benefit. This limit should be discussed.

Seems pretty straightforward, as you say we'll have to consider the useful limit here. Perhaps we could be smart though. Once the search engine has only returned a single suggestion for a string just stop asking for more suggestions as the user types more.

> 2. We should probably not ask suggestions about things that look like urls,
> to avoid disclosing potentially unsafe informations about networks or
> passwords.

This warrants discussion around how smart we want to be here too. http[s]?:// starting strings obviously should skip suggestions but does that imply we shouldn't request suggestions until the user has entered at least eight characters? Do we skip for "www." starting strings? What about "foo.com". Probably best to start simple and break off other cases we might want to do into other bugs.
(In reply to Dave Townsend [:mossop] from comment #2)
> > 1. We should likely limit what we query about to a maximum number of
> > characters, it's unlikely to get suggestions for very long phrases and we
> > just pay a performance price for low benefit. This limit should be discussed.
> 
> Seems pretty straightforward, as you say we'll have to consider the useful
> limit here. Perhaps we could be smart though. Once the search engine has
> only returned a single suggestion for a string just stop asking for more
> suggestions as the user types more.

Yes we could do that for performance.
But here I also wanted to provide a privacy benefit. Some users might want to use the feature, but might be scared of being tracked. They could set the chars limit to 2 or 3 letters to still have benefit without disclosing info.
My current patch just adds a pref (default 20) that limits number of chars used.
We could do both things?

> > 2. We should probably not ask suggestions about things that look like urls,
> > to avoid disclosing potentially unsafe informations about networks or
> > passwords.
> 
> This warrants discussion around how smart we want to be here too.
> http[s]?:// starting strings obviously should skip suggestions but does that
> imply we shouldn't request suggestions until the user has entered at least
> eight characters? Do we skip for "www." starting strings? What about
> "foo.com". Probably best to start simple and break off other cases we might
> want to do into other bugs.

My current patch is very simple:
1. split the search string in tokens (based on whitespace)
2.don't ask for suggestions if any token contains any of "/", "@", ":", "." (it's hard to detect if it's a typoed url, since it might contain passwords or private ips I preferred to be picky)
3. don't ask for suggestions if any token passes new URL(token)
4. don't ask for suggestions if the first token is a fixupURI whitelisted domain

Do you think this might be a good starting point?
Flags: needinfo?(dtownsend)
Sounds like a good start. Once we have something in it would probably be worth blogging and posting to the newsgroups, the community probably have good ideas for how to improve this.
Flags: needinfo?(dtownsend)
Attached patch patch v1 (obsolete) — Splinter Review
balancing reviews load.

This applies on top of my patches that are pending review.
Attachment #8637931 - Flags: review?(dtownsend)
Comment on attachment 8637931 [details] [diff] [review]
patch v1

Review of attachment 8637931 [details] [diff] [review]:
-----------------------------------------------------------------

Looking good. A few changes here though.

::: toolkit/components/places/UnifiedComplete.js
@@ +927,5 @@
>  
>    *_matchSearchSuggestions() {
> +    // Limit the string sent for search suggestions to a maximum length.
> +    let searchString = this._searchTokens.join(" ")
> +                           .substr(0, Prefs.maxCharsForSearchSuggestions);

You combine the tokens here then split them again in _prohibitSearchSuggestionsFor which seems wasteful. I see you're doing it to limit the string length but if it is already over maxCharsForSearchSuggestions won't we already bail early because this._prohibitSearchSuggestions is set?

@@ +973,5 @@
> +      return true;
> +
> +    // Getting suggestions for a single letter is not going to bring very
> +    // useful results.
> +    if (searchString.length < 2)

I think we should put this number in pref a preference, privacy or bandwidth sensitive folks might want to set this to 6 so they at least type "https:" before this passes and we bail below.

@@ +987,5 @@
> +      // Can it be parsed as a valid URL?
> +      try {
> +        new URL(token);
> +        return true;
> +      } catch(ex) {}

From what I can see token is never a valid URL unless it contains a ":" so I don't think this test gives us anything.

@@ +993,5 @@
> +      // The first token may be a whitelisted host.
> +      if (i == 0 &&
> +          REGEXP_SINGLEWORD_HOST.test(token) &&
> +          Services.uriFixup.isDomainWhitelisted(token, -1))
> +        return true;

Feels weird to have this inside the loop. I think I'd prefer to do this test after the split and before the some check starts.
Attachment #8637931 - Flags: review?(dtownsend)
(In reply to Dave Townsend [:mossop] from comment #6)
> ::: toolkit/components/places/UnifiedComplete.js
> @@ +927,5 @@
> >  
> >    *_matchSearchSuggestions() {
> > +    // Limit the string sent for search suggestions to a maximum length.
> > +    let searchString = this._searchTokens.join(" ")
> > +                           .substr(0, Prefs.maxCharsForSearchSuggestions);
> 
> You combine the tokens here then split them again in
> _prohibitSearchSuggestionsFor which seems wasteful. I see you're doing it to
> limit the string length but if it is already over
> maxCharsForSearchSuggestions won't we already bail early because
> this._prohibitSearchSuggestions is set?

If the string is over maxCharsForSearchSuggestions we don't bail out, we just ask suggestions for up to maxCharsForSearchSuggestions chars of the string.
This is sort of imperfect but the alternative would not support setting the max chars pref to a low value to avoid being eventually tracked (like, if you set it to 3 chars, as soon as you type a 4th char you lose all suggestion results).

_prohibitSearchSuggestions is set if the last search found only 0 or 1 result and the new search is longer.

So, I'm not sure how to avoid the join/split, I could pass the tokens to _prohibitSearchSuggestionsFor, but then it might do more work than needed on a long string (like a data uri). Any idea?

> > +    // Getting suggestions for a single letter is not going to bring very
> > +    // useful results.
> > +    if (searchString.length < 2)
> 
> I think we should put this number in pref a preference, privacy or bandwidth
> sensitive folks might want to set this to 6 so they at least type "https:"
> before this passes and we bail below.

I'm not sure how that would be useful, that means you have to type long text before suggestions kick in suddenly. It sorts of defeats their value. At that point it would make more sense for those users to disable suggestions in Options, imo. Plus we are already bailing out on urls.
(In reply to Marco Bonardo [::mak] from comment #7)
> (In reply to Dave Townsend [:mossop] from comment #6)
> > ::: toolkit/components/places/UnifiedComplete.js
> > @@ +927,5 @@
> > >  
> > >    *_matchSearchSuggestions() {
> > > +    // Limit the string sent for search suggestions to a maximum length.
> > > +    let searchString = this._searchTokens.join(" ")
> > > +                           .substr(0, Prefs.maxCharsForSearchSuggestions);
> > 
> > You combine the tokens here then split them again in
> > _prohibitSearchSuggestionsFor which seems wasteful. I see you're doing it to
> > limit the string length but if it is already over
> > maxCharsForSearchSuggestions won't we already bail early because
> > this._prohibitSearchSuggestions is set?
> 
> If the string is over maxCharsForSearchSuggestions we don't bail out, we
> just ask suggestions for up to maxCharsForSearchSuggestions chars of the
> string.
> This is sort of imperfect but the alternative would not support setting the
> max chars pref to a low value to avoid being eventually tracked (like, if
> you set it to 3 chars, as soon as you type a 4th char you lose all
> suggestion results).

Oh right, I was conflating the limit and the cut-off when the last search returns too little results.

> _prohibitSearchSuggestions is set if the last search found only 0 or 1
> result and the new search is longer.
> 
> So, I'm not sure how to avoid the join/split, I could pass the tokens to
> _prohibitSearchSuggestionsFor, but then it might do more work than needed on
> a long string (like a data uri). Any idea?
> 
> > > +    // Getting suggestions for a single letter is not going to bring very
> > > +    // useful results.
> > > +    if (searchString.length < 2)
> > 
> > I think we should put this number in pref a preference, privacy or bandwidth
> > sensitive folks might want to set this to 6 so they at least type "https:"
> > before this passes and we bail below.
> 
> I'm not sure how that would be useful, that means you have to type long text
> before suggestions kick in suddenly. It sorts of defeats their value. At
> that point it would make more sense for those users to disable suggestions
> in Options, imo. Plus we are already bailing out on urls.

Good point, I guess the only leaked data in that case is that we'd request suggestions for "https" before cutting out. Congratulations search engine, you know your user visits websites!
Attached patch patch v2Splinter Review
thanks to the new patch in bug 1172937 I could decouple this from those changes, so my patches queue is far easier to manage and this can land sooner or later.
Attachment #8637931 - Attachment is obsolete: true
Attachment #8639228 - Flags: review?(dtownsend)
note to self: fix the reviewer in the checkin comment
Comment on attachment 8639228 [details] [diff] [review]
patch v2

Review of attachment 8639228 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good
Attachment #8639228 - Flags: review?(dtownsend) → review+
Flags: in-testsuite?
https://hg.mozilla.org/mozilla-central/rev/4be4031b55e3
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 42
Flags: in-testsuite? → in-testsuite+
Iteration: --- → 42.3 - Aug 10
I've found an issue that may have regressed from this patch:
Start typing in the url bar several search terms (3 should be enough) that provide suggestions - then type gibberish (eg mozilla organizational structure kmkmkmkmkmkm). The result is that the suggestions list is not updated (removed), all the suggestions previously displayed for the search term are still shown.  

Was this the intention of this patch? Thank you!
Flags: needinfo?(mak77)
(In reply to Petruta Rasa [QA] [:petruta] from comment #14)
> I've found an issue that may have regressed from this patch:
> Start typing in the url bar several search terms (3 should be enough) that
> provide suggestions - then type gibberish (eg mozilla organizational
> structure kmkmkmkmkmkm). The result is that the suggestions list is not
> updated (removed), all the suggestions previously displayed for the search
> term are still shown.  

could you please provide better steps to reproduce? I'm unable to reproduce but I think I did it wrong... when you write the several terms, how do you proceed? mouse, keyboard? do you remove the previous term or select it or what?

This is the intended behavior here:
1. If the final word you type is longer than 20 chars, we will only keep searching for the first 20 chars.
2. If a search returns no suggestions and you keep typing more, we won't even try to search
3. If you type something that looks like a URL, we should not search it
4. if you type a whitelist host (like localhost) we should not search it

anything outside of this is likely worth filing a bug with detailed steps to reproduce.
Flags: needinfo?(mak77) → needinfo?(petruta.rasa)
I believe it's no. 1 above that I'm seeing:
I type using the keyboard several words (total exceeds 20 characters) and notice the suggestions.
The suggestions list is no longer updated after 20 characters. So if I type "mozilla organizational structure" or "mozilla organizational chart" the suggestions that I see are the same for both. 

The search bar panel is updated for this type of search terms.

I'm adding a screencast: https://drive.google.com/file/d/0B7jBNZBM3mz-VnlxNWxrRm4zSjA/view?usp=sharing

I don't expect to see suggestions that are not matching the search term, but the awesome bar acts differently that's why I'm not sure about this.
Flags: needinfo?(petruta.rasa)
(In reply to Petruta Rasa [QA] [:petruta] from comment #16)
> I believe it's no. 1 above that I'm seeing:

yes, it's the "expected" behavior.
The problem is that if we'd just stop searching after 20 chars, all the results would suddenly disappear, but they still have value against the first part of the search string. We could probably tweak the char limit a little bit, generally it should be rare that a user has to write that many chars before finding a good result. Or we could still throw away the results M chars after the limit. These are all possible enhancements. But it's not a bug, since working as intended.
Depends on: 1551048
Depends on: 1551049
You need to log in before you can comment on or make changes to this bug.