Closed Bug 1080682 Opened 10 years ago Closed 4 years ago

Use PSL to do a search for foo.barrr URL bar entries which aren't known domains/TLDs, with the same infobar as for single-word searches

Categories

(Firefox :: Address Bar, defect, P2)

defect
Points:
5

Tracking

()

VERIFIED FIXED
Firefox 77
Iteration:
77.2 - Apr 20 - May 3
Tracking Status
firefox77 --- verified

People

(Reporter: tim, Assigned: mak)

References

Details

(Whiteboard: [fxsearch])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0
Build ID: 20140923175406

Steps to reproduce:

In the address bar search for anything that contains a period but no spaces. This is fairly common when searching for API or programming terms:

console.log
response.write
async.each
etc...


Actual results:

Firefox responds with Server not Found error.


Expected results:

Since the search contains no valid TLD, the expected result is to handle the query as a search.
Blocks: 693808
Status: UNCONFIRMED → NEW
Component: Untriaged → Location Bar
Ever confirmed: true
OS: Mac OS X → All
Hardware: x86 → All
Version: 32 Branch → Trunk
Gerv, do you know if we have some kind of builtin list of valid TLDs?
Flags: needinfo?(gerv)
(In reply to :Gijs Kruitbosch from comment #1)
> Gerv, do you know if we have some kind of builtin list of valid TLDs?

No, this is the job of the DNS, and an invalid TLD in one network may be valid elsewhere.
(In reply to Dão Gottwald [:dao] from comment #2)
> (In reply to :Gijs Kruitbosch from comment #1)
> > Gerv, do you know if we have some kind of builtin list of valid TLDs?
> 
> No, this is the job of the DNS, and an invalid TLD in one network may be
> valid elsewhere.

Sure, the question is whether you want identical behaviour for someone typing:

www.somenonexistingdomain.com

and

array.each

both of which currently would return the same server not found error (in some cases after considerable time).

I would argue that the behaviour should not be the same, in that the former should try to resolve first and not bother doing a search (but maybe provide UI on the error page to do a search) whereas the latter should search first.

DNS doesn't allow you to distinguish these; a list of IANA-registered TLDs would.
Chrome uses the Public Suffix List (http://www.publicsuffix.org/) to distinguish search terms from domain names. However, the problem with this is that it requires your copy of the PSL to be very up to date; if it's not, some domains are unnavigable because the browser insists on doing a search anyway.

I'd say we should do a DNS request, but then do a search after a short timeout (shorter than normal), with an infobar which says "Were you actually trying to reach the website foo.bar?".

Gerv
Flags: needinfo?(gerv)
(In reply to Gervase Markham [:gerv] from comment #4)
> Chrome uses the Public Suffix List (http://www.publicsuffix.org/) to
> distinguish search terms from domain names. However, the problem with this
> is that it requires your copy of the PSL to be very up to date; if it's not,
> some domains are unnavigable because the browser insists on doing a search
> anyway.
> 
> I'd say we should do a DNS request, but then do a search after a short
> timeout (shorter than normal), with an infobar which says "Were you actually
> trying to reach the website foo.bar?".

What would be the issue with us trying to keep Firefox as up-to-date with the PSL as Chrome does?
AIUI, we have more of a trailing version problem than they do. Perhaps going forward, that won't be so. But I'd want to be confident first.

This use is not what the PSL was designed for; using it that way has a number of disadvantages. If we are seriously considering it, I'll write them up so we can make a wise decision.

Gerv
(In reply to Gervase Markham [:gerv] from comment #6)
> AIUI, we have more of a trailing version problem than they do. Perhaps going
> forward, that won't be so. But I'd want to be confident first.

Trailing Firefox version or trailing PSL version? Because even if Firefox gets out of date, it should still be able to download updated PSL lists, right? (Barring any schema changes to the PSL)
 
> This use is not what the PSL was designed for; using it that way has a
> number of disadvantages. If we are seriously considering it, I'll write them
> up so we can make a wise decision.

I think we should seriously consider using the PSL for this. It gets us the best possible solution for the time being (things like example.mov will still fail to do a search, but such is the bane of gTLDs).
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #7)
> Trailing Firefox version or trailing PSL version?

I meant trailing Firefox version.

> Because even if Firefox
> gets out of date, it should still be able to download updated PSL lists,
> right? (Barring any schema changes to the PSL)

If we built an automated-update system for the PSL, yes. Which we currently do not have.

> I think we should seriously consider using the PSL for this. It gets us the
> best possible solution for the time being (things like example.mov will
> still fail to do a search, but such is the bane of gTLDs).

I think the best possible solution is to identify ambiguous text (any "foo.bar"), and do both search and DNS at the same time. We can then prioritise one or the other (might be a different decision for different "TLDs") but offer the other in an infobar. No need to use the PSL for that.

Gerv
(In reply to Gervase Markham [:gerv] from comment #8)
> I think the best possible solution is to identify ambiguous text (any
> "foo.bar"), and do both search and DNS at the same time. We can then
> prioritise one or the other (might be a different decision for different
> "TLDs") but offer the other in an infobar. No need to use the PSL for that.

If Firefox starts sending all "foo.com" entries in the location bar to their search provider, IMO that would probably be considered a serious privacy issue. It'd also not be a great solution for mobile.

Both of these are reasons why it'd be useful to have an offline list (PSL or otherwise).

I've not looked at the format for the PSL, but naively I would have thought that building a decent update system shouldn't be that hard...
True. Doh.

I've been arguing for an update system for the PSL for some time; it would be great if someone could build one. Note, though, that it gets "compiled" for Firefox from the original text format.

Gerv
So we can probably break this in to two bugs. One for getting the URI fixup code to see if a public suffix is in use, and another bug to implement updating the local copy of the PSL list.
Depends on: 1083971
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #11)
> So we can probably break this in to two bugs. One for getting the URI fixup
> code to see if a public suffix is in use, and another bug to implement
> updating the local copy of the PSL list.

Split off the updating to bug 1083971, morphing this to implement the actual search behaviour and twiddling some of the requisite flags.
Points: --- → 8
Flags: qe-verify+
Flags: in-testsuite?
Flags: firefox-backlog+
Summary: Search terms that contain a period, but no spaces results in Server Not Found in Address Bar → Use PSL to do a search for foo.bar URL bar entries which aren't known domains, with the same infobar as for single-word searches
Gavin, can we add this into the scratchpad for the upcoming iteration?
Flags: needinfo?(gavin.sharp)
Given the dependency on bug 1083971, I'm not sure that makes sense. If we're going to tackle this and bug 1083971 at the same time we'll need more space overall, and I don't think we have that much space in the next few iterations.
Flags: needinfo?(gavin.sharp)
(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #14)
> Given the dependency on bug 1083971, I'm not sure that makes sense. If we're
> going to tackle this and bug 1083971 at the same time we'll need more space
> overall, and I don't think we have that much space in the next few
> iterations.

Do you think fixing bug 1083971 is a hard requirement here? I am not convinced that is the case. We already use the list to make e.g. cookie setting decisions and coloring in the location bar (highlighting the domain), which IMO is "worse" than the failure case here (cookies don't work and/or security issues vs. you get a search with infobar instead of directly getting the website). Chrome does a 6-week cycle for these (cf.  https://code.google.com/p/chromium/issues/detail?id=323402 ) so as long as we always uplift this stuff to beta, we should be fine here.

I think I'd prefer to fix this first and then take work on the auto-update stuff (which probably needs a breakdown as it'll need server infra work).
Flags: needinfo?(gavin.sharp)
After comment 11, I was going to comment that the problem with breaking this issue into two bugs is that someone will suggest we fix one and not the other, leading to all the problems I've outlined above. (Because I suspect that once this is done, the impetus to write the auto-update mechanism will dissipate.) But I didn't even have time to predict that before Gijs already suggested it in comment 15 :-)

The comparison with cookies is not apt, because the fallback behaviour is different. For cookies, if the suffix is not in the PSL, it falls back to treating it like .com - i.e. flat registrations below the TLD. For most TLDs, and most new gTLDs, that's fine. So being a bit out of date is not a disaster. Having said that, we do experience some problems with out-of-dateness and an auto-update mechanism would be a great thing even if we didn't use it for this use.

However, in this case, the fallback behaviour would make the site much harder to reach (because you'd get search results instead), which is a much more serious failure. We would get a lot of irritation from the owners of domains under new gTLDs who expect their names to work. Chrome already gets a reasonable amount of this flak.

We've also not considered the issue of internal names. "array.each" could very well be a server on the intranet of Each Corporation. Are they expected to click through the infobar every time they want to visit an internal site?

The DNS should be the arbiter of what is and isn't a valid name. Different networks have different answers; a static list will get this wrong. Doing search results after a couple of seconds, or as a clickthrough, would be a great usability enhancement. Doing them by default would be a usability disenhancement.

Gerv
(In reply to Gervase Markham [:gerv] from comment #16)

Snipping the top part. I see your point. I'll suggest we pick up the other bug first, then. Clearing gavin's needinfo here.

> We've also not considered the issue of internal names. "array.each" could
> very well be a server on the intranet of Each Corporation. Are they expected
> to click through the infobar every time they want to visit an internal site?

We should probably have a pref for additional domain names and default it to include .local. Corporations use distributed installs and can use that to add their own suffixes if required.

> The DNS should be the arbiter of what is and isn't a valid name. Different
> networks have different answers; a static list will get this wrong. Doing
> search results after a couple of seconds, or as a clickthrough, would be a
> great usability enhancement. Doing them by default would be a usability
> disenhancement.

Hm. So I only just realized we don't yet do what we used to do for single-word lookups (no dot), which is to redirect to search if DNS fails. That can be another separate bug, I guess.

I (wrongly) assumed that this is essentially the same case as what we fixed for single words, which is to always do a search unless we've whitelisted the host (in this case, the TLD).

The problem we fixed for single-word lookups is that they can be super slow on some networks (order of many seconds instead of the milliseconds as we (people emailed for this bug right now) are probably used to). Mobile goes in this bucket as well.

The reason certain other browsers (...) /feel/ faster is in part stuff like this. If you wait even 200ms before hitting the search engine instead, it'll feel slower to the user.

Your suggestion that we should wait for "a couple of seconds" doesn't fix the DNS issue unless we wait for DNS, which as said can be very slow (more than 2s :-) ).

I would suggest that the main issue here wouldn't be enterprise users (who are a minority anyway and could get their enterprise to set this up in their configurations) or users who locally use particular domains (who can use a similar whitelist as we have for non-dot-things-that-resolve-in-DNS, and who I don't think will have so many intranet/local domains that this will require more than 1-2 failure cases).

Instead, I think the flip side here is privacy (see also the discussion in bug 1083634). I'm not sure if that is serious enough to make this a clickthrough instead of automatic.
Flags: needinfo?(gavin.sharp)
(In reply to :Gijs Kruitbosch from comment #17)
> We should probably have a pref for additional domain names and default it to
> include .local. Corporations use distributed installs and can use that to
> add their own suffixes if required.

I'm sure there are plenty of small companies which don't use that kind of system for rolling out software.

My point is: if you think you can decide what is a resolvable name without consulting the DNS, you are going to have a bad time. Chrome does this, and it's caused problems with new gTLDs, and other problems too. As maintainer of the PSL people complain to me, and I have to shrug and point them at specific bits of software.

An auto-update system helps with the gTLD problem, but it doesn't help with the local names problem.

> Hm. So I only just realized we don't yet do what we used to do for
> single-word lookups (no dot), which is to redirect to search if DNS fails.
> That can be another separate bug, I guess.

Why not just do that, and use the same system for foo.bar?

> I (wrongly) assumed that this is essentially the same case as what we fixed
> for single words, which is to always do a search unless we've whitelisted
> the host (in this case, the TLD).

When did that change go in?

> The problem we fixed for single-word lookups is that they can be super slow
> on some networks (order of many seconds instead of the milliseconds as we
> (people emailed for this bug right now) are probably used to). Mobile goes
> in this bucket as well.

Well, fine; adopt a quicker timeout then pop up an infobar if the DNS finally comes back with a non-NXDOMAIN answer.

> The reason certain other browsers (...) /feel/ faster is in part stuff like
> this. If you wait even 200ms before hitting the search engine instead, it'll
> feel slower to the user.

How do other browsers deal with the issues I've been outlining? Does Chrome make people whitelist all their internal network names?

Chrome has more of a problem because they have an omnibox, whereas we don't.

Gerv
(In reply to Gervase Markham [:gerv] from comment #18)
> (In reply to :Gijs Kruitbosch from comment #17)
> > We should probably have a pref for additional domain names and default it to
> > include .local. Corporations use distributed installs and can use that to
> > add their own suffixes if required.
> 
> I'm sure there are plenty of small companies which don't use that kind of
> system for rolling out software.
> 
> My point is: if you think you can decide what is a resolvable name without
> consulting the DNS, you are going to have a bad time. Chrome does this, and
> it's caused problems with new gTLDs, and other problems too. As maintainer
> of the PSL people complain to me, and I have to shrug and point them at
> specific bits of software.
> 
> An auto-update system helps with the gTLD problem, but it doesn't help with
> the local names problem.

What's "the local names problem"?
 
> > Hm. So I only just realized we don't yet do what we used to do for
> > single-word lookups (no dot), which is to redirect to search if DNS fails.
> > That can be another separate bug, I guess.
> 
> Why not just do that, and use the same system for foo.bar?

Firstly, because as I said later on, that's not enough in terms of providing a good UX.

Secondly, because I'm far more likely to mistype the domain than the gTLD (mostly by function of them being longer), and the same privacy-conscious people who complained about the change for single-word input would be even more upset if typing e.g. "www.mozilllla.org" took them to the search engine. [0]

[0] I was too lazy to look for a working example with a levenshtein distance of 1 for more than 5 minutes; most things I tried were domain-squatted

> > I (wrongly) assumed that this is essentially the same case as what we fixed
> > for single words, which is to always do a search unless we've whitelisted
> > the host (in this case, the TLD).
> 
> When did that change go in?

bug 693808, so Firefox 33

> > The problem we fixed for single-word lookups is that they can be super slow
> > on some networks (order of many seconds instead of the milliseconds as we
> > (people emailed for this bug right now) are probably used to). Mobile goes
> > in this bucket as well.
> 
> Well, fine; adopt a quicker timeout then pop up an infobar if the DNS
> finally comes back with a non-NXDOMAIN answer.

That's essentially what we do now, except our timeout is 0. Adding a non-zero timeout here would take significant engineering effort. I would say that the ROI of making the timeout 200ms is essentially 0 (because it'll still be noticeably slow(er) than an immediate search even when DNS is slower than 200ms), and the ROI of making it 1s or higher is just negative.

> > The reason certain other browsers (...) /feel/ faster is in part stuff like
> > this. If you wait even 200ms before hitting the search engine instead, it'll
> > feel slower to the user.
> 
> How do other browsers deal with the issues I've been outlining? Does Chrome
> make people whitelist all their internal network names?

Yes. But "whitelist" sounds onerous. Both Chrome and us pop up an infobar in this case (we still do the DNS lookup!), and if you click the "yes, take me to myfancylocalserver" button, that automatically whitelists that host.

> Chrome has more of a problem because they have an omnibox, whereas we don't.

Maybe, but so do IE and Safari. We're the only popular browser I know that still has a discrete search box, and people come with the habit and expectation that searching from the location bar will also work - and it mostly does, just badly in some cases. We should fix that.
(In reply to :Gijs Kruitbosch from comment #19)
> What's "the local names problem"?

The fact that only my local DNS server knows whether "for.each" is a resolvable name on my local network. No centralized list can tell the browser this.

> > > Hm. So I only just realized we don't yet do what we used to do for
> > > single-word lookups (no dot), which is to redirect to search if DNS fails.
> > > That can be another separate bug, I guess.
> > 
> > Why not just do that, and use the same system for foo.bar?
> 
> Firstly, because as I said later on, that's not enough in terms of providing
> a good UX.

Well, it's worked OK for years :-)

> > When did that change go in?
> 
> bug 693808, so Firefox 33

Having read that entire bug, I don't see the part where the pros and cons of making the change were weighed up, and a clear-eyed decision was made that the breakage of doing this was worth the improvement. Where did that happen?

If I can't talk you out of this, I think you should do the following:

* Have a whitelist of TLDs as well as one of hosts
* Add .local, .localhost to it by default, along with .corp, .home and .mail
  <http://blog.icann.org/2014/02/mitigating-dns-namespace-collisions/>
* If someone types foo.bar and says "I want to visit this site", whitelist the entirety of ".bar", not 
  just "foo.bar".

Gerv
An idea from bug 1198832:
> It was said in bug 1196906 that Shift key is used to bypass urlbar actions, so it may be used
> here as well, to allow searching for domains even if they are known.
Whiteboard: [fxsearch][triage]
Tracking PSL updates being so bad an idea (cf. comment 16) must be the reason why this issue hasn't seen update for more than a year.

With "Omnibar" behavior since Firefox 46 (or so), what is the reason, again, for not doing simply:

1. a DNS query;
2. if the domain is unresolvable, if the user has "Visited this site prior to today", show the neterror;
3. else, do a search query with default search provider.

The additional, history-scrutinizing condition in step 2 is to alleviate any privacy concerns. If the user visited that site before, it may really be down now. But if the site is new, it may as well be a typo. In any case, it being a typo or the user being a paranoid loony seems a much less frequent use case compared to the whole of IT sector looking up their reference!

Privacy-conscious individuals should and do use a custom, "safe" search engine, such as a local instance of Searx, or similar.
(In reply to liquider from comment #26)
> what is the reason, again,
> for not doing simply:
> 
> 1. a DNS query;

The fact it can take seconds to get a reply, and in the meanwhile your browser just looks asleep.
Here there are 2 major points:
1. performance: we don't want the browser to look slow
2. privacy: we don't want to send domain-looking things to a search engine by default

The only viable solution proposed so far looks like the whitelist.
I think we might evaluate improving the notfound page contents to finally provide some useful options, for example in this case it could tell something like "Did you mean to search for 'for.each' instead?", and that could populate a whitelist.
The comment 20 whitelist solutions sounds good enough to me.
> 1. performance: we don't want the browser to look slow

If the query looks like a domain name (has dots, no space), the current behavior (since always) is to make a DNS request, so I don't see how the new behavior will negatively affect the perceived performance. Only those users in environments with slow nameservers, entering domain-like, non-resolvable queries AND with an actual search intent will be affected. A minority, <1% use case.

> 2. privacy: we don't want to send domain-looking things to a search engine by default

They are not sent by default. Only if they don't resolve. Don't privacy-conscious people use trustworthy search engines anyway?

> "Did you mean to search for 'for.each' instead?"

Searching from the notfound page is exactly the opposite to comment 20. The most important drawback being the need to confirm searching after every unique query and then keeping multiple opposing lists. I think comment 20 proposes adding the hostname (and its up-level domain(s)) onto the whitelist only if the user chooses the "Visit http://a.b/" in the dropdown list.

Can we agree on "Visit http://a.b/" being the second entry in the dropdown list (and thus a non-default choice, i.e. positioned after "a.b - Search with _") until its DNS result proves positive??
(In reply to liquider from comment #28)
> If the query looks like a domain name (has dots, no space), the current
> behavior (since always) is to make a DNS request, so I don't see how the new
> behavior will negatively affect the perceived performance.

Currently you see it's slow, and then you get a notfound page and you connect the slowness to the fact it's invalid.
With the change you suggest, the only thing you'd see is that the browser took a lot of seconds just to load a search page. It is very different from a perception point of view.
Then you could argue typing something with a dot is a less common case than typing a single word, so it may not matter much, and here we could agree.

> > 2. privacy: we don't want to send domain-looking things to a search engine by default
> 
> They are not sent by default. Only if they don't resolve. Don't
> privacy-conscious people use trustworthy search engines anyway?

Privacy-conscious people want to know where their data goes before it does. Provided a trustworthy good engine exists, it's still something to evaluate with care since our mission.

> The most important drawback being the need to confirm searching after every
> unique query and then keeping multiple opposing lists. I think comment 20
> proposes adding the hostname (and its up-level domain(s)) onto the whitelist
> only if the user chooses the "Visit http://a.b/" in the dropdown list.

You can't choose the visit in the dropdown list, that's just what happens when the input field text is confirmed. Selecting that entry or pressing Enter in the input field is the same exact thing.
So no, that wouldn't work. Afaict the final part of comment 20 is about a general whitelist approach with something that allows to add stuff to it.
I just suggested the notfound page itself could be that thing.
> You can't choose the visit in the dropdown list

What do you mean, of course I can choose an entry from the drop-down? The reason the browser upon Enter navigates to that page is that the "Visit ..." is the first and only entry in it. Were there several entries, the user would be able to choose the action to take (as they are now with a regular search or history-matching query) by clicking or with a Down arrow key.

In this light, "Visit ..." could be a non-default entry (i.e. after "... - Search") when the site is not in user's history and until the DNS resolves?
No, that Visit entry is special, it just shows in the UI what happens when you confirm with Enter the location bar contents. So if enter is going to do a search it will read "Search".
So there could be TWO entries deciding what Enter is going to do, one that says "Search" and one that says "Visit"??
sure, but the problem then would be "what is the default?".
Even if "Visit" remains the default, with two entries, "Search" is most easily available.
Any chance to get this fixed...it's very annoying and it don't happend on Chrome or Opera.

Maeby an option to enable this behaivor https://bugzilla.mozilla.org/show_bug.cgi?id=1080682#c26 would be a great idea...big part of the users are not worried about privacy on doing a domain search and it can be disable by default.
Hi all,

Or could we try do fix the issue in a test pilot add-on universal-search[1]? I'm asking that will universal-search be a system add-on in Firefox in a GitHub issue[2]. If so, we might add the Public Suffix List(or any other ways) to detect a string is a URL or just a keyword, then do the right reaction.

Let's make thing happen, just like Matias said in Comment 36.

Any thought?

[1]: https://testpilot.firefox.com/experiments/universal-search
[2]: https://github.com/mozilla/universal-search/issues/302
(In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #38)
> Hi all,
> 
> Or could we try do fix the issue in a test pilot add-on universal-search[1]?
> I'm asking that will universal-search be a system add-on in Firefox in a
> GitHub issue[2]. If so, we might add the Public Suffix List(or any other
> ways) to detect a string is a URL or just a keyword, then do the right
> reaction.
> 
> Let's make thing happen, just like Matias said in Comment 36.
> 
> Any thought?
> 
> [1]: https://testpilot.firefox.com/experiments/universal-search
> [2]: https://github.com/mozilla/universal-search/issues/302

I very much doubt that any of the test pilot add-ons will be included as a system add-on anytime soon. In any case, it's not clear why exactly the same concerns that have been raised in this bug would not apply just because we'd implement the solution in a different bit of software. It just moves the problem.

We should focus on a fix in the existing implementation. The URL bar behaviour part of it is fairly straightforward - we'd just need to add some more code in nsDefaultURIFixup.cpp . It's getting data out of the PSL and making sure it is up-to-date that is more work, though by now we have kinto, and maybe that can help. In any case, that division of work wouldn't really change just by implementing it in the universal-search add-on.
Assuming there are no objection from UX perspective, we should fix bug 1256074 first, which does not take PSL in-scope.

The bug here represents a feature involve PSL, and we have dependency already set (bug 1083971).
Depends on: 1256074
Priority: -- → P3
Whiteboard: [fxsearch][triage] → [fxsearch]
Just pointing this out because I don't think I saw it in the discussion so far: if the TLD list is incomplete, it only makes it hard to access the site if you're typing the site into the URL bar by hand without a protocol.  Entry into the URL bar with the protocol (e.g., copy-pasting a URL) will still work.

Furthermore, if you've visited the site before, autocomplete will still make the first result what you want.  E.g., if your company's internal site is "company.internal", the first time you type it it might give you a search page by accident.  But if it's in your history, especially if you type it often, as soon as you start to type "comp" it should autocomplete to "http://company.internal" and work fine.

So I don't think this is a big deal if it does misidentify new or private TLDs, or am I missing something?
There's a note here about sending domain names to a search engine, but I don't see any reference here to the opposite - the fact that by doing the wrong thing we're accidentally sending a bunch of our search terms to our DNS provider. Certainly in the case where we've selected, say, DuckDuckGo as a search provider to avoid that sort of leakage, this is bad.

If we had a way to set a pref that added something to PSL for internal networks, would that get us where we need to go?
(In reply to Mike Hoye [:mhoye] from comment #45)
> There's a note here about sending domain names to a search engine, but I
> don't see any reference here to the opposite - the fact that by doing the
> wrong thing we're accidentally sending a bunch of our search terms to our
> DNS provider. Certainly in the case where we've selected, say, DuckDuckGo as
> a search provider to avoid that sort of leakage, this is bad.

Sure, but this bug would only really avoid this for search terms that included a pseudo-domain, so it'd continue to exist for single-word queries. Could you file an orthogonal bug? Perhaps we should provide an opt-out pref, or do something cleverer (unsure what that would be, off-hand).

> If we had a way to set a pref that added something to PSL for internal
> networks, would that get us where we need to go?

We already have this for the non-domain case, it wouldn't be hard to do it for the internal network case, at least in terms of not showing the notification bar / doing DNS lookups.

Note that making the actual PSL pref-extensible is scary to me because of issues like bug 1365892. Randomly changing the PSL at runtime from about:config would make a lot of stuff "fun" (like, hi, I can now break setting cookies / using indexeddb / whatever on "foo.com" by adding it to the PSL, which will maybe help my privacy/security but also maybe break random other bits of the internet in ways I don't fully understand). We know people mess with about:config in ways that basically destroy their experience or make their browser insecure, and users (somewhat) understandably blame us for making that possible. I'm... not keen to open the PSL up to that type of thing.
(In reply to :Gijs (no reviews; PTO recovery mode) from comment #46)
> (In reply to Mike Hoye [:mhoye] from comment #45)
> > There's a note here about sending domain names to a search engine, but I
> > don't see any reference here to the opposite - the fact that by doing the
> > wrong thing we're accidentally sending a bunch of our search terms to our
> > DNS provider. Certainly in the case where we've selected, say, DuckDuckGo as
> > a search provider to avoid that sort of leakage, this is bad.
> 
> Sure, but this bug

Err, *fixing* this bug, of course.

> would only really avoid this for search terms that
> included a pseudo-domain, so it'd continue to exist for single-word queries.
> Could you file an orthogonal bug? Perhaps we should provide an opt-out pref,
> or do something cleverer (unsure what that would be, off-hand).

+ni for this
Flags: needinfo?(mhoye)
I'm sorry, but I don't think I understand the single-word queries problem.

As I understand the situation right now:
- Multi-word entries containing what could be a domain name automatically go to search. ("file.jpg test", "test file.jpg")
- Single-word entries that don't look like domains go to search, 
- Single-word entries that are correct domain names go to DNS and,
- Single-word queries that look like domains but aren't go to DNS. ("file.jpg")

I _think_ that the only thing we're getting wrong there is the last one, and that a PSL lookup in that case only that sends null hits to search would resolve it. I did file https://bugzilla.mozilla.org/show_bug.cgi?id=1411051 but it was duped back to this one.

In the case of users requiring intranet namespaces to work, I honestly don't think setting a pref in about:config is that big a burden. Certainly that's almost invariably going to happen on corporate networks with preconfigured images. For our typical users a refresh gets you out from under that if you break something, and if you're going to cause breakage in Firefox by doing random crap in about:config that you don't understand it's not as though you're short of options.
Flags: needinfo?(mhoye)
(In reply to Mike Hoye [:mhoye] from comment #48)
> I'm sorry, but I don't think I understand the single-word queries problem.

> - Single-word entries that don't look like domains go to search, 

They do. But we *also* look up the domain name. Try adding "foobar" as an alias for localhost in your /etc/hosts file (or Windows equivalent), then type 'foobar' in the URL bar and hit enter. Now you get a search page for "foobar" but also a notification bar that says "Did you mean to go to foobar?", so that it's easy to access such domains if that's the behaviour you wanted.

Of course, in order to provide this functionality, we need to do a DNS lookup. :-)



(In reply to Mike Hoye [:mhoye] from comment #48)
> In the case of users requiring intranet namespaces to work, I honestly don't
> think setting a pref in about:config is that big a burden.

It's not, I just wanted to clarify that I don't want that pref to actually affect the 'real' public suffix list that we use to make internal decisions like cookie and other data's scope, and same origin policy... We can have a pref for what things are intranet suffixes, we already have one for single-word local hosts (which is what that notification bar saves "foobar" to if you say "ah yes, I did mean to go to http://foobar/ instead of searching for 'foobar' ").
Blocks: 1180329
(In reply to Marco Bonardo [::mak] from comment #33)
> sure, but the problem then would be "what is the default?".

Current behaviour (FF Dev Ed 63): as soon as the thing in the address bar looks like it might be a domain name (whether or not it's in DNS), an "http://array.map/ - Visit" option is added below.  The problem is that it's added as the first, default, behaviour, used if you hit Enter, with search as the second option.

Please can search be the default behaviour.

If the string can be quickly determined to be an actual domain (from DNS, history or TLD whitelist) the current behaviour might be acceptable but just blindly assuming that anything with a dot in is a domain name is unhelpful.

I just learned that the Public Suffix List is actually a Mozilla project!

Tangential remark: We know that Chrome isn't the most privacy-conscious browser, but it's also a fact that the Chrome team takes security very seriously and has put their money where their mouth is in that regard. It's not sufficient to blindly follow their lead on all things, but it is definitely a useful signal to consider and weighs in favor of adopting the PSL approach.

An observation: currently for someone intending to search for browserconfig.xml (with the separate search bar hidden), the process of typing in browserconfig.xml in the address bar and quickly hitting enter (yay muscle memory!) without considering that it is going to NXDOMAIN, having to then go back to the search bar, type in browserconfig.xml again and then go through the chore of identifying which drop-down autocomplete entry should be used to invoke a search rather than a navigation is considerably longer than the time it takes to perform either a search or a DNS query in the background. The point being, it's not a good idea to simultaneously perform both a DNS lookup and initiate a search (for privacy reasons) and sequentially performing these tasks one after the other in case of a failure of the first does not result in a user unfriendly UX due to the delay/lack of parallelization.

Looks like Chrome bakes the list in, see e.g. https://bugs.chromium.org/p/chromium/issues/detail?id=610495 , without out-of-band updates.

We now (as of bug 1459891) do this, too, but fortunately our updates are mostly automatic (they are automatically created by a bot based on the authoritative list, and reviewed by a human).

I think that's probably sufficient to move forward with this bug, even if out-of-band updates would be a nice bonus. There should probably be an about:config pref to turn search-for-non-psl-domains off.

Marco, could this be part of the quantumbar improvements?

Depends on: 1459891
Flags: needinfo?(mak77)

The current scope is to provide an experience identical to the old bar, to limit the confusion for users and QA, but it could surely be one of the things we want to do sooner than later.
Should we use it through nsIEffectiveTLDService? I just tried to use getPublicSuffixFromHost, but it doesn't seem to do any kind of verification, it doesn't seem to have a strict option, so I assume we'd have to add a new method to check a given host.

Blocks: 1425024
Depends on: quantumbar
Flags: needinfo?(mak77)
Priority: P3 → P2
Depends on: 1540242
No longer depends on: 1540242

I would prefer it we'd have a meta bug tracking the PSL work, for now I guess bug 1563246 will do.

Depends on: 1563246
Summary: Use PSL to do a search for foo.bar URL bar entries which aren't known domains, with the same infobar as for single-word searches → Use PSL to do a search for foo.bar URL bar entries which aren't known domains/TLDs, with the same infobar as for single-word searches

Mathieu, what's the status of the underlying networking feature? It looks like the product side is done and we're just waiting for bug 1563225? Unfortunately there is still no meta bug tracking this feature we can depend upon.

Flags: needinfo?(mathieu)

Mathieu, what's the status of the underlying networking feature?

As you could figure, the feature on the product side is done. I created this meta bug https://bugzilla.mozilla.org/show_bug.cgi?id=1579856 , I hope that helps!
The remaining bits should be done in the following days.

Flags: needinfo?(mathieu)

Thank you for the update!

Depends on: 1579856

When a developer will be able to schedule some time for it, Firefox is a huge project, resources are limited. It's on our radar anyway, now that the platform fix is complete.

Flags: needinfo?(mathieu)

I'm taking this as a side project to the main backlog, I'm also working on other improvements in this area so it seems to make sense. May take a while anyway, because there's a lot of other work ongoing.

Assignee: nobody → mak77
Status: NEW → ASSIGNED

Does this issue include checking highlighted text against a cached Publix Suffix List? Meaning that a right click on hightlighted text "foobar.this" will recognize the hightlighted text as not being a hyperlink and therefore the context menu will not offer link-actions?

I wonder because technically it might fall under the same scope of URL detection. If not I will create another bug.

(In reply to neubland from comment #73)

Does this issue include checking highlighted text against a cached Publix Suffix List? Meaning that a right click on hightlighted text "foobar.this" will recognize the hightlighted text as not being a hyperlink and therefore the context menu will not offer link-actions?

That would require a separate bug, this is only about the urlbar.

(In reply to neubland from comment #73)

a right click on hightlighted text "foobar.this" will recognize the hightlighted text as not being a hyperlink

That's bug 529085.

Hey, I filed Bug 1610130 which has been resolved as a duplicate.
Figuring out valid alphanumeric tlds seems to be non-trivial, but it shouln't it be easier for things like subprocess.run() or this.rn!'{}+()n which contain non-alphanumeric characters in the tld part and are therefore invalid?

Especially the first example is really annoying if you are searching for code-related stuff. I see that detecting if subprocess.run is a valid domain can be quite difficult but that shouldn't apply for subprocess.run() or am I mistaken there?

No longer depends on: quantumbar
Points: 8 → 5
Depends on: 1621168
Depends on: 1496578
Iteration: --- → 77.1 - Apr 6 - Apr 19
Attachment #9136756 - Attachment description: Bug 1080682 - Use the Public Suffix List to distinguish foo.bar searches in URIFixup (and consequently the Address Bar). r=gijs → Bug 1080682 - Use the Public Suffix List to distinguish foo.bar searches in URIFixup (and consequently the Address Bar). r=gijs!,dao!
Iteration: 77.1 - Apr 6 - Apr 19 → 77.2 - Apr 20 - May 3
Pushed by mak77@bonardo.net:
https://hg.mozilla.org/integration/autoland/rev/18205ec9467e
Use the Public Suffix List to distinguish foo.bar searches in URIFixup (and consequently the Address Bar). r=Gijs,dao
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 77
Regressions: 1632582

I am so happy about this. Thank you Marco for your heroic efforts! (And also everyone else involved!)

I managed to reproduce the issue using an older version of Nightly on Windows 10 x64.
I verified the fix using latest Nightly 77.0a1 on Windows 10 x64, Ubuntu 18.04 x64 and Mac OS 10.11. The issue is not reproducing anymore.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Summary: Use PSL to do a search for foo.bar URL bar entries which aren't known domains/TLDs, with the same infobar as for single-word searches → Use PSL to do a search for foo.barrr URL bar entries which aren't known domains/TLDs, with the same infobar as for single-word searches
Summary: Use PSL to do a search for foo.barrr URL bar entries which aren't known domains/TLDs, with the same infobar as for single-word searches → Use PSL to do a search for foo.bar URL bar entries which aren't known domains/TLDs, with the same infobar as for single-word searches

I specifically changed the summary from foo.bar to foo.barrr because 'bar' appears in the PSL.

Summary: Use PSL to do a search for foo.bar URL bar entries which aren't known domains/TLDs, with the same infobar as for single-word searches → Use PSL to do a search for foo.barrr URL bar entries which aren't known domains/TLDs, with the same infobar as for single-word searches

I want to provide some feedback about this change and see if this can be fixed:

For general usage, this change is good and better than redirecting users to unexisting websites when they want to search.

However, there are some TLDs that aren't present in the Public Suffix List, which means that users who want to visit such domains will instead be redirected to search. Example of such domains are local/reserved TLDs (like .local or .test) which are commonly used for local development/testing, as well as non-IANA-managed TLDs, like OpenNIC (.oss, .libre...) and others, mostly by some decentralized projects (.eth, .bit...).

All of them now just trigger a search. To access them, you now need to manually type http:// or https:// before the domain. Are there any other better solutions? Could Firefox try to resolve such domains to see if they contain existing website (for local/reserved TLDs and OpenNIC which are commonly resolved with custom /etc/hosts file or custom DNS server) and provide some API to extensions so that extensions will be able to tell it that TLDs are valid (for .eth, .bit... which are commonly resolved through extensions).

(In reply to Filip Š from comment #84)

However, there are some TLDs that aren't present in the Public Suffix List, which means that users who want to visit such domains will instead be redirected to search. Example of such domains are local/reserved TLDs (like .local or .test) which are commonly used for local development/testing, as well as non-IANA-managed TLDs, like OpenNIC (.oss, .libre...) and others, mostly by some decentralized projects (.eth, .bit...).

just add domains to the whitelist, create a bool pref like browser.fixup.domainwhitelist.yourdomain.yoursuffix and set it to true.

you can also use browser.fixup.dns_first_for_single_words to force a dns lookup before searching.

(In reply to Marco Bonardo [:mak] from comment #86)

you can also use browser.fixup.dns_first_for_single_words to force a dns lookup before searching.

Would it be possible to have something like this (additional pref), but only for searches that contain (at least one) dot? This would be useful because most commonly, searches without dots are not used as domains (except localhost which is handled separately), while searches containing dots could be commonly used as domains.

Also, I don't know if it's just for me, but Firefox takes longer to "resolve" (try resolving but then fail and search instead) domains without dots (for exampletest, which takes few seconds before redirecting to search) that domains with dots (for exampletest.aaaaa`, which is redirected to search almost immediately).

(In reply to Marco Bonardo [:mak] from comment #85)

just add domains to the whitelist, create a bool pref like browser.fixup.domainwhitelist.yourdomain.yoursuffix and set it to true.

What about whitelisting whole TLDs/suffixes? And along with this, API that allows extensions to modify the list of whitelisted TLDs would be quite useful, for example for extensions that provide support for domains like .eth or .bit.

(In reply to Filip Š from comment #87)

Would it be possible to have something like this (additional pref), but only for searches that contain (at least one) dot? This would be useful because most commonly, searches without dots are not used as domains (except localhost which is handled separately), while searches containing dots could be commonly used as domains.

I don't see why, searches without dots are also used a lot in enterprise. I gave you two solutions that cover your use case pretty well depending on the needs, there are also ways to always force a visit (type the protocol) or always force a search (prepend ?). I don't think we should make the code even more complex for every edge case.

Also, I don't know if it's just for me, but Firefox takes longer to "resolve" (try resolving but then fail and search instead) domains without dots (for exampletest, which takes few seconds before redirecting to search) that domains with dots (for exampletest.aaaaa`, which is redirected to search almost immediately).

I'm still working on bug 1398567 that may help there.

What about whitelisting whole TLDs/suffixes? And along with this, API that allows extensions to modify the list of whitelisted TLDs would be quite useful, for example for extensions that provide support for domains like .eth or .bit.

I don't think extensions should be necessary, you can file an enhancement bug for whitelisting suffixes and we can eval that.

Blocks: 1634592

(In reply to Marco Bonardo [:mak] from comment #88)

I don't see why, searches without dots are also used a lot in enterprise. I gave you two solutions that cover your use case pretty well depending on the needs, there are also ways to always force a visit (type the protocol) or always force a search (prepend ?). I don't think we should make the code even more complex for every edge case.

The reason why I requested this was long resolving time for searches without dots, but if this is going to be fixed, separate pref is not needed.

I don't think extensions should be necessary, you can file an enhancement bug for whitelisting suffixes and we can eval that.

I created bug 1634650.

(In reply to Marco Bonardo [:mak] from comment #85)

(In reply to Filip Š from comment #84)

However, there are some TLDs that aren't present in the Public Suffix List, which means that users who want to visit such domains will instead be redirected to search. Example of such domains are local/reserved TLDs (like .local or .test) which are commonly used for local development/testing, as well as non-IANA-managed TLDs, like OpenNIC (.oss, .libre...) and others, mostly by some decentralized projects (.eth, .bit...).

just add domains to the whitelist, create a bool pref like browser.fixup.domainwhitelist.yourdomain.yoursuffix and set it to true.

My use case is helping family members troubleshot their Internet problems, and that often involve a phone call to them to debug it live over voice (especially these days during self-isolation). Currently, I can ask them to connect to the LAN-only portal hosted on their ISP box to see if the issue is between their computer and that box, or between the box and the rest of the world. Those boxes often make available their settings over unfortunate non-existing TLDs such as ".ip", and now I also have to tell them to go edit scary settings through about:config and whatnot, that seems really unfortunate and user-hostile.

(In reply to Anthony Ramine [:nox] from comment #90)

My use case is helping family members troubleshot their Internet problems

You can use an autoconfig file with specific prefs set to whitelist domains. We'll provide a suffixes whitelist in bug 1634650 to simplify managing that. I'll check also if there's something we can/should do about group policies.

This fix has been done for the many (so many!) user reports we got complaining about the original behavior, so I hope you understand the intent is the opposite of user-hostile (the whole papercuts bugs are supposed to solve common users complains). Of course, it's hard to satisfy every single case, but we'll do whatever possible to support your use-case.

Regressions: 1635033

(In reply to Marco Bonardo [:mak] from comment #91)

(In reply to Anthony Ramine [:nox] from comment #90)

My use case is helping family members troubleshot their Internet problems

You can use an autoconfig file with specific prefs set to whitelist domains. We'll provide a suffixes whitelist in bug 1634650 to simplify managing that. I'll check also if there's something we can/should do about group policies.

I cannot use an autoconfig file, I would need to deploy one somewhere on the Internet and explain over the phone to my family members how to install it.

with bug 1636583 fixed, you can also tell them the address with an ending slash, that will force a visit.

Depends on: 1648940
No longer depends on: 1648940
Regressions: 1693833
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: