Closed Bug 1313957 Opened 8 years ago Closed 4 years ago

Strings like ";.p" are considered valid urls

Categories

(Core :: Networking, defect, P3)

52 Branch
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: zeusex81, Assigned: valentin)

References

Details

(Whiteboard: [necko-next])

When we do Paste & Go in the location bar it's supposed to detect if it's an uri or not and then go to the site or go to our default search engine, but for urls it seems to only look for the expression "a.b" and it doesn't care whether it's valid or not, ";.p" is okay for firefox.

What I expect it to do :
1) check if random text or uri with more accuracy (valid format, no forbidden char)
hint : https://www.google.com/search?q=uri+regex

2) in case of url check for valid domain extension
hint : https://www.icann.org/resources/pages/tlds-2012-02-25-en

3) check for valid protocol too
But here I'm not sure if it's doable since anyone can create a custom protocol, Chrome tried to do it but now we can't open links like steam://14521051

Or at least if it's too bothersome please add a third entry "Paste & Search" so the user can choose (though I'd like to have this in any case because sometimes I want to search for urls or stuff that look like it).

:)
I don't think this is a problem with paste and go but simply with how Firefox interprets strings in the urlbar.  If you type in ;.p and hit return, Firefox tries to navigate to http://;.p/ or http://www.;.p/, same as what happens if you paste-and-go it.

Do you have example text where paste and go behaves differently from pasting and then hitting the return key?
Flags: needinfo?(zeusex81)
Oh yes indeed it's not specific to paste & go, I always got the bug that way so I was completely focused on it but it's the same thing when typing with keyboard.
This is likely a missed detection in URIFixup, we just trust what it thinks. I think we'll never reach a 100% cover of all the cases there, since it's mostly heuristics and heuristics can always be tricked by well-forged strings.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(zeusex81)
Priority: -- → P5
Summary: Paste & Go is stupid, it doesn't accurately detects urls → Strings like ";.p" are considered valid urls
Whiteboard: [fxsearch]
https://url.spec.whatwg.org/#concept-host-parser seems to indicate ";" is valid in a hostname (specifically, it's not in the list in step 5 in that algorithm). Anne, am I missing something?

For the TLDs, the relevant bug is bug 1080682.
Flags: needinfo?(annevk)
Basically, I think the main issue for this bug is that our URI code thinks "http://www.;.com" is a perfectly valid URL. If that is correct, this bug is INVALID (or a dupe of the TLD bug if we want to argue over .com vs. .p), and if it isn't, it should be moved to Core :: Networking and we should change our URI parser.
I guess the question is how much restrictions we can put there and not break existing code. E.g., technically you cannot have "_" in a domain, but enough subdomains use "_" for us to absolutely have to support that.

There might also be non-DNS-based systems that do something with ";", although that's a somewhat rarer case and I'm not sure how much we need to care about that at this point.
Flags: needinfo?(annevk)
Per discussion on IRC and observing the behaviour of Chrome and Safari, I'll suggest a spec change to include ";" and I'll move this to networking to update the URL parser.
Component: Location Bar → Networking
Priority: P5 → --
Product: Firefox → Core
Assignee: nobody → valentin.gosu
Whiteboard: [fxsearch] → [necko-active][fxsearch]
";" was just an example.
I don't know how much chrome is respectful in regards of the specs but here's the characters that only firefox accepts : http://www.$%*+=!;,<>|&~^`'"(){}.com/
It also accepts empty strings : https://.mozilla.org/ https://bugzilla..org/ https://bugzilla.mozilla./ https://../
Whiteboard: [necko-active][fxsearch] → [necko-active]
(In reply to zeusex81 from comment #8)
> ";" was just an example.
> I don't know how much chrome is respectful in regards of the specs but
> here's the characters that only firefox accepts :
> http://www.$%*+=!;,<>|&~^`'"(){}.com/

We want to disallow % in bug 1311107.
It's a good question whether we want to allow the other characters.

> It also accepts empty strings : https://.mozilla.org/ https://bugzilla..org/
> https://bugzilla.mozilla./ https://../

The empty hostname (http:///) is handled in bug 1275746. We fixed it for a while, but had to back it out recently.

Chrome seems to accept:
https://../
https://bugzilla.mozilla./
https://bugzilla..org/
and throws for
https://.mozilla.org/

(In reply to zeusex81 from comment #0)
> 2) in case of url check for valid domain extension
> hint : https://www.icann.org/resources/pages/tlds-2012-02-25-en

This does not scale. New TLDs show up every day, and users are able to define their own extensions via the hosts file, or via the local (company) DNS resolver.

> 3) check for valid protocol too
> But here I'm not sure if it's doable since anyone can create a custom
> protocol, Chrome tried to do it but now we can't open links like
> steam://14521051

Right. We probably don't want to restrict that.
Whiteboard: [necko-active] → [necko-next]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P2
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3

I think all of the cases mentioned in this bug are fixed now.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.