Closed Bug 1460097 Opened 6 years ago Closed 6 years ago

Hidden character in pasted url gets wrongly interpreted

Categories

(Firefox :: Address Bar, defect, P2)

defect

Tracking

()

VERIFIED FIXED
Firefox 62
Tracking Status
firefox62 --- fixed

People

(Reporter: sarah, Assigned: bugzilla)

References

Details

(Whiteboard: [fxsearch])

Attachments

(1 file)

In a message I received I was given the URL 

http://watermarkpress.worksmartsuite.com
 

(it was in a bulleted list on The Hub)

When pasted into Firefox Nightly URLBar, it changes to http://watermarkpress.worksmartsuite.xn--com-7t7s/ and does not resolve.

When pasted into chrome it correctly resolves.

If, after pasting, I press delete twice I can tell there's some kind of hidden character at the end of what I was sent that's causing the problem.

In case it's helpful, I copy-pasted the same thing into https://r12a.github.io/app-conversion/ and it showed the character at the end variously as:

http://watermarkpress.worksmartsuite.com
 
http://watermarkpress.worksmartsuite.com\u2028 
http%3A%2F%2Fwatermarkpress.worksmartsuite.com%E2%80%A8%20
(In reply to Sarah Bird from comment #0)
> When pasted into Firefox Nightly URLBar, it changes to
> http://watermarkpress.worksmartsuite.xn--com-7t7s/ and does not resolve.

It looks correct to me, we should not ignore hidden chars, the result of doing that could be very wrong.

> When pasted into Chrome it correctly resolves.

To the ".com" page I assume? That's imo not the right thing to do, that's not what the user requested.
If I paste the encoded url it does a search.

I'm tempted to wontfix... Gijs, thoughts?
Flags: needinfo?(gijskruitbosch+bugs)
There is one thing though, that is if hidden chars are at the end, and we decode them, it's not a great UX, so maybe we should just not decode invisible chars at the end.
(In reply to Marco Bonardo [::mak] from comment #1)
> (In reply to Sarah Bird from comment #0)
> > When pasted into Firefox Nightly URLBar, it changes to
> > http://watermarkpress.worksmartsuite.xn--com-7t7s/ and does not resolve.
> 
> It looks correct to me, we should not ignore hidden chars, the result of
> doing that could be very wrong.

...

> I'm tempted to wontfix... Gijs, thoughts?

Well, as far as I can tell, "http://watermarkpress.worksmartsuite.com " (with a regular space at the end and no magic line break char) gets trimmed to a URL without whitespace, and both regular expression "".replace(/\s/g, "") and "".trim() match/remove \u2028 . Why do we not just throw away the final whitespace here? That would generally seem correct to me...
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mak77)
Actually new URL("http://watermarkpress.worksmartsuite.com "); returns "http://www.google.com/"
while new URL("http://watermarkpress.worksmartsuite.com\u2028"); returns "http://www.google.xn--com-7t7s/"
as well as getFixupURIInfo().fixedURI

If we'd trim, we'd actually transform the url to something else. We can still decide for the user it's better to do that, but it'd be a deliberate choice, rather than a bug fix.
Flags: needinfo?(mak77)
Whiteboard: [fxsearch]
ehr sorry I mixed up my example through copy paste from the console (it should have been the same host both sides), I hope it's clear yet.
(In reply to Marco Bonardo [::mak] from comment #4)
> Actually new URL("http://watermarkpress.worksmartsuite.com "); returns
> "http://www.google.com/"
> while new URL("http://watermarkpress.worksmartsuite.com\u2028"); returns
> "http://www.google.xn--com-7t7s/"
> as well as getFixupURIInfo().fixedURI
> 
> If we'd trim, we'd actually transform the url to something else. We can
> still decide for the user it's better to do that, but it'd be a deliberate
> choice, rather than a bug fix.

Hm. I wonder what Valentin and Anne think. Feels like if the URL parsing algorithm removes " " but not "\u2028" but we treat those 2 things as somewhat interchangeable whitespace elsewhere, maybe the URL parsing algorithm is just wrong.

Of course, even if we decide not to change the spec I think it'd be more user-friendly to trim the URL. And yes, that'd be a deliberate choice based on the idea that I don't think it's sane to pass around URLs where leading/trailing unescaped (thus invisible) whitespace-class characters are supposed to be meaningful (and certainly IANA won't start having gTLDs ending in such chars, but then the URL could conceivably include those chars in the path/search/ref/hash component as well...). :-)
Flags: needinfo?(valentin.gosu)
(In reply to :Gijs (he/him) from comment #6)
> (In reply to Marco Bonardo [::mak] from comment #4)
> > Actually new URL("http://watermarkpress.worksmartsuite.com "); returns
> > "http://www.google.com/"
> > while new URL("http://watermarkpress.worksmartsuite.com\u2028"); returns
> > "http://www.google.xn--com-7t7s/"
> > as well as getFixupURIInfo().fixedURI
> > 
> > If we'd trim, we'd actually transform the url to something else. We can
> > still decide for the user it's better to do that, but it'd be a deliberate
> > choice, rather than a bug fix.
> 
> Hm. I wonder what Valentin and Anne think. Feels like if the URL parsing
> algorithm removes " " but not "\u2028" but we treat those 2 things as
> somewhat interchangeable whitespace elsewhere, maybe the URL parsing
> algorithm is just wrong.

I don't think this needs or should be in the URL spec.
There are a bunch of unicode whitespace characters, and I feel it's important to encode those characters when encountered.

But the URL bar is a totally different story, as people paste text in there. I don't think it would hurt if we stripped out all leading/trailing WS characters there. I believe that's what Chrome is doing.
Flags: needinfo?(valentin.gosu) → needinfo?(annevk)
I agree with Valentin. Being lenient in the UI makes sense (similar to how we don't require a leading scheme), given the URL might come from all sorts of places, but we shouldn't make that affect the URL parser.
Flags: needinfo?(annevk)
Looks like we have an agreement here.
Priority: -- → P2
(In reply to Anne (:annevk) from comment #8)
> I agree with Valentin. Being lenient in the UI makes sense (similar to how
> we don't require a leading scheme), given the URL might come from all sorts
> of places, but we shouldn't make that affect the URL parser.

The URL parser already strips out null bytes, control characters, and "regular" whitespace like a "normal" space, so I don't really buy the argument that it can't be "lenient" - it's already really lenient! Anyway, if we don't want to change the URL spec then we'll have to fix this in the UI.
(In reply to :Gijs (he/him) from comment #10)
> (In reply to Anne (:annevk) from comment #8)
> > I agree with Valentin. Being lenient in the UI makes sense (similar to how
> > we don't require a leading scheme), given the URL might come from all sorts
> > of places, but we shouldn't make that affect the URL parser.
> 
> The URL parser already strips out null bytes, control characters, and
> "regular" whitespace like a "normal" space, so I don't really buy the
> argument that it can't be "lenient" - it's already really lenient! Anyway,
> if we don't want to change the URL spec then we'll have to fix this in the
> UI.

Agreed. But the URL parser is already fairly complex. There's an overhead to stripping non-ascii whitespace. Not to mention that we'd have to update it every time new unicode WS characters are added.
https://bugzilla.mozilla.org/show_bug.cgi?id=1319533#c5 contains a few hints to the code we may want to change. :harry was looking into that bug and will likely have a look at this.
Comment on attachment 8980671 [details]
Bug 1460097 - Autocompleted URLs are trimmed.

https://reviewboard.mozilla.org/r/246838/#review255784

Sorry for the delay. The patch looks mostly good, but it needs automated tests.
most of the tests for unifiedcomplete are under toolkit/components/places/tests/unifiedcomplete/...
You can build tests using ./mach build faster and run them using ./mach test path_to_tests_or_single_test

You could probably add some simple test case to test_visit_url.js or to test_encoded_urls.js
For the change to urlbarbindings you may add a test to browser_urlbarEnter.js (IIRC it's touching that code path)

::: toolkit/components/places/UnifiedComplete.js:1690
(Diff revision 1)
>      }
>    },
>  
>    // TODO (bug 1054814): Use visited URLs to inform which scheme to use, if the
>    // scheme isn't specificed.
> +  // Bug 1460097: trimmed search string is used

This is not necessary, the blame will already tell us, and anyway if we'd annotate every bug fix with the bug# the code would become unreadable

::: toolkit/components/places/UnifiedComplete.js:1702
(Diff revision 1)
>                                                      flags);
>      } catch (e) {
>        if (e.result == Cr.NS_ERROR_MALFORMED_URI && !Prefs.get("keyword.enabled")) {
>          let value = PlacesUtils.mozActionURI("visiturl", {
> -          url: this._originalSearchString,
> -          input: this._originalSearchString,
> +          url: this._trimmedOriginalSearchString,
> +          input: this._trimmedOriginalSearchString,

input property should not be changed

::: toolkit/components/places/UnifiedComplete.js:1743
(Diff revision 1)
>      let escapedURL = uri.displaySpec;
>      let displayURL = Services.textToSubURI.unEscapeURIForUI("UTF-8", escapedURL);
>  
>      let value = PlacesUtils.mozActionURI("visiturl", {
>        url: escapedURL,
> -      input: this._originalSearchString,
> +      input: this._trimmedOriginalSearchString,

ditto for input
Attachment #8980671 - Flags: review?(mak77) → review-
Assignee: nobody → htwyford
Status: NEW → ASSIGNED
Comment on attachment 8980671 [details]
Bug 1460097 - Autocompleted URLs are trimmed.

https://reviewboard.mozilla.org/r/246838/#review256552

The patch looks good apart a few coding style nits. I triggered a Try run to check tests, and we should wait for it to confirm this doesn't break other tests.

::: browser/base/content/test/urlbar/browser_pasteAndGo.js:42
(Diff revision 2)
>    }
>  });
> +
> +add_task(async function() {
> +  const url = "http://example.com/4\u2028";
> +    await BrowserTestUtils.withNewTab("about:blank", async function(browser) {

indentation is wrong here, await should be on the same column as const

::: toolkit/components/places/tests/unifiedcomplete/test_visit_url.js:62
(Diff revision 2)
> +  info("visit url, with non-standard whitespace");
> +  await check_autocomplete({
> +    search: "https://www.mozilla.org\u2028",
> +    searchParam: "enable-actions",
> +    matches: [ { uri: makeActionURI("visiturl", {url: "https://www.mozilla.org/", input: "https://www.mozilla.org"}), title: "https://www.mozilla.org/", style: [ "action", "visiturl", "heuristic" ] } ]
> +  });

I see the previous code doesn't do that, but please align this in a more readable way:
matches: [{
  uri: makeActionURI("visiturl", {url: "https://www.mozilla.org/",
                                  input: ...},
  title: ...,
  style: ...,
  ...
}]
Attachment #8980671 - Flags: review?(mak77) → review+
Comment on attachment 8980671 [details]
Bug 1460097 - Autocompleted URLs are trimmed.

https://reviewboard.mozilla.org/r/246838/#review256582

Just a nit.
Try looks good, so you can autoland when ready.

::: toolkit/components/places/tests/unifiedcomplete/test_visit_url.js:65
(Diff revision 3)
> +    searchParam: "enable-actions",
> +    matches: [{
> +      uri: makeActionURI("visiturl", {url: "https://www.mozilla.org/",
> +                                      input: "https://www.mozilla.org"}),
> +                                      title: "https://www.mozilla.org/",
> +                                      style: [ "action", "visiturl", "heuristic" ]}]

Actually, title and style should be aligned with uri, not with url (url and input are part of the object passed to makeActionURI)
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/9d105e4b00be
Autocompleted URLs are trimmed. r=mak
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/9d105e4b00be
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 62
Just got new nightly - it's fixed - awesome - thanks all :D
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: