Closed Bug 1501617 Opened 3 years ago Closed 3 years ago

Linkify ignores spaces in hyperlinks if they are quoted (as per RFC 2396, Appendix-E)

Categories

(Core :: Networking, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: pch, Assigned: jorgk-bmo)

Details

(Whiteboard: [necko-triaged][necko-would-take])

Attachments

(2 files)

Attached image test-with-spaces.png
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0

Steps to reproduce:

Sent plain-text mail with quoted hyperlink, eg
"https://dish.myzel.net/test with spaces.txt"


Actual results:

Recipient (TB user as well) clicked and got 404 - Turns out:

TB mangles the URL, yanks all spaces before sending off to browser


Expected results:

The link should be forwarded to the browser as displayed, see picture of the difference in a draft.
As you can see in comment one, which linkifies your text, spaces in links are illegal. They need to be replaced with %20.

I don't quite understand how you got the spaces into the link, if I type https://dish.myzel.net/test with spaces.txt into a plain text mail, the link stops at test. Oh, I see, adding quotes around it trips up the linkify to include the spaces visually, but not in the link.

I can either close the bug as invalid or ship it off to M-C since the linkify code is there. It's not likely that it will be actioned.
Component: Untriaged → Networking
Product: Thunderbird → Core
Summary: Spaces yanked from hyperlinks handed over to webbrowser → Linkify includes spaces in hyperlinks if they are quoted
Version: 59 → 59 Branch
Hello Jörg, thank you for the prompt reply.

Regarding your question: I still think it is a bug and it should be forwarded. Look at the screenshot, the text in the mail is different from the link in the status line.

Regarding your thoughts: I think the fix is relatively easy - Make linkify disregard quotes, so it behaves like this bugzilla here and stops the underline at the first blank. That way poor lads like me see immediately, what went wrong.

Yours
Peter
What is the linkify code that you are referring to?
I assume it's in mozITXTToHTMLConv, but it's not clear which method TB is calling.

A test added to netwerk/test/unit/test_mozTXTToHTMLConv.js would make this much easier to fix.
Flags: needinfo?(jorgk)
Priority: -- → P5
Whiteboard: [necko-triaged][necko-would-take]
Yes, the code in netwerk/streamconv/converters/mozTXTToHTMLConv.cpp. I think Thunderbird veteran Ben Bucksch (:BenB) wrote it and we call it here in MIME in mimetpfl.cpp and mimetpla.cpp.
https://searchfox.org/comm-central/search?q=mozITXTToHTMLConv&case=false&regexp=false&path=mime

Those suffixes tpfl and tpla mean: "Text Plain Flowed" and "Text Plain", this is the code that reads flowed and non-flowed plain text messages and turns it into DOM, linkifying anything that gets in its way.

I think you might find that adding the following to the tests will pass:

      input: "http://mozilla.org/test with spaces.txt",
      url: "http://mozilla.org/test"

      input: "\"http://mozilla.org/test with spaces.txt\"",
      url: "http://mozilla.org/testwithspaces.txt"

The latter one is the error we talk about here.

Many Netwerk people have already voiced concern when I submitted patches for that "mini HTML parser" and wanted the whole thing ripped out. If you wish, I could find the references.

Once upon a time I was going to suggest to move that crufty code to Mailnews/, but M-C has one call site here:
mozilla/extensions/spellcheck/src/mozEnglishWordUtils.h
12 #include "mozITXTToHTMLConv.h"
Flags: needinfo?(jorgk)
Sorry, the result for the second test may be: "\"http://mozilla.org/testwithspaces.txt\""
The code mentioned references https://tools.ietf.org/html/rfc2396#appendix-E
This clearly makes it impossible to have unencoded spaces within a "delimited" URI.
Thunderbird follows that to the letter.
According to comment 6 I assume it's safe to close this bug.
Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Hmm, this is interesting reading indeed. Basically it recommends that URI be delimited by spaces, double quotes or angle brackets.

Then it goes on to say:
  The whitespace should be ignored when extracting the URI.
Clearly that can't apply to whitespace delimited URLs but only to quote/bracket delimited ones. As an example this is given:
  <ftp://ds.internic.  <-- newline here.
  net/rfc/>
which should result in ftp://ds.internic.net/rfc/.

So the Mozilla code does follow the RFC and ignores whitespace in URIs delimited by quotes. If not delimited by quotes, it terminates at the first space found.

This makes the bug INVALID.
Resolution: WONTFIX → INVALID
Summary: Linkify includes spaces in hyperlinks if they are quoted → Linkify ignores spaces in hyperlinks if they are quoted (as per RFC 2396, Appendix-E)
Why don't we use this as an opportunity to document the correct behaviour?
Assignee: nobody → jorgk
Status: RESOLVED → REOPENED
Ever confirmed: true
Attachment #9020559 - Flags: review?(valentin.gosu)
Resolution: INVALID → ---
Comment on attachment 9020559 [details] [diff] [review]
1501617-more-test-cases.patch

Review of attachment 9020559 [details] [diff] [review]:
-----------------------------------------------------------------

This is great, thanks!
Attachment #9020559 - Flags: review?(valentin.gosu) → review+
Thanks for the inspiration to reporter and reviewer :-)
Keywords: checkin-needed
Version: 59 Branch → Trunk
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3a268a8fa9e2
Enhance test_mozTXTToHTMLConv.js by adding more cases for RFC 2396 Appendix-E. r=valentin
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/3a268a8fa9e2
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in before you can comment on or make changes to this bug.