Closed Bug 1274242 Opened 8 years ago Closed 8 years ago

Linkify stops at |

Categories

(Core :: Networking, defect)

45 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed
thunderbird_esr38 --- wontfix
thunderbird_esr45 47+ fixed

People

(Reporter: josephelfelt, Assigned: jorgk-bmo)

Details

(Whiteboard: [necko-would-take])

Attachments

(1 file, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160407164938

Steps to reproduce:

Tried to send this link in an email.  This used to work fine until I updated my Thunderbird.

https://mappingsupport.com/p/gmap4.php?label=on&t=s&tilt=off&markers=//Not_a_survey___Coordinates_are_approximate||description=plm2||label=on||line=on||19.441526,-154.912231^1||19.441529,-154.910222^2||19.441601,-154.910042^3||19.441665,-154.909876^4||19.441819,-154.909271^5||19.441856,-154.909132^6||19.439465,-154.909884^7||19.440006,-154.911529^8||19.441066,-154.91143^9||19.441242,-154.912498^10||19.441526,-154.912231^1


Actual results:

Thunderbird only recognized the first part of the link as shown below.  Thunderbird chopped off the link when it got to the first || characters.

https://mappingsupport.com/p/gmap4.php?label=on&t=s&tilt=off&markers=//Not_a_survey___Coordinates_are_approximate


Expected results:

A zoomed in Google aerial should appear along with a red polygon and corner numbers 1 through 10.  Copy the entire link and paste it into a browser and you will see this.

I am the developer of the Gmap4 software that is displaying the map.
If you copy your link into a message as plain text (<ctrl><shift>V) then the "linkify" will only detect a link until the first ||.

This was so in TB 38 and is still so in TB 45. So the behaviour has *not* changed.

However, if you insert/paste a HTML link into a HTML composition, then the link will be transmitted in full.

Reporter, since you're saying that the behaviour changed, have you changed your behaviour?

For reference: The linkify code is in M-C core here:
https://dxr.mozilla.org/comm-central/source/mozilla/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#311

Looks like we should just remove the '|' from the test.
Summary: Link in email body is broken if the link contains || → Linkify stops at |
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: Untriaged → Networking
Product: Thunderbird → Core
Ben, why are you stopping at a "|"?
Flags: needinfo?(ben.bucksch)
Blame shows that the "|" test has been there for a long time but that Magnus worked in this area in bug 223681. Magnus, do you have an opinion here?
Flags: needinfo?(mkmelin+mozilla)
Whiteboard: [necko-would-take]
Jorg,

Thank you for looking at this.
Yes, I changed my behavior in that I upgraded to the most recent version of Thunderbird.
Not sure what version I was using before.

I just tried pasting the following link into an html composition window.

http://www.mappingsupport.com/p/gmap4.php?t=Nova_Scotia_aerial&label=on&tilt=off&markers=//Not_a_survey___Coordinates_are_approximate||description=plm2||line=on||44.810123,-62.871949^1||44.807591,-62.874422^2||44.806832,-62.874392^3||44.807001,-62.874934^4||44.807466,-62.874531^5||44.807781,-62.875128^6||44.80779,-62.875145^7||44.807846,-62.875091^8||44.810321,-62.872686^9&rest=https://nsgiwa.novascotia.ca/arcgis/rest/services/BASE/NSODB_10k_WebMercator_WGS84/MapServer?name=Nova_Scotia_aerial&layers=0&transparent=false

TB made a clickable link that stopped at the first ||

In the previous version of TB I was using links like this worked fine.
(In reply to Joseph from comment #4)
> Yes, I changed my behavior in that I upgraded to the most recent version of
> Thunderbird.
That's not what I was talking about.

In TB you have two types of messages: Plain text and HTML.

In plain text you can only insert text. So if you paste a link, www.mickeymoouse.com, that text will be transmitted. However, we have some smarts in the box that will show a clickable link when displaying a text-only link in a plain text message. These smarts are called "linkify", they haven't changed in many versions of TB, and yes, link recognition stops at the first "|" found. That has always been that way.

The second option is HTML. If you insert/paste a link into HTML, it will be transmitted as is. If, however, you just paste the textual representation of a link into a HTML message, the same smarts will recognise the link in the text and send a real link <a href="http://www.mickeymoouse.com">www.mickeymoouse.com</a> instead. That uses the same code for the linkify and again, none of this has changed for a while.

When you say "I just tried pasting the following link into an html composition window". Did the link show in blue? Then a link was inserted. If it showed in black, then text was inserted that would later be linkified but only to the first "|".

I suggest to use the TB function "Insert Link" to make sure you insert a link to start with and not rely on "linkify".

While we already identified the code that stops at the "|", I can't classify the bug as "regression", since I don't see that this ever worked differently.
Hi Patrick, can you please advise me on how to run the test manually. Sorry, I usually work in an area where we use mochitests for everything.
Attachment #8754827 - Flags: feedback?(mcmanus)
what test are you asking about?
I am not trying to depend on linkify.

I email reports to clients.
For the last couple of years my standard practice has been to paste plain text into the body of an email. 
Then I highlight the long link and click Insert ==> Link.
The long link turns all blue.
Then I send the email.

I am testing by sending from one of my accounts to another account.
When I open the email there is a blue link that stops at the first |.
If I look in my ‘sent’ folder I also see a blue link that stops at the first |.

Below is the start of a sample report to a client
-------------------------------
Hi John,

This email completes your order from PropertyLineMaps.  Please keep in mind that these coordinates and property lines are approximate and are not survey-grade.

Below is the link that starts the Gmap4 software and displays your map.  This link uses our exclusive map-in-a-link technology.  If you click/touch a corner, then you will see a popup that has the corner number and the approximate coordinates.

If your email program does not display the map when you click this link, then simply copy the entire link and paste it into your browser address bar.

http://www.mappingsupport.com/p/gmap4.php?t=Nova_Scotia_aerial&label=on&tilt=off&markers=//Not_a_survey___Coordinates_are_approximate||description=plm2||line=on||44.810123,-62.871949^1||44.807591,-62.874422^2||44.806832,-62.874392^3||44.807001,-62.874934^4||44.807466,-62.874531^5||44.807781,-62.875128^6||44.80779,-62.875145^7||44.807846,-62.875091^8||44.810321,-62.872686^9

The above link will work in most browsers and on most devices, from smartphones to computers.
.....
(In reply to Patrick McManus [:mcmanus] from comment #7)
> what test are you asking about?
The one modified in the patch with f? - test_mozTXTToHTMLConv.js ;-)
./mach xpcshell-test --verbose netwerk/test/unit/TESTNAME.js
Comment on attachment 8754827 [details] [diff] [review]
Remove | from characters that will terminate URL.

Review of attachment 8754827 [details] [diff] [review]:
-----------------------------------------------------------------

I don't have any expertise to provide feedback on the change
Attachment #8754827 - Flags: feedback?(mcmanus)
(In reply to Joseph from comment #8)
Finally I was able to reproduce the problem.

STR (steps to reproduce):
- paste plain text link into the body of an email. 
- highlight the link and click Insert ==> Link.
- The long link turns all blue.
- Send the email.

In TB 31 the long link is sent. In TB 38 and TB 45 the long link is broken at the first |.
So this is a regression, but it already didn't work in TB 38.
Can you please check which version you used before.

Take any message sent before the upgrade and look at the message source:
View > Message Source.
Then look at the User-Agent header. That will say Thunderbird/nn.n
(In reply to Patrick McManus [:mcmanus] from comment #11)
> I don't have any expertise to provide feedback on the change
So who does? Bug 223681 also in the area was reviewed by Ben Bucksch and "cbiesinger" (Christian Biesinger, peer emeritus).
You are the module owner, so can you please recommend a peer:
Jason Duell, Honza Bambas, Michal Novotny, Nick Hurley, Dragana Damjanovic, Valentin Gosu, Daniel Stenberg
Flags: needinfo?(mcmanus)
Looks like I was using TB 31

User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
if you have someone familiar enough with the code to review your patch I'm happy to delegate the review.
Flags: needinfo?(mcmanus)
Well, that explains the "change in behaviour". As I said in comment #12: TB 31 worked, TB 38 and TB 45 already did not. I'll see whether the the change I proposed here fixes anything. Otherwise I need to go digging when that broke.
Fixed the test. Now it works:
./mach xpcshell-test --verbose netwerk/test/unit/test_mozTXTToHTMLConv.js
Attachment #8754827 - Attachment is obsolete: true
Attachment #8754859 - Flags: review?(cbiesinger)
Attachment #8754859 - Flags: feedback?(mkmelin+mozilla)
Attachment #8754859 - Flags: feedback?(ben.bucksch)
Joseph, I moved your original issue here, bug 1274602.
Got it.
Thanks for all the help
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Comment on attachment 8754859 [details] [diff] [review]
Remove | from characters that will terminate URL.  Test (v2).

Hi Honza, you reviewed bug 964024, so perhaps you can review this one, too.
Attachment #8754859 - Flags: review?(honzab.moz)
Comment on attachment 8754859 [details] [diff] [review]
Remove | from characters that will terminate URL.  Test (v2).

Review of attachment 8754859 [details] [diff] [review]:
-----------------------------------------------------------------

Yes I think this should be fine
Attachment #8754859 - Flags: feedback?(mkmelin+mozilla) → feedback+
Comment on attachment 8754859 [details] [diff] [review]
Remove | from characters that will terminate URL.  Test (v2).

Review of attachment 8754859 [details] [diff] [review]:
-----------------------------------------------------------------

Wish there were resources to rewrite this class from scratch with mozilla::Tokenizer!

Forwarding to Valentin, which is now our URL expert.  For me looks good.
Attachment #8754859 - Flags: review?(honzab.moz) → review?(valentin.gosu)
Attachment #8754859 - Flags: review?(cbiesinger)
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(ben.bucksch)
Comment on attachment 8754859 [details] [diff] [review]
Remove | from characters that will terminate URL.  Test (v2).

Review of attachment 8754859 [details] [diff] [review]:
-----------------------------------------------------------------

The change looks good. Thanks for adding a test.
r=valentin
Attachment #8754859 - Flags: review?(valentin.gosu) → review+
Keywords: checkin-needed
Dear sheriff please fix the reviewer in the checkin comment
"|" is not a valid character in the URL per RFC 3986. Why is this change OK?
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(mozilla)
It is allowed per http://url.spec.whatwg.org/
Flags: needinfo?(valentin.gosu)
I'm just the guy who implemented what the others wanted ;-(
However, I looked at the link given above and "|" seems to be part of the "userinfo encode set".
Flags: needinfo?(mozilla)
Fixed reviewer in checkin comment. Carrying forward Valentin Gosu's review.
Attachment #8754859 - Attachment is obsolete: true
Attachment #8754859 - Flags: feedback?(ben.bucksch)
Attachment #8755412 - Flags: review+
I meant to say: Carrying forward Valentin Gosu's r+.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/01cd683a67a6
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Pushed to a release branch on mozilla-beta for Thunderbird 47.0b1
Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1
https://hg.mozilla.org/releases/mozilla-beta/rev/7238c61172c8
Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1 - take 2:
https://hg.mozilla.org/releases/mozilla-beta/rev/f960fa659d35
Terminating was intentional and per spec. Per RFC 2396, | is an invalid character in a URL.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Hmm, Ben, where were you when I asked in May, see comment #2?
According to comment #27, the | can be included. Firefox accepts is, try the URL in comment #26.

What do you want to do? Back this out?
See comment 25:
Masatoshi Kimura [:emk] wrote:
> "|" is not a valid character in the URL per RFC 3986. Why is this change OK?

Valentin Gosu wrote:
> It is allowed per http://url.spec.whatwg.org/

WhatWG is not the standards body that manages URL; the IETF is.

Nonetheless, WhatWG agrees that | is not allowed. It's allowed only as part of the username/password (e.g. <ftp://myuser|domain:password@ftp.example.com/path>). The fact that it's part of the "userinfo encode set" and not the normal "default encode set" means that it's specifically *not* allowed outside username/password.

Thus, | is NOT allowed in the path or query parts, where they are sometimes uses. Such URLs are invalid.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Jorg K, sorry for not having responded. I don't real bugmail anymore, I get too much, sorry. Thanks for having asked me, and sorry that I didn't react. You may email me directly personally for questions like these where my input is needed.

As for what to do: There are arguments for both sides. There are some web services that use | in URLs, albeit very few. OTOH, URL specs forbid it.

Jorg K, I leave the decision up to you, whether to keep or remove the change. You are unbiased.
(In reply to Ben Bucksch (:BenB) from comment #39)
> See comment 25:
> Masatoshi Kimura [:emk] wrote:
> > "|" is not a valid character in the URL per RFC 3986. Why is this change OK?
> 
> Valentin Gosu wrote:
> > It is allowed per http://url.spec.whatwg.org/
> 
> WhatWG is not the standards body that manages URL; the IETF is.

Browsers implement the WHATWG standard. Even though there are sometimes differences in implementations, this is usually true.
I just checked and Chrome does correctly parse the URLs in comment 26. As does Firefox. And bugzilla "linkifies" it properly.

> Nonetheless, WhatWG agrees that | is not allowed. It's allowed only as part
> of the username/password (e.g.
> <ftp://myuser|domain:password@ftp.example.com/path>). The fact that it's
> part of the "userinfo encode set" and not the normal "default encode set"
> means that it's specifically *not* allowed outside username/password.

Actually, the fact that | is in the userinfo encode set means that a | must be percent encoded if included in the userinfo.
The fact that it isn't in the default encode set means that it does not need to be percent encoded if it appears in the path, query, hash.
This bug here is done. To remove the change, we'd need another bug.

Personally, I don't care. However, as you said, there may be some advantages in allowing the |. I've done a little research and it can also be used as a : replacement, for example:
file:///D|/Desktop/Bugzilla-down.png

So unless someone complains, I'd leave it as it is for now. Strangely also Bugzilla doesn't stop at the |, see: http://www.google.com|www.google.com.
> Bugzilla doesn't stop at the |, see: http://www.google.com|www.google.com.

Please don't use Bugzilla's linkifier as reference implementation. It's a simple regexp. The TB is a lot more complex and correct. I wrote the TB implementation based strictly on the specifications. Bugzilla's is not based on specs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: