Linkify stops at |

RESOLVED FIXED in Firefox 49

Status

()

Core
Networking
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: Joseph, Assigned: Jorg K (GMT+2))

Tracking

45 Branch
mozilla49
Points:
---

Firefox Tracking Flags

(firefox49 fixed, thunderbird_esr38 wontfix, thunderbird_esr4547+ fixed)

Details

(Whiteboard: [necko-would-take])

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

a year ago
User Agent: Mozilla/5.0 (Windows NT 5.1; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160407164938

Steps to reproduce:

Tried to send this link in an email.  This used to work fine until I updated my Thunderbird.

https://mappingsupport.com/p/gmap4.php?label=on&t=s&tilt=off&markers=//Not_a_survey___Coordinates_are_approximate||description=plm2||label=on||line=on||19.441526,-154.912231^1||19.441529,-154.910222^2||19.441601,-154.910042^3||19.441665,-154.909876^4||19.441819,-154.909271^5||19.441856,-154.909132^6||19.439465,-154.909884^7||19.440006,-154.911529^8||19.441066,-154.91143^9||19.441242,-154.912498^10||19.441526,-154.912231^1


Actual results:

Thunderbird only recognized the first part of the link as shown below.  Thunderbird chopped off the link when it got to the first || characters.

https://mappingsupport.com/p/gmap4.php?label=on&t=s&tilt=off&markers=//Not_a_survey___Coordinates_are_approximate


Expected results:

A zoomed in Google aerial should appear along with a red polygon and corner numbers 1 through 10.  Copy the entire link and paste it into a browser and you will see this.

I am the developer of the Gmap4 software that is displaying the map.
(Assignee)

Comment 1

a year ago
If you copy your link into a message as plain text (<ctrl><shift>V) then the "linkify" will only detect a link until the first ||.

This was so in TB 38 and is still so in TB 45. So the behaviour has *not* changed.

However, if you insert/paste a HTML link into a HTML composition, then the link will be transmitted in full.

Reporter, since you're saying that the behaviour changed, have you changed your behaviour?

For reference: The linkify code is in M-C core here:
https://dxr.mozilla.org/comm-central/source/mozilla/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#311

Looks like we should just remove the '|' from the test.
Summary: Link in email body is broken if the link contains || → Linkify stops at |
(Assignee)

Updated

a year ago
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Assignee)

Updated

a year ago
Component: Untriaged → Networking
Product: Thunderbird → Core
(Assignee)

Comment 2

a year ago
Ben, why are you stopping at a "|"?
Flags: needinfo?(ben.bucksch)
(Assignee)

Comment 3

a year ago
Blame shows that the "|" test has been there for a long time but that Magnus worked in this area in bug 223681. Magnus, do you have an opinion here?
Flags: needinfo?(mkmelin+mozilla)
Whiteboard: [necko-would-take]
(Reporter)

Comment 4

a year ago
Jorg,

Thank you for looking at this.
Yes, I changed my behavior in that I upgraded to the most recent version of Thunderbird.
Not sure what version I was using before.

I just tried pasting the following link into an html composition window.

http://www.mappingsupport.com/p/gmap4.php?t=Nova_Scotia_aerial&label=on&tilt=off&markers=//Not_a_survey___Coordinates_are_approximate||description=plm2||line=on||44.810123,-62.871949^1||44.807591,-62.874422^2||44.806832,-62.874392^3||44.807001,-62.874934^4||44.807466,-62.874531^5||44.807781,-62.875128^6||44.80779,-62.875145^7||44.807846,-62.875091^8||44.810321,-62.872686^9&rest=https://nsgiwa.novascotia.ca/arcgis/rest/services/BASE/NSODB_10k_WebMercator_WGS84/MapServer?name=Nova_Scotia_aerial&layers=0&transparent=false

TB made a clickable link that stopped at the first ||

In the previous version of TB I was using links like this worked fine.
(Assignee)

Comment 5

a year ago
(In reply to Joseph from comment #4)
> Yes, I changed my behavior in that I upgraded to the most recent version of
> Thunderbird.
That's not what I was talking about.

In TB you have two types of messages: Plain text and HTML.

In plain text you can only insert text. So if you paste a link, www.mickeymoouse.com, that text will be transmitted. However, we have some smarts in the box that will show a clickable link when displaying a text-only link in a plain text message. These smarts are called "linkify", they haven't changed in many versions of TB, and yes, link recognition stops at the first "|" found. That has always been that way.

The second option is HTML. If you insert/paste a link into HTML, it will be transmitted as is. If, however, you just paste the textual representation of a link into a HTML message, the same smarts will recognise the link in the text and send a real link <a href="http://www.mickeymoouse.com">www.mickeymoouse.com</a> instead. That uses the same code for the linkify and again, none of this has changed for a while.

When you say "I just tried pasting the following link into an html composition window". Did the link show in blue? Then a link was inserted. If it showed in black, then text was inserted that would later be linkified but only to the first "|".

I suggest to use the TB function "Insert Link" to make sure you insert a link to start with and not rely on "linkify".

While we already identified the code that stops at the "|", I can't classify the bug as "regression", since I don't see that this ever worked differently.
(Assignee)

Comment 6

a year ago
Created attachment 8754827 [details] [diff] [review]
Remove | from characters that will terminate URL.

Hi Patrick, can you please advise me on how to run the test manually. Sorry, I usually work in an area where we use mochitests for everything.
Attachment #8754827 - Flags: feedback?(mcmanus)
what test are you asking about?
(Reporter)

Comment 8

a year ago
I am not trying to depend on linkify.

I email reports to clients.
For the last couple of years my standard practice has been to paste plain text into the body of an email. 
Then I highlight the long link and click Insert ==> Link.
The long link turns all blue.
Then I send the email.

I am testing by sending from one of my accounts to another account.
When I open the email there is a blue link that stops at the first |.
If I look in my ‘sent’ folder I also see a blue link that stops at the first |.

Below is the start of a sample report to a client
-------------------------------
Hi John,

This email completes your order from PropertyLineMaps.  Please keep in mind that these coordinates and property lines are approximate and are not survey-grade.

Below is the link that starts the Gmap4 software and displays your map.  This link uses our exclusive map-in-a-link technology.  If you click/touch a corner, then you will see a popup that has the corner number and the approximate coordinates.

If your email program does not display the map when you click this link, then simply copy the entire link and paste it into your browser address bar.

http://www.mappingsupport.com/p/gmap4.php?t=Nova_Scotia_aerial&label=on&tilt=off&markers=//Not_a_survey___Coordinates_are_approximate||description=plm2||line=on||44.810123,-62.871949^1||44.807591,-62.874422^2||44.806832,-62.874392^3||44.807001,-62.874934^4||44.807466,-62.874531^5||44.807781,-62.875128^6||44.80779,-62.875145^7||44.807846,-62.875091^8||44.810321,-62.872686^9

The above link will work in most browsers and on most devices, from smartphones to computers.
.....
(Assignee)

Comment 9

a year ago
(In reply to Patrick McManus [:mcmanus] from comment #7)
> what test are you asking about?
The one modified in the patch with f? - test_mozTXTToHTMLConv.js ;-)
./mach xpcshell-test --verbose netwerk/test/unit/TESTNAME.js
Comment on attachment 8754827 [details] [diff] [review]
Remove | from characters that will terminate URL.

Review of attachment 8754827 [details] [diff] [review]:
-----------------------------------------------------------------

I don't have any expertise to provide feedback on the change
Attachment #8754827 - Flags: feedback?(mcmanus)
(Assignee)

Comment 12

a year ago
(In reply to Joseph from comment #8)
Finally I was able to reproduce the problem.

STR (steps to reproduce):
- paste plain text link into the body of an email. 
- highlight the link and click Insert ==> Link.
- The long link turns all blue.
- Send the email.

In TB 31 the long link is sent. In TB 38 and TB 45 the long link is broken at the first |.
So this is a regression, but it already didn't work in TB 38.
Can you please check which version you used before.

Take any message sent before the upgrade and look at the message source:
View > Message Source.
Then look at the User-Agent header. That will say Thunderbird/nn.n
(Assignee)

Comment 13

a year ago
(In reply to Patrick McManus [:mcmanus] from comment #11)
> I don't have any expertise to provide feedback on the change
So who does? Bug 223681 also in the area was reviewed by Ben Bucksch and "cbiesinger" (Christian Biesinger, peer emeritus).
You are the module owner, so can you please recommend a peer:
Jason Duell, Honza Bambas, Michal Novotny, Nick Hurley, Dragana Damjanovic, Valentin Gosu, Daniel Stenberg
Flags: needinfo?(mcmanus)
(Reporter)

Comment 14

a year ago
Looks like I was using TB 31

User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
if you have someone familiar enough with the code to review your patch I'm happy to delegate the review.
Flags: needinfo?(mcmanus)
(Assignee)

Comment 16

a year ago
Well, that explains the "change in behaviour". As I said in comment #12: TB 31 worked, TB 38 and TB 45 already did not. I'll see whether the the change I proposed here fixes anything. Otherwise I need to go digging when that broke.
(Assignee)

Comment 17

a year ago
Created attachment 8754859 [details] [diff] [review]
Remove | from characters that will terminate URL.  Test (v2).

Fixed the test. Now it works:
./mach xpcshell-test --verbose netwerk/test/unit/test_mozTXTToHTMLConv.js
Attachment #8754827 - Attachment is obsolete: true
Attachment #8754859 - Flags: review?(cbiesinger)
Attachment #8754859 - Flags: feedback?(mkmelin+mozilla)
Attachment #8754859 - Flags: feedback?(ben.bucksch)
(Assignee)

Comment 18

a year ago
Joseph, I moved your original issue here, bug 1274602.
(Reporter)

Comment 19

a year ago
Got it.
Thanks for all the help
(Assignee)

Updated

a year ago
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
(Assignee)

Comment 20

a year ago
Comment on attachment 8754859 [details] [diff] [review]
Remove | from characters that will terminate URL.  Test (v2).

Hi Honza, you reviewed bug 964024, so perhaps you can review this one, too.
Attachment #8754859 - Flags: review?(honzab.moz)
(Assignee)

Updated

a year ago
status-thunderbird_esr38: --- → affected
status-thunderbird_esr45: --- → affected
tracking-thunderbird_esr45: --- → ?

Comment 21

a year ago
Comment on attachment 8754859 [details] [diff] [review]
Remove | from characters that will terminate URL.  Test (v2).

Review of attachment 8754859 [details] [diff] [review]:
-----------------------------------------------------------------

Yes I think this should be fine
Attachment #8754859 - Flags: feedback?(mkmelin+mozilla) → feedback+
Comment on attachment 8754859 [details] [diff] [review]
Remove | from characters that will terminate URL.  Test (v2).

Review of attachment 8754859 [details] [diff] [review]:
-----------------------------------------------------------------

Wish there were resources to rewrite this class from scratch with mozilla::Tokenizer!

Forwarding to Valentin, which is now our URL expert.  For me looks good.
Attachment #8754859 - Flags: review?(honzab.moz) → review?(valentin.gosu)
(Assignee)

Updated

a year ago
Attachment #8754859 - Flags: review?(cbiesinger)
(Assignee)

Updated

a year ago
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(ben.bucksch)
Comment on attachment 8754859 [details] [diff] [review]
Remove | from characters that will terminate URL.  Test (v2).

Review of attachment 8754859 [details] [diff] [review]:
-----------------------------------------------------------------

The change looks good. Thanks for adding a test.
r=valentin
Attachment #8754859 - Flags: review?(valentin.gosu) → review+
(Assignee)

Updated

a year ago
Keywords: checkin-needed
(Assignee)

Comment 24

a year ago
Dear sheriff please fix the reviewer in the checkin comment
"|" is not a valid character in the URL per RFC 3986. Why is this change OK?
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(mozilla)
(Reporter)

Comment 26

a year ago
@ Masatoshi

I am the reporter.  I email map links to clients.  Below is an example.  The | characters you see in the link are not part of the URL itself but instead are part of the parameter data that is passed to the php script.

http://www.mappingsupport.com/p/gmap4.php?t=s&label=on&tilt=off&crosshair=off&markers=//Not_a_survey___Coordinates_are_approximate||description=plm2||label=on||line=on||42.325327,-71.865003^1||42.325233,-71.864631^2||42.325191,-71.864073^3||42.325393,-71.862163^4||42.325067,-71.862029^5||42.324468,-71.861804^6||42.323729,-71.861491^7||42.323084,-71.861241^8||42.323047,-71.862781^9||42.323048,-71.862936^10||42.323057,-71.863428^11||42.323082,-71.864069^12||42.323196,-71.864156^13||42.3234,-71.864283^14||42.323812,-71.86448^15||42.323841,-71.864488^16||42.324075,-71.863215^17||42.324471,-71.863247^18||42.324545,-71.863254^19||42.324757,-71.863258^20||42.325044,-71.863941^21||42.32507,-71.864278^22||42.324937,-71.864872^23||42.32524,-71.864943^24||42.325285,-71.864979^25||42.325327,-71.865003^1||line=on||42.324468,-71.861804^6||42.324075,-71.863215^17
It is allowed per http://url.spec.whatwg.org/
Flags: needinfo?(valentin.gosu)
(Assignee)

Comment 28

a year ago
I'm just the guy who implemented what the others wanted ;-(
However, I looked at the link given above and "|" seems to be part of the "userinfo encode set".
Flags: needinfo?(mozilla)
(Assignee)

Comment 29

a year ago
Created attachment 8755412 [details] [diff] [review]
Remove | from characters that will terminate URL. Test (v2).

Fixed reviewer in checkin comment. Carrying forward Valentin Gosu's review.
Attachment #8754859 - Attachment is obsolete: true
Attachment #8754859 - Flags: feedback?(ben.bucksch)
Attachment #8755412 - Flags: review+
(Assignee)

Comment 30

a year ago
I meant to say: Carrying forward Valentin Gosu's r+.

Comment 31

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/01cd683a67a6298e5adc628c9890662514ffad35
Bug 1274242 - Linkify should not stop at |. r=valentin

Updated

a year ago
Keywords: checkin-needed

Comment 32

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/01cd683a67a6
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
status-firefox49: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Pushed to a release branch on mozilla-beta for Thunderbird 47.0b1
Pushed http://hg.mozilla.org/releases/mozilla-esr45/rev/62b9dc5ed08d to THUNDERBIRD_45_VERBRANCH

Updated

a year ago
status-thunderbird_esr38: affected → wontfix
status-thunderbird_esr45: affected → fixed
tracking-thunderbird_esr45: ? → 47+
(Assignee)

Comment 35

a year ago
Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1
https://hg.mozilla.org/releases/mozilla-beta/rev/7238c61172c8
(Assignee)

Comment 36

a year ago
Pushed to a release branch on mozilla-beta for Thunderbird 48.0b1 - take 2:
https://hg.mozilla.org/releases/mozilla-beta/rev/f960fa659d35
Terminating was intentional and per spec. Per RFC 2396, | is an invalid character in a URL.

Updated

a year ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 38

a year ago
Hmm, Ben, where were you when I asked in May, see comment #2?
According to comment #27, the | can be included. Firefox accepts is, try the URL in comment #26.

What do you want to do? Back this out?
See comment 25:
Masatoshi Kimura [:emk] wrote:
> "|" is not a valid character in the URL per RFC 3986. Why is this change OK?

Valentin Gosu wrote:
> It is allowed per http://url.spec.whatwg.org/

WhatWG is not the standards body that manages URL; the IETF is.

Nonetheless, WhatWG agrees that | is not allowed. It's allowed only as part of the username/password (e.g. <ftp://myuser|domain:password@ftp.example.com/path>). The fact that it's part of the "userinfo encode set" and not the normal "default encode set" means that it's specifically *not* allowed outside username/password.

Thus, | is NOT allowed in the path or query parts, where they are sometimes uses. Such URLs are invalid.
Status: REOPENED → RESOLVED
Last Resolved: a year agoa year ago
Resolution: --- → FIXED
Jorg K, sorry for not having responded. I don't real bugmail anymore, I get too much, sorry. Thanks for having asked me, and sorry that I didn't react. You may email me directly personally for questions like these where my input is needed.

As for what to do: There are arguments for both sides. There are some web services that use | in URLs, albeit very few. OTOH, URL specs forbid it.

Jorg K, I leave the decision up to you, whether to keep or remove the change. You are unbiased.
(In reply to Ben Bucksch (:BenB) from comment #39)
> See comment 25:
> Masatoshi Kimura [:emk] wrote:
> > "|" is not a valid character in the URL per RFC 3986. Why is this change OK?
> 
> Valentin Gosu wrote:
> > It is allowed per http://url.spec.whatwg.org/
> 
> WhatWG is not the standards body that manages URL; the IETF is.

Browsers implement the WHATWG standard. Even though there are sometimes differences in implementations, this is usually true.
I just checked and Chrome does correctly parse the URLs in comment 26. As does Firefox. And bugzilla "linkifies" it properly.

> Nonetheless, WhatWG agrees that | is not allowed. It's allowed only as part
> of the username/password (e.g.
> <ftp://myuser|domain:password@ftp.example.com/path>). The fact that it's
> part of the "userinfo encode set" and not the normal "default encode set"
> means that it's specifically *not* allowed outside username/password.

Actually, the fact that | is in the userinfo encode set means that a | must be percent encoded if included in the userinfo.
The fact that it isn't in the default encode set means that it does not need to be percent encoded if it appears in the path, query, hash.
(Assignee)

Comment 42

a year ago
This bug here is done. To remove the change, we'd need another bug.

Personally, I don't care. However, as you said, there may be some advantages in allowing the |. I've done a little research and it can also be used as a : replacement, for example:
file:///D|/Desktop/Bugzilla-down.png

So unless someone complains, I'd leave it as it is for now. Strangely also Bugzilla doesn't stop at the |, see: http://www.google.com|www.google.com.
> Bugzilla doesn't stop at the |, see: http://www.google.com|www.google.com.

Please don't use Bugzilla's linkifier as reference implementation. It's a simple regexp. The TB is a lot more complex and correct. I wrote the TB implementation based strictly on the specifications. Bugzilla's is not based on specs.
You need to log in before you can comment on or make changes to this bug.