Open Bug 437495 Opened 17 years ago Updated 1 year ago

mozTXTToHTMLConv: Spaces not retained in links enclosed between <...> (RFC2396E/RFC1738 url notations)

Categories

(Core :: Networking, defect, P5)

defect

Tracking

()

People

(Reporter: tguyot, Unassigned)

References

Details

(Whiteboard: [necko-would-take])

Attachments

(1 file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Build Identifier: version 2.0.0.14 (20080421) When a full text message contains a link with space enclosed in less than/greater than characters, Thunderbird display the link properly but strips the spaces when you click on it. Reproducible: Always Steps to Reproduce: 1. Send a text (non-html) email to an address using thunderbird. Include the following URL: <http://example.com/Url With Spaces/> 2. Open the email in thunderbird Notice the full link is blue/underlined and shows spaces properly 3. Click on the link Actual Results: Opens "http://example.com/UrlWithSpaces/" in the web browser. Expected Results: Should open "http://example.com/Url With Spaces/" in the web browser. The link with stripped spaces also appears in the status bar when you hover the link with the mouse. I know this isn't the "proper" way to do this, and should rather encode the link properly. I'm using that as a cheap workaround for a bug in an application and it works in other clients. I noticed Thunderbird doesn't handle that properly though. At the very least the link should look the same in the email as in the status bar.
Confirmed on Trunk, I think this is a text->html problem, but I'm not seeing where to move the bug to, so keeping in same component for now...
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
Version: unspecified → Trunk
moving to same component as the mozTXTToHTMLConv tracking bug.
Blocks: 116842
Component: General → Networking
Product: Thunderbird → Core
QA Contact: general → networking
Summary: Spaces not retained in links enclosed between lt/gt (<...>) characters → mozTXTToHTMLConv: Spaces not retained in links enclosed between lt/gt (<...>) characters
Summary: mozTXTToHTMLConv: Spaces not retained in links enclosed between lt/gt (<...>) characters → mozTXTToHTMLConv: Spaces not retained in links enclosed between <...> (RFC2396E/RFC1738 url notations)
Attached patch proposed fixSplinter Review
I can't think of a situation where the space should be stripped. For the cases without < >, an url with space would never get recognized as a whole, and when it exist, the space is obviously wanted.
Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #324094 - Flags: review?(ben.bucksch)
http://www.apps.ietf.org/rfc/rfc2396.html , Appendix E: "In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may need to be added to break long URI across lines. The whitespace should be ignored when extracting the URI." WONTFIX
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → WONTFIX
Attachment #324094 - Flags: review?(ben.bucksch) → review-
We don't recognize URLs with manual line breaks in them anyway AFAIKT. Also from http://www.apps.ietf.org/rfc/rfc2396.html , Appendix E: (a little below) "Using <> angle brackets around each URI is especially recommended as a delimiting style for URI that contain whitespace." That would apply exactly for this case - <http://example.com/Url With Spaces/> It's just silly having one text showing as the url, and linking to another URL. (The stripping I propose to remove is only applied to the href, not the link text it self.)
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
> We don't recognize URLs with manual line breaks in them anyway AFAIKT. The converter does. I took great care to support that. Just libmime can't - so far. That's bug 5351. (Fixing that would be really useful, but requires limited caching.) At the minimum, you need to strip linebreaks and whitespace before/after it. > It's just silly having one text showing as the url, and linking to another URL. No, not silly. What's shown to the user does not need to match what's used technically (e.g. escaping), nor what's in the message on the wire (e.g. smilies). To reporter: Personally, I think it is really unwise to generate URLs with spaces in the first place (and the spec explicitly disallows them in Section 2.4.3), and even more so to put them in plaintext mail. I can guarantee you that other software will break the URL as well. I know you said this is a workaround for a bug in some other software. How about filing a bug against *that* instead of us?
I don't think that's a fair comparison. Of course it's ok to encode the url, it would still go to the same place - which is the essential part. And yes, I agree space in urls is worth avoiding for the sending app. > At the minimum, you need to strip linebreaks and whitespace before/after it. Possible linebreaks (and tabs) inside it we could easily strip. But when would the whitespace before/after have to be stripped? When would the recognizer ever include start/end space?
> when would the whitespace before/after [linebreaks] have to be stripped? http://www.apps.ietf.org/rfc/rfc2396.html#sec-E "In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may need to be added to break long URI across lines. The whitespace should be ignored when extracting the URI." It is very common to have text indented, and the sending software will then add the indention to the broken URL, too. (RFC822 even *requires* adding spaces at the start of the new, continuing line, for the headers - these are mostly irrelevant here, unless an email source is pasted, but the idea is common.)
@Ben I reported the bug (with fix) to the other software and it was fixed in a timely manner. Note that in this case I control line wrapping so multi-line url's aren't really an issue (except in quoted replies...) Even if you don't want to fix it, if the url is detected it should look the same in the message body as in the status bar, and the same link should be sent to the browser. We're not talking about HTML links that can have different text and link; the email is plaintext and the same url is shown differently at different places. The fix above looks ok to me, though if you expect to ever handle multi-line strings in that function then you should: 1. split on newlines 2. strip the start and end of each link (like Python's str.strip() ) 3. join the lines back together, with or without single space* (*) With space, auto line-wrap on space is handled correctly, which IMHO will be the majority of use cases. OTOH, considering that unencoded spaces are silly, without space makes any decent url accidentally word-wrapped working again. Even I can't tell which one is the more sane way to handle line breaks (isn't that silly anyway?), but in either case the logic above wouldn't break my use case: single-line url.
Correction (sorry for the spam) 2. strip the start and end of each *line* (like Python's str.strip() )
Thomas: Yes, if we fix this bug, we need to continue to strip whitespace around linebreaks.
Whiteboard: [necko-would-take]
Priority: -- → P5
Severity: normal → S3
Severity: S3 → S4
Status: REOPENED → NEW
Assignee: mkmelin+mozilla → nobody
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: