Closed Bug 1707360 Opened 4 years ago Closed 2 years ago

Links with long URL containing space break in sent message due to added line break and indent spaces inside href value in source

Categories

(MailNews Core :: MIME, defect, P2)

Tracking

(thunderbird_esr78+ affected, thunderbird_esr91 affected, thunderbird_esr102 affected, thunderbird91 affected)

RESOLVED FIXED
109 Branch
Tracking Status
thunderbird_esr78 + affected
thunderbird_esr91 --- affected
thunderbird_esr102 --- affected
thunderbird91 --- affected

People

(Reporter: w.schuetz, Unassigned)

References

Details

(Keywords: dataloss, ux-implementation-level, Whiteboard: [fixed by bug 1563452])

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0

Steps to reproduce:

I tried to include a link to a .PDF by means of Insert 'Link...' . If the URL of the target .PDF-file contains blanks Thunderbird will insert spurious blanks
https://www.124spiderclub.ch/veranstaltungen/2021/124 Spiderclub Einladung Jubilaeum Wallis.pdf

The same problem occurs if the blanks are coded with a '%20':
https://www.124spiderclub.ch/veranstaltungen/2021/124%20Spiderclub%20Einladung%20Jubilaeum%20Wallis.pdf

Actual results:

Thunderbird will replace the first blank with several blanks while transmitting the mail:
https://www.124spiderclub.ch/veranstaltungen/2021/124%20%20%20%20%20%20%20%20Spiderclub%20Einladung%20Jubilaeum%20Wallis.pdf

This is very cumbersome, as you will not realize this until the recipients complain about the broken link!

Expected results:

There should be only 1 (one ) blank:
https://www.124spiderclub.ch/veranstaltungen/2021/124%20Spiderclub%20Einladung%20Jubilaeum%20Wallis.pdf

Vielen Dank, WaltiS!

Confirming exactly as reported, tested on 78.10.0 (32-bit).
Indeed, links with long URLs having spaces are seriously broken - cunning!
From an end user perspective, also wearing my enterprise hat, I consider this a serious UX fail for an everyday scenario (P2/S2).

Reduced STR:

  1. Compose HTML message
  2. Insert > Link (Ctrl+K), Link Text: long test link, Link Location (long URL with space):
    https://www.124spiderclub.ch/veranstaltungen/2021/124 Spiderclub Einladung Jubilaeum Wallis.pdf
  3. For comparison, short URL:
    Insert > Link (Ctrl+K), link text: test link, Link Location (short URL with space):
    http://www.example.com/foo bar.pdf
  4. Send Later (Ctrl+Shift+Enter)
  5. Check sent msg source in Outbox

Actual result:

  • long URL with space fails (due to source formatting: href split up with line break and indent)
    <a moz-do-not-send="true"
      href="https://www.124spiderclub.ch/veranstaltungen/2021/124
      Spiderclub Einladung Jubilaeum Wallis.pdf">long test link</a><br>
    <br>
  • short URL with space works (href on a single line)
    <a moz-do-not-send="true" href="http://www.example.com/foo bar.pdf">test
      link</a><br>
    <br>

Expected result

  • link with long URL with space must be preserved correctly so that it renders correctly for message recipient

Analysis

Apparently the reason is that Thunderbird will format the source with indents and tries to keep line length at average, then the href="foo bar" attribute is broken across two lines and indented without escaping or otherwise indicating that there's a space which still belongs to the URL.
The erroneous extra spaces are exactly the 6 indent spaces after the linebreak inside the href attribute.

Severity: -- → S2
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
Summary: Blanks in URL wrongly coded in 'Link...' → Links with long URL containing space break in sent message due to added line break and indent spaces inside href value in source

Testcase attached - message from outbox after send later.

Component: Untriaged → MIME
Product: Thunderbird → MailNews Core

Lasana, please have a look.
Probably in https://searchfox.org/comm-central/rev/6dbfd83437bab124d39332c768a5f55a36e7a196/mail/components/compose/content/dialogs/EdLinkProps.js#267 - we should make sure it's encoded. Probably decode+encode again to make sure already encoded can also be inserted.

Assignee: nobody → lasana

Duplicate of bug 1563452. M-C problem in the serialiser. You could work around it by encoding the space.

This does indeed appear to be a duplicate of bug 1563452. The extra encoded spaces do not occur in the editor, they happen when the message is serialized for sending due to that bug.

I don't see much we can do here maybe try not using the nsIDocumentEncoder::OutputFormatted flag?
https://searchfox.org/comm-central/source/mailnews/compose/src/nsMsgCompose.cpp#1099

I don't think decoding and encoding will have an effect because setting an anchor's href seems to ensure the value is properly encoded regardless.

Thomas note: I don't think this has to do with the url length. Try adding a link to "http://www.example.com/foo ba r.pdf" and you will get the same result. See bug 1563452 comment 14.

Flags: needinfo?(mkmelin+mozilla)

Sending HTML unformatted is not a good idea, that will cause very long lines in the HTML part and potentially base64 encoding. What's wrong with changing href="http://www.example.com/foo bar.pdf" to href="http://www.example.com/foo%20bar.pdf". That can be achieved in the editor.

Per previous comments, we should just make sure to encode the URL when we insert it, so it doesn't have spaces to begin with.
This is basically a UI layer change. Before using the URL the user entered, make sure it's encoded: url = encodeURI(decodeURI(url));

Flags: needinfo?(mkmelin+mozilla)

Seems like I'm missing something here. In the console I did:

let a = document.createElement("a");
a.setAttribute("href", "https://www.124spiderclub.ch/veranstaltungen/2021/124 Spiderclub Einladung Jubilaeum Wallis.pdf");

a.href === encodeURI(decodeURI("https://www.124spiderclub.ch/veranstaltungen/2021/124 Spiderclub Einladung Jubilaeum Wallis.pdf"))
// true

What's going on here?

Status: NEW → ASSIGNED

encodeURI/decodeURI seems to be a way to avoid the problem. But only if you are aware of it! In my case I received an Email with the link, which was perfectly clickable in Thunderbird.

Then I forwarded the message and the recipient complained about the broken link... So I think this really needs to be fixed!

(In reply to Lasana Murray from comment #9)

What's going on here?

That's because you're looking at the href property, not the attribute. href property is what the browser will actually use.

(In reply to WaltiS from comment #10)

Then I forwarded the message and the recipient complained about the broken link... So I think this really needs to be fixed!

All links in the document could be processed after the editor has loaded in the compose window, maybe at the end of this function:
https://searchfox.org/comm-central/rev/7ba8d9953373770b53c0606c49b26d2dfa5c60d9/mail/components/compose/content/MsgComposeCommands.js#9199
You can get them with editor.document.querySelectorAll("a").

All links in the document could be processed after the editor has loaded in the compose window, maybe at the end of this function: ...
I assume this has been erroneously linked to comment #10. Doesn't make any sense...

That would fix forward, but I think someone mentioned dnd of links into the compose window as well.
Seems we might want to check bug 1563452 to get it fixed properly.

(In reply to Magnus Melin [:mkmelin] from comment #14)

That would fix forward, but I think someone mentioned dnd of links into the compose window as well.
Seems we might want to check bug 1563452 to get it fixed properly.

Technically it might fix forwarding. But in reality you will not realize, that you forward something that will arrive broken. So this is not really helpful.

Looks like you have three choices:

  1. Fix bug 1563452.
  2. Percent-encode all URLs in the composer. That means those which are inserted from the a) original message during reply/forward/"edit as new" (see comment #12), b) the ones added by the user (attached patch) and c) the ones added via drag&drop or copy&paste.
  3. Percent-encode all URLs when sending before serialising. There's already code that processes the links: https://searchfox.org/comm-central/rev/7ba8d9953373770b53c0606c49b26d2dfa5c60d9/mailnews/compose/src/MessageSend.jsm#1127
Target Milestone: --- → 90 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/4320b8c7a11a
Properly encode url before setting link href in editor. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

Could be good to uplift 78. Please ask approval

Comment on attachment 9222620 [details]
Bug 1707360 - Properly encode url before setting link href in editor. r?mkmelin

[Approval Request Comment]
Regression caused by (bug #): ?
User impact if declined: User-inserted links having unescaped spaces in the URL can break when sending
Testing completed (on c-c, etc.): yes
Risk to taking this patch (and alternatives if risky): near zero (straightforward one-line change)

Attachment #9222620 - Flags: approval-comm-esr78?
Blocks: 1716063

Comment on attachment 9222620 [details]
Bug 1707360 - Properly encode url before setting link href in editor. r?mkmelin

[Triage Comment]
Approved for esr78

Attachment #9222620 - Flags: approval-comm-esr78? → approval-comm-esr78+
Blocks: 1722374
No longer blocks: 1722374
Regressions: 1722374
Regressions: 1722079

Can that please be backed out from all branches, it's piling up regressions.

Flags: needinfo?(rob)
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(vseerror)

Can that please be backed out [?]

Thanks for the question.

The questions that come to mind for me, before considering backout at this time, are

  1. For the regression, what is the volume of reporting in support venues? (because I'm not sure two bug reports, two weeks after release is significant) And from that can we estimate how widely users might be affected
  2. For the regression, are there examples of where the reported impact to the user is severe?
  3. For the new improved behavior, how large is the value to the user?
  4. For the new improved behavior, is better for more users to not have it (for now), so that some users don't get the regression behavior? (And we can't assume the real fix in bug 1563452 will happen any time soon - perhaps not even in version 91)

Surely, the first two items can be at least approximated - if users are seeing it then they should be reporting it. Answering the last two may be speculation.

Flags: needinfo?(vseerror)
Flags: needinfo?(unicorn.consulting)
Flags: needinfo?(emoore)
Flags: needinfo?(anjeyelf)

(In reply to Wayne Mery (:wsmwk) from comment #23)

  1. For the regression, are there examples of where the reported impact to the user is severe?

Both bug 1722374 and bug 1722079 are severe. Users report that after processing links in TB, they can't be opened any more. The links in question are links to commercial sites, like https://bookingpremium.secureholiday.net, which contain some internal data, like a session ID, encoded with % which is now destroyed. TB 78.12 shipped with this fault on 13th July and you had two reports within two weeks of release. That's a lot more than the people that complained about the original issue about spaces in links. For the spaces in links there is a workaround, you can simply not insert them and use %20 instead. For the current fault, there is no workaround.

Besides, based on software correctness, the band-aid implemented here is just wrong, and as we see it's even destructive. The real fix is in bug 1563452 which even has a fix and an NI to bring this to conclusion.

(In reply to Tom from comment #24)

(In reply to Wayne Mery (:wsmwk) from comment #23)

  1. For the regression, are there examples of where the reported impact to the user is severe?

Both bug 1722374 and bug 1722079 are severe.

There's only a single regression here, these two bugs are duplicates of each other.
That said, yes, the regression is severe, harder to notice and impossible to work around, which looks worse and much more frequent than the original problem of adding spaces into a link if it happens to be broken across lines.

Users report that after processing links in TB, they can't be opened any more. The links in question are links to commercial sites, like https://bookingpremium.secureholiday.net, which contain some internal data, like a session ID, encoded with % which is now destroyed. TB 78.12 shipped with this fault on 13th July and you had two reports within two weeks of release. That's a lot more than the people that complained about the original issue about spaces in links. For the spaces in links there is a workaround, you can simply not insert them and use %20 instead. For the current fault, there is no workaround.

+1

Besides, based on software correctness, the band-aid implemented here is just wrong

Although it looks right at first sight...
I don't fully understand why this fails for cases like href="...%3D":
globalElement.setAttribute("href", encodeURI(decodeURI(href)));
If decodeURI would decode "%3D" to "=" (which it does not), then encodeURI could re-encode that to "%3D" (which it does not, either) - no bug!

Why does decodeURI("%3D") return "%3D" instead of "="?
For comparison: decodeURI("%20") will return " " (space character)

And why does encodeURI("=") return "=" instead of "%3D"?
For comparison: decodeURI(" ") will return "%20".

The real fix is in bug 1563452 which even has a fix and an NI to bring this to conclusion.

Sure, the patch currently attached to bug 1563452 looks straightforward and would be preferable as a fix, especially because it's already available, and because not touching the href of our links will always be safer than touching it...

No longer regressions: 1722079
See Also: → 1722079

This is all rather confusing. Apart from de/encodeURI() there are also de/encodeURIComponent() and decodeURIComponent("%3D") gives the equal sign. More reading:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURI
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent

A very simple fix would be s/ /%20/g to address the space issue. I agree that leaving the links the way the user entered them and just not breaking that when serializing them out, is possibly best. If the "fix" here causes problems for 78 ESR users, it should be backed out until a new fix is available. The same goes for beta since that will be the next ESR in about a week.

Let's control the damage first (regression: bug 1722374) by backing this out.
Bug 1722374 is much worse (more obscure, impossible to work around) and certainly more frequent than this bug.

Also needs backout on beta.

Attachment #9234147 - Flags: review?(geoff)
Attachment #9234147 - Attachment is patch: true
Status: RESOLVED → REOPENED
Flags: needinfo?(rob)
Resolution: FIXED → ---

Comment on attachment 9234147 [details] [diff] [review]
1707360_linkUrlBackout.patch (Daily 92.0a1 (2021-07-31))

You don't need a review to back something out, just somebody to do the backing out.

Attachment #9234147 - Flags: review?(geoff)
Attachment #9222620 - Flags: approval-comm-esr78+

Can we close this as won't fix in favor of the upstream patch?

(In reply to Lasana Murray from comment #32)

Can we close this as won't fix in favor of the upstream patch?

I would not recommend that because it's still a worthwhile bug which affects Thunderbird, so we should probably continue to track it here until it will be fixed either here or there. We don't have much control over the upstream bug, and we don't know when it is going to reach Thunderbird.

Depends on: 1563452
Flags: needinfo?(unicorn.consulting)
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(emoore)
Flags: needinfo?(anjeyelf)
Assignee: lasana → nobody
Whiteboard: [enterprise-relevance]
Whiteboard: [enterprise-relevance] → [enterprise-relevance][BenC assigned to core bug 1563452 which will fix this]
Status: REOPENED → RESOLVED
Closed: 4 years ago2 years ago
Resolution: --- → FIXED
Whiteboard: [enterprise-relevance][BenC assigned to core bug 1563452 which will fix this] → [fixed by bug 1563452]
Target Milestone: 90 Branch → 109 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: