Open Bug 1781264 Opened 2 years ago Updated 24 days ago

Encoded Umlauts %C3%BC are passed as decoded umlauts ü when passing different protocol URLs to external handlers on windows (e.g. mailto, custom schemes)

Categories

(Firefox :: File Handling, defect, P3)

Firefox 91
x86_64
Windows
defect

Tracking

()

People

(Reporter: andrej.wolkow, Unassigned)

References

Details

(Keywords: regression, regressionwindow-wanted)

Attachments

(2 files)

Steps to reproduce:

Sending mailto request with umlauts, converts the umlauts.

<html>
<body>
<a
href="mailto:test@test.com?subject=Test%2003:%20MailTo%20-%20Aush%C3%A4ndigung%20&body=
%C3%BC
%C3%A4
%C3%B6">MailTo</a><br/>
</body>
</html>

Tested also with current release of firefox

Actual results:

Umaluts converted in mailto call via Firefox
"C:\PROGRA2\MICROS3\Office16\OUTLOOK.EXE" -c IPM.Note /mailto "mailto:test@test.com?subject=Test%2003:%20MailTo%20-%20Aushändigung%20&body=üäö"

Expected results:

Example with MSEdge and Chrome, Umlauts remain encoded
"C:\PROGRA2\MICROS3\Office16\OUTLOOK.EXE" -c IPM.Note /mailto "mailto:test@test.com?subject=Test%2003:%20MailTo%20-%20Aush%C3%A4ndigung%20&body=%C3%BC%C3%A4%C3%B6"

OS: Unspecified → Windows
Hardware: Unspecified → x86_64

List of codes https://www.w3schools.com/tags/ref_urlencode.ASP
Adding different umlauts looks like this (passed with corresponding url encoding)
"C:\PROGRA2\MICROS3\Office16\OUTLOOK.EXE" -c IPM.Note /mailto "mailto:test%40test.net?subject=Ää%7B"
"C:\PROGRA2\MICROS3\Office16\OUTLOOK.EXE" -c IPM.Note /mailto "mailto:test%40test.net?subject=ÜüÄäß"
"C:\PROGRA2\MICROS3\Office16\OUTLOOK.EXE" -c IPM.Note /mailto "mailto:test%40test.net?subject=üÜÄä%24%7B"

Adding non working characters like TM %E2%84 leads to expected behaviour
"C:\PROGRA2\MICROS3\Office16\OUTLOOK.EXE" -c IPM.Note /mailto "mailto:test%40test.net?subject=%C3%BC%C3%9C%C3%84%C3%A4%24%E2%84"

Component: Untriaged → General

In Firefox 78.8.0ESR it is correct encoded

Product: Firefox → Core
Component: General → DOM: Editor
Attached file test case

Here's the test case from comment 0, with a slightly modified domain.

mconley, why did you move this to Core instead of Firefox:: File Handling? I'd expect the code that decides how to convert this or not lives more in uriloader/exthandler/ExtHandlerService.jsm. I'll poke around a bit more and see if I can figure out where the actual conversion might happen...

Flags: needinfo?(mconley)

This could be some Core encoding kind of issue, but I think this is not editor.

Component: DOM: Editor → General

I... honestly can't remember. I remember we discussed this bug during our last Firefox :: General triage meeting, and came to the conclusion to put this in Core, but it looks like I failed to write that rationale down in a bug comment. :/

Flags: needinfo?(mconley)

I'm going to move this over to Firefox: File Handling for now. At a minimum, if there is some kind of Core encoding issue here we'll need help from somebody who is familiar with file handling to point us to where the conversion might be happening. I spent 10 or so minutes and couldn't really figure out anything useful.

Component: General → File Handling
Product: Core → Firefox
QA Whiteboard: [qa-regression-triage]

Hello! I have tried to reproduce the issue with firefox 106.0a1(2022-08-25) on Ubuntu 22.04 and Windows 10 unfortunately I wasn't able to reproduce the issue or I don't find something wrong for that mater.
I have used the test case from comment 3 and the link only opens my default email client, for Windows 10 is outlook and for Ubuntu 22.04 is Thunder bird.
I have attached a screenshot with the contents of the message.

Can someone point me in the right direction in order to reproduce this issue.

What language and version of Windows are you testing with, and how are you checking the commandline output? (And is the result actually broken for you in the outlook/thunderbird UI, or are you only concerned about not escaping these characters in the commandline output?)

If possible, could you try using https://mozilla.github.io/mozregression/ to establish when this changed?

Flags: needinfo?(andrej.wolkow)

I am testing with Win 10 21H2, Windows Server 2012R2 and Windows Server 2016 R2. All german language. I check the output with procmon from sysinternals.
Outlook handles the umlauts correct but I assume they have their own condition if umlauts are not escaped, to pass them as they are. Thunberbird handles them probably the same way.
We have a custom application which also uses this passed arguments but it expects them as escaped due to IETF regulations as reported by the developer of the application.
I have contacted the developer to verify this and they have now made a workaround for this case. At the moment I am testing their new version with handling "not escaped" characters.

Flags: needinfo?(andrej.wolkow)

I tested this but locally for me on Firefox 78 and even Firefox 52 nightlies (using mozregression), when launching Thunderbird via the system default launcher (ie by just handing the URL to the operating system, and having thunderbird set as default email app), I still see the unencoded version. AFAICT bug 1696685 was supposed to have caused everything to be escaped here. I don't know why that isn't the case (maybe because of newURI re-unescaping things or something?), and/or if ProcExp is a good way of verifying what happened. Paul/Valentin, can you help?

(The other mystery is how this apparently worked in 78.8.0 esr for the reporter, and yet in 78 and 91 nightly it does not appear to be working for me, nor can I think of a reasonable explanation for it ever working inbetween those releases given the discussion in bug 1696685 around fix/regression ranges.)

Depends on: CVE-2021-43541
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(pbz)

I'm not totally sure how this happens.
Even something as simple as url = new URL("mailto:test@test.com?subject=Test%2003:%20MailTo%20-%20Aushändigung%20&body=üäö"); url.href seems to make sure everything is properly escaped.
Since this is windows only, I expect some of the windows specific bits are causing the problem. Most likely it's coming from here

(In reply to Valentin Gosu [:valentin] (he/him) from comment #12)

I'm not totally sure how this happens.
Even something as simple as url = new URL("mailto:test@test.com?subject=Test%2003:%20MailTo%20-%20Aushändigung%20&body=üäö"); url.href seems to make sure everything is properly escaped.
Since this is windows only, I expect some of the windows specific bits are causing the problem. Most likely it's coming from here

rofl, blame for that code points to bug 227268, where Outlook Express and various other email programs didn't like things being escaped into UTF-8 compliant escape sequences, so 10 years ago we started unescaping things to make the URLs work. This bug is the inverse problem (the unescaped content apparently causes problems, so the request is to escape content).

Computers were a mistake. :-\

That sounds right. Bug 1696685 didn't make things more permissive but rather stricter by escaping more characters. Here is the list: https://searchfox.org/mozilla-central/rev/380fc5571b039fd453b45bbb64ed13146fe9b066/xpcom/io/nsEscape.cpp#282

#'./:;=?@[]
Flags: needinfo?(pbz)
Flags: needinfo?(valentin.gosu)

The severity field is not set for this bug.
:Gijs, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gijskruitbosch+bugs)

It's really unclear we can even fix this without breaking other consumers (yes, win7 is unsupported by MS but we still have something like 15-20% of Firefox users on it). Maybe a fix would need to make the unescaping conditional on being on an older OS or something. Maybe there needs to be a way for us to preserve some/all of the original escaping rather than the normalization passes that currently happen, which would be a larger architectural change. Or maybe someone needs to dive into the specifics of what Chrome/Edge do and make sure we are aligned. Either way, given Outlook and Thunderbird cope, prioritizing that work seems unlikely to happen in the short term.

Severity: -- → S3
Flags: needinfo?(gijskruitbosch+bugs)
Priority: -- → P3
See Also: → 141706
Duplicate of this bug: 1842168
Summary: Encoded Umlauts %C3%BC are passed as decoded umlauts ü in mailto → Encoded Umlauts %C3%BC are passed as decoded umlauts ü when passing different protocol URLs to external handlers on windows (e.g. mailto, custom schemes)

Fixing this would involve adjusting https://searchfox.org/mozilla-central/rev/8d43262674d6c6d469b821cca579b1240ebb42a5/uriloader/exthandler/win/nsMIMEInfoWin.cpp#288-305 to stop doing this escaping... very very carefully, considering some of the security stuff mentioned in the comments around ensuring URLMon is happy with the URL.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Issue https://bugzilla.mozilla.org/show_bug.cgi?id=1842168 has been marked as a duplicate of this one - that looks spot on. One curious thing there: When having firefox call the protocol handler from the windows registry, umlauts (and other escaped non-ascii characters) are passed escaping normalized to codepage. Yet when using firefox preferences to make it call an arbitrary windows executable, nothing in the URL passed gets unescaped - I created a custom protocol helper that just passes on the URL unmolested, it can fire up the real helper without a problem.

Duplicate of this bug: 141706
See Also: 141706
See Also: → 1885492

Topic of this issue here -- there is too much decoding happening: Passing arguments to external protocol handlers on Windows (not Linux) decodes escape sequences, and that only for system-set handlers -- it does not do it for custom set handlers via the preferences mechanism. More in the issue I mentioned 2 posts above.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: