URL query parser encodes "|" when returned by an encoder (e.g., Shift_JIS)
Categories
(Core :: Networking, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox117 | --- | fixed |
People
(Reporter: annevk, Assigned: dotoole)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged])
Attachments
(1 file)
See https://github.com/web-platform-tests/wpt/pull/26317. Firefox encodes −
in Shift_JIS as %81%7C
rather than %81|
.
Comment 1•4 years ago
|
||
This is not only a Shift_JIS issue. From my quick tests (which I forgot to post anywhere), the URL UTF-8 encoding for "|", "A|" and even "áa|" is correct, but "á|" becomes "%C3%A1%7C".
This in intentional and comes from bug 288154. Redirecting the needinfo to Masayuki to see if the special case is still needed now that we support UTF-8 file URLs on Windows.
Comment 3•4 years ago
|
||
Well, as far as I tested the STR of bug 288154 on Windows, it seems that the file URL is handled with utf-8. But I'm not sure whether all paths are now handled in unicode, e.g., arguments at launching firefox.exe, redirecting them from profile manager, and at opening the specified file path in a tab. emk-san may be better person to ask around here than me.
Comment 4•4 years ago
|
||
We still accept URLs that are encoded in a legacy charset as a fallback.
But I think nowadays this hack may cause more problems than it solves. For example, we fail to split <URL ending with UTF-8 non-ASCII character>|<another URL> due to this hack. We still use .split("|")
all over the place.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•2 years ago
|
Hello, how did you reproduce your results? Thanks
Comment 6•2 years ago
|
||
(In reply to Dylan from comment #5)
Hello, how did you reproduce your results? Thanks
Hi. In any page with UTF-8 encoding, new URL("https://example.com?|").search
returns "?|"
and new URL("https://example.com?á|").search
returns "?%C3%A1%7C"
.
Per the URL spec, when parsing URLs, the query gets encoded with "percent-encode after encoding" with either the query percent-encode set or the special query percent-encode set. Neither of those percent-encode sets include the pipe character, and so they shouldn't result in it being percent-encoded, by itself or after a byte in the percent-encode set.
(In reply to Andreu Botella from comment #6)
(In reply to Dylan from comment #5)
Hello, how did you reproduce your results? Thanks
Hi. In any page with UTF-8 encoding,
new URL("https://example.com?|").search
returns"?|"
andnew URL("https://example.com?á|").search
returns"?%C3%A1%7C"
.Per the URL spec, when parsing URLs, the query gets encoded with "percent-encode after encoding" with either the query percent-encode set or the special query percent-encode set. Neither of those percent-encode sets include the pipe character, and so they shouldn't result in it being percent-encoded, by itself or after a byte in the percent-encode set.
Thanks for your response! I am a little confused about the solution here. I see from this that the encoding of "|" is intended. Is this behavior still required or should it be changed to not encode "|"? Thanks.
Comment 8•2 years ago
|
||
Thanks for your response! I am a little confused about the solution here. I see from this that the encoding of "|" is intended. Is this behavior still required or should it be changed to not encode "|"? Thanks.
According to the URL specification, the pipe character should not be encoded, regardless of what character precedes it. But Gecko seems to purposefully encode it in those cases to work around a bug; see comments 2-4 in this issue, as well as bug 288154. When that bug was closed, the WHATWG URL specification did not exist yet, and I don't think Gecko was non-compliant with the then-relevant RFCs, but it now does go against the URL specification.
I don't know enough about that bug, so I can't say whether it's a good idea to fix this one.
Comment 9•2 years ago
|
||
Chrome percent-encodes |
in path components. For example, new URL("https://example.com/|")
returns /%7C
. It also violates the URL spec because path percent-encode set does not include |
.
Comment 10•2 years ago
|
||
So, I'm assuming the reason Safari didn't have to deal with bug 288154 is that it doesn't run on windows?
I wonder if we could only enforce the | thing for file URLs? Or we could just allow | and move the fix to nsURLHelperWin.cpp::net_GetFileFromURLSpec instead. That's assuming it's still a problem on windows - we should just fix this, and if it regresses bug 288154 we can deal with that in windows specific code.
Comment 11•2 years ago
|
||
Bug 288154 itself will not regress because we no longer split URLs from command lines (bug 441120). Although we still accept file: URLs that are encoded in ANSI codepage, we will encode file URLs in UTF-8 (bug 278161). So the problem would be much rarer than before.
We should stop treating |
as a URL separator (bug 244192), but this is an existing problem.
In conclusion, I guess we could just fix this now.
Assignee | ||
Comment 12•2 years ago
|
||
Comment 13•2 years ago
|
||
Comment on attachment 9341097 [details]
Bug 1674369 - Backed out changeset (Bug 288154) for encoding pipe character in URL query parser
Revision D182064 was moved to bug 288154. Setting attachment 9341097 [details] to obsolete.
Updated•2 years ago
|
Comment 14•2 years ago
|
||
Comment 15•2 years ago
|
||
bugherder |
Updated•1 years ago
|
Description
•