Closed Bug 1674369 Opened 4 years ago Closed 11 months ago

URL query parser encodes "|" when returned by an encoder (e.g., Shift_JIS)

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

RESOLVED FIXED
117 Branch
Tracking Status
firefox117 --- fixed

People

(Reporter: annevk, Assigned: dotoole)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

See https://github.com/web-platform-tests/wpt/pull/26317. Firefox encodes in Shift_JIS as %81%7C rather than %81|.

Flags: needinfo?(hsivonen)

This is not only a Shift_JIS issue. From my quick tests (which I forgot to post anywhere), the URL UTF-8 encoding for "|", "A|" and even "áa|" is correct, but "á|" becomes "%C3%A1%7C".

This in intentional and comes from bug 288154. Redirecting the needinfo to Masayuki to see if the special case is still needed now that we support UTF-8 file URLs on Windows.

Flags: needinfo?(hsivonen) → needinfo?(masayuki)

Well, as far as I tested the STR of bug 288154 on Windows, it seems that the file URL is handled with utf-8. But I'm not sure whether all paths are now handled in unicode, e.g., arguments at launching firefox.exe, redirecting them from profile manager, and at opening the specified file path in a tab. emk-san may be better person to ask around here than me.

Flags: needinfo?(masayuki) → needinfo?(VYV03354)

We still accept URLs that are encoded in a legacy charset as a fallback.

But I think nowadays this hack may cause more problems than it solves. For example, we fail to split <URL ending with UTF-8 non-ASCII character>|<another URL> due to this hack. We still use .split("|") all over the place.

Flags: needinfo?(VYV03354)
Blocks: url
Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged]
Summary: URL query parser encodes | when returned by an encoder (e.g., Shift_JIS) → URL query parser encodes "|" when returned by an encoder (e.g., Shift_JIS)
Priority: P2 → P3
Assignee: nobody → dotoole

Hello, how did you reproduce your results? Thanks

Flags: needinfo?(andreu)

(In reply to Dylan from comment #5)

Hello, how did you reproduce your results? Thanks

Hi. In any page with UTF-8 encoding, new URL("https://example.com?|").search returns "?|" and new URL("https://example.com?á|").search returns "?%C3%A1%7C".

Per the URL spec, when parsing URLs, the query gets encoded with "percent-encode after encoding" with either the query percent-encode set or the special query percent-encode set. Neither of those percent-encode sets include the pipe character, and so they shouldn't result in it being percent-encoded, by itself or after a byte in the percent-encode set.

Flags: needinfo?(andreu)

(In reply to Andreu Botella from comment #6)

(In reply to Dylan from comment #5)

Hello, how did you reproduce your results? Thanks

Hi. In any page with UTF-8 encoding, new URL("https://example.com?|").search returns "?|" and new URL("https://example.com?á|").search returns "?%C3%A1%7C".

Per the URL spec, when parsing URLs, the query gets encoded with "percent-encode after encoding" with either the query percent-encode set or the special query percent-encode set. Neither of those percent-encode sets include the pipe character, and so they shouldn't result in it being percent-encoded, by itself or after a byte in the percent-encode set.

Thanks for your response! I am a little confused about the solution here. I see from this that the encoding of "|" is intended. Is this behavior still required or should it be changed to not encode "|"? Thanks.

Thanks for your response! I am a little confused about the solution here. I see from this that the encoding of "|" is intended. Is this behavior still required or should it be changed to not encode "|"? Thanks.

According to the URL specification, the pipe character should not be encoded, regardless of what character precedes it. But Gecko seems to purposefully encode it in those cases to work around a bug; see comments 2-4 in this issue, as well as bug 288154. When that bug was closed, the WHATWG URL specification did not exist yet, and I don't think Gecko was non-compliant with the then-relevant RFCs, but it now does go against the URL specification.

I don't know enough about that bug, so I can't say whether it's a good idea to fix this one.

Chrome percent-encodes | in path components. For example, new URL("https://example.com/|") returns /%7C. It also violates the URL spec because path percent-encode set does not include |.

So, I'm assuming the reason Safari didn't have to deal with bug 288154 is that it doesn't run on windows?
I wonder if we could only enforce the | thing for file URLs? Or we could just allow | and move the fix to nsURLHelperWin.cpp::net_GetFileFromURLSpec instead. That's assuming it's still a problem on windows - we should just fix this, and if it regresses bug 288154 we can deal with that in windows specific code.

Flags: needinfo?(VYV03354)

Bug 288154 itself will not regress because we no longer split URLs from command lines (bug 441120). Although we still accept file: URLs that are encoded in ANSI codepage, we will encode file URLs in UTF-8 (bug 278161). So the problem would be much rarer than before.

We should stop treating | as a URL separator (bug 244192), but this is an existing problem.

In conclusion, I guess we could just fix this now.

Flags: needinfo?(VYV03354)
Assignee: dotoole → nobody
Assignee: nobody → dotoole

Comment on attachment 9341097 [details]
Bug 1674369 - Backed out changeset (Bug 288154) for encoding pipe character in URL query parser

Revision D182064 was moved to bug 288154. Setting attachment 9341097 [details] to obsolete.

Attachment #9341097 - Attachment is obsolete: true
Attachment #9341097 - Attachment is obsolete: false
Pushed by sstanca@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/b94ebbc25e03
Backed out changeset (Bug 288154) for encoding pipe character in URL query parser r=valentin,emk
Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → 117 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: