Open Bug 1329960 Opened 9 years ago Updated 3 years ago

wrong URI encoding for hash with some special characters

Tracking

()

Status:

NEW

People

(Reporter: tristan.fraipont, Unassigned)

Details

(Keywords: dev-doc-needed, site-compat, Whiteboard: [necko-backlog])

Attachments

(2 files)

hashBug.html 9 years ago Kaiido 351 bytes, text/html		Details
hash-percent-encoding.patch 8 years ago Oriol Brufau [:Oriol] 2.41 KB, patch	valentin : review-	Details \| Diff \| Splinter Review

Kaiido

Reporter

Description

•

9 years ago

Attached file hashBug.html — Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:51.0) Gecko/20100101 Firefox/51.0 Build ID: 20170103031119 Steps to reproduce: set an anchor element's href to # followed by any special character followed by an ~out-of-range encoded character~ defined as follow : a percent-symbol (%) followed by any value from 81 to 99. e.g #'%89 Actual results: Only the first special character is encoded, not the % of the out-of-range %89. This results in #%27%91 Expected results: instead of #'%91 (expected one) or #%27%2591 or even #'%2591 which two laters would at least be valid URIs. I suspect the encoder ignores %91 since it looks like a valid percentage-encoded character, and the decoder to throw an error since it doesn't recognize this character. Should also be noted that bot encodeURI and encodeURIComponent do encode correctly this string.

Loic

Updated

•

9 years ago

Component: JavaScript: Standard Library → DOM

Marat Tanalin

Comment 1

•

9 years ago

May be related to bug 1093611.

Masatoshi Kimura [:emk]

Comment 2

•

9 years ago

It follows the spec and all browsers agree. (Actually Firefox will encode ' to %27 that does not conform to the spec.)

Status: UNCONFIRMED → RESOLVED

Closed: 9 years ago

Resolution: --- → INVALID

Kaiido

Reporter

Comment 3

•

9 years ago

(In reply to Masatoshi Kimura [:emk] from comment #2) > It follows the spec and all browsers agree. (Actually Firefox will encode ' > to %27 that does not conform to the spec.) Ok so the bug is not what I "suspected", but it's still there. FYI it also happens for other "special characters" like < or > and probably others. If you do open a new issue without my wrong and probably misleading assumptions, could you point to it in here ?

Masatoshi Kimura [:emk]

Comment 4

•

9 years ago

OK, reopening and changing the component. (The URL parser implementation is lying under Networking)

Status: RESOLVED → REOPENED

Component: DOM → Networking

Ever confirmed: true

Resolution: INVALID → ---

Honza Bambas (:mayhemer)

Comment 5

•

9 years ago

Valentin, anything that falls into your jurisdiction?

Assignee: nobody → valentin.gosu

Status: REOPENED → NEW

Whiteboard: [necko-active]

Oriol Brufau [:Oriol]

Comment 6

•

8 years ago

Attached patch hash-percent-encoding.patch — Details — Splinter Review

The relevant specs are - https://html.spec.whatwg.org/multipage/semantics.html#dom-hyperlink-hash - https://url.spec.whatwg.org/#fragment-state If my understanding of the specs is correct, this code shouldn't log any error: function utf8encode(codePoint) { if (codePoint >>> 0 !== codePoint) throw codePoint + ' is not a code point'; /* https://encoding.spec.whatwg.org/#utf-8-encoder */ if (codePoint >= 0x0000 && codePoint <= 0x007F) return [codePoint]; let count, offset; if (codePoint >= 0x0080 && codePoint <= 0x07FF) [count, offset] = [1, 0xC0]; else if (codePoint >= 0x0800 && codePoint <= 0xFFFF) [count, offset] = [2, 0xE0]; else if (codePoint >= 0x10000 && codePoint <= 0x10FFFF) [count, offset] = [3, 0xF0]; let bytes = [(codePoint >> (6*count)) + offset]; while (count > 0) { let temp = codePoint >> (6 * (count-1)); bytes.push(0x80 | (temp & 0x3F)); --count; } return bytes; } function percentEncode(byte) { if ((byte & 255) !== byte) throw byte + ' is not a byte'; return '%' + byte.toString(16).toUpperCase().padStart(2, '0'); } function test(from, to) { let url = new URL('http://example.com'); for (let i=from; i<to; ++i) { let c = String.fromCodePoint(i); url.hash = '#' + c; let result = url.hash; let expected; if (i === 0x00) { expected = ''; } else if (i >= 0x20 && i <= 0x7E) { expected = c; } else { expected = utf8encode(i).map(byte => percentEncode(byte)).join(''); } if (expected) { expected = '#' + expected; } if (result !== expected ) { console.error('Assert fail: i='+i+' produced '+result+', expected '+expected); } } } test(0x00, 0x10FFFF); Currently, there are these problems: 1) U+0000 is not removed 2) U+0020 ( ), U+0022 ("), U+0027 ('), U+003C (<), U+003E (>), U+0060 (`) are percent encoded 3) Bad encodings starting from U+D800. This patch fixes problem 2, which seems the one reported here. That is, now this works: test(0x01, 0xD800); This makes Firefox match Chrome in the U+0001...U+007E range (inclusive).

Attachment #8858154 - Flags: review?(valentin.gosu)

Oriol Brufau [:Oriol]

Updated

•

8 years ago

Attachment #8858154 - Attachment description: hash-percent-encoding.path → hash-percent-encoding.patch

Attachment #8858154 - Attachment filename: hash-percent-encoding.path → hash-percent-encoding.patch

Attachment #8858154 - Attachment is patch: true

Oriol Brufau [:Oriol]

Comment 7

•

8 years ago

Ah, there is also a 4th problem, which should be fixed in another bug let url = new URL('http://example.com'); url.hash = '#à|'; Then I think url.hash should be '#%C3%A0|', but it's '#%C3%A0%7C'. This behavior (encoding '|' after non-ASCII) is hard-coded for some reason.

Oriol Brufau [:Oriol]

Comment 8

•

8 years ago

Presumably the 3rd problem is because unicode does not assign code points to these numbers. Assuming that range can be ignored (I don't know), then the tests are test(0x00, 0xD800); test(0xE000, 0x110000); Currently that fails for U+0000 (NULL), U+0020 ( ), U+0022 ("), U+0027 ('), U+003C (<), U+003E (>), U+0060 (`). After the patch, only for U+0000 (NULL).

Valentin Gosu [:valentin]

Comment 9

•

8 years ago

Comment on attachment 8858154 [details] [diff] [review] hash-percent-encoding.patch Review of attachment 8858154 [details] [diff] [review]: ----------------------------------------------------------------- So, I think this is ok, except for spaces and \0. Spaces we always encode (see below), and \0 is a validation error [1], so indeed we might want to strip it. This patch would break a bunch of tests, so I need to check some use cases before we take it. [1] https://url.spec.whatwg.org/#fragment-state ::: xpcom/io/nsEscape.cpp @@ +345,5 @@ > // 0 1 2 3 4 5 6 7 8 9 A B C D E F > { > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 1x > + 512,1023, 0, 512,1023, 0,1023, 112,1023,1023,1023,1023,1023,1023, 953, 784, // 2x !"#$%&'()*+,-./ Previous necko bugs have established that we will always encode spaces in all parts of the URL. I think the relevant reasons are in bug 1184589.

Attachment #8858154 - Flags: review?(valentin.gosu) → review-

Oriol Brufau [:Oriol]

Comment 10

•

8 years ago

(In reply to Valentin Gosu [:valentin] from comment #9) > Previous necko bugs have established that we will always encode spaces in > all parts of the URL. I think the relevant reasons are in bug 1184589. But this is a violation of the spec, isn't it? The C0 control percent-encode set does not include spaces, unlike the path percent-encode set or the userinfo percent-encode set. If encoding spaces is so important, can't Anne change the spec?

Firefox Bug Husbandry Bot

Comment 11

•

8 years ago

Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258

Priority: -- → P1

Jason Duell

Updated

•

8 years ago

Assignee: valentin.gosu → nobody

Priority: P1 → P3

Whiteboard: [necko-active] → [necko-backlog]

Kohei Yoshino [:kohei]

Updated

•

8 years ago

Keywords: dev-doc-needed, site-compat

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

wrong URI encoding for hash with some special characters

Categories

(Core :: Networking, defect, P3)

Tracking

()

People

(Reporter: tristan.fraipont, Unassigned)

References

Details

(Keywords: dev-doc-needed, site-compat, Whiteboard: [necko-backlog])

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type