Closed Bug 172699 Opened 22 years ago Closed 22 years ago

JS UTF-8 decoder accepts overlong sequences

Categories

(Core :: JavaScript Engine, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: jgmyers, Assigned: rogerl)

References

Details

(Keywords: js1.5, Whiteboard: [Have filed bug 173180 against Rhino for the same issue])

Attachments

(1 file)

Another UTF-8 decoder, another instance of the same botch as bug 50702 and bug 74198. This one is in Utf8ToOneUCS4Char() in js/src/jsstr.c
Attached patch Proposed fixSplinter Review
cc'ing reviewers for the provided patch -
Assignee: rogerl → khanson
rogerl, this looks like your code originally (jsstr.c rev 3.20) -- can you r= and get the patch in? Thanks, and thanks to jgmyers for the find and fix. /be
Assignee: khanson → rogerl
Keywords: js1.5, mozilla1.2
Comment on attachment 101774 [details] [diff] [review] Proposed fix r=rogerl. Phil - a test case is decodeURI("%C0%AF").charCodeAt(0) which should result in 65533
Attachment #101774 - Flags: review+
Fix checked in (inferring sr from Brendan). Waldemar - the algorithm in 15.1.3.1 doesn't address this issue, should this be raised at ECMA or leave this as a Netscape security 'extension'?
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Unicode 3.1 prohibits UTF-8 decoders from accepting overlong sequences. Also see http://www.unicode.org/versions/corrigendum1.html I suggest bringing this up with ECMA, as this would appear to be an inconsistency with the Unicode standard.
Adding Waldemar for ECMA comments.
Testcase added to JS testsuite: mozilla/js/tests/js1_5/Regress/regress-172699.js
Marking Verified FIXED. The above testcase now passes. It used to fail as follows: *-* Testcase js1_5/Regress/regress-172699.js failed: Bug Number 172699 STATUS: UTF-8 decoder should not accept overlong sequences Failure messages were: FAILED!: Section 1 of test - FAILED!: Expected value '65533', Actual value '37' The test is currently failing in Rhino in exactly this way. I have filed bug 173180 against Rhino for this issue -
Status: RESOLVED → VERIFIED
Summary: js UTF-8 decoder accepts overlong sequences → JS UTF-8 decoder accepts overlong sequences
Whiteboard: [Have filed bug 173180 against Rhino for the same issue]
But why the fix treats overlongs by replacing them by 0xFFFD and not throwing an exception like any other invalid UTF-8 would do? My understanding of http://www.unicode.org/unicode/uni2errata/UTF-8_Corrigendum.html is that overlongs are as broken as any other invalid UTF-8 sequences and should not be treated in a different way.
Throwing an exception would be acceptable, though it doesn't look like UTF8ToOneUCS4Char() can do that directly.
Flags: testcase+
Depends on: 511859
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: