Closed Bug 182751 Opened 22 years ago Closed 22 years ago

Mozilla accepsts malformed UTF-8 byte sequences

Categories

(Core :: Security, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: alexeyc2003, Assigned: smontagu)

References

()

Details

(Keywords: intl, testcase, topembed+)

Attachments

(2 files, 2 obsolete files)

OS: Win2K Build: 2002111904 (Trunk) http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt This URL contains UTF-8 decoder capability and stress test. Mozilla seem to fail a maximum overlong sequences test 4.2.3 and all tests in section 5 - illegal code positions. The document also briefly explains security implications of accepting overlong and malformed sequences. You can use this tool for testing wellformedness and round trip of UTF-8: http://www.macchiato.com/unicode/convert.html Also the following bugs seem to be related: Bug 86411 Bug 172701 Bug 174351
Depends on: 86411, 172701, 174351
'Me Too' - Status update. Testing against the UTF-8 stress test with mozilla-1.3a (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3a) Gecko/20021213), 4.2.3 looks to be 'fixed' (at least, working), but 2.1.3 looks broken, as do 2.3.x and all of the 5.x tests. This is a pretty important security issue and needs to be fixed before the code gets into an enormous installed base (we hope) who never upgrade...
Over to Simon - we should run these tests and act on the results.
Assignee: mstoltz → smontagu
Group: security
Accepting.
Status: NEW → ASSIGNED
Depends on: 191483
There are two problems: a typo at http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUTF8ToUnicode.cpp#185 // from Unicode 3.1, non-shortest form is illegal if(((2==mBytes) && (mUcs4 < 0x0080)) || ((3==mBytes) && (mUcs4 < 0x0800)) || ((4==mBytes) && (mUcs4 < 0x1000)) || // ^^^^^^ -- should be 0x10000 (5==mBytes) || (6==mBytes)) and because this whole block is in the |else| of |if(mUcs4 >= 0x00010000)| at http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUTF8ToUnicode.cpp#173 some cases are never getting tested.
Attached patch Patch (obsolete) — Splinter Review
Fix the problems described in the last comment, add tests for UTF-16 encoded as UTF-8 and other illegal characters, and general cleanup.
Attached patch The same, diff -uw (obsolete) — Splinter Review
Attachment #115576 - Flags: review?(ftang)
Keywords: topembed
plussing per edt triage
Keywords: topembedtopembed+
Attached patch New patchSplinter Review
Removed the test for U+FFFE and U+FFFF per bug 172701 comment 6, cleaned up a bit more and expanded comments.
Attachment #115575 - Attachment is obsolete: true
Attachment #115576 - Attachment is obsolete: true
Comment on attachment 115576 [details] [diff] [review] The same, diff -uw transferring review request
Attachment #115576 - Flags: review?(ftang) → review-
Attachment #116491 - Flags: review?(ftang)
Comment on attachment 116491 [details] [diff] [review] The last patch, diff -uw looks accroding to the new definitation unicode 3.2
Attachment #116491 - Flags: review?(ftang) → review+
Attachment #116491 - Flags: superreview?(heikki)
Comment on attachment 116491 [details] [diff] [review] The last patch, diff -uw I don't claim to understand this Unicode checking code, so my sr is purely a syntax/style-based sr. Please test with every testcase you can get your hands on... The only thing I could notice that needs fixing is the indentation, inconsistent space around operators & tab usage. Please replace all tabs in the file with 2 spaces, and make sure indentation still looks ok.
Attachment #116491 - Flags: superreview?(heikki) → superreview+
The indentation appears wonky in the diff -uw because it's wonky in the original file, but it looks all OK to me in my locak file and attachment 116490 [details] [diff] [review].
I have been testing with http://jazz.mcom.com/users/smontagu/publish/dynamicutf8test.html, which is a version of the URL given here. I'm now looking for some more testcases to run before I check in.
I can see even from the diff that there are tabs!
Also tested by correlating http://www.iur.ut.ee/juridica/utf8-3.htm with the table of legal UTF-8 sequences in http://www.unicode.org/reports/tr28/, and looking at lots of utf-8 pages in different languages to see that they are still displayed OK. I think we are good to go with this.
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
No longer depends on: 174351
Blocks: 86411
No longer depends on: 86411
Clearing security flag: fix was checked in months ago and there was never a demonstrated exploit of the potential to abuse over-long UTF8 encodings.
Group: security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: