UTF-8 decoder accepts overlong sequences

VERIFIED INVALID

Status

()

Core
Internationalization
--
major
VERIFIED INVALID
17 years ago
17 years ago

People

(Reporter: Christopher Hoess (gone), Assigned: Frank Tang)

Tracking

({intl})

Trunk
x86
Windows NT
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

(Reporter)

Description

17 years ago
In Section 4, a number of overlong UTF-8 sequences are included in the document,
which should be rejected by the UTF-8 decoder.  Failure to do so opens the
possibility of an attack based on concealing line feeds, etc. in overlong
sequences.  The document contains a number of UTF-8 "edge cases", not all of
which may have security implications.
Interesting...sounds like this could be a security problem. Reporter, could you
provide a demonstration of how this could be exploited?

Reassigning to I18n, I believe that's the right component.
Assignee: mstoltz → nhotta
Component: Security: General → Internationalization
QA Contact: ckritzer → andreasb

Comment 2

17 years ago
Reassign to ftang.
I think this has already been done by jgmyers.
Assignee: nhotta → ftang
(Reporter)

Comment 3

17 years ago
Unfortunately, I can't provide a demonstration, as I don't have a hex editor on
hand, but I can describe approximately how to construct one.  Using one of the
overlong character representations in the document, construct a line feed,
rather than a slash character.  Feed this string into some input where linefeeds
are normally stripped (some/all URLs?), and see if the character is, in fact,
stripped.

Even if this doesn't present a security risk, the commentary in the document
suggests that such representations are to be discouraged (as they can be used to
exploit unwary UTF-8 decoders); perhaps we should refuse to decode to discourage
their use (which, AFAIK, is minimal, but someone could always write a broken
creation tool...) 
(Assignee)

Comment 4

17 years ago
I don't think there are a REAL security hole here. I know unicode.org recently
change the definitation of UTF-8 as specified in
http://www.unicode.org/unicode/reports/tr27/

Updated

17 years ago
QA Contact: andreasb → ylong
(Assignee)

Comment 5

17 years ago
We should list all the code in mozilla which do utf8 conversion here first. 
Status: NEW → ASSIGNED

Updated

17 years ago
Keywords: intl

Comment 6

17 years ago
I've long since fixed this in the two UTF-8 decoders I've found.  Once in the 
intl UTF-8 decoder, once in the string code.  The fixed code decodes overlong 
sequences to REPLACEMENT CHARACTER.

Please be specific as to which test cases in section 4 the current code 
interprets incorrectly.


(Assignee)

Comment 7

17 years ago
Christopher Hoess:
ok, we believe that we already fix this issue. if you still think mozill have
this issue, then please provide step by step test cases and reopen it. Thanks
Status: ASSIGNED → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → INVALID

Comment 8

17 years ago
Verified.  Re-open it in case you see the problem again.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.