Closed Bug 91296 Opened 23 years ago Closed 23 years ago

UTF-8 decoder accepts overlong sequences

Categories

(Core :: Internationalization, defect)

x86
Windows NT
defect
Not set
major

Tracking

()

VERIFIED INVALID

People

(Reporter: choess, Assigned: ftang)

References

()

Details

(Keywords: intl)

In Section 4, a number of overlong UTF-8 sequences are included in the document,
which should be rejected by the UTF-8 decoder.  Failure to do so opens the
possibility of an attack based on concealing line feeds, etc. in overlong
sequences.  The document contains a number of UTF-8 "edge cases", not all of
which may have security implications.
Interesting...sounds like this could be a security problem. Reporter, could you
provide a demonstration of how this could be exploited?

Reassigning to I18n, I believe that's the right component.
Assignee: mstoltz → nhotta
Component: Security: General → Internationalization
QA Contact: ckritzer → andreasb
Reassign to ftang.
I think this has already been done by jgmyers.
Assignee: nhotta → ftang
Unfortunately, I can't provide a demonstration, as I don't have a hex editor on
hand, but I can describe approximately how to construct one.  Using one of the
overlong character representations in the document, construct a line feed,
rather than a slash character.  Feed this string into some input where linefeeds
are normally stripped (some/all URLs?), and see if the character is, in fact,
stripped.

Even if this doesn't present a security risk, the commentary in the document
suggests that such representations are to be discouraged (as they can be used to
exploit unwary UTF-8 decoders); perhaps we should refuse to decode to discourage
their use (which, AFAIK, is minimal, but someone could always write a broken
creation tool...) 
I don't think there are a REAL security hole here. I know unicode.org recently
change the definitation of UTF-8 as specified in
http://www.unicode.org/unicode/reports/tr27/
QA Contact: andreasb → ylong
We should list all the code in mozilla which do utf8 conversion here first. 
Status: NEW → ASSIGNED
Keywords: intl
I've long since fixed this in the two UTF-8 decoders I've found.  Once in the 
intl UTF-8 decoder, once in the string code.  The fixed code decodes overlong 
sequences to REPLACEMENT CHARACTER.

Please be specific as to which test cases in section 4 the current code 
interprets incorrectly.


Christopher Hoess:
ok, we believe that we already fix this issue. if you still think mozill have
this issue, then please provide step by step test cases and reopen it. Thanks
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → INVALID
Verified.  Re-open it in case you see the problem again.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.