Closed Bug 91296 Opened 23 years ago Closed 23 years ago

UTF-8 decoder accepts overlong sequences

Tracking

()

Status:

VERIFIED INVALID

People

(Reporter: choess, Assigned: ftang)

References

(
URL
)

Details

(Keywords: intl)

Christopher Hoess (gone)

Reporter

Description

•

23 years ago

In Section 4, a number of overlong UTF-8 sequences are included in the document,
which should be rejected by the UTF-8 decoder.  Failure to do so opens the
possibility of an attack based on concealing line feeds, etc. in overlong
sequences.  The document contains a number of UTF-8 "edge cases", not all of
which may have security implications.

Mitchell Stoltz (not reading bugmail)

Comment 1

•

23 years ago

Interesting...sounds like this could be a security problem. Reporter, could you
provide a demonstration of how this could be exploited?

Reassigning to I18n, I believe that's the right component.

Assignee: mstoltz → nhotta

Component: Security: General → Internationalization

QA Contact: ckritzer → andreasb

nhottanscp

Comment 2

•

23 years ago

Reassign to ftang.
I think this has already been done by jgmyers.

Assignee: nhotta → ftang

Christopher Hoess (gone)

Reporter

Comment 3

•

23 years ago

Unfortunately, I can't provide a demonstration, as I don't have a hex editor on
hand, but I can describe approximately how to construct one.  Using one of the
overlong character representations in the document, construct a line feed,
rather than a slash character.  Feed this string into some input where linefeeds
are normally stripped (some/all URLs?), and see if the character is, in fact,
stripped.

Even if this doesn't present a security risk, the commentary in the document
suggests that such representations are to be discouraged (as they can be used to
exploit unwary UTF-8 decoders); perhaps we should refuse to decode to discourage
their use (which, AFAIK, is minimal, but someone could always write a broken
creation tool...)

Frank Tang

Assignee

Comment 4

•

23 years ago

I don't think there are a REAL security hole here. I know unicode.org recently
change the definitation of UTF-8 as specified in
http://www.unicode.org/unicode/reports/tr27/

Andreas Becker

Updated

•

23 years ago

QA Contact: andreasb → ylong

Frank Tang

Assignee

Comment 5

•

23 years ago

We should list all the code in mozilla which do utf8 conversion here first.

Status: NEW → ASSIGNED

Andreas Becker

Updated

•

23 years ago

Keywords: intl

John G. Myers

Comment 6

•

23 years ago

I've long since fixed this in the two UTF-8 decoders I've found.  Once in the 
intl UTF-8 decoder, once in the string code.  The fixed code decodes overlong 
sequences to REPLACEMENT CHARACTER.

Please be specific as to which test cases in section 4 the current code 
interprets incorrectly.

Frank Tang

Assignee

Comment 7

•

23 years ago

Christopher Hoess:
ok, we believe that we already fix this issue. if you still think mozill have
this issue, then please provide step by step test cases and reopen it. Thanks

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → INVALID

Yuying Long

Comment 8

•

23 years ago

Verified.  Re-open it in case you see the problem again.

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

UTF-8 decoder accepts overlong sequences

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: choess, Assigned: ftang)

References

(
URL
)

Details

(Keywords: intl)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Updated

Comment 6

Comment 7

Comment 8