If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Update UTF-8 for Unicode 3.01 conformance

VERIFIED FIXED in mozilla0.9.5



16 years ago
16 years ago


(Reporter: bobj, Assigned: Roy Yokoyama)



Firefox Tracking Flags

(Not tracked)


(Whiteboard: PDT+)


(2 attachments)



16 years ago
Unicode 3.1 updated the definition of UTF-8 to help prevent security issues.
We need to modify the UTF-8 converter to  not interpret "non-shortest forms".

See http://www.unicode.org/unicode/reports/tr27/index.html:

The current conformance clause C12 in The Unicode Standard, Version 3.0 forbids
the generation of "non-shortest form" UTF-8, and forbids the interpretation of
illegal sequences, but not the interpretation of "non-shortest form". Where
software does interpret the non-shortest forms, security issues can arise. For

    * Process A performs security checks, but does not check for non-shortest forms.
    * Process B accepts the byte sequence from process A, and transforms it into
UTF-16 while interpreting non-shortest forms.
    * The UTF-16 text may then contain characters that should have been filtered
out by process A.

To address this issue, the Unicode Technical Committee has modified the
definition of UTF-8 to forbid conformant implementations from interpreting
non-shortest forms for BMP characters, and clarified some of the conformance

Comment 1

16 years ago
Nominating this for nsenterprise.
Keywords: nsenterprise

Comment 2

16 years ago
We also need to explore if there is a way to 
update the UTF-8 converter for earlier versions of 
Netscape 6.

Comment 3

16 years ago
Actually this may have been updated for Unicode 3.01:

Updated Summary from:
   Update UTF-8 for Unicode 3.1 conformance
   Update UTF-8 for Unicode 3.01 conformance

Added nsbranch keyword.

momoi> We also need to explore if there is a way to 
momoi> update the UTF-8 converter for earlier versions of 
momoi> Netscape 6.

If the fix is just in the UTF-8 converter, we should be able to provide
updated an XPCOM converter module for Netscape 6.0 and 6.1.
Frank, Is that correct?
Keywords: nsbranch
Summary: Update UTF-8 for Unicode 3.1 conformance → Update UTF-8 for Unicode 3.01 conformance

Comment 4

16 years ago
In a Netscape internal bug, shanjian mentioned that
achieving efficiency with the proposed change in the UTF-8 
converter may take some additional work.

Comment 5

16 years ago
I didn't not realize unicode3.1 addressed this problem when I wrote that. I 
guess that unicode 3.1 must have a converting program in the companion CD. If 
that is the case, we can just borrow that code and implementation shouldn't take 
a lot of time. 

Comment 6

16 years ago
Isn't it just a matter of checking bit patterns as described in Table 3.1B in

            Table 3.1B. Legal UTF-8 Byte Sequences
 Code Points         1st-Byte  2nd-Byte  3rd-Byte  4th-Byte
 U+0000..U+007F      00..7F      
 U+0080..U+07FF      C2..DF    80..BF     
 U+0800..U+0FFF      E0        A0..BF    80..BF   
 U+1000..U+FFFF      E1..EF    80..BF    80..BF   
 U+10000..U+3FFFF    F0        90..BF    80..BF    80..BF
 U+40000..U+FFFFF    F1..F3    80..BF    80..BF    80..BF
 U+100000..U+10FFFF  F4        80..8F    80..BF    80..BF

Table 3.1B. lists all of the byte sequences that are legal in UTF-8. A
range of byte values such as A0..BF indicates that any byte from A0 to
BF (inclusive) is legal in that position. Any byte value outside of the
ranges listed is illegal. For example, the byte sequence <C0 AF> is
illegal since C0 is not legal in the 1st Byte column. The byte sequence
<E0 9F 80> is illegal since in the row where E0 is legal as a first
byte, 9F is not legal as a second byte. The byte sequence <F4 80 83 92>
is legal, since every byte in that sequence matches a byte range in a
row of the table (the last row).

    * Cases where a trailing byte range is not 80..BF are underlined in
      the table to draw attention to them. These occur only in the
      second byte of a sequence.

Comment 7

16 years ago
The simplest implementation will be like that. But we probably want to optimize 
the code and try to achieve the same result with no or less extra performance 
cost. I think somebody in unicode society already did this for us, so we can 
just borrow the code or algorithm. 

Comment 8

16 years ago
security issue. also, easy to fix. moz0.9.4
Keywords: nsbranch+
Target Milestone: --- → mozilla0.9.4


16 years ago
QA Contact: andreasb → ylong

Comment 9

16 years ago
The security issue here is:

  Do we do to help *poorly* written 3rd party apps avoid parsing errors?

Comment 10

16 years ago
s/Do/What do/

Comment 11

16 years ago
Created attachment 47746 [details] [diff] [review]
Check for Unicode byte

Comment 12

16 years ago
since the code is poorly ident,  I also change a lot of tab and space and the
diff is a -uw .  Please ignore the the ugly looking of tab/space in the patch.
The check in will show nicely with the while file follow mozilla identification. 

jbetak/shanjian can you review this code?

Comment 13

16 years ago
This get our attention because a real security hole exist somewhere


16 years ago
Keywords: nsbranch

Comment 14

16 years ago
The security hole in webmail has already been fixed by webmail team. The
importanance of this fix have been lowered now. 

Comment 15

16 years ago
Lowered, but still high.

If we fix this, then our client can foil any future similar exploits which
use non-shortest forms of UTF-8 for spoofing.

Comment 16

16 years ago
fully tested. Need code review. 

Comment 17

16 years ago
Two decisions we need to make:

(1) Do make new XPCOM converter modules available for 6.1?  (The solution for 6.0
    users should be to upgrade to 6.1 + the new XPCOM converter module.)
(2) If we do (1), should we create a 6.11 or do a silent upgrade?  If we do a
    silent upgrade, how do users know if they have the fix or not?  Do we
    have them check the the size and date of the converter module?

Comment 18

16 years ago
> If we fix this, then our client can foil any future similar 
> exploits which use non-shortest forms of UTF-8 for spoofing.

There are non-Netscape webmail services in which the exploit
is still problematical. (Hotmail & Yahoo, for example.) Mozilla/NS 6
users use these services and we should not be contributing to a security

Comment 19

16 years ago

Comment 20

16 years ago

Comment 21

16 years ago
I will be on vacation start from 9/6. If I don't got a approval of this by
tomorrow noon, then I will check it in after 9/17

Comment 22

16 years ago
Comment on attachment 47746 [details] [diff] [review]
Check for Unicode byte

waiting for /a
Attachment #47746 - Flags: superreview+
Attachment #47746 - Flags: review+

Comment 23

16 years ago
I'll assign this bug to myself while ftang is on vacation.
Assignee: ftang → yokoyama

Comment 24

16 years ago
This has a r= and sr=. Just awaiting an a=, and FTang says it addresses a
secutiry issue (e.g. " . . .*poorly* written 3rd party apps avoid parsing

Nominating for PDT+. Removing nsenterpise from keyword, as I do not believe it
is an enterprise issue.

Keywords: nsenterprise
Whiteboard: PDT

Comment 25

16 years ago
Adding PDT+.
Whiteboard: PDT → PDT+

Comment 26

16 years ago
Comment on attachment 47746 [details] [diff] [review]
Check for Unicode byte

Fully tested by ftang 
s comment on 2001-09-04 10:53
Attachment #47746 - Attachment description: patch- need more teesting. → Check for Unicode byte


16 years ago


16 years ago
Whiteboard: PDT+ → PDT+, r and sr'd waiting for approval

Comment 27

16 years ago
0.9.4 is out the door.
Group: netscapeconfidential?
Target Milestone: mozilla0.9.4 → mozilla0.9.5

Comment 28

16 years ago
U got the PDT+. Pls check it in ASAP

Comment 29

16 years ago
Checked into 0_9_4_BRANCH. I'll check into the trunk once opened.
Whiteboard: PDT+, r and sr'd waiting for approval → PDT+

Comment 30

16 years ago
thanks. roy

Comment 31

16 years ago
someone said N4.x have the same problem. But I look at it, n4.x do not have the 
same problem as 6.1 and IE does. 

Comment 32

16 years ago
Created attachment 50136 [details]
test cases

Comment 33

16 years ago
Frank's test case has passed on 09-20 Branch build / Windows2000.

Will verify it on Trunk build once it get checked in there.

Comment 34

16 years ago
> someone said N4.x have the same problem. But I look at it, n4.x 
> do not have the same problem as 6.1 and IE does. 

Both Takagi-san and I reproduced the problem with 4.78 at Netscape 
Webmail -- not once but several times. I also looked at Yahoo and 
Hotmail and they also had the problem. 

None of these sites exhibit the problem today with Comm 4.78 
and I gather that they fixed the problem. Is it possible that 
your test case only covers one possible way the exploit works?


Comment 35

16 years ago
checked into the trunk..
Last Resolved: 16 years ago
Resolution: --- → FIXED

Comment 36

16 years ago
Frank's testcase has passed on 09-24 trunk build / Win2000-CN.

Mark it as verified, please re-open if there is some other case(s) might cause 
the problem.
You need to log in before you can comment on or make changes to this bug.