Open Bug 282168 Opened 20 years ago Updated 2 years ago

Unicode canonical equivalents are not handled correctly

Categories

(Core :: Layout: Text and Fonts, defect)

defect

Tracking

()

People

(Reporter: andres, Unassigned)

References

()

Details

(Keywords: intl)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

Firefox is not able to present Unicode canonical equivalents. In theory the
hexadecimal sequence 4F CC 88 should be displayed just like the sequence C3 96.
Please see: <a
href="http://www.unicode.org/reports/tr15/#Sample">http://www.unicode.org/reports/tr15/#Sample"</a>
and <a href="http://www.unicode.org/notes/tn5">http://www.unicode.org/notes/tn5</a>.

I have created a simple Perl script to illustrate the problem:

#!/usr/bin/perl -w
use strict;
use CGI;
my $cgi = new CGI;
print $cgi->header(-charset => "utf-8"), "<h1>'\x{4F}\x{CC}\x{88}' (4F CC 88)
should be displayed just like '\x{C3}\x{96}' (C3 96)</h1>";

If you want to execute this Perl script you can go <a
href="http://demo.exlibrisgroup.com:3210/demo/cgi/public/utf-8.cgi">http://demo.exlibrisgroup.com:3210/demo/cgi/public/utf-8.cgi</a>

FYI - Both Internet Explorer and Safari display both sequences both the same way. 


Reproducible: Always

Steps to Reproduce:
1. Server sets charset to UTF-8 as part of the header
2. Server sends and sends two sequences (in hex): 4F CC 88 and C3 96. 

Actual Results:  
Firefox shows C3 96 as O with diaeresis and 4F CC 88 as and O followed by
diaresis (or some strange glyph like that).

Expected Results:  
It should have presented both as an O with diaresis: Ö
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050502
Firefox/1.0+ still has the bug.

After looking at the Components for Core page it seems that Core: Layout: Fonts
and Text is the best fit.
Status: UNCONFIRMED → NEW
Component: General → Layout: Fonts and Text
Ever confirmed: true
Product: Firefox → Core
Version: unspecified → Trunk
Firefoxes 1.0.3 (under linuxes and windows xp) / Mozilla 1.7.7 (under debian) /
with Tibetan Machine Uni font. In the « om mani padme hum » mantra : the tibetan
"ཧཱུྃ" (with discouraged unicode : U+0F75 TIBETAN VOWEL SIGN UU) should be
displayed as "ཧཱུྃ" (with the canonical unicode : U+0F71 TIBETAN VOWEL SIGN AA +
U+0F74 TIBETAN VOWEL SIGN U) but the canonical form is wrongly displayed (the
U+0F71 and U+0F74 are at the same vertical level, but the 74 should be under the
71). 

The both form are correctly displayed under konqueror.
*** Bug 312771 has been marked as a duplicate of this bug. ***
Assignee: firefox → nobody
QA Contact: general → layout.fonts-and-text
Keywords: intl
Can someone (reporter) test with current trunk build to see if things have improved?
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/
WFM - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a7pre) Gecko/2007071705 Minefield/3.0a7pre ID:2007071705
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.