Closed Bug 140013 Opened 20 years ago Closed 13 years ago

Incorrect Display of Character Encoding Unicode UTF-8 (Tamil)

Categories

(Core :: Internationalization, defect)

x86
Windows 98
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Future

People

(Reporter: gsathis, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(6 files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0rc1)
Gecko/20020417
BuildID:    2002041711

I am please with the support of Unicode Encoding support on Mozilla. I have
visited some sites which have Tamil characters to test the display. Though most
of the characters display correctly, few of them don't, namely the UyirMai
Characters. AA and II Uyirmai's are displaying right and the rest the characters
are interchanged. I am on windows XP box and I haven't installed any unicode
fonts in addition to whatever that was available on WinXP. I believe they
already have font "Latha" to support Tamil Characters.

Reproducible: Always
Steps to Reproduce:
1.Visit the Site
http://www.ss003b3751.pwp.blueyonder.co.uk/Tamil/Unicode/Tamil%20Unicode.html#Testforcorrectdisplay
2. Go through the characters displayed for testing carefully
3.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl
QA Contact: ruixu → ylong
ftang: I have no help with Tamil characters. Please help.
Assignee: yokoyama → ftang
Indic display, future.
Status: NEW → ASSIGNED
Target Milestone: --- → Future
I would be interested in helping you with Tamil Characters.
I'm giving you a link to a test page for Tamil unicode as rendered using mozilla
and IE. THE URL for test page is http://www.pathcom.com/~u1037916/sirikka.htm
The IE rendering is http://groups.yahoo.com/group/e-Uthavi/files/unicode-ie.jpg
Moz rendering is http://groups.yahoo.com/group/e-Uthavi/files/unicode-mozilla.jpg

There is also another bug for enabling TSCII encoding support in mozilla here..
http://bugzilla.mozilla.org/show_bug.cgi?id=186463

Please let us know what other information you need to move forward on Tamil Support.
Please try to fix this bug as soon as possible. As for now there are no standard
ways to make tamil webpages that can be viewed by both mozilla and IE users,
since mozilla does not view tamil unicode pages correctly.

Please do not force the webmasters to continue to use non-standard fonts and
non-standard webdesign choices.

Thanks.
Attached file a simple test case
This is a very simple test case with font-family set to Code2000 
(http://home.att.net/~jameskass). Code2000 font is known to have
opentype layout table for Tamil script, which I've just confirmed.
It seems like even MS IE 6 (with the latest test version of
Uniscribe dll installed) doesn't support Tamil rendering with 
Tamil Opentype fonts. Actually, this test page is rendered a little
better by Mozilla (that supports a simple overstriking of zero-width
glyphs) than by MS IE (that doesn't support a simple overstriking).

You may wonder why then in your screenshot MS IE rendered Tamil
text well. That's because the page you took a screenshot of uses web font (eot
font) that's only supported by MS IE. Mozilla does not support web fonts and
perhaps will never.  

That being said, let me tell you how I think we have to support Tamil. As I
wrote in bug 186463, we can support Tamil web pages in UTF-8 in two ways. One
is with TSCII-encoded fonts and the other is with Tamil Opentype fonts such as
Code2000
(see http://www.microsoft.com/typography/otfntdev/tamilot/default.htm).
The first one is a short-term solution while the latter is a long-term
solution.
On Windows 2k and Windows XP, Tamil pages encoded in UTF-8 get rendered
*correctly* when fonts with opentype layout tables (GSUB,GPOS)
for Tamil(Code2000 is one of such fonts and available at
http://home.att.net/~jameskass. For Tamil OT font development, see
http://www.microsoft.com/typography/otfntdev/tamilot/default.htm)
 are installed on the system and Tamil (and other language)
language support option(s) are installed (they come on the OS CD ROM.
All you have to do is to go to the Control Panel | Language?? and
click a  few times..). See
news://news.mozilla.org:119/b8fsm8$dli3@ripley.netscape.com
for details. (in short, on Win2k/XP, complex script rendering
just works as far as Win2k/XP support it. However, there
are other issues to consider. See the article for them.) 
The screenshot in  comment #4 must have been taken without installing either (a)
Tamil opentype font(s) or Tamil support option in Windows XP. 
Otherwise, the titlebar would have shown Tamil correctly instead of empty boxes.

In conclusion, as far as Windows 2k/XP is concerned, this bug is *invalid*.  To
support Tamil in UTF-8/UTF-16/UTF-32 on other
platforms (including Win9x/ME, Unix-like/POSIS system, MacOS,etc)
is another issue. Changing the platform to all (more exactly
all but Win2k/XP) or one of those would make this bug valid. 

BTW, I'm afraid the author of the document at the URL box of this
bug appears misinformed about the Unicode encoding model of Tamil.
(http://www.unicode.org/book/ch09.pdf : section 6 of chapter 9).

Another BTW, Tamil web page writers had better encourage site
visitors to install Tamil opentype fonts instead of relying
on web/dynamic fonts. Using web/dynamic fonts works(as
is done at http://www.pathcom.com/~u1037916/sirikka.htm)
only if browser supports web/dynamic fonts (and Mozilla does not.)
This screenshot shows that both Mozilla and MS IE 6 
renders my  test page (slightly modified from attachment 121786 [details])
identically. Note that when Code2000 (with OT layout tables
for Tamil) is used Tamil text is rendered correctly
while with Arial MS Unicode (*without* OT layout tables for
Tamil) Tamil text is rendered with nominal glyphs without
any shaping/reordering applied.
I have to clarify that attachment 121872 [details] was not taken with
attachment 121786 [details]. I replaced <pre></pre> with <br>'s at the end
of each line because I realized that opentype layout table present 
in Code2000 is not made use of (by MS IE) when rendering text enclosed 
by <pre> tag. As James Kass kindly pointed out, for text within <pre>
'monospace' font (which Code2000 is not) is used. Nonetheless, 
Code2000 is picked up by MS IE(because there's no alternative), 
but OT layout table doesn't get utilized for text in <pre> block for some reason.

BTW, I'm sorry I was wrong to say http://www.pathcom.com/~u1037916/sirikka.htm
use web/dynamic font. I must have seen web/dynamic fonts used
somewhere else and mixed it up with this. (Aha.. it was
http://www.murasu.com/unicode/sample.html that uses web/dynamic fonts)

This is a sample of Linear Tamil and omplex rendered normal Tamil display and
other examples for typical erronous displays.
Comment on attachment 122035 [details]
Linear Tamil and normal (Complex) Tamil

Normal Tamil Display - correct
Linear Tamil Display - correct
Typical erronous Tamil display are shown
Please, give us the sample text used in your screenshot *in UTF-8*. Without it, 
the screenshot is of little use. (I can reproduce the text based on your screenshot,
but it'll take me at least 10minutes without knowing the language and the script).
BTW, I believe Mozilla can render Tamil as well as MS IE6 under Win2k/XP. Therefore,
this bug is invalid as long as the platform is set to Windows XP as I wrote in
comment #9.
 
I would like this problem to be fixed on all platforms, specielly Linux. Shall I
create a new bugreport?

Manmathan
It was taken with my patch to bug 176290 and my patch to add Unicode->TSCII
converter(yet to be uploaded) applied along with the following	entries in 
fontEncoding.properties. (this works in similar way to the way
bug 176315 and bug 203052 was/is about to be fixed.)

# Tamil fonts (TSCII encoding : see http://www.tscii.net)
# These fonts have pseudo-Unicode cmap with TSCII  interpreted as Windows-1252.

encoding.tsc_avarangal.ttf = x-tamilttf.wide
encoding.tsc_aparanarpdf.ttf = x-tamilttf.wide
encoding.tsc_avarangalfxd.ttf = x-tamilttf.wide
encoding.tsc_paranbold.ttf = x-tamilttf.wide
encoding.tsc_paranarho.ttf = x-tamilttf.wide
encoding.tsc_kannadaasan.ttf = x-tamilttf.wide
 
# These two fonts don't have Unicode cmap but have pseudo-Apple Roman cmap
# with TSCII assignment.
encoding.tsc_aandaal.ttf = x-tscii
encoding.tsc_aandaal.ftcmap = apple-roman
encoding.tsc_paranarpdf.ttf = x-tscii
encoding.tsc_paranarpdf.ftcmap = apple-roman


The encoder works fine, but for a mysterious reason,
'U' and 'UU' following HA and SSA got rendered with
incorrect glyphs. For U and UU, the TSCII encoder
returns the correct  'font custom codepoints', but XftCharIndex comes up
with something strange for glyph IDs.Other than that,
vowel splitting, reordering and consonant conjuncts and so forth
work fine within the limitation of TSCII glyph encodings. 
More complex (optional) ligatures specified in Unicode 3.0
cannot be acheive with the limited glyph repertoire of TSCII fonts.

As for filing a new bug, I guess either we can do that or we can change the
platform
of this bug to Linux.
I filed bug 204039 for Mozilla-Xft (Linux). To keep
this bug open,  I'm changing the platform to Win98
(Win95/WinME) because on WinXP/2k, Tamil is rendered
perfectly well with Tamil opentype fonts. There's a problem
with the font selection, though. (see bug 204586).
OS: Windows XP → Windows 98
Depends on: 204039
In a new patch to bug 204039, Win 9x/ME support was added so that this bug will
be fixed when the patch for bug 204039 is landed.
In addition, I have problems correctly displaying UTF-8 for this Vietnamese site
: http://www.lyricafe.com/index.php

I'm not sure if the server is the problem or mozilla; the page is displayed
correctly in IE6  though.
Kevin: thank you for trying to avoid filing a dupe, but www.lyricafe.com is a
different case. The site is sending malformed UTF-8 and in my opinion IE is
totally wrong to try to "correct" it.
*** Bug 221123 has been marked as a duplicate of this bug. ***
(In reply to comment #19)
The bug for Tamil is still there (Windows XP, Mozilla 1.7, e. g. on the BBC
Tamil site) and I also have no problems with that in IE 6 where all is rendered
correctly.
I think this problem should be fixed as soon as possible as it is impossible to
read Tamil at Mozilla at present.
(In reply to comment #22)
> (In reply to comment #19)
> The bug for Tamil is still there (Windows XP, Mozilla 1.7, e. g. on the BBC
> Tamil site) and I

No, it's not there on Windows XP/2k unless text-justify is specified. You have a
problem at BBC site because you didn't turn on the complex script support on
your Windows XP. Even with that disabled, MS IE works because it directly uses
Uniscribe APIs while Mozilla indirectly uses Uniscribe via the standard text
drawing APIs. See also bug 218887.
Depends on: uniscribe
what a hack. I have not touch mozilla code for 2 years. I didn't read these bugs
for 2 years. And they are still there. Just close them as won't fix to clean up.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → WONTFIX
Mass Re-open of Frank Tangs Won't fix debacle. Spam is his responsibility not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be
re-open. Spam is his fault not my own
Assignee: ftang → nobody
Status: REOPENED → NEW
Assignee: nobody → jshin1987
Flags: blocking-aviary1.1?
Flags: blocking-aviary1.1? → blocking-aviary1.1-
This issue is fixed in Firefox 3.X

Indic (Tamil, Hindi) fonts were not rendered correctly bcz of Windows XP. Go to regional language option and enable support.

anyway you dont' need to do anything funny if you're using firefox 3.X
WORKSFORME per last comment
Status: NEW → RESOLVED
Closed: 17 years ago13 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.