Closed Bug 140013 Opened 23 years ago Closed 16 years ago

Incorrect Display of Character Encoding Unicode UTF-8 (Tamil)

Categories

(Core :: Internationalization, defect)

x86
Windows 98
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Future

People

(Reporter: gsathis, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(6 files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0rc1) Gecko/20020417 BuildID: 2002041711 I am please with the support of Unicode Encoding support on Mozilla. I have visited some sites which have Tamil characters to test the display. Though most of the characters display correctly, few of them don't, namely the UyirMai Characters. AA and II Uyirmai's are displaying right and the rest the characters are interchanged. I am on windows XP box and I haven't installed any unicode fonts in addition to whatever that was available on WinXP. I believe they already have font "Latha" to support Tamil Characters. Reproducible: Always Steps to Reproduce: 1.Visit the Site http://www.ss003b3751.pwp.blueyonder.co.uk/Tamil/Unicode/Tamil%20Unicode.html#Testforcorrectdisplay 2. Go through the characters displayed for testing carefully 3.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl
QA Contact: ruixu → ylong
ftang: I have no help with Tamil characters. Please help.
Assignee: yokoyama → ftang
Indic display, future.
Status: NEW → ASSIGNED
Target Milestone: --- → Future
I would be interested in helping you with Tamil Characters.
I'm giving you a link to a test page for Tamil unicode as rendered using mozilla and IE. THE URL for test page is http://www.pathcom.com/~u1037916/sirikka.htm The IE rendering is http://groups.yahoo.com/group/e-Uthavi/files/unicode-ie.jpg Moz rendering is http://groups.yahoo.com/group/e-Uthavi/files/unicode-mozilla.jpg There is also another bug for enabling TSCII encoding support in mozilla here.. http://bugzilla.mozilla.org/show_bug.cgi?id=186463 Please let us know what other information you need to move forward on Tamil Support.
Please try to fix this bug as soon as possible. As for now there are no standard ways to make tamil webpages that can be viewed by both mozilla and IE users, since mozilla does not view tamil unicode pages correctly. Please do not force the webmasters to continue to use non-standard fonts and non-standard webdesign choices. Thanks.
Attached file a simple test case
This is a very simple test case with font-family set to Code2000 (http://home.att.net/~jameskass). Code2000 font is known to have opentype layout table for Tamil script, which I've just confirmed. It seems like even MS IE 6 (with the latest test version of Uniscribe dll installed) doesn't support Tamil rendering with Tamil Opentype fonts. Actually, this test page is rendered a little better by Mozilla (that supports a simple overstriking of zero-width glyphs) than by MS IE (that doesn't support a simple overstriking). You may wonder why then in your screenshot MS IE rendered Tamil text well. That's because the page you took a screenshot of uses web font (eot font) that's only supported by MS IE. Mozilla does not support web fonts and perhaps will never. That being said, let me tell you how I think we have to support Tamil. As I wrote in bug 186463, we can support Tamil web pages in UTF-8 in two ways. One is with TSCII-encoded fonts and the other is with Tamil Opentype fonts such as Code2000 (see http://www.microsoft.com/typography/otfntdev/tamilot/default.htm). The first one is a short-term solution while the latter is a long-term solution.
On Windows 2k and Windows XP, Tamil pages encoded in UTF-8 get rendered *correctly* when fonts with opentype layout tables (GSUB,GPOS) for Tamil(Code2000 is one of such fonts and available at http://home.att.net/~jameskass. For Tamil OT font development, see http://www.microsoft.com/typography/otfntdev/tamilot/default.htm) are installed on the system and Tamil (and other language) language support option(s) are installed (they come on the OS CD ROM. All you have to do is to go to the Control Panel | Language?? and click a few times..). See news://news.mozilla.org:119/b8fsm8$dli3@ripley.netscape.com for details. (in short, on Win2k/XP, complex script rendering just works as far as Win2k/XP support it. However, there are other issues to consider. See the article for them.) The screenshot in comment #4 must have been taken without installing either (a) Tamil opentype font(s) or Tamil support option in Windows XP. Otherwise, the titlebar would have shown Tamil correctly instead of empty boxes. In conclusion, as far as Windows 2k/XP is concerned, this bug is *invalid*. To support Tamil in UTF-8/UTF-16/UTF-32 on other platforms (including Win9x/ME, Unix-like/POSIS system, MacOS,etc) is another issue. Changing the platform to all (more exactly all but Win2k/XP) or one of those would make this bug valid. BTW, I'm afraid the author of the document at the URL box of this bug appears misinformed about the Unicode encoding model of Tamil. (http://www.unicode.org/book/ch09.pdf : section 6 of chapter 9). Another BTW, Tamil web page writers had better encourage site visitors to install Tamil opentype fonts instead of relying on web/dynamic fonts. Using web/dynamic fonts works(as is done at http://www.pathcom.com/~u1037916/sirikka.htm) only if browser supports web/dynamic fonts (and Mozilla does not.)
This screenshot shows that both Mozilla and MS IE 6 renders my test page (slightly modified from attachment 121786 [details]) identically. Note that when Code2000 (with OT layout tables for Tamil) is used Tamil text is rendered correctly while with Arial MS Unicode (*without* OT layout tables for Tamil) Tamil text is rendered with nominal glyphs without any shaping/reordering applied.
I have to clarify that attachment 121872 [details] was not taken with attachment 121786 [details]. I replaced <pre></pre> with <br>'s at the end of each line because I realized that opentype layout table present in Code2000 is not made use of (by MS IE) when rendering text enclosed by <pre> tag. As James Kass kindly pointed out, for text within <pre> 'monospace' font (which Code2000 is not) is used. Nonetheless, Code2000 is picked up by MS IE(because there's no alternative), but OT layout table doesn't get utilized for text in <pre> block for some reason. BTW, I'm sorry I was wrong to say http://www.pathcom.com/~u1037916/sirikka.htm use web/dynamic font. I must have seen web/dynamic fonts used somewhere else and mixed it up with this. (Aha.. it was http://www.murasu.com/unicode/sample.html that uses web/dynamic fonts)
This is a sample of Linear Tamil and omplex rendered normal Tamil display and other examples for typical erronous displays.
Comment on attachment 122035 [details] Linear Tamil and normal (Complex) Tamil Normal Tamil Display - correct Linear Tamil Display - correct Typical erronous Tamil display are shown
Please, give us the sample text used in your screenshot *in UTF-8*. Without it, the screenshot is of little use. (I can reproduce the text based on your screenshot, but it'll take me at least 10minutes without knowing the language and the script). BTW, I believe Mozilla can render Tamil as well as MS IE6 under Win2k/XP. Therefore, this bug is invalid as long as the platform is set to Windows XP as I wrote in comment #9.
I would like this problem to be fixed on all platforms, specielly Linux. Shall I create a new bugreport? Manmathan
It was taken with my patch to bug 176290 and my patch to add Unicode->TSCII converter(yet to be uploaded) applied along with the following entries in fontEncoding.properties. (this works in similar way to the way bug 176315 and bug 203052 was/is about to be fixed.) # Tamil fonts (TSCII encoding : see http://www.tscii.net) # These fonts have pseudo-Unicode cmap with TSCII interpreted as Windows-1252. encoding.tsc_avarangal.ttf = x-tamilttf.wide encoding.tsc_aparanarpdf.ttf = x-tamilttf.wide encoding.tsc_avarangalfxd.ttf = x-tamilttf.wide encoding.tsc_paranbold.ttf = x-tamilttf.wide encoding.tsc_paranarho.ttf = x-tamilttf.wide encoding.tsc_kannadaasan.ttf = x-tamilttf.wide # These two fonts don't have Unicode cmap but have pseudo-Apple Roman cmap # with TSCII assignment. encoding.tsc_aandaal.ttf = x-tscii encoding.tsc_aandaal.ftcmap = apple-roman encoding.tsc_paranarpdf.ttf = x-tscii encoding.tsc_paranarpdf.ftcmap = apple-roman The encoder works fine, but for a mysterious reason, 'U' and 'UU' following HA and SSA got rendered with incorrect glyphs. For U and UU, the TSCII encoder returns the correct 'font custom codepoints', but XftCharIndex comes up with something strange for glyph IDs.Other than that, vowel splitting, reordering and consonant conjuncts and so forth work fine within the limitation of TSCII glyph encodings. More complex (optional) ligatures specified in Unicode 3.0 cannot be acheive with the limited glyph repertoire of TSCII fonts. As for filing a new bug, I guess either we can do that or we can change the platform of this bug to Linux.
I filed bug 204039 for Mozilla-Xft (Linux). To keep this bug open, I'm changing the platform to Win98 (Win95/WinME) because on WinXP/2k, Tamil is rendered perfectly well with Tamil opentype fonts. There's a problem with the font selection, though. (see bug 204586).
OS: Windows XP → Windows 98
Depends on: 204039
In a new patch to bug 204039, Win 9x/ME support was added so that this bug will be fixed when the patch for bug 204039 is landed.
In addition, I have problems correctly displaying UTF-8 for this Vietnamese site : http://www.lyricafe.com/index.php I'm not sure if the server is the problem or mozilla; the page is displayed correctly in IE6 though.
Kevin: thank you for trying to avoid filing a dupe, but www.lyricafe.com is a different case. The site is sending malformed UTF-8 and in my opinion IE is totally wrong to try to "correct" it.
*** Bug 221123 has been marked as a duplicate of this bug. ***
(In reply to comment #19) The bug for Tamil is still there (Windows XP, Mozilla 1.7, e. g. on the BBC Tamil site) and I also have no problems with that in IE 6 where all is rendered correctly. I think this problem should be fixed as soon as possible as it is impossible to read Tamil at Mozilla at present.
(In reply to comment #22) > (In reply to comment #19) > The bug for Tamil is still there (Windows XP, Mozilla 1.7, e. g. on the BBC > Tamil site) and I No, it's not there on Windows XP/2k unless text-justify is specified. You have a problem at BBC site because you didn't turn on the complex script support on your Windows XP. Even with that disabled, MS IE works because it directly uses Uniscribe APIs while Mozilla indirectly uses Uniscribe via the standard text drawing APIs. See also bug 218887.
Depends on: uniscribe
what a hack. I have not touch mozilla code for 2 years. I didn't read these bugs for 2 years. And they are still there. Just close them as won't fix to clean up.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Mass Re-open of Frank Tangs Won't fix debacle. Spam is his responsibility not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be re-open. Spam is his fault not my own
Assignee: ftang → nobody
Status: REOPENED → NEW
Assignee: nobody → jshin1987
Flags: blocking-aviary1.1?
Flags: blocking-aviary1.1? → blocking-aviary1.1-
This issue is fixed in Firefox 3.X Indic (Tamil, Hindi) fonts were not rendered correctly bcz of Windows XP. Go to regional language option and enable support. anyway you dont' need to do anything funny if you're using firefox 3.X
WORKSFORME per last comment
Status: NEW → RESOLVED
Closed: 20 years ago16 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: