Closed Bug 86427 Opened 24 years ago Closed 21 years ago

Vietnamese support is deficient (UTF-8 and VISCII)

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED INVALID
Future

People

(Reporter: fare+mozilla, Assigned: smontagu)

Details

(Keywords: intl)

Attachments

(2 files)

I've long tried to get a linux browser to read vietnamese correctly. Painful attempts with Netscape 4 were unsuccessful. My latest attempt with mozilla 0.9.1 is both encouraging and unsatisfying. I've converted some vietnamese text into UTF-8. Works perfectly on Windows with a recent IE or Mozilla (unicode fonts preinstalled; didn't check font auto-installation). Mozilla 0.9.1 on Linux will kind of show it correctly, but will use a really **** scaled raster font for any character not in latin1, even though I do my best to specify as "Unicode" font a TrueType font that does have all required characters. You can test mozilla with the following URL: http://ciev.org/1984-vi-utf8-html/ By comparison, my previous attempt, using VISCII 1.1, doesn't work with the most recent IE under windows (5.5) or mozilla under Linux (0.9.1), but kind of works with mozilla under Windows (if I re-specify the font everytime). http://ciev.org/1984-vi-viscii-html/ Mozilla 0.9.1 will kind of work if forced into VISCII encoding, which is a global setting and uses the ugly unscaled raster font. What's the Right Thing(tm) to declare VISCII as charset encoding? Note: the œ character, essential to fully support french, is also displayed in this ugly font under Linux.
marking NEW.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I the real problem is we do not let users specify a vietnames font in the font pref and we do not recognize vietnames font In Linux, what is the XLFD on vietnames font? where can we find one? reassign to bstell@netscape.com and mark it as future
Assignee: nhotta → bstell
Target Milestone: --- → Future
I don't know zilch about XLFD. If you give me sensible URLs, I'm willing to have a look. Maybe selecting a vietnamese font might help; but even then, a main difficulty with vietnamese is that it mixes of latin characters in the 00-7F range, the 100-1FF range, and the 1E00-1EFF range, and you want words in a vietnamese text to be displayed with the SAME font, for the sake of readability. So stubborn per-character range-checking won't do it. But then, mix of ranges happen for other languages (e.g. european ones), so if these should work (french has same problem as vietnamese "thanks" to œ), then vietnamese should too, by the same solution. Why can't I have a way to just specify Bitstream CyberBit or Arial Unicode MS, or some other TrueType font with large Unicode support, and get a clean uniform result from my browser in any mix of language? I don't know how IE manages things, but at least my pages display nicely with it, so something should be posible. At the very least, the "current" font shouldn't be overridden with unifont or anything if it contains the required characters. And there should be a way to specify something better than unifont for the overriding font. Additional difficulty with vietnamese fonts: so as to be compatible with legacy software, many of them "cheat" with iso-10646-1 encoding, by faking the default encoding of the system (iso-8859-1 or windows-1252). I don't know if the TrueType versions contain disambiguating information (how can I check?).
BTW, to confirm my latter remark, the VISCII version works with mozilla and IE, if I cheat and DO NOT specify VISCII as the encoding. If I DO specify VISCII as the encoding, then I MUST NOT specify a VISCII font, and let IE choose its nice font and Mozilla choose its ugly font. The problem being VISCII fonts cheating with encoding by being incorrectly declared as standard windows or latin encoding, for compatibility with legacy programs *and documents* (the latter part being the most tricky since there is no universal conversion utility, even less a fully automatic one). Indeed VISCII fonts existed before any office application really supported Unicode. BTW, MSIE seems to grok BASEFONT, and not Mozilla. Is it on purpose, or should I file a bug report?
QA Contact: andreasb → ylong
Status: NEW → ASSIGNED
Ok, after a lot of attempts, I finally got around to have vietnamese work properly. However, it was HELL getting it to work. First problem and fix, I realized that although I had installed fonts that were capable of displaying vietnamese (Verdana, etc), they weren't recognized as such by X, and thus by Mozilla. Of course, Mozilla gave me no hint about it, and I had to discover it painfully, by reading lots of documentation all over the net. Declaring fonts as being iso10646-1 capable in the fonts.dir was a matter of hacking a simple shell/perl script, which it's definitely not something a newbie can do. It might not be strictly Mozilla's fault, but I think the Mozilla documentation should at the very least include some remarks about this in a prominent way in the release notes, or else the unicode support in Mozilla will prove mostly useless to most Linux users. Next problem, configuring Mozilla so as to display all the vietnamese characters using the SAME font. I spent HOURS trying to find a correct setting, because the font selection code in Mozilla SUCKS big time. My! Go see how Konqueror does it -- they do it MUCH MUCH better. Actually, I only got heart trying with Mozilla thanks to Galeon's slow but much faster and much more usable selection code, so I first got it to work with Galeon, and then migrated my settings to Mozilla. At the worst moment, I had *4* classes of vietnamese characters, each displayed with a different font: * vietnamese letters present in western alphabets * vietnamese letters o+ u+ (U+1B0) and some variants * vietnamese letter with ?-shaped accent (da^'u hoi?) * other vietnamese accented letters Ultimately, I "fixed" the problem by selecting Verdana in all 12 proportional font settings of Western, Unicode and User-Defined, and it worked. Font selection is so slow and difficult (over 1 minute to make the slightest change -- and it was much worse when I had those hundreds of fonts installed in the server), that I stopped trying to identify a "minimal solution" once I got things working. A simple way of seeing fonts without selecting them (as in Konqueror) and of applying font settings without Mozilla from the preferences menu (as in Konqueror) without having to spend one minute closing and opening it again would be great. Also, configuring fonts is all the more disheartening since there is no user-available description (except the source) of how font selection works, and thus of how users should be configuring them. Also, if it ever was necessary to have lots of fonts settings, a simple way to copy/paste from settings to settings would help a lot. Note: the situation with vietnamese on MacOS9 seems desperate, so even though it was hell on Linux, it could have been even worse.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Keywords: intl
Mark it as verified for Linux by the previous comment. Please re-open if still has problems.
Status: RESOLVED → VERIFIED
Yow! I am now using Mozilla 0.9.7, and Vietnamese is broken again.You can browse around http://ciev.org/1984-vi-utf8-html/ and see the results.It ought to be all of the same unicode font, say Verdana.I get many different fonts, at least 3 different after I set everythingto Verdana, and maybe 5 or 6 different if settings differ (but I don't have the courage to test anymore, considering the utter slowness of testingfont settings -- maybe a debugging tool to identify which font from whichsetting was used at a given point would help). The following charactersshould (hopefully) span the whole range of character classes used by Mozilla:ASCII and Latin1 characters (a a a' á),latin-N characters (dd đ DD Đ),extended latin characters (o+ ơ u+ ư),characters with ? accent (a? ả),other vietnamese accent combinations(u+' ứ u+? ử e^? ể e~ ẽ o^~ ỗ).I don't recognize which font is chosen for ASCII and Latin1 when I browsea vietnamese page, but it looks like none I selected in the settings,and certainly not the Verdana that I managed to configure for all the other character classes.I don't know how Mozilla handles its fonts, but it seems overly complicated.Instead of building huge kludges, you should promote use of unicode fonts.With Internet Explorer, you have one font choice for all Latin and derived,and Verdana (or Times) and Courier work great. The Microsoft fonts areavailable on all platforms. Bitstream and B&H also have fonts available for everyone. Konqueror also has complicated settings like Mozilla, butat least, it seems to work (plus it's anti-aliased!).If you really want to kluge something that doesn't depend on non-freely available fonts, I think that rather than have settings so complex noone understands how they work, you should have simple settings,and provide one or two "virtual fonts" made from existing free fonts.Or maybe manage to distribute Lucida Console or something like thatas part of the Mozilla package or an associated package.It might pay to do the right thing and achieve the distribution of a realunicode font, rather than add kluges over kluges so as to handlelatin characters.Yours freely, -- #f ?
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
-> ftang
Assignee: bstell → ftang
Status: REOPENED → NEW
This image shows how Mozilla is rendering UTF-8 vietnamese for me. I have installed MS-Arial ttf font, which does work for vietnamese afaik (because it shows fine on windows mozilla) It looks like the font is being rendered as a composition of similar characters, which for example if i highlight a(? (U+1EB3), it is all a single glyph, not separate characters ( which is good except that it is a terrible looking rendering). I wonder what causes this type of output- rather than using the correct glyph in the unicode range U+1E00 – U+1EFF. Someone mentioned getting Xfree to recognize iso10646-1, if they could elaborate how they did that... thx,
As for installing unicode fonts under XFree86. First, you must be TrueType-ready. Install XFree 4.1.0 or later and be sure that FreeType be enabled in the XF86Config-4: in Section "Module", Load "freetype". Using older servers (or servers from an evil proprietary vendor), install xfsft, or the latest xfs from XFree86 that already includes the freetype patches. Then, you must install the fonts. Get the Microsoft web fonts from: http://www.microsoft.com/typography/fontpack/default.htm and unpack them all in /usr/X11R6/lib/fonts/TrueType/ or /usr/local/... or some such -- using debian woody or later, you can just apt-get install msttcorefonts which will do the job for you. Then you must declare the fonts as Unicode-ready. The idea is to create a fonts.dir file with for each font an entry with encoding iso10646-1 as in: Verdana.ttf -Microsoft-Verdana-medium-r-normal--0-0-0-0-p-0-iso10646-1 So you must cd into the directory where you put the .ttf files, run mkfontdir, edit the fonts.dir file it created, and add a line like that for every font. Emacs macros or perl can help you there. Then copy the fonts.dir file into a fonts.scale file, so as to ensure the fonts will work in all sizes. Note that under debian, must instead edit /etc/X11/fonts/TrueType/msttcorefonts.scale and dpkg-reconfigure msttcorefonts (or manually run update-fonts-dir and update-fonts-scale). If you write a HOWTO, or better, a script (perl or whatever) that does everything, and publish it on a web page, you'll do everyone a great service! I'd add it to http://ciev.org/ that already points to a few pages that can help you about Unicode or test your browser with vietnamese. BTW, ciev.org is experiencing DNS problems, so you can try http://206.63.100.249:8108/ instead. Similarly the first page I use to test browser support of vietnamese (cited in a previous addendum to this bug report) is http://ciev.org/1984-vi-utf8-html/1984-1-1.html http://206.63.100.249:8108/1984-vi-utf8-html/1984-1-1.html
Here is a simple no-frills page containing Vietnamese data using Unicode, using characters in the U+1Exx range rather than explicit composition of Unicode characters (hum -- that would require another test case). You can find it on http://ciev.org/1984-vi-utf8-html/1984-1-1.html or (if the DNS is still down) on http://206.63.100.249:8108/1984-vi-utf8-html/1984-1-1.html The test is successful if and only if all characters show as correct Unicode glyphs of the SAME FONT (say, Verdana), just like IE5 or Konqueror 2.2 do it. Currently, Mozilla uses up to 5 different fonts for the text -- ugly.
I went ahead and added the iso10646-1 to my cyberbit font, and restarted xfs. I am running Xfree 3.6, but i worked, and im rather bothered that ttmkfontdir doesnt know how to generate that line. Once i load the 1984 excerpt, xfs tries to rasterize the entire 13 megabyte font, which freezes up my desktop (and production server (now you know why im running 3.6 still)) for about 30 seconds. At least the glyphs are readable now, even if the letters with two diacritics are in a different font than those without, i can at least see the right letters. (Now to learn the language) Now that Ive gotten it this far, Im willing to bet its just not a Mozilla issue, though this has been the most productive forum Ive found for getting this working. This issue probably needs to be taken up with freetypes mkttfntdir and some Xfree86 faq. I cant wait for Xft and render, and it looks like mozilla will be ready. i cant get te the ciev ip address at all, but I think the contents of this bug should at least be harvested for a faq on the subject.
I think once you set the ttf font as ISO-8859-1 then it should be able to see them The problem you saw are the last-resort transliteration code. which linux are you using ?
Status: NEW → ASSIGNED
I'm using the latest packages from the latest (unstable) debian sid (well, now two weeks old). And yes, the fonts are also declared as iso-8859-1 (although in a different directory - I will retry with the same font in same directory, later, in case it might matter). Anyway, once the font is declared as iso-10646-1, which subsumes iso-8859-1, why should mozilla care whether it is also iso-8859-1 at all??? Why does mozilla try to look into other fonts characters that are actually present in the current font, and declared as such? If you can't reproduce the bug (can you not?), I can setup a test system, running an Xvnc server, or whatever suits you.
Using Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:0.9.9+) Gecko/20020318 the browser display is somewhat strange. I have installed the ISO10646-1 fonts found at http://www.cl.cam.ac.uk/~mgk25/unicode.html (TrueType WGL4 fonts don't work correctly here) and tried to view attachment 66689 [details]. Unfortunately it seems that the presence of certain unicode characters triggers a display mode where - the Preferences/Appearance/Fonts setting for Unicode will be ignored (same as comment 5) - ISO8859-1 characters will be displayed in some monospace font - most non-ISO8859-1 characters will be displayed in another monospace font - characters containing a "dau hoi" diacritic will be last-resort-transliterated - in the Tab bar, non-ISO8859-1 characters show up in a third font - if you try to select last-resort-transliterated text the selection does not follow the mouse cursor as it should - in the print preview, the title in the header is displayed correctly in the Adobe-Times ISO10646-1 font! (the rest remains messed up though) In Windows, everything is displayed in the Times New Roman font as defined in the Preferences dialog. As said, this is very strange and if we could find out what is responsible for this mode and fix it, Mozilla usability for displaying vietnamese content would be greatly improved. The print preview header does The Right Thing(tm), why do Navigator and Composer not?
This seems to be WFM, no?
Assignee: ftang → smontagu
Status: ASSIGNED → NEW
> - the Preferences/Appearance/Fonts setting for Unicode will be ignored (same as > comment 5) See bug 256383. Also note that currently Vietnamese is regarded as 'x-western' so that you have to set fonts for Western to Vietnamese fonts you want to use (or Latin fonts with a sufficiently large coverage). Besides, if you have a page in one of Unicode encodings (e.g. utf-8), make sure to specify 'lang' like this <html lang="vi">. If only a part of the document is in Vietnamese, use 'lang="vi"' only in that part (e.g. <div lang="vi">, <span lang="vi">, <p lang="vi">, etc). In case your pages are in one of Vietnamese encoding, Mozilla infers the languge from the page encoding and uses that unless it's overriden explicitly with 'lang' attribute. This was not a bug per se but a documentation issue (it should have been made clear that fonts for Western have to be used to set fonts for Vietnamese in one of Vietnamese encodings). Perhaps, we have to open a new bug about help and other documentation issues on the font selection.
Status: NEW → RESOLVED
Closed: 24 years ago21 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: