I am using the Moziila 1.1b and localising it for the Hindi Indian language. As per the the unicode spcifications the Devanagiri characters are constructed with different character combinations. There are matras (it is a Hindi word) for the vowels in Hindi. These matras are used to construct the correct Devenagiri letters. But it does not work as specified in unicode. I have tested this with a web page from www.bbc.co.uk/hindi/. These is also a Devenagiri font on the site. The characters are dislayed correctly when the page is seen in IE. But when the same page is viewd in the Mozilla the characters do not render properly. I have also experienced during th localisation of Mozilla. I use the unicode editor SC Unipad for editing the Hindi characters. I see the correct combination of chracters. But when I see the characters in mozilla the are not rendered correctly.
see bug 164528...
Summary: Rendering of Devenagiri (Hindi) charachters in Mozilla on Windows 98 Operating system on Mozilla 1.1b → Rendering of Devanagiri (Hindi) charachters in Mozilla on Windows 98 Operating system on Mozilla 1.1b
prabhat: would your checkin for 163962 (8-30-2002) fix this problem? Windows 98. Raj: if you can use Win2K/WinXP, you will be able to enter Hindi char in mozilla now. (trunk build 09-24 or later) :)
Yes, my code itself is Cross-Platform, but i never tried it on Windows. Let me know if you found any problem.
I was tested the Mozilla on Linux platform now. I build the Mozilla from CVS with --enable-ctl option. I see the same problem as in Win9x. The redenring is not proper even on Liniux. Was this tested on Linuix?
This bug report doesn't really contain enough information to be useful. Please read the guidelines at http://www.mozilla.org/quality/bug-writing-guidelines.html to understand the kind of information that a bug report should include. In any case, Devanagari support is currently work in progress. I am marking this as a duplicate of bug 174424. If you still see problems after that bug is marked as FIXED, please feel free to file new bugs.
What is the extra information needed? I build the Mozilla from cvs and tested it on Windows 98 and Redhat Linux 7.2 and on both platforms it does not render the Devanagiri Fonts properly. I have followed the guidelines at http://www.mozilla.org/quality/bug-writing-guidelines.html before submitted the bugs. I may not understand the technical terminology, my bug report is from the point of view of a user. However, I am ready to provide any other information needed. A look at the following page in IE and Mozilla makes it clear what is the wrong with the Mozilla rendering. http://www.unicode.org/unicode/standard/translations/hindi.html
I tried it on SuSe before i first integrated into trunk and it worked fine. Let me do so again sometime in next week (with the new RedHat8.0) and let you know the results.
I am attaching a simple test case that demonstrates this problem. I have verified that the problems exists in Mozilla 1.3a both in Linux (Red Hat 8) and Windows XP, but it works properly in IE6/WinXP. If you compare the display of the attachment in IE6/WinXP vs. Mozilla, you'll see the difference in the second letter, which should be a half-"Na".
> verified that the problems exists in Mozilla 1.3a both in Linux > (Red Hat 8) Windows XP, but it works properly in IE6/WinXP. On Redhat 8, what you have is Xft-build of Mozilla that doesn't make use of hindi shaper even when CTL is enabled. Currently, hindi shaper is only used for Mozilla-X11core (plain gtk without Xft or Xlib). When you run Mozilla-Xft with the environment variable GDK_USE_XFT set to 0, you'll see that Devanagari gets rendered properly assuming that you have X11 core fonts with SunIndic encoding. As for Mozilla under Windows XP, are you sure it doesn't work? It should work (I've just tried your test case under Win2k and it got rendered exactly the same way by both Mozilla-Win and MS IE 6. The same is true of Thai and Tamil I tested the other day.) if you activated the complex script support of the OS by installing any one of language support options that require complex script rendering. That is, if you install support for Hindi, Tamil, Thai, Mozilla-Win should work almost as well as MS IE. The cursor movement andselection don't work yet because nsTextFrame doesn't use Uniscribe, but the rendering should work. Mozilla under Win9x/ME is a different issue. Mozilla is currently using the standard Win32 text APIs for rendering text (complex or not) while MS IE6 apparently uses Uniscribe APIs. There's little difference in *rendering* on platforms where complex scripts are natively supported (WinXP/2k) between two approaches. However, on platforms without native complex script support (i.e. on Win9x/ME), standard Win32 text APIs don't work while Uniscribe APIs still work. Therefore, to make Mozilla render complex script across Win32 platforms (Win9x/ME or Win2k/XP), nsFontMetricsWin has to be recast in terms of Uniscribe APIs (, which will simplify nsFontMetricsWin significantly as would recasting nsFontMetricsXft with Pango APIs). However, there's a dillemma here. On Win9x/ME, usp10.dll(Uniscribe DLL) is available only when either MS IE 5.5/6 or MS Office 2000/XP is installed (or other commerical programs that licensed Uniscribe). script. We can't assume that usp10.dll is available on all Win9x/ME so that we have to check its availability at run-time.... Alternative to using Uniscribe APIs is what was discussed here(using CTL module). A similar approach is being tried for Tamil and Thai rendering for Mozilla-Xft (and is enabled for Korean) (see bug 203052, bug 176315, bug 204039). The upside of that is that Mozilla is self-sustaining (not relying on Uniscribe). On the other hand, it's like reinventing a wheel and it increase the code size to implement what's offered "by the OS" (well only when Mozilla's competitor is installed on Win9x/ME....)
Jungshik was right about Windows. It seems IE in WinXP works fine in normal version, but user first needs to enable Indic script for Mozilla to display Devanagiri properly. I have looked around for a guide on how to get Devanagiri enabled correctly in Red Hat 8, but can't find any. Is there a document that explains it? I have multiple Unicode TTX fonts that support Devanagiri installed in the system, but it still looks weird. Does the normal nightly/milestone build work, or do I have to make my own build with CTL enabled?
For Linux-Xft, see bug 204286 as well as bug 176290. With my patch for bug 176290 landed, it's very easy to support Devanagari shaping in Mozilla-Xft because the converter is already there thanks to Prabhat.
> On Redhat 8, what you have is Xft-build of Mozilla that doesn't > make use of hindi shaper even when CTL is enabled. Currently, > hindi shaper is only used for Mozilla-X11core (plain gtk without > Xft or Xlib). When you run Mozilla-Xft with the environment variable > GDK_USE_XFT set to 0, you'll see that Devanagari gets rendered > properly assuming that you have X11 core fonts with SunIndic encoding. Ok, I'm using Mozilla 1.4RC1 on Redhat 8 via gnome. I installed the SunIndic encoded fonts (saraswati) in /usr/local/share/fonts/bitmap , [alok@localhost bitmap]$ more fonts.dir 2 SaraswatiBold24.pcf.gz -cdac-saraswati-bold-r-normal-devanagari-25-240-75-75-p-240-sun.unicode.india-0 SaraswatiNormal24.pcf.gz -cdac-saraswati-medium-r-normal-devanagari-25-240-75-75-p-240-sun.unicode.india-0 [alok@localhost bitmap]$ xset q [snip] Font Path: /home/alok/.gnome2/share/cursor-fonts,unix/:7100,/usr/local/share/fonts/bitmap,/usr/X11R6/lib/X11/fonts/misc,/usr/X11R6/lib/X11/fonts/100dpi,/usr/X11R6/lib/X11/fonts/75dpi,/usr/share/fonts/indic/OpenType,/home/alok/.gnome2/share/fonts Bug Mode: compatibility mode is disabled [snip] So this directory /usr/local/share/fonts/bitmap is in my fontpath now. Now using a terminal session i do $ export GDK_USE_XFT=0 $ /usr/local/mozilla/mozilla #Mozilla 1.4 RC 1 However, even now the devanagari pages are rendered incorrectly. Firstly, in Edit->Preferences->Fonts For->Devanagari, the options Serif thru Monospace don't have the Saraswati font listed. Nor in proportional. Fonts for Unicode doesn't have them either. And the rendering is done using the listed fonts, ie if I change the font from here, the font gets changed in the rendered page as well, and I know that it is not saraswati font. So what' wrong? Doesn't the env variable setting work? If not, how can I test this feature? If yes, is there some tweaking required with respect to the FontPath?
CTL is NOT enabled in the default build. That's why it doesn't work. Enabling CTL by default is another bug. The bug number is escaping me at the moment. I bookmarked it, but I'm far away from my computer at the moment. Anyway, you can easily find it out and add your comment there if necesary. Note that this bug is specifically for Win 9x/ME. I believe there are nightly binaries with CTL enabled you can grab.
Summary: Rendering of Devanagiri (Hindi) charachters in Mozilla on Windows 98 Operating system on Mozilla 1.1b → Rendering of Devanagari (Hindi) charachters in Mozilla on Windows 98 Operating system on Mozilla 1.1b
Enabling CTL by default is dealt with in bug 201746
*** Bug 210575 has been marked as a duplicate of this bug. ***
Re: the comment by Jungshik Shin regarding Bug 210575 (duplicate) "Is this only on Windows 98 or do you have the same problem on Win 2k/XP as well? I don't think you do." I can confirm his statement is absolutely correct and that the problem with Complex Text Layout in Devanagari does NOT occur on Windows 2000 once the necessary fonts and .DLLS are loaded. The setup of Devanagari can be quickly achieved via the Regional Settings section of the Control Panel. I haven't checked XP yet. There is a lot of resistance to XP in the UK and many local computer dealers offer to remove XP from a new system and install an earlier Windows operating system. The change is licensing terms for XP has had a negative effect on the purchase of new systems here. I don't know if this is true of other countries. This means that it is likely there will still be a large established base of Windows 95/98/ME users for some time to come.
Re: the Complex Text Layout issue in general. I have some misgivings about whether Mozilla should be attempting to provide this feature in isolation. I have just been in touch with Adobe on this same issue because it appears that complete support for Complex Text Layout is missing from the various software tools which claim to support Unicode fonts and international character sets (e.g. the current SVG Viewer). Complex Text Layout seems to be a much wider (if not global) issue. I don't have the latest version of Adobe Acrobat so I don't know if this will be supported fully in the new PDF format either. Perhaps there could be an option to use an inbuilt Mozilla Complex Text Processing utility or to use the 3rd party standard CTL or even one supported by the particular platform? It will, of course, lead to some level of dependency of an external technology (depending on the platform) to make this workable. This is how it works currently with Windows 2000 and XP as I have been advised by Jungshik Shin. Many non-MS Windows applications (especially multimedia programs) already have a certain degree of dependency on MS libraries. One is often instructed to is download an update of a particular DLL or a new version of DirectX to ensure that an application works correctly when installing a new package. Would it be possible for a future version of Mozilla (for pre-Windows 2000 platforms) to make use of the UniScribe DLL in this same way? Jungshik Shin has already suggested this and I think that is a sensible option with a lot to recommend it. I don't know if there are patent or legal issues involved in doing this. It wouldn't surprise me if that was the case knowing MS. Sun, in contrast, have released their source code for their Complex Text Layout support in Solaris into the Open Source community which could be a useful resource, then again, there is the Pango project. If Mozilla does create its own Complex Text Layout support libraries then I hope that these will be platform- and browser-neutral and that use of these libraries will not be restricted by other applications. I realise that I might be dealing with issues that have already been discussed (and or decided) but I thought that some outside input might be of some use. J.M.N. London
Thanks, J.M.N. for your input. BTW, you may also give it a try to SILA (Graphite-enabled Mozilla) at http://sila.mozdev.org. I think it works on Win 9x/ME as well as on Win 2k/XP. You need Graphite fonts.
*** Bug 209243 has been marked as a duplicate of this bug. ***
*** Bug 179804 has been marked as a duplicate of this bug. ***
------- Additional Comment #19 From Jungshik Shin 2003-07-12 18:40 ------- Thanks, J.M.N. for your input. BTW, you may also give it a try to SILA (Graphite-enabled Mozilla) at http://sila.mozdev.org. I think it works on Win 9x/ME as well as on Win 2k/XP. You need Graphite fonts. I downloaded and installed the beta2 of SILA just now. It still has the same problem with devanagari on windows98. BTW, is anyone at Mozila actually working on this issue or are we just talking amongst users here??? Shree
> BTW, is anyone at Mozila actually working on this issue or are we just talking > amongst users here??? More users, more solutions. Nobody seems to have really tried a --enable-ctl build on Win98 yet. Does anybody have a) location of a --enable-ctl binary for Win* or b) test results on an --enable-ctl build for Win* ? Even ctl as is on linux has not really been used a lot. OpenType fonts are just *one way* of rendering devanagari. Let's explore the *available solutions* thoroughly before dismissing the work of the mozilla developers, who imho are doing a great job.
I have tried the linux build with --enable-ctl option. But it did not work and have the same problem as in Windows 9x.
> I have tried the linux build with --enable-ctl option. But it did not work and > have the same problem as in Windows 9x. Did you use the correct font? Here are some screenshots of the rendering by the ctl enabled build here: http://bugzilla.mozilla.org/attachment.cgi?id=127816&action=view Attachment as part of bug 212770 and bug 212583 There are some bugs, but a patch has already been submitted as part of bug 211921 . If we get more users we could get more bugs detected. The rendering is quite good and the font is also of excellent quality, much better than the available opentype fonts, surprisingly. For instructions regarding the fonts and the build, please see http://www.arbornet.org/~marcow/howto.html I'm just waiting to be able to try out the same thing on a win9x machine, perhaps I'll try doing a build myself. Will post more screenshots with the ctl enabled build - I'm sure looking at the shots, more people would be prompted to use it.
I would be happy to test a Win98 ctl-enabled build of Mozilla once it is ready. I am transcribing a number of Sanskrit texts into Unicode and so will have sufficient material to test the formation of complex conjunct consonants and matras. JMN London
> I would be happy to test a Win98 ctl-enabled build of Mozilla once it is ready. What is the preferred method of distributing a binary?
>>>>------- Additional Comment #25 From Alok Kumar 2003-07-20 02:23 ------- > I have tried the linux build with --enable-ctl option. But it did not work and > have the same problem as in Windows 9x. Did you use the correct font? ....... On windows98 I do have the unicode devanagari fonts which work with IE 6. Are you referring to some other fonts? The page you referred to has instructions for UNIX - would the same fonts work under win98? I already have raghu8, arial unicdoe, code2000, devanagari MT on the PC. >>>> From J.M.N. 2003-07-20 07:13 ------- I would be happy to test a Win98 ctl-enabled build of Mozilla once it is ready. I am transcribing a number of Sanskrit texts into Unicode and so will have sufficient material to test the formation of complex conjunct consonants and matras. -- A little bit offtopic perhaps, but a number of sanskrit texts are available at http://sanskrit.gde.to in ITRANS which can be very easily converted to unicode using ITRANS 5.3. A sampling of converted texts can be seen at http://satsang.tripod.com/ - all the HINDI versions of the files are in UTF-8 devanagari and TAMIL versions are in utf-8 tamil script. With IE6 on WIn98 I am able to see them fine. Not so with IE5 on Mac OS X though Icab browser renders them correctly. What technology is iCab using that is not available to Mozilla?
> 6. Are you referring to some other fonts? The page you referred to has The Saraswati font - by Sun, please see http://geocities.com/alkuma/seehindi.html (refresh). It's in pcf format so you have to convert it to ttf first. I haven't tried it myself yet since I don't have a --enable-ctl build on win* to try it on.
Sun's Saraswati font is also available in TTF (at least is supposed to be available in TTF) at http://developer.sun.com/techtopics/global/index.html (follow the link for 'Free Indian Font'). Shree, see http://www.mozilla.org/releases/mozilla1.4/known-issues-int.html about CTL.
At the moment, enabling CTL on Win9x/ME doesn't make Mozilla render Devanagari with Sun's fonts. After bug 203406 is resolved, Mozilla on Win9x/ME will be able to render Devanagari with Sun's Saraswati. The following line has to be added to fontEncoding.properties file : encoding.saraswati5.ttf = x-sun-unicode-india-0.wide
Depends on: 203406
Summary: Rendering of Devanagari (Hindi) charachters in Mozilla on Windows 98 Operating system on Mozilla 1.1b → Rendering of Devanagari (Hindi) charachters in Mozilla on Windows 98
I build Mozilla fresh from cvs on Redhat Linux 8.0. The "chotti E" matra and "halant" characters are not rendered properly. "Chotti E matra" should be before the consonant and "halant (half) consonants are also not aligned. The "oo matra" on some sharacters where it should after teh consonant e.g a "oo" matra after "r" is displayed on the foot of the character. Please refer the sample page in attachement.
> I build Mozilla fresh from cvs on Redhat Linux 8.0. The "chotti E" matra and o Was this a --enable-ctl build? (about:buildconfig will tell you) o Do you have the Sun Saraswati font installed? Rendering will not work without --enable-ctl. Also, even with --enable-ctl, rendering will not work unless you have a sun-unicode-india-0 encoded font like Saraswati. Try rebuilding with --enable-ctl if you hadn't done so earlier. Also install the sun font if you haven't done so. Also see <http://groups.google.com/groups?dq=&hl=hi&lr=&ie=UTF-8&oe=UTF-8&selm=bfe9kd%24e6b2e%241%40ID-196040.news.uni-berlin.de> > Sun's Saraswati font is also available in TTF (at least is supposed to be > available in TTF) at http://developer.sun.com/techtopics/global/index.html This download shows up only the pcf fonts. Are the ttf fonts available for public download?
As for Sun's download site, Prabhat wrote in May that he would ask the person in charge to put up two TTFs instead of BDFs. I haven't checked that since. Prabhat, could you ask her/him once more to take care of this problem?
> o Was this a --enable-ctl build? (about:buildconfig will tell you) > o Do you have the Sun Saraswati font installed? Yes, my build is --enable-ctl build. I have installed teh DVBOT.ttf. I also tried to install the Sarsvati font but could not succeed. The font on Sun download page are .pcf. Do you know how to convert them to .ttf? Or other install procedure.
Ok, so it looks like unicode indic support on win98 for mozilla is not ready yet. What about ISCII? I was told recently in another conversation: " Since JAVA has come up with ISCII support so there is no need to upgrade the ISCII plugin. JAVA support both Unicode and ISCII. I consider this to be the right approach. Simple conversion program are also available for converting between ISCII and Unicode. I think the Indian masses will go for Unicode/UTF8 solution because of ignorance! (No wonder, most people go for "preya" sacrificing the "shreya".) I oppose Unicode/UTF8 mix because i) it gives step motherly treatment to Indian scripts. An ASCII character is represented as a single byte whereas an Indian script character is represented in three bytes. ii) it hides the underlying unity among Indian scripts. iii) it is very easy to support ISCII as demonstrated by JAVA. All we need to do is to show Microsoft that most Indians want ISCII. Technologically the problem is trivial. iv) strictly speaking, we need Unicode only when we need to display non-Indian and non-Roman script together with Indian script in the same page. " Do any of you know about Java support for ISCII and Indian scripts? Is it possible to support devanagari through the ISCII approach? It would be great to have a uniform method to store and search the Indic information - Unicode has made a start and it is great for me to be able to search hindi content using google - except it is very limited. Most of the Indian content is in proprietary fonts? Would it be better for thr browsers to go the ISCII route than Unicode???? Shree
Actually, it's almost ready. The patch for bug 203406 is just waiting for r/sr. Once it's resolved, it's just as simple as enabling CTL (there are a couple of issues to resolve before enabling CTL by default, but anyone with her/his own build environment can enable it on her/his build) ISCII and Unicode are almost identical except for the codepoint assignment so that once Indic script in Unicode is supported, supporting ISCII is not very hard (if that's really necessary). I've filed a bug to write a converter for ISCII, but haven't begun working on it, yet. I do have ISCII in English (PDF). I'm afraid your opinion on Unicode and ISCII is based on some misconception about Unicode. In the ideal world, everybody has to use Unicode AND NOTHING ELSE. We don't live in the ideal world so that we have to live with legacy charsets. You and your fellow Indians are lucky in that you don't have as much baggage of data in __legacy__ charsets to carry on as others (who you regard as lucky).
> Would it be better for thr browsers to go the ISCII route than Unicode???? Absolutely NOT. ISCII is made up of 8 different character sets and to mix them in a single document, you have to rely on clumsy ISO-2022-based (ISO 2022 was a way to encode multilingual/multiscript document before ISO 10646/Unicode came up) escape sequences. Why would you want to go back to that dreadful days of ISO-2022 when Unicode is now widely supported.
I will double-check the font download. Apologies for the delay. And guys, please don't bring up the ISCII instead of unicode!. Don't see any problem with Unicode support (Except for some situations that are surmountable) for Indic scripts. In my opinion, users do not need to be concerned with encoding except for reasons of : Legacy, Coverage or (practical & not theoritical) performance. Using Unicode for Indian Scripts have none of the issues. Additionally having ISCII support is trivial as Jungshik pointed out.
What problem did you face while installing the pcf font? The instructions at http://www.arbornet.org/%7Emarcow/howto.html should be sufficient. Or try any of the links from google: <http://www.google.com/search?q=installing+pcf+fonts+on+linux&ie=UTF-8&oe=UTF-8&hl=hi&btnG=Google+%E0%A4%96%E0%A5%8B%E0%A4%9C> Or follow the instructions here: http://tldp.org/HOWTO/Indic-Fonts-HOWTO/iosetup.html#xwindows
> I oppose Unicode/UTF8 mix because > i) it gives step motherly treatment to Indian scripts. > An ASCII character is represented as a single byte > whereas an Indian script character is represented > in three bytes. The same "step motherly treatment" is given to other non latin scripts as well, the Indian scripts are not singled out. > ii) it hides the underlying unity among Indian scripts. There is still a one to one mapping for the sake of transliteration among the Indian scripts. If you look up the code charts, the locations for characters that are present in, say, Malayalam but absent in Devanagari are left blank. You can still easily convert a document from the Malayalam script to Devanagari using simple arithmetic. The Unicode encoding is based on the ISCII encoding. > iii) it is very easy to support ISCII as demonstrated by > JAVA. All we need to do is to show Microsoft that > most Indians want ISCII. Technologically the problem > is trivial. Nobody disputes that it's easy to support iscii, and this at the same time doesn't mean that supporting utf-8 is tougher. Taken together, we have so many more people working on utf-8. "most Indians want ISCII" is a statement that I wouldn't bet on. Most Indian's don't care, as long as they are able to view, store, copy, search, sort and print documents in their language. > iv) strictly speaking, we need Unicode only when we need > to display non-Indian and non-Roman script together > with Indian script in the same page. Right. And we need iscii only when we need to display roman and indic scripts on the same page. Why not go for 7 bit encoding that breaks ascii as well? " > Do any of you know about Java support for ISCII and Indian scripts? Is it > possible to support devanagari through the ISCII approach? Yes it is possible and has been done in yudit, IE and MS Office. It shouldn't be difficult to do in Mozilla either, but IMO it's not something that should be encouraged unless iscii gets an iso-* certification. > It would be great to have a uniform method to store and search the Indic > information - Unicode has made a start and it is great for me to be able to > search hindi content using google - except it is very limited. Most of the > Indian content is in proprietary fonts? Would it be better for thr browsers to > go the ISCII route than Unicode???? This goes to prove that having one standard is better than having none. Technologists can provide the tools, but they have to be used by creative people to *produce the content*. ISCII documents are not searchable either. ISCII doesn't have an ISO standard, the only known standard for Indian scripts is UTF8, which is a shame, we have nothing to bargain with in the Unicode Consortium, but that's the way it is. IMO, here *are* some valid reasons to change the existing Unicode encoding: 1. Multiple sets of character combinations mapping to the same glyph/cluster 2. No way to indicate which shape I want for a *given* cluster. 3. Sorting is a nightmare 4. So is searching, considering 1,2 and 3. These problems are going to become more aggravated as more and more users start using unicoded documents, but as the number of users increases, the number of developers and therefore the number of solutions will also increase. Till then, better have one bad standard than none at all. The bottom line, IMO: We should try to get *more USERS* for indic scripts on computers, help them create electronic content, help them solve the problems they face. Users that are already are able to use computers, because they know both a latin based language and an Indic language have to take the lead here. The gaps that we need to fill for a person who doesn't know a latin based language to straightaway start using an electronic device is just too huge, so the responsibility is on our shoulders, people who know both types of scripts. Technical problems will solve themselves, each one of us has to give their own little nudge in getting up more electronic content in Indic scripts and in getting more users of Indic scripts.
> What problem did you face while installing the pcf font? > The instructions at http://www.arbornet.org/%7Emarcow/howto.html should be > sufficient. Thanks a lot Alok Kumar. I could intall the font and I can see proper rendering. The my problem was that I did not run the mkfontdir command as it was not mentioned in the above refered document. I have a question about why other true type fonts e.g Surekh do not render properly? Do they not follow the unicode standard? Any idea?
Great to hear that you got the rendering right! > I have a question about why other true type fonts e.g Surekh do not render > properly? Do they not follow the unicode standard? There are two kinds of encodings: a. The character encoding, called charset, and b. The font encoding. A charset defines what "number", or code, a particular character should map to. A font encoding defines what *glyphs* are located where. character to glyph is a many to many mapping, ie 1 character may map to one glyph (as is the case with latin, for the sake of simplicity) One character may also map to multiple glyphs (eg a "pa" in devanagari may be written as a "half pa glyph" + the "a matra glyph") Multiple characters may map to a single glyph (eg The latin "fi" may map to a single glyph in some fonts, where the f and the i actually are joined together) Multiple characters may map to multiple glyphs, which is quite obvious, no examples needed. Multiple characters may also map to different sets of glyphs depending on where they are placed in a word. Unicode defines the codes for the locations of the basic characters for devanagari. Those code points are the same in case of both the Surekh font and the Saraswati font. The difference is in the font encoding. There is *no standard font encoding for devanagari*, which is a pity. Therefore the sun guys came up with a sun-unicode-india encoding, which Saraswati uses. Surekh most probably uses the ISFOC encoding, but you can't bet on it. Saraswati is a true type font, and since the encoding is known, it is possible to map the glyps to the character sequences, which is what ctl has done. For Surekh, the ISFOC coding is not really *documented*, meaning it is subject to change without notice. But, it is an open type font. In an open type font, there are tables within the font that define which (unicode)character sequences should map to which glyph sequences, also known as glyph substitution. That way, you are actually obviating the need for a font encoding, as long as the rendering program knows how to read those tables. What happens in win2k, winxp, pango and yudit is that the open type tables are read, and the corresponding glyphs are read. Mozilla doesn't read the open type tables, (and it shouldn't?) That is the job of lower level layers, which would do this job for all applications. To sum up: Mozilla knows the font encoding for saraswati, but it doesn't know the font encoding/tables of Surekh. Both have unicode character encoding, so both are able to display the correct characters based directly on code points. As a corollary, it is possible to display a utf-8 encoded page even by using a non-unicode encoded font - as long as you know the "inside" of the font, ie the font encoding. That is what's being done/proposed to be done for Tamil, using TSCII fonts. Similarly, a utf-8 encoded page could even be displayed using the Susha font - but you would need to write the converter. It would also be a good idea to do the same thing for devanagari for ISFOC encoding, they are very good quality. An even better idea would be to have *at least one standard font encoding* so we don't have to rely on open type alone, but that's something beyond the control of most people reading this. Hope this cleared up a few things, rather than the other way around! Jungshik and Prabhat, please correct me if you find any factual inaccuracies.
>>> Actually, it's almost ready. The patch for bug 203406 is just waiting for r/sr. Once it's resolved, it's just as simple as enabling CTL (there are a couple of issues to resolve before enabling CTL by default, but anyone with her/his own build environment can enable it on her/his build) OK, I'll wait for it and hopefully someone here will make a win98 binary with CTL enabled for me to try :-) >>> We should try to get *more USERS* for indic scripts on computers, help them create electronic content, help them solve the problems they face. Contentwise I am plaaning to convert the documents on sanskrit.gde.to to unicode (utf-8) - just waiting to get a little more support on the browser side. Another area where any user could add unicode hindi content at the hindi wikipedia site. see http://hi.wikipedia.org/wiki/Pratigya and add to it, I would appreciate if one of the unix gurus here would take a look at http://hi.wikipedia.org/wiki/sahaayataa and correct the information for both viewing devanagari unicode on *nix as well as any input method information for the same. Please feel free to correct info regarding windows also, I am only using in on win98. thanks, Shree
>>> ISCII and Unicode are almost identical except for the codepoint assignment so that once Indic script in Unicode is supported, supporting ISCII is not very hard (if that's really necessary). I've filed a bug to write a converter for ISCII, but haven't begun working on it, yet. I do have ISCII in English (PDF). You may want to look at the programs and tables on http://www.sharma- home.net/~adsharma/languages/scripts/. Please see a sample below: ================== devanagari ================== र [ U+930 ] 2786 क [ U+915 ] 2048 ना [ U+928 U+93e ] 1893 त [ U+924 ] 1204 प [ U+92a ] 1058 न [ U+928 ] 1047 ल [ U+932 ] 1019 So the converter you need may already be available. Shree
>>>> In my opinion, users do not need to be concerned with encoding except for reasons of : Legacy, Coverage or (practical & not theoritical) performance. Using Unicode for Indian Scripts have none of the issues. OK, here is a practical issue using Unicode for Hindi. See, http://hi.wikipedia.org/wiki/Eng-hin-a or http://hi.wikipedia.org/wiki/Eng-hin-b The file sizes double when using utf-8 and so it is very slow to add, edit or view. So even though storage may not be an issue, speed problem is definitely there. Any suggestions ...
I notice that although Mozilla will attempt to reproduce the Devanagari text when saving a bookmark, it does appear to display UTF-8 text in the title at the top of the browser window. Is this on the to-do list ? J.M. Nicholls London
re comment #49: there's no way to support that in Win 9x/ME because Win 9x/ME is not a Unicode-based OS and rendering in the title bar of a window belongs to the realm of the OS (not application programs like Mozilla and MS IE). See bug 9449 and int'l release notes. Of course, it'd be possible if MS released a Hindi (Tamil, Bangla, etc) version of Win9x/ME, but MS doesn't plan to. Even then, you couldn't see Tamil title under Hindi Win9x/ME. (the same is true of any language X under Y language version of Win9x/ME where X and Y use different scripts/writing systems) re comment #48: Three-fold size(it's not 2 but 3 if you compare ISCII and UTF-8) increase falls pale when compared to numerous advantages of Unicode. Moreover, 'compression encodings' like BOCU and SCSU make it possible to store Devangari and Indian text in other Indic scripts in Unicode efficiently as if they're in 8bit ISCII. Again, why would you want to go back to the 'dark age' from which the informed (for whom legacy encoding has been the norm) are happy to escape? Well, some people wanted to go back as Erich Fromm observed ;-)
>>>> Again, why would you want to go back to the 'dark age' from which the informed (for whom legacy encoding has been the norm) are happy to escape? Well, some people wanted to go back as Erich Fromm observed ;-) I am not wanting to go back to dark age. I am very happy with what unicode is offering for users. But as I am trying to add content in utf-8 I am finding it slow and cumbersome at times - I just wanted to know if there are ways and means of getting around it or improving it.
I got an email saying : Bug 203406 Summary: Performace enhancement in CTL code http://bugzilla.mozilla.org/show_bug.cgi?id=203406 Does that mean the bug relating to devanagari has also been fixed or is this on a back burner somewhere??? Shree
This fix will not allow Hindi support in Win98.
Actually, the fix for bug 203406 can fix the problem on Win9x/ME if Mozilla is built with CTL enabled for Win 9x/ME and truetype version of Sun's Indic fonts are installed. That's why this bug was made dependent on bug 203406. Tamil and Korean are supported that way on Win 9x/ME without CTL enabled because Tamil and Korean custom font converters are not in intl/ctl but in intl/uconv. (see bug 177877 for the 'infrastructure' for that.) Needless to say, a better solution would be to make use of Uniscribe (assuming that everybody has MS IE 5.5 or later installed so that Uniscribe is available). However, that's a huge endeavor.... Besides, I still don't know how to support MathML in that case (and Mozilla does a better job with Korean than Uniscribe) @netscape.com address doesn't work any more so that I'm assigning to myself. It's bug 218887.
Assignee: yokoyama → jshin
The problem with the ChoTI 'e' mAtrA (Unicode 0x093f) still persists for me, on Mozilla 1.7 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 MultiZilla/126.96.36.199b), standard installer build on WinXP Home SP1. Is devanAgarI rendering supposed to work on the above build? IE 6 on the same machine, which uses the same font (Mangal) as Mozilla for devanAgarI, renders correctly. Mozilla also uses the halant for rendering compound letters (joDAkShara instead of using half-letters where appropriate.
just a clarification, the vowel I referred to is transcribed as 'i' in ITRANS, and is the vowel in the word "hit".('e' is a different vowel - the reason I used 'e' is that bug 174424 uses the letter E to refer to the vowel.) sorry for the confusion.
(In reply to comment #55) > Mozilla 1.7 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) > Gecko/20040616 MultiZilla/188.8.131.52b), standard installer build on WinXP Home SP1. Did you turn on complex script support on Windows XP? You probably did because you already have Mangal. Anyway, please make sure you do. Besides, is there any other problem or is that the only problem? BTW, what's the url of the page you have a trouble with? If text is 'justified', Mozilla doesn't work.
thanks for your help, Jungshik. no, I didn't have complex script support enabled - did that and lo and behold, everything's fine now. Well, almost - over at the Hindi version of Wikipedia (http://hi.wikipedia.org), the pages are all displayed correctly, but the cursor in the editing box behaves a bit funny. However, that may not have anything to do with Mozilla. thanks again for the help.
(In reply to comment #58) > but the cursor > in the editing box behaves a bit funny. However, that may not have That's another bug to fix. See bug 229896
*** This bug has been marked as a duplicate of 343405 ***
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → DUPLICATE
Component: Layout: CTL → Layout: Text
QA Contact: amyy → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.