Closed Bug 166520 Opened 22 years ago Closed 18 years ago

Rendering of Devanagari (Hindi) charachters in Mozilla on Windows 98

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Windows 98
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 343405

People

(Reporter: raj.saini, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(2 files)

I am using the Moziila 1.1b and localising it for the Hindi Indian language. As
per the the unicode spcifications the Devanagiri characters are constructed with
different character combinations. There are matras (it is a Hindi word) for the
vowels in Hindi. These matras are used to construct the correct Devenagiri
letters. But it does not work as specified in unicode. 

I have tested this with a web page from www.bbc.co.uk/hindi/. These is also a
Devenagiri font on the site. The characters are dislayed correctly when the page
is seen in IE. But when the same page is viewd in the Mozilla the characters do
not render properly.

I have also experienced during th localisation of Mozilla. I use the unicode
editor SC Unipad for editing the Hindi characters. I see the correct combination
of chracters. But when I see the characters in mozilla the are not rendered
correctly.
Keywords: intl
QA Contact: ruixu → ylong
Status: UNCONFIRMED → NEW
Ever confirmed: true
Severity: critical → normal
Summary: Rendering of Devenagiri (Hindi) charachters in Mozilla on Windows 98 Operating system on Mozilla 1.1b → Rendering of Devanagiri (Hindi) charachters in Mozilla on Windows 98 Operating system on Mozilla 1.1b
prabhat: would your checkin for 163962 (8-30-2002) fix this problem? Windows 98.
Raj: if you can use Win2K/WinXP, you will be able to enter Hindi char in mozilla
     now. (trunk build 09-24 or later)  :)
Yes, my code itself is Cross-Platform, but i never tried it on Windows. Let me
know if you found any problem.
I was tested the Mozilla on Linux platform now. I build the Mozilla from CVS 
with --enable-ctl option. I see the same problem as in Win9x. The redenring is 
not proper even on Liniux.

Was this tested on Linuix?
This bug report doesn't really contain enough information to be useful. Please
read the guidelines at
http://www.mozilla.org/quality/bug-writing-guidelines.html to understand the
kind of information that a bug report should include.

In any case, Devanagari support is currently work in progress. I am marking this
as a duplicate of bug 174424. If you still see problems after that bug is marked
as FIXED, please feel free to file new bugs.
What is the extra information needed? I build the Mozilla from cvs and tested it
on  Windows 98 and Redhat Linux 7.2 and on both platforms it does not render the
Devanagiri Fonts properly. 

I have followed the guidelines at
http://www.mozilla.org/quality/bug-writing-guidelines.html before submitted the
bugs. 

I may not understand the technical terminology, my bug report is from the point
of view of a user. However, I am ready to provide any other information needed.

A look at the following page in IE and Mozilla makes it clear what is the wrong
with the Mozilla rendering.

http://www.unicode.org/unicode/standard/translations/hindi.html
I tried it on SuSe before i first integrated into trunk and it worked fine. Let
me do so again sometime in next week (with the new RedHat8.0) and let you know
the results.
Component: Internationalization → Complex Text Layout
I am attaching a simple test case that demonstrates this problem. I have
verified that the problems exists in Mozilla 1.3a both in Linux (Red Hat 8) and
Windows XP, but it works properly in IE6/WinXP. If you compare the display of
the attachment in IE6/WinXP vs. Mozilla, you'll see the difference in the second
letter, which should be a half-"Na".
Attached file Test Case
> verified that the problems exists in Mozilla 1.3a both in Linux 
> (Red Hat 8) Windows XP, but it works properly in IE6/WinXP.

  On Redhat 8, what you have is Xft-build of Mozilla that doesn't
make use of hindi shaper even when CTL is enabled. Currently,
hindi shaper is only used for Mozilla-X11core (plain gtk without
Xft or Xlib). When you run Mozilla-Xft with the environment variable
GDK_USE_XFT set to 0, you'll see that Devanagari gets rendered
properly assuming that you have X11 core fonts with SunIndic encoding.

As for Mozilla under Windows XP, are you sure it doesn't work?
It should work (I've just tried your test case under Win2k
and it got rendered exactly the same way by both Mozilla-Win
and MS IE 6. The same is true of Thai and Tamil I tested
the other day.) if you activated the complex script support
of the OS by installing any one of language support options
that require complex script rendering. That is, if you
install support for Hindi, Tamil, Thai, Mozilla-Win should
work almost as well as MS IE. The cursor movement andselection
don't work yet because nsTextFrame doesn't use Uniscribe,
but the rendering should work. 

Mozilla under Win9x/ME is a different issue. Mozilla is
currently using the standard Win32 text APIs for rendering
text (complex or not) while MS IE6 apparently uses Uniscribe
APIs. There's little difference in *rendering* on platforms where complex
scripts are natively supported (WinXP/2k) between
two approaches. However, on platforms without native complex
script support (i.e. on Win9x/ME), standard Win32 text APIs
don't work while Uniscribe APIs still work.

Therefore, to make Mozilla render complex script across Win32 platforms
(Win9x/ME or Win2k/XP), nsFontMetricsWin has to be recast in terms
of Uniscribe APIs (, which will simplify nsFontMetricsWin significantly
as would recasting nsFontMetricsXft with Pango APIs).

 However, there's a dillemma here. On Win9x/ME, usp10.dll(Uniscribe DLL) is
available only when either MS IE 5.5/6 or MS Office 2000/XP is installed (or
other commerical programs that licensed Uniscribe).    
script.  We can't assume that usp10.dll is available on all 
Win9x/ME so that we have to check its availability at run-time....
 
  Alternative to using Uniscribe APIs is what was discussed
here(using CTL module). A similar approach is being tried for Tamil 
and Thai rendering for Mozilla-Xft (and is enabled for Korean)
(see bug 203052, bug 176315, bug 204039). The upside of
that is that Mozilla is self-sustaining (not relying on Uniscribe).
On the other hand, it's like reinventing a wheel and it increase
the code size to implement what's offered "by the OS" (well only when
Mozilla's competitor is installed on Win9x/ME....)
Jungshik was right about Windows. It seems IE in WinXP works fine in normal
version, but user first needs to enable Indic script for Mozilla to display
Devanagiri properly.

I have looked around for a guide on how to get Devanagiri enabled correctly in
Red Hat 8, but can't find any. Is there a document that explains it? I have
multiple Unicode TTX fonts that support Devanagiri installed in the system, but
it still looks weird. Does the normal nightly/milestone build work, or do I have
to make my own build with CTL enabled?
For Linux-Xft, see bug 204286 as well as bug 176290. 
With my patch for bug 176290 landed, it's very easy
to support Devanagari shaping in Mozilla-Xft because
the converter is already there thanks to Prabhat.
>  On Redhat 8, what you have is Xft-build of Mozilla that doesn't
> make use of hindi shaper even when CTL is enabled. Currently,
> hindi shaper is only used for Mozilla-X11core (plain gtk without
> Xft or Xlib). When you run Mozilla-Xft with the environment variable
> GDK_USE_XFT set to 0, you'll see that Devanagari gets rendered
> properly assuming that you have X11 core fonts with SunIndic encoding.

Ok, I'm using Mozilla 1.4RC1 on Redhat 8 via gnome.
I installed the SunIndic encoded fonts (saraswati) in
/usr/local/share/fonts/bitmap ,

[alok@localhost bitmap]$ more fonts.dir
2
SaraswatiBold24.pcf.gz
-cdac-saraswati-bold-r-normal-devanagari-25-240-75-75-p-240-sun.unicode.india-0
SaraswatiNormal24.pcf.gz
-cdac-saraswati-medium-r-normal-devanagari-25-240-75-75-p-240-sun.unicode.india-0

[alok@localhost bitmap]$ xset q
[snip]
Font Path:
 
/home/alok/.gnome2/share/cursor-fonts,unix/:7100,/usr/local/share/fonts/bitmap,/usr/X11R6/lib/X11/fonts/misc,/usr/X11R6/lib/X11/fonts/100dpi,/usr/X11R6/lib/X11/fonts/75dpi,/usr/share/fonts/indic/OpenType,/home/alok/.gnome2/share/fonts
Bug Mode: compatibility mode is disabled
[snip]


So this directory /usr/local/share/fonts/bitmap is in my fontpath now.

Now using a terminal session i do 

$ export GDK_USE_XFT=0
$ /usr/local/mozilla/mozilla #Mozilla 1.4 RC 1

However, even now the devanagari pages are rendered incorrectly.
Firstly, in 
Edit->Preferences->Fonts For->Devanagari, the options Serif thru
Monospace
don't have the Saraswati font listed. Nor in proportional.

Fonts for Unicode doesn't have them either.
And the rendering is done using the listed fonts, ie if I change the
font from
here, the font gets changed in the rendered page as well, and I know
that it is
not saraswati font.

So what' wrong? Doesn't the env variable setting work? If not, how can
I test
this feature? If yes, is there some tweaking required with respect to
the
FontPath?
CTL is NOT enabled in the default build. That's why it doesn't work. Enabling
CTL by default is another bug.  The bug number is escaping me at the moment. I
bookmarked it, but I'm far away from my computer at the moment. Anyway, you can
easily find it out and add your comment there if necesary. Note that this bug is
specifically for Win 9x/ME. 

I believe there are nightly binaries with CTL enabled you can grab.   
Summary: Rendering of Devanagiri (Hindi) charachters in Mozilla on Windows 98 Operating system on Mozilla 1.1b → Rendering of Devanagari (Hindi) charachters in Mozilla on Windows 98 Operating system on Mozilla 1.1b
Enabling CTL by default is dealt with in bug 201746
*** Bug 210575 has been marked as a duplicate of this bug. ***
Re: the comment by Jungshik Shin regarding Bug 210575 (duplicate) 

"Is this only on Windows 98 or do you have the same problem on Win 2k/XP as
well? I don't think you do." 

I can confirm his statement is absolutely correct and that the problem with
Complex Text Layout in Devanagari does NOT occur on Windows 2000 once the
necessary fonts and .DLLS are loaded.  The setup of Devanagari can be quickly
achieved via the Regional Settings section of the Control Panel. 

I haven't checked XP yet.  There is a lot of resistance to XP in the UK and many
local computer dealers offer to remove XP from a new system and install an
earlier Windows operating system.  

The change is licensing terms for XP has had a negative effect on the purchase
of new systems here.  I don't know if this is true of other countries. This
means that it is likely there will still be a large established base of Windows
95/98/ME users for some time to come.  
Re: the Complex Text Layout issue in general.  I have some misgivings about
whether Mozilla should be attempting to provide this feature in isolation.  

I have just been in touch with Adobe on this same issue because it appears that
complete support for Complex Text Layout is missing from the various software
tools which claim to support Unicode fonts and international character sets
(e.g. the current SVG Viewer).  

Complex Text Layout seems to be a much wider (if not global) issue.  I don't
have the latest version of Adobe Acrobat so I don't know if this will be
supported fully in the new PDF format either.

Perhaps there could be an option to use an inbuilt Mozilla Complex Text
Processing utility or to use the 3rd party standard CTL or even one supported by
the particular platform?  

It will, of course, lead to some level of dependency of an external technology
(depending on the platform) to make this workable.  This is how it works
currently with Windows 2000 and XP as I have been advised by Jungshik Shin.  

Many non-MS Windows applications (especially multimedia programs) already have a
certain degree of dependency on MS libraries. One is often instructed to is
download an update of a particular DLL or a new version of DirectX to ensure
that an application works correctly when installing a new package.  

Would it be possible for a future version of Mozilla (for pre-Windows 2000
platforms) to make use of the UniScribe DLL in this same way?  Jungshik Shin has
already suggested this and I think that is a sensible option with a lot to
recommend it.  

I don't know if there are patent or legal issues involved in doing this.  It
wouldn't surprise me if that was the case knowing MS.  Sun, in contrast, have
released their source code for their Complex Text Layout support in Solaris into
the Open Source community which could be a useful resource, then again, there is
the Pango project.

If Mozilla does create its own Complex Text Layout support libraries then I hope
that these will be platform- and browser-neutral and that use of these libraries
will not be restricted by other applications.

I realise that I might be dealing with issues that have already been discussed
(and or decided) but I thought that some outside input might be of some use.

J.M.N.
London


 
Thanks, J.M.N. for your input. BTW, you may also give it a try to SILA
(Graphite-enabled Mozilla) at http://sila.mozdev.org. I think it works on Win
9x/ME as well as on Win 2k/XP. You need Graphite fonts.
Blocks: 209243
No longer blocks: 209243
*** Bug 209243 has been marked as a duplicate of this bug. ***
*** Bug 179804 has been marked as a duplicate of this bug. ***
------- Additional Comment #19 From Jungshik Shin 2003-07-12 18:40 ------- 
Thanks, J.M.N. for your input. BTW, you may also give it a try to SILA
(Graphite-enabled Mozilla) at http://sila.mozdev.org. I think it works on Win
9x/ME as well as on Win 2k/XP. You need Graphite fonts.


I downloaded and installed the beta2 of SILA just now. It still has the same 
problem with devanagari on windows98.

BTW, is anyone at Mozila actually working on this issue or are we just talking 
amongst users here???

Shree
> BTW, is anyone at Mozila actually working on this issue or are we just talking 
> amongst users here???

More users, more solutions.

Nobody seems to have really tried a --enable-ctl build on Win98 yet. Does
anybody have
a) location of a --enable-ctl binary for Win* 
or
b) test results on an --enable-ctl build for Win* 

?
Even ctl as is on linux has not really been used a lot. 
OpenType fonts are just *one way* of rendering devanagari. Let's explore the
*available solutions* thoroughly before dismissing the work of the mozilla
developers, who imho are doing a great job.
I have tried the linux build with --enable-ctl option. But it did not work and
have the same problem as in Windows 9x. 
> I have tried the linux build with --enable-ctl option. But it did not work and
> have the same problem as in Windows 9x. 

Did you use the correct font? 
Here are some screenshots of the rendering by the ctl enabled build here:
http://bugzilla.mozilla.org/attachment.cgi?id=127816&action=view
Attachment as part of bug 212770 and bug 212583
There are some bugs, but a patch has already been submitted as part of bug
211921 . If we get more users we could get more bugs detected.
The rendering is quite good and the font is also of excellent quality, much
better than the available opentype fonts, surprisingly.

For instructions regarding the fonts and the build, please see
http://www.arbornet.org/~marcow/howto.html

I'm just waiting to be able to try out the same thing on a win9x machine,
perhaps I'll try doing a build myself.

Will post more screenshots with the ctl enabled build - I'm sure looking at the
shots, more people would be prompted to use it.
I would be happy to test a Win98 ctl-enabled build of Mozilla once it is ready.
 I am transcribing a number of Sanskrit texts into Unicode and so will have
sufficient material to test the formation of complex conjunct consonants and matras.

JMN
London
> I would be happy to test a Win98 ctl-enabled build of Mozilla once it is ready.

What is the preferred method of distributing a binary?
 
>>>>------- Additional Comment #25 From Alok Kumar 2003-07-20 02:23 ------- 
> I have tried the linux build with --enable-ctl option. But it did not work and
> have the same problem as in Windows 9x. 

Did you use the correct font? 

....... On windows98 I do have the unicode devanagari fonts which work with IE 
6. Are you referring to some other fonts? The page you referred to has 
instructions for UNIX - would the same fonts work under win98? I already have 
raghu8, arial unicdoe, code2000, devanagari MT on the PC.


>>>> From J.M.N. 2003-07-20 07:13 ------- 
I would be happy to test a Win98 ctl-enabled build of Mozilla once it is ready.
 I am transcribing a number of Sanskrit texts into Unicode and so will have
sufficient material to test the formation of complex conjunct consonants and 
matras.


-- A little bit offtopic perhaps, but a number of sanskrit texts are available 
at http://sanskrit.gde.to in ITRANS which can be very easily converted to 
unicode using ITRANS 5.3. 

A sampling of converted texts can be seen at http://satsang.tripod.com/ - all 
the HINDI versions of the files are in UTF-8 devanagari and TAMIL versions are 
in utf-8 tamil script. 

With IE6 on WIn98 I am able to see them fine. Not so with IE5 on Mac OS X 
though Icab browser renders them correctly.

What technology is iCab using that is not available to Mozilla?



> 6. Are you referring to some other fonts? The page you referred to has 

The Saraswati font - by Sun, please see
http://geocities.com/alkuma/seehindi.html (refresh). It's in pcf format so you
have to convert it to ttf first. I haven't tried it myself yet since I don't
have a --enable-ctl build on win* to try it on.
Sun's Saraswati font is also available in TTF (at least is supposed to be
available in TTF) at http://developer.sun.com/techtopics/global/index.html
(follow the link for 'Free Indian Font'). 

Shree,
see http://www.mozilla.org/releases/mozilla1.4/known-issues-int.html
about CTL.
At the moment, enabling CTL on Win9x/ME doesn't make Mozilla render Devanagari
with Sun's fonts. After bug 203406 is resolved, Mozilla on Win9x/ME will be able
to render Devanagari with Sun's Saraswati.

The following line has to be added to fontEncoding.properties file :

encoding.saraswati5.ttf = x-sun-unicode-india-0.wide

 
Depends on: 203406
Summary: Rendering of Devanagari (Hindi) charachters in Mozilla on Windows 98 Operating system on Mozilla 1.1b → Rendering of Devanagari (Hindi) charachters in Mozilla on Windows 98
I build Mozilla fresh from cvs on Redhat Linux 8.0. The "chotti E" matra and
"halant" characters are not rendered properly. "Chotti E matra" should be before
the consonant and "halant (half) consonants are also not aligned. The "oo matra"
on some sharacters where it should after teh consonant e.g a "oo" matra after
"r" is displayed on the foot of the character.

Please refer the sample page in attachement.

> I build Mozilla fresh from cvs on Redhat Linux 8.0. The "chotti E" matra and

o Was this a --enable-ctl build? (about:buildconfig will tell you)
o Do you have the Sun Saraswati font installed?

Rendering will not work without --enable-ctl.
Also, even with --enable-ctl, rendering will not work unless you have a
sun-unicode-india-0 encoded font like Saraswati.

Try rebuilding with --enable-ctl if you hadn't done so earlier.
Also install the sun font if you haven't done so.

Also see
<http://groups.google.com/groups?dq=&hl=hi&lr=&ie=UTF-8&oe=UTF-8&selm=bfe9kd%24e6b2e%241%40ID-196040.news.uni-berlin.de>



> Sun's Saraswati font is also available in TTF (at least is supposed to be
> available in TTF) at http://developer.sun.com/techtopics/global/index.html

This download shows up only the pcf fonts. Are the ttf fonts available for
public download?

As for Sun's download site, Prabhat wrote in May that he would ask the person in
charge to put up two TTFs instead of BDFs. I haven't checked that since.
Prabhat, could you ask her/him once more to take care of this problem? 
> o Was this a --enable-ctl build? (about:buildconfig will tell you)
> o Do you have the Sun Saraswati font installed?

Yes, my build is --enable-ctl build. 

I have installed teh DVBOT.ttf. I also tried to install the Sarsvati font but
could not succeed. The font on Sun download page are .pcf. Do you know how to
convert them to .ttf? Or other install procedure.
Ok, so it looks like unicode indic support on win98 for mozilla is not ready 
yet. What about ISCII? I was told recently in another conversation:

" Since JAVA has come up with ISCII support so there is no need to upgrade
the ISCII plugin. JAVA support both Unicode and ISCII. I consider this 
to be the right approach. Simple conversion program are also available
for converting between ISCII and Unicode.

I think the Indian masses will go for Unicode/UTF8 solution
because of ignorance! (No wonder, most people go for "preya" sacrificing
the "shreya".)

I oppose Unicode/UTF8 mix because
i) it gives step motherly treatment to Indian scripts.
An ASCII character is represented as a single byte
whereas an Indian script character is represented
in three bytes.
ii) it hides the underlying unity among Indian scripts.
iii) it is very easy to support ISCII as demonstrated by
     JAVA. All we need to do is to show Microsoft that
     most Indians want ISCII. Technologically the problem
     is trivial.
iv) strictly speaking, we need Unicode only when we need
    to display non-Indian and non-Roman script together
    with Indian script in the same page.

"

Do any of you know about Java support for ISCII and Indian scripts? Is it 
possible to support devanagari through the ISCII approach? 

It would be great to have a uniform method to store and search the Indic 
information - Unicode has made a start and it is great for me to be able to 
search hindi content using google - except it is very limited. Most of the 
Indian content is in proprietary fonts? Would it be better for thr browsers to 
go the ISCII route than Unicode????

Shree

Actually, it's almost ready. The patch for bug 203406 is just waiting for r/sr.
Once it's resolved, it's just as simple as enabling CTL (there are a couple of
issues to resolve before enabling CTL by default, but anyone with her/his own
build environment can enable it on her/his build)

ISCII and Unicode are almost identical except for the codepoint assignment so
that once Indic script in Unicode is supported, supporting ISCII is not very
hard (if that's really necessary). I've filed a bug to write a converter for
ISCII, but haven't begun working on it, yet. I do have ISCII in English (PDF). 

I'm afraid your opinion on Unicode and ISCII is based on some misconception
about Unicode. In the ideal world, everybody has to use Unicode AND NOTHING
ELSE. We don't live in the ideal world so that we have to live with legacy
charsets. You and your fellow Indians are lucky in that you don't have as much
baggage of data in __legacy__ charsets to carry on as others (who you regard as
lucky).  
> Would it be better for thr browsers to go the ISCII route than Unicode????

  Absolutely NOT. ISCII is made up of 8 different character sets and to mix them
in a single document, you have to rely on clumsy ISO-2022-based (ISO 2022 was a
way to encode multilingual/multiscript document before ISO 10646/Unicode came
up) escape sequences. Why would you want to go back to that dreadful days of
ISO-2022 when Unicode is now widely supported.
I will double-check the font download. Apologies for the delay. And guys, please
don't bring up the ISCII instead of unicode!. Don't see any problem with Unicode
support (Except for some situations that are surmountable) for Indic scripts. In
my opinion, users do not need to be concerned with encoding except for reasons of :
Legacy, Coverage or (practical & not theoritical) performance. Using Unicode for
Indian Scripts have none of the issues. Additionally having ISCII support is
trivial as Jungshik pointed out.
What problem did you face while installing the pcf font? 

The instructions at http://www.arbornet.org/%7Emarcow/howto.html should be
sufficient. 
Or try any of the links from google:
<http://www.google.com/search?q=installing+pcf+fonts+on+linux&ie=UTF-8&oe=UTF-8&hl=hi&btnG=Google+%E0%A4%96%E0%A5%8B%E0%A4%9C>
Or follow the instructions here:
http://tldp.org/HOWTO/Indic-Fonts-HOWTO/iosetup.html#xwindows

> I oppose Unicode/UTF8 mix because
> i) it gives step motherly treatment to Indian scripts.
> An ASCII character is represented as a single byte
> whereas an Indian script character is represented
> in three bytes.

The same "step motherly treatment" is given to other non latin scripts as well,
the Indian scripts are not singled out.

> ii) it hides the underlying unity among Indian scripts.

There is still a one to one mapping for the sake of transliteration among the
Indian scripts. If you look up the code charts, the locations for characters
that are present in, say, Malayalam but absent in Devanagari are left blank. You
can still easily convert a document from the Malayalam script to Devanagari
using simple arithmetic. The Unicode encoding is based on the ISCII encoding.


> iii) it is very easy to support ISCII as demonstrated by
>      JAVA. All we need to do is to show Microsoft that
>      most Indians want ISCII. Technologically the problem
>      is trivial.

Nobody disputes that it's easy to support iscii, and this at the same time
doesn't mean that supporting utf-8 is tougher. Taken together, we have so many
more people working on utf-8. "most Indians want ISCII" is a statement that I
wouldn't bet on. Most Indian's don't care, as long as they are able to view,
store, copy, search, sort and print documents in their language.

> iv) strictly speaking, we need Unicode only when we need
>     to display non-Indian and non-Roman script together
>     with Indian script in the same page.

Right. And we need iscii only when we need to display roman and indic scripts on
the same page. Why not go for 7 bit encoding that breaks ascii as well? 

"

> Do any of you know about Java support for ISCII and Indian scripts? Is it 
> possible to support devanagari through the ISCII approach? 

Yes it is possible and has been done in yudit, IE and MS Office. It shouldn't be
difficult to do in Mozilla either, but IMO it's not something that should be
encouraged unless iscii gets an iso-* certification.

> It would be great to have a uniform method to store and search the Indic 
> information - Unicode has made a start and it is great for me to be able to 
> search hindi content using google - except it is very limited. Most of the 
> Indian content is in proprietary fonts? Would it be better for thr browsers to 
> go the ISCII route than Unicode????

This goes to prove that having one standard is better than having none.
Technologists can provide the tools, but they have to be used by creative people
to *produce the content*. ISCII documents are not searchable either. ISCII
doesn't have an ISO standard, the only known standard for Indian scripts is
UTF8, which is a shame, we have nothing to bargain with in the Unicode
Consortium, but that's the way it is.

IMO, here *are* some valid reasons to change the existing Unicode encoding:
1. Multiple sets of character combinations mapping to the same glyph/cluster
2. No way to indicate which shape I want for a *given* cluster.
3. Sorting is a nightmare
4. So is searching, considering 1,2 and 3.

These problems are going to become more aggravated as more and more users start
using unicoded documents, but as the number of users increases, the number of
developers and therefore the number of solutions will also increase. Till then,
better have one bad standard than none at all. 

The bottom line, IMO:
We should try to get *more USERS* for indic scripts on computers, help them
create electronic content, help them solve the problems they face. Users that
are already are able to use computers, because they know both a latin based
language and an Indic language have to take the lead here. The gaps that we need
to fill for a person who doesn't know a latin based language to straightaway
start using an electronic device is just too huge, so the responsibility is on
our shoulders, people who know both types of scripts. Technical problems will
solve themselves, each one of us has to give their own little nudge in getting
up more electronic content in Indic scripts and in getting more users of Indic
scripts.


> What problem did you face while installing the pcf font? 

> The instructions at http://www.arbornet.org/%7Emarcow/howto.html should be
> sufficient. 

Thanks a lot Alok Kumar. I could intall the font and I can see proper rendering. 

The my problem was that I did not run the mkfontdir command as it was not
mentioned in the above refered document. 

I have a question about why other true type fonts e.g Surekh do not render
properly? Do they not follow the unicode standard?

Any idea?

Great to hear that you got the rendering right!

> I have a question about why other true type fonts e.g Surekh do not render
> properly? Do they not follow the unicode standard?

There are two kinds of encodings:
a. The character encoding, called charset, and
b. The font encoding.

A charset defines what "number", or code, a particular character should map to. 
A font encoding defines what *glyphs* are located where.

character to glyph is a many to many mapping, ie 1 character may map to one
glyph (as is the case with latin, for the sake of simplicity)
One character may also map to multiple glyphs (eg a "pa" in devanagari may be
written as a "half pa glyph" + the "a matra glyph")
Multiple characters may map to a single glyph (eg The latin "fi" may map to a
single glyph in some fonts, where the f and the i actually are joined together)
Multiple characters may map to multiple glyphs, which is quite obvious, no
examples needed. Multiple characters may also map to different sets of glyphs
depending on where they are placed in a word.

Unicode defines the codes for the locations of the basic characters for
devanagari. Those code points are the same in case of both the Surekh font and
the Saraswati font.

The difference is in the font encoding.
There is *no standard font encoding for devanagari*, which is a pity.
Therefore the sun guys came up with a sun-unicode-india encoding, which
Saraswati uses.
Surekh most probably uses the ISFOC encoding, but you can't bet on it.

Saraswati is a true type font, and since the encoding is known, it is possible
to map the glyps to the character sequences, which is what ctl has done.

For Surekh, the ISFOC coding is not really *documented*, meaning it is subject
to change without notice.
But, it is an open type font. In an open type font, there are tables within the
font that define which (unicode)character sequences should map to which glyph
sequences, also known as glyph substitution.
That way, you are actually obviating the need for a font encoding, as long as
the rendering program knows how to read those tables.
What happens in win2k, winxp, pango and yudit is that the open type tables are
read, and the corresponding glyphs are read.
Mozilla doesn't read the open type tables, (and it shouldn't?) That is the job
of lower level layers, which would do this job for all applications.

To sum up: Mozilla knows the font encoding for saraswati, but it doesn't know
the font encoding/tables of Surekh. Both have unicode character encoding, so
both are able to display the correct characters based directly on code points.

As a corollary, it is possible to display a utf-8 encoded page even by using a
non-unicode encoded font - as long as you know the "inside" of the font, ie the
font encoding. That is what's being done/proposed to be done for Tamil, using
TSCII fonts. Similarly, a utf-8 encoded page could even be displayed using the
Susha font - but you would need to write the converter.

It would also be a good idea to do the same thing for devanagari for ISFOC
encoding, they are very good quality.

An even better idea would be to have *at least one standard font encoding* so we
don't have to rely on open type alone, but that's something beyond the control
of most people reading this.

Hope this cleared up a few things, rather than the other way around! Jungshik
and Prabhat, please correct me if you find any factual inaccuracies.
>>> Actually, it's almost ready. The patch for bug 203406 is just waiting for 
r/sr. Once it's resolved, it's just as simple as enabling CTL (there are a 
couple of issues to resolve before enabling CTL by default, but anyone with 
her/his own build environment can enable it on her/his build)

OK, I'll wait for it and hopefully someone here will make a win98 binary with 
CTL enabled for me to try :-)


>>> We should try to get *more USERS* for indic scripts on computers, help them
create electronic content, help them solve the problems they face.

Contentwise I am plaaning to convert the documents on sanskrit.gde.to to 
unicode (utf-8) - just waiting to get a little more support on the browser 
side. 

Another area where any user could add unicode hindi content at the hindi 
wikipedia site. see http://hi.wikipedia.org/wiki/Pratigya and add to it,

I would appreciate if one of the unix gurus here would take a look at 
http://hi.wikipedia.org/wiki/sahaayataa and correct the information for both 
viewing devanagari unicode on *nix as well as any input method information for 
the same. Please feel free to correct info regarding windows also, I am only 
using in on win98.

thanks,
Shree


>>> ISCII and Unicode are almost identical except for the codepoint assignment 
so that once Indic script in Unicode is supported, supporting ISCII is not very
hard (if that's really necessary). I've filed a bug to write a converter for
ISCII, but haven't begun working on it, yet. I do have ISCII in English (PDF). 

You may want to look at the programs and tables on http://www.sharma-
home.net/~adsharma/languages/scripts/. Please see a sample below:

==================
devanagari
==================

र  [  U+930 ]  2786
क  [  U+915 ]  2048
ना  [  U+928 U+93e ]  1893
त  [  U+924 ]  1204
प  [  U+92a ]  1058
न  [  U+928 ]  1047
ल  [  U+932 ]  1019

So the converter you need may already be available.

Shree

>>>> In
my opinion, users do not need to be concerned with encoding except for reasons 
of : Legacy, Coverage or (practical & not theoritical) performance. Using 
Unicode for Indian Scripts have none of the issues.

OK, here is a practical issue using Unicode for Hindi.
See, 
http://hi.wikipedia.org/wiki/Eng-hin-a or 
http://hi.wikipedia.org/wiki/Eng-hin-b

The file sizes double when using utf-8 and so it is very slow to add, edit or 
view. So even though storage may not be an issue, speed problem is definitely 
there. 

Any suggestions ...
I notice that although Mozilla will attempt to reproduce the Devanagari text
when saving a bookmark, it does appear to display UTF-8 text in the title at the
top of the browser window.  Is this on the to-do list ?

J.M. Nicholls
London
re comment #49:  
there's no way to support that in Win 9x/ME because Win 9x/ME is not a  
Unicode-based OS and rendering in the title bar of a window belongs to the realm of  
the OS (not application programs like Mozilla and MS IE). See bug 9449 and int'l 
release notes. Of course, it'd be possible if MS released a Hindi (Tamil, Bangla, etc) 
version of Win9x/ME, but MS doesn't plan to. Even then, you couldn't see Tamil title 
under Hindi  Win9x/ME. (the same is true of any language X under Y language version 
of Win9x/ME  where X and Y use different scripts/writing systems)  
  
re comment #48:  Three-fold size(it's not 2 but 3 if you compare ISCII and UTF-8) 
increase falls pale when compared to numerous advantages of  
Unicode. Moreover, 'compression encodings' like BOCU and SCSU make it possible  
to store Devangari and Indian text in other Indic scripts in Unicode efficiently as if  
they're in 8bit ISCII.   Again, why would you want to go back to the 'dark age' from  
which the informed (for whom legacy encoding has been the norm) are happy to 
escape?  Well, some people wanted to go back as Erich Fromm observed ;-)  
  
>>>> Again, why would you want to go back to the 
'dark age' from  which the informed (for whom legacy encoding has been the 
norm) are happy to escape?  Well, some people wanted to go back as Erich Fromm 
observed ;-)  


I am not wanting to go back to dark age. I am very happy with what unicode is 
offering for users. But as I am trying to add content in utf-8 I am finding it 
slow and cumbersome at times - I just wanted to know if there are ways and 
means of getting around it or improving it.

  
I got an email saying :

Bug 203406 Summary: Performace enhancement in CTL code
http://bugzilla.mozilla.org/show_bug.cgi?id=203406

Does that mean the bug relating to devanagari has also been fixed or is this on 
a back burner somewhere???

Shree
This fix will not allow Hindi support in Win98.
Actually, the fix for bug 203406 can fix the problem on Win9x/ME if Mozilla is
built with CTL enabled for Win 9x/ME and truetype version of Sun's Indic fonts
are installed.  That's why this bug was made dependent on bug 203406.

Tamil and Korean are supported that way on Win 9x/ME without CTL enabled because
Tamil and Korean custom font converters are not in  intl/ctl but in intl/uconv.
(see bug 177877 for the 'infrastructure' for that.)

 Needless to say, a better solution would be  to make use of Uniscribe (assuming
that everybody has MS IE 5.5  or later installed so that Uniscribe is
available). However, that's a huge endeavor.... Besides, I still don't know how
to support MathML in that case (and Mozilla does a better job with Korean than
Uniscribe)

 @netscape.com address doesn't work any more so that I'm assigning to myself.
It's bug 218887. 
Assignee: yokoyama → jshin
The problem with the ChoTI 'e' mAtrA (Unicode 0x093f) still persists for me, on
Mozilla 1.7 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7)
Gecko/20040616 MultiZilla/1.6.4.0b), standard installer build on WinXP Home SP1.
Is devanAgarI rendering supposed to work on the above build? IE 6 on the same
machine, which uses the same font (Mangal) as Mozilla for devanAgarI, renders
correctly.
Mozilla also uses the halant for rendering compound letters (joDAkShara instead
of using half-letters where appropriate.
just a clarification, the vowel I referred to is transcribed as 'i' in ITRANS,
and is the vowel in the word "hit".('e' is a different vowel - the reason I used
'e' is that bug 174424 uses the letter E to refer to the vowel.) sorry for the
confusion.
(In reply to comment #55)

> Mozilla 1.7 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7)
> Gecko/20040616 MultiZilla/1.6.4.0b), standard installer build on WinXP Home SP1.

Did you turn on complex script support on Windows XP? You probably did because
you already have Mangal. Anyway, please make sure you do. Besides, is there any
other problem or is that the only problem? BTW, what's the url of the page you
have a trouble with? If text is 'justified', Mozilla doesn't work.
thanks for your help, Jungshik.
no, I didn't have complex script support enabled - did that and lo and behold,
everything's fine now. Well, almost - over at the Hindi version of Wikipedia
(http://hi.wikipedia.org), the pages are all displayed correctly, but the cursor
in the editing box behaves a bit funny. However, that may not have anything to
do with Mozilla.
thanks again for the help.
(In reply to comment #58)
> but the cursor
> in the editing box behaves a bit funny. However, that may not have 

That's another bug to fix. See bug 229896


*** This bug has been marked as a duplicate of 343405 ***
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → DUPLICATE
Component: Layout: CTL → Layout: Text
QA Contact: amyy → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: