Closed Bug 204039 Opened 21 years ago Closed 21 years ago

write a converter for Tamil rendering with TSCII-encoded TTFs

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.4final

People

(Reporter: jshin1987, Assigned: jshin1987)

References

Details

(Keywords: intl)

Attachments

(4 files, 2 obsolete files)

This is a spin-off of bug 140013. Bug 140013 was filed for Windows XP and
it turned out that Win2k/XP can render Tamil well when complex script
support is turned on at the OS level. Just installing Thai/Hinid/Tamil/etc
support in Win2k/XP dramatically improves what standard Win32 text APIs
such as ExtTextOutW can do without Mozilla doing anything. (see also
my comments in bug 203052 and the news artcile linked there)

So, I'm filing a separate bug for Mozilla-Xft. This bug is 
similar to bug 176315 and bug 203052 and as such it
depends on bug 176290.
Attached patch a tentative patch (obsolete) — Splinter Review
This is a tentative patch that more or less works. We have to decide
where to put these converters (in intl/uconv or intl/ctl). 
Unicode->TSCII converter is based on Bruno Haible's converter
for glibc 2.3.x, but I modified it and fixed some problems
(probably due to problems in the mapping table Bruno refered to
when writing his converter)
As you have noticed, one implicit property of a converter in intl/ctl is that it
is based on a module (pango shaper) which also provides cluster boundary
information. This gets used in edit operations of TextFrame. My suggestion is
not to house the converter in intl/ctl unless you're able to produce a pangolite
shaper out of it.
 Prabhat, thank you for the note. I'll put it in ucvlatin, then. I'm not sure
if I have to make it built only when enable-ctl is turned on. Perhaps, embedding
people may want me to do that.
After a lot of tinkering with eight 'TSCII' fonts I downloaded
(and pfaedit, ftdump and a simple TTF test program I wrote),
I identified 4 fonts with the consistent CMaps. The shot
was taken with two of them. The last column is rendered
with Code2000[1] font to show nominal glyphs. Comparing
nominal glyph sequences with the rendering result 
in the second and third column will show you that
vowel reordering/splitting and ligaturing(for
consonant conjuncts and other sequences)  work.   

[1] Code2000 has OT layout tables for Tamil, but Mozilla-Xft
can't take advantage of them so that it's used to show
nominal glyphs.
Well done Jungshik Shin!

The output is correct. How can I download the build you have?

Thank you.
I added 'ta : x-unicode' mapping. This enables Tamil users to set fonts for
rendering Unicode web pages with 'ta' lang tag by setting fonts for Unicode
instead of x-western. Of course, if my proposed patch for bug 204536 is landed
before this, this mapping is not necessary. Ultimately, we have to add
font-pref. menu for Tamil and many other scripts, but in case of Tamil, I can't
because of fontconfig bug
(http://fontconfig.org/cgi-bin/bugzilla/show_bug.cgi?id=84).
Attachment #122195 - Attachment is obsolete: true
> if my proposed patch for bug 204536 is landed

  It's bug 204586. 
Comment on attachment 122854 [details] [diff] [review]
a new patch with langGroups.properties change

I'm not sure whom to ask for r. It'll be great if either Simon or Prabhat   can
review it. Hope rbs won't mind taking this for sr :-)
Attachment #122854 - Flags: superreview?(rbs)
Attachment #122854 - Flags: review?(smontagu)
This patch can be used for Mozilla-Win(Win 95/98/ME. thanks to the patch for bug
177877), Mozilla-X11core (and possibly other platforms) as well as for
Mozilla-Xft just like the patch for bug 176315 is used by Mozilla-Win,
Mozilla-X11core and Mozilla-Xft. I'm changing the summery line and platform/OS
as such. 

Mozilla-Win on Win2k/XP also can benefit from this patch if Tamil opentype font
is not available. If Tamil opentype fonts are available, it can rely on the
native support of Tamil on that platform.
Blocks: 140013
Status: NEW → ASSIGNED
OS: Linux → All
Hardware: PC → All
Summary: write a converter for Tamil rendering with TSCII-encoded TTFs (for Xft build) → write a converter for Tamil rendering with TSCII-encoded TTFs
To take this shot, I followed a similar procedure (for pre-1933 orthography
Korean. see bug 176315). Here's recap:

1. Install some TSCII truetype fonts [1] in the directory of your choice.
2. In that directory, run 'mkfontdir' and 'mkfontscale'
3. In that directory run the following script to fonts.alias

grep 10646-1 fonts.dir | sed -e \
's/[^ ]* \(-[a-zA-Z]*\)\(-.*-\)iso10646-1/"-tscii\2tamilttf-0"
"\1\2iso10646-1"\
> fonts.alias

4. run the following commands

   $ xset fp+ `pwd`
   $ xset fp rehash

5. If you want the change to be permanent, add the full path of the directory
to    the font search path of xfs (X11 font server) or your X11 server. 


[1] TSCII fonts I tried are as following: They can be found at 

ftp://sunsite.dk/mirrors/mandrake/9.1/i586/Mandrake/RPMS/fonts-ttf-tamil-1.1-1mdk.noarch.rpm

ftp://sunsite.dk/mirrors/mandrake/9.1/i586/Mandrake/RPMS/fonts-ttf-tscii-1.1-1mdk.noarch.rpm



TSCAKKAN.TTF
TSCPARAB.TTF
TSCPARAH.TTF
TSCu_Comic.ttf
TSCu_Paranar.ttf
TSCu_Times.ttf
TSCu_paranarb.ttf
TSCu_paranari.ttf
I added Mozilla-X11core (Gfx-Xlib/Xprint, Gfx-GTK) support as well as
Mozilla-Win support. As for Mozilla-Win, just adding Tamil font entries to
fontEncoding.properties file is enough. This is to  enable Mozill-Win under
Win9x/ME to renderTamil pages.
Attachment #122854 - Attachment is obsolete: true
Attached file Tamil test page
This is the test page I used to make attachment 122218 [details] and attachment 122998 [details].
Attachment #122854 - Flags: superreview?(rbs)
Attachment #122854 - Flags: review?(smontagu)
Comment on attachment 122999 [details] [diff] [review]
a new patch with font-preference patch and Gfx-GTK/Gfx-Xlib patch added

This patch also add font-pref. menu for Tamil (as described in bug 204586). In
case of Mozilla-Xft, fontconfig has to be patched to remove Tamil digits and
U+0B82 from the minimum set of characters for Tamil support (see
http://fontconfig.org/cgi-bin/bugzilla/show_bug.cgi?id=84)


Now that this patch is not only for Moz-Xft but also for Moz-Win and
Moz-X11core, I'm sure this will help a lot of Tamil speakers(especially those
with old PCs with Win9x/ME). Therefore, it'd be really great to get this in
before 1.4 especially considering that 1.4 will be the foundation for a new
Netscape. There's virtually no interaction with other parts, which means there
won't be any regression. Thank you.
Attachment #122999 - Flags: superreview?(rbs)
Attachment #122999 - Flags: review?(smontagu)
Target Milestone: --- → mozilla1.4final
TSCII to Unicode mapping table is available at http://jshin.net/i18n/tscii.pdf
(I fixed some obvious errors in the table at http://www.tamil.net. ).
FYI there are about 70 million tamil speaking people around the world in
countries like India, Sri Lanka, South Africa, Malaysia, Singapore, Mauritius,
Great Britain, the US, and Canada.
(http://www.ethnologue.com/show_language.asp?code=TCV).

There are also several thousands living in each European countries like Denmark,
Germany, Norway, Sweden, Italy, Switzerland, Netherlands and more. 

I think this improvement will mean a lot to the tamil speaking people. Since the
next many versions of Netscape will also be based on this release, it will
really make a big difference.

This will also encourage more tamil webdevelopers to make use of Unicode
(instead of many non-standard fontencodings) and at the same time increase the
number of Mozilla and Netscape users under Windows95/98/Me, Linux and other Unix
OS. Personally I am involved in projects intalling Linux with Mozilla in old
computers and there by giving many poor students and schools access to computers 

Please make this MAJOR improvement available in release 1.4.

Thank you very much.
Manmathan Kumarathurai
I'll take a look at the patch, but it needs approval from someone with more idea
than me about Tamil (Prabhat?)
Simon, glad that you're gonna look at it. 
As for the conversion itself, I'm pretty confident that it works 
correctly. I spent some time with
Unicode 3.0 chapter 9, the TSCII table and most of all, 'bewildering' :-)
varieties of 'TSCII' fonts. I built mine upon glibc 2.3's tscii converter,
but simplified and improved it in a few aspects (and added a lot of
macros to make the source relatively easy to understand).
                                                                                
Nonetheless, a second look by Prabhat would be most welcome :-)
Comment on attachment 122999 [details] [diff] [review]
a new patch with font-preference patch and Gfx-GTK/Gfx-Xlib patch added

Looks good to me.
Attachment #122999 - Flags: review?(smontagu) → review+
Comment on attachment 122999 [details] [diff] [review]
a new patch with font-preference patch and Gfx-GTK/Gfx-Xlib patch added

Did somebody campained to get all those votes?!?

+# Tamil fonts (TSCII encoding : see http://www.tscii.net)

Dead link.

re: comment 9:
fontEncoding take precedence. Won't this change affect the case when the OS
(Win2K/XP) has native support?

I was wondering if ucvth wasn't a better, but it is empty now, so sr=rbs.
It should have been .org:
Please try: http://tscii.org/  (or http://www.tamil.net/tscii/)

I must admit that I did some campagning :-) and tried to explain how important
this bugfix is.

I hope Jungshik Shin or others can answer the other questions.

Cheers,
Manmathan
Thank you sr, rbs. I fixed up the dead link and added a bit more details about
those lines including what to do on Win2k/XP. With 'TSCII' only 'dumb' (as
opposed to opentype) truetype fonts, this mechanism can coexist with the native
OS support.

As for the location, ucvth is for Thai(yet empty). Probably, when X-ISCII-yy or
ISO-8859-12-yy? (where yy is one of 'de', 'be', 'ta', 'ka'? and other Indic
scripts) en/decoders are added , we have to make ucvin and put TSCII converter
along with ISCII converters. 
Attachment #122999 - Flags: superreview?(rbs) → superreview+
Comment on attachment 122999 [details] [diff] [review]
a new patch with font-preference patch and Gfx-GTK/Gfx-Xlib patch added

This is adding a new script support (Tamil). For those who will never view
pages in Tamil, the impact is near zero (it's not zero only because the binary
size increases by  a few kBs). There's virtually no possibility of regression
to them.  On the other hand, it'd be great for Tamil speakers to have a
released version of Mozilla that supports Tamil in Unicode.
Attachment #122999 - Flags: approval1.4?
Comment on attachment 122999 [details] [diff] [review]
a new patch with font-preference patch and Gfx-GTK/Gfx-Xlib patch added

a=asa (on behalf of drivers) for checkin to 1.4
Attachment #122999 - Flags: approval1.4? → approval1.4+
Fix checked in. Thank you all.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
*** Bug 205992 has been marked as a duplicate of this bug. ***
this check in added 4k to libuconv.so  is there a way to have a build config
option to opt this out?

  libuconv.so
  	Total:	      +4244 (+4248/-4)
  	Code:	      +3380 (+3384/-4)
  	Data:	       +864 (+864/+0)
  	      +3380 (+3384/-4)	T (CODE)
  		      +3380 (+3384/-4)	UNDEF:libuconv.so:T
  			      +1160	nsUnicodeToTSCII::Convert(unsigned short const *, int *, char
*, int *)
  			       +328	nsUnicodeToTamilTTF::Convert(unsigned short const *, int *,
char *, int *)
  			       +252	nsUnicodeToTSCII::QueryInterface(nsID const &, void **)
  			       +204	nsUnicodeToTamilTTFConstructor(nsISupports *, nsID const &,
void **)
  			       +180	nsUnicodeToTSCIIConstructor(nsISupports *, nsID const &, void **)
  			       +176	nsUnicodeToTSCII::FillInfo(unsigned int *)
  			       +120	nsUnicodeToTSCII::Finish(char *, int *)
  			       +104	nsUnicodeToTamilTTF::QueryInterface(nsID const &, void **)
  			       +100	nsUnicodeToTamilTTF::SetOutputErrorBehavior(int,
nsIUnicharEncoder *, unsigned short)
  			       +100	nsUnicodeToTamilTTF::nsUnicodeToTamilTTF(void)
  			        +84	nsUnicodeToTSCII::nsUnicodeToTSCII(void)
  			        +84	nsUnicodeToTamilTTF::~nsUnicodeToTamilTTF(void)
  			        +68	nsUnicodeToTSCII::~nsUnicodeToTSCII(void)
  			        +60	nsUnicodeToTSCII::Release(void)
  			        +36	nsUnicodeToTamilTTF::AddRef(void)
  			        +36	nsUnicodeToTamilTTF::Release(void)
  			        +28	nsUnicodeToTamilTTF::GetMaxLength(unsigned short const *, int,
int *)
  			        +28	virtual function thunk (delta:-4) for
nsUnicodeToTSCII::AddRef(void)
  			        +28	virtual function thunk (delta:-4) for
nsUnicodeToTSCII::FillInfo(unsigned int *)
  			        +28	virtual function thunk (delta:-4) for
nsUnicodeToTSCII::QueryInterface(nsID const &, void **)
  			        +28	virtual function thunk (delta:-4) for
nsUnicodeToTSCII::Release(void)
  			        +28	virtual function thunk (delta:-4) for
nsUnicodeToTamilTTF::AddRef(void)
  			        +28	virtual function thunk (delta:-4) for
nsUnicodeToTamilTTF::QueryInterface(nsID const &, void **)
  			        +28	virtual function thunk (delta:-4) for
nsUnicodeToTamilTTF::Release(void)
  			        +20	nsUnicodeToTSCII::GetMaxLength(unsigned short const *, int, int *)
  			        +20	nsUnicodeToTSCII::Reset(void)
  			        +16	nsUnicodeToTSCII::AddRef(void)
  			        +12	nsUnicodeToTSCII::SetOutputErrorBehavior(int, nsIUnicharEncoder
*, unsigned short)
  			         -4	ToUpperCase(unsigned short)
  	       +512 (+512/+0)	R (DATA)
  		       +512 (+512/+0)	UNDEF:libuconv.so:R
  			       +224	g_ufJohabJamoMapping
  			       +128	UnicharToTSCII
  			        +64	gTSCIIToTTF
  			        +32	coverage.126
  			        +28	consonant_with_virama
  			        +18	consonant_with_u
  			        +18	consonant_with_uu
  	       +352 (+352/+0)	D (DATA)
  		       +352 (+352/+0)	UNDEF:libuconv.so:D
  			       +128	components
  			        +64	nsUnicodeToTSCII virtual table
  			        +64	nsUnicodeToTamilTTF virtual table
  			        +32	gConverterRegistryInfo
  			        +32	nsUnicodeToTSCII::nsICharRepresentable virtual table
  			        +32	nsUnicodeToTamilTTF::nsICharRepresentable virtual table
No, there isn't at the moment. I considered putting it inside SUNCTL, but it
doesn't really belong there because it doesn't yet offer clustering info
(comment #2). Do you want me to add a build config option to disable it?
Something like 'disable-indic-converters' (default is on) would work ( I'm
planning to add x-iscii-yy converters). 
I have tested a Xft build of 1.4rc1 with the patch for bug 176290 applied. And I
have found out that the fonts you select in the font-preferences dialog for
"Tamil" only have effect when you are in the tamil locale (LANG=ta_IN).

If you are in a danish locale, LANG=da_DK.UTF-8, you have to change the fonts
for the Western. Changing the fonts for Tamil have no effect at all.

One other thing: When I change the fonts for western when I am in a danish
locale, it also changes the fonts for the mozilla application it self. fx. the
fonts in the Preferences dialog is changed.
That's a known problem, but that doesn't have much  to do with this bug. Only
Mozilla-Windows selects fonts for Unicode documents based on the code range (see
bug 206123 and bug 91190) when 'lang' (html) or 'xml:lang' is not specified.
Mozilla-X11core and Mozilla-Xft rely on the current locale to choose fonts for
Unicode documents without lang/xml:lang. I thought I had filed a bug for this,
but apparently I didn't.   I'm gonna file a bug on this issue.

BTW, it's always a good practice to specify lang (for html) and xml:lang for xml. 
FYI, see bug 229394 and bug 208479 for issues mentioned in the last two comments.



Please take a look at bug 307257, which is a current topcrasher and might be
very likely related to the work done here.
Component: Layout: CTL → Layout: Text
QA Contact: arthit → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: