Closed Bug 215219 (CTL2) Opened 21 years ago Closed 17 years ago

add a bridge to Pango from Xft rendering code for CTL

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
mozilla1.7final

People

(Reporter: jshin1987, Assigned: jshin1987, NeedInfo)

References

Details

(Keywords: intl)

Attachments

(17 files, 15 obsolete files)

141 bytes, text/plain
Details
2.15 KB, text/html
Details
6.66 KB, text/html
Details
42.94 KB, image/jpeg
Details
1.22 KB, text/html
Details
52.08 KB, image/jpeg
Details
58.05 KB, image/jpeg
Details
215.38 KB, image/png
Details
3.07 KB, text/html
Details
3.78 KB, text/plain
Details
492 bytes, text/html
Details
24.12 KB, image/png
Details
44.99 KB, patch
Details | Diff | Splinter Review
44.02 KB, patch
Details | Diff | Splinter Review
14.18 KB, patch
Details | Diff | Splinter Review
166.68 KB, image/png
Details
199.43 KB, patch
Details | Diff | Splinter Review
There's a patch to take advantage of Pango (installed on the system) to render
complex scripts at 
http://www.eng.buffalo.edu/%7Edsjoshi/mozilla-xft-ctl.html


This is an alternative to bug  214715(going all the way to Pango) and Prabhat's
approach summarized in bug 199857
I forgot to mention that this info. was kindly passed along by Aloc. 

Below are some thoughts of mine:

 A similar approproach was   taken by SILA
(http://sila.mozdev.org).  We might be able go along a similar path
on Windows (Uniscribe) and MacOS X (ATSUI). I just took a brief look
at your patch and it looks fine (other than some factoring that need
to be done and CTL range check that you wrote about). In addiiton to
DrawString, text-dimension related functions have to be modified as well.

> > --with-default-toolkit=gtk2).

 I don't see any reason that we need gtk2 for this patch. Is there any?


> > xpcom module. It also bypasses the DrawingDataFillSpec funcition for
> > these ranges (actually it is just a copy of the original
> > DrawingDataFillSpec with minor modification to take care of changed
> > glyph string lengths (for indic scripts )

> > xpcom module. It also bypasses the DrawingDataFillSpec funcition for
> > these ranges (actually it is just a copy of the original
> > DrawingDataFillSpec with minor modification to take care of changed
> > glyph string lengths (for indic scripts )

   We may be able to  do some factoring here.


 In one aspect, I share Prabhat's concern. We have to extend this to support
not just rendering but also 'editing' (cursor movement, selection, etc).
Because Pango already offers APIs to get  clustering info and there's
Prabhat's implementation with PangoLite, it wouldn't be hard
to find a wrapper for that to be used in nsTextFrame.cpp.

  Now we haev a couple of options in incorporating Pango with Mozilla:

     - bug 214715 : recast everything in Pango: it'd not be very hard to
       do this because Pango is very 'smart' (sometimes too smart).
       MathML support is a headache in this case.
       On the other hand, overall code would be a lot simpler and leaner.

     - the approach taken by Dhananjay : built upon the current code.
       MathML (and scripts not well supported by Pango but supported
       by Mozilla) can be supportd without fixing Pango. code would
       a bit more complex than now.

  The downside of both approaches is that Mozilla is dependent on
Pango, but I guess it makes _a lot of_ sense to make use of it when
available (the same is true of Uniscribe and ATSUI). Where it's not
available, we can use PangoLite(as summarized in bug 199857) Perhaps,
what we have to do is to make a consensus on the way we interact
with/invoke Pango APIs (the system-installed or built-in PangoLite)
so that from the consumers' (API users : renderer and nsTextFrame,
etc) point of view, both are equivalent.
Summary: add a bridge to Pango from Xft rendering code for CTL : → add a bridge to Pango from Xft rendering code for CTL :
It seems like the patch was written in such a way that it can be an 'almost'
drop-in replacement for PangoLite, which means that we can make either run-time
or compile-time selection between two options.

 Reassinging to me for some polishing and impedance matching. I'm not able to
work on this right now, but hoping to get back soon(?). 

BTW, I'm not yet sure which is better, the approach of bug 214715 or this.
Perhaps in the long run, the former is better (assuming that we can solve MathML
issue) than this. On the other hand, there are a couple of things we want to
keep our grip on ('invisible' char. handling, transliteration, etc), for which
the latter is better. 
Assignee: prabhat.hegde → jshin
I more or less concur with what Jungshik says. However, before making a full
commitment, i'd like to suggest to investigate ripping Indic OT layout from ICU.
The reason being a number of OT layout bugs are fixed there (since we use it for
Ooo) and also consistent testing on MSFT compatible OT fonts. The secondary
reason is two-fold. One - we don't need additional work to support our non-Gnome
friends. 2> Possibly can be re-used for windows 98 & 2k (where OT support does
not exist for non Hindi scripts as far as i know.)

I also propose restructuring nsULE so that edit operation API can be re-used.
Thereby all that is needed will be shaping code. Secondly i would suggest a
cache based performance enhancement which will eliminate the need to go through
a clustering call for edit operations. 

prabhat.
I modify my comment earlier. Here is an option i would prefer.
A> New compile-time switch to choose system pango.
B> Use Dhanjay's approach (Re-use pango API for both shaping and edit operations).
C> Possibly add new compile time switch *in case* we want to have an ICU layout 
   to replace pango in Dhanjay's approach.
I worked on selection yesterday and this module works fine with prabhats cluster
recognition code very well indeed. I have cleaned up the code a lot and fixed
most bugs. please check the new screenshot at
http://www.eng.buffalo.edu/~dsjoshi/mozilla-xft-ctl.html
newer source code is also there.

This module still needs ULE/ILE for selection but not dvng-x

dsjoshi
Thanks for the update. 

I've given some thought to this bug on the road and I'm inclined toward a
slightly different approach (more similar to what's done in SILA with Graphite
fonts/rendering API). In SILA, Graphite fonts are identified by an 'invasive'
method and a new font class is added (or it can be done by adding a 'flag'
member variable in nsXftFontUnicode) for Graphite fonts. When a list of matching
fonts (candidate fonts) is generated, Graphite fonts are idnetified and
instances of the class for Graphite font are ctor'd. This font class  has usual
drawing and measuring methods implemented in terms of the Graphite API wrapper
service(it's also written in xpcom).

I had been pondering over this possibility with Pango, but hadn't come up with a
way to identify 'Pango fonts' (there's no such thing as Pango fonts. What I
meant is fonts that can render given char/strings with Pango APIs) until 
Dhananjay's patch 'inspired' me. Graphite fonts have a unique signature   so
that it's easy to identify them. 

What did I come up with? I don't have to identify them other than making sure
that they're opentype fonts and meet other creteria (lang, family name, etc. If
necessary, a static const precompiled CCMap can be added to nsFontMetricsXft.cpp
for the Unicode range checking). We can either add a new class (say,
nsFontXftOpenType) or flag it with a new boolean member variable in
nsFontXftUnicode. 

Now here's a little problem. Unless there's a fontconfig API that lets us tell
opentype fonts from other fonts, we have to give up 'lazy opening' of XftFont
(GetXftFont) and resort to FT2 API to identify opentype fonts. 
In effect, Gfx:Win always 'looked into' fonts  (not for identifying opentype
fonts but for getting truetype cmaps. Therefore, ftang didn't have to worry
about this when he implemented SILA.
Anyway, this is why I used the word 'invasive' when describing SILA's
idnetification of Graphite fonts. 

I'm adding Keith to CC to see what he thinks about adding an API to fontconfig
for 'font type' identification if there isn't one yet.

Incidentally, we can just go all the way to Pango (except for MathML and Korean
- in case of the latter, until I fix Pango. In other words, except for 'custom
fonts' listed in fontEncoding.properties file) and let Pango run the show. In
that case, we don't need to worry about 'OTF' checking at all. This is an
alternative to bug 214715 that enable us to keep a leverage (if/where necessary)
while taking advantage of Pango's intelligence. I guess we still need 'Pango
wrapper API' service(xpcom) because it's not only for rendering but also for
editing/selection. 

Needless to say, there are a couple of 'combinations' of the above
(run-time/compile-time Pango presence checking, ICU, etc...)

I'm writing all this from memory(w/o actually looking at the code) so that there
may be some loose ends.
in EnumarateGlyphs:
A suitable font is determined. This font is passed to the DrawStringCTLCallback
function along with the string.

In DrawStringCallback:
FcResult v;
if(FcGetPattern(nsFontXFT::XftFont,FC_FAMILY,0,&v ==FcResultMatch)
{
//we got the family string v.u.s send it to pango with the unichar string
//continue with the old function
}

in pango module use this family string to set the font for context
this way we do not need to have anycode to identify fonts separately.

Assumes: All the Fonts that are available to mozilla are also available to pango
(actually pango gets to use all fonts, but mozilla only uses fonts that are
particular rendering system xft/x. cause I cannot see any x fonts listed in the
fonts prefs. only xft fonts.
I might be wrong  about this assumption though.

dsjoshi
That's a safe assumption.  At least in the Xft code we use fontconfig as does
pango.  Also, if we ever want to do downloadable fonts (I hope we do) we should
just use the standard path code in fontconfig to add it to the path.
I dont' think my earlier comment was posted.
//earlier comment

I realised that in thai sometimes you get more glyphs than there are characters.
So I now allocate 200% of the length to store glyph indices in DrawStringCTL
callback function. So now it works for thai. Should work for other scripts too.

Thai Screenshot and latest patch/source is at 
http://www.eng.buffalo.edu/~dsjoshi/mozilla-xft-ctl.html
I'd be concerned that there were characters that required more than two glyphs
to render.
I Understood What blizzard is saying and consequently I have done a couple of things
1: at the expense of an additional function call ( first analyze text and find
length) and next allocate memory accordingly and get the glyph indices. 

2: wrote additional code to take care of the situation when pango_itemize
returns more than one item

earlier the patch was working for thai, only with the translation page on
unicode webpage and not with bbc thai. After the additions it works for both
satisfactorily (without crashing !! and without any assertions from pango) 
please check the screenshots and source at the same url as earlier. Jungshik
could you please check if the screenshots of thai are correctly reordered.

It is my belief that with minor modifications and optimizations it can be
incorporated in the main trunk.

dj
Sorry Thai was not rendered properly in your screenshot. The latest patch of
yours has 'raghindi' hardcoded and I guess that's why it didn't work for Thai. 

BTW, I couldn't make 'configure' work with gtk2 + xft + ctl on RedHat 9 
although I don't have any problem with gtk + xft + ctl (that can render Thai,
Devanagari, Tamil and Korean well). For some reason, configure thinks that I
don't have gmodule support. Anyway, I realized that to use Pango's opentype font
support, we need gtk2/glib 2.x. So, this is for gtk2 port only. 
Status: NEW → ASSIGNED
This is the 0th cut so that there are a lot of rough edges to smooth out.
Because of the difficulty I have with xft+ctl+gtk2, I couldn't test this patch,
which was written to implement what I mentioned earlier:

> Incidentally, we can just go all the way to Pango (except for MathML and
Korean
> - in case of the latter, until I fix Pango. In other words, except for
'custom
> fonts' listed in fontEncoding.properties file) and let Pango run the show. In

> that case, we don't need to worry about 'OTF' checking at all. 

The directory organization under intl/ctl was on purpose to be distinct from
the existing directories to avoid the 'collison' for the now.  Obviously, we
need to figure out the best way to organize things there. And, that's not the
only thing but there are a lot to work out. Partly because we have to move
around 'spacing vectors', APIs in nsICTLService are not so 'pretty'.  Neiter
are some changes made in nsFontMetricsXft.cpp. At the moment, 'spacing vectors'
don't seem to be used often, but we can't ignore them because layout will
certainly use them to fully implement CSS/XSL
I probably should have checked whether the new source was correctly uploaded.
Now it is.

Regarding: xft+ctl+gtk2
I had the same problem with the source version 1.4 of mozilla
A workaround for this, I found out is to edit the configure script
 search for "gmodule support"
remove the test program compilation in the script or set the glib flags manually

I do not have the problem with version 1.5a
I still have a trouble getting 'configure' script to work for gtk2+ctl+xft
(1.5b trunk). I worked around it for the now by making intl/ctl get compiled
even when SUNCTL is not defined. 

This patch can render Devanagari, Tamil, Thai with opentype fonts by invoking
Pango APIs (Pango on my Linux box seems to have a little problem with
ZWJ/ZWNJ).  For MathML, Tamil and Korean, custom-encoded fonts can be used as
well. Because Arabic and Hebrew are handled by Mozilla's own BIDI code, Pango
APIs are not invoked on them. A better way is  to disable Mozilla's own BIDI
code for Gtk2(+CTL).

As in Dhananjay's latest patch, I made multiple 'items' taken care of.

Todo:

 1. implement translating 'spacing' info in PangoGlyphString to 'spacing' to
use with |FillSpecBuffer|.  Currently, a blind copy is made. 

 2. ASCII/8bit code path has some problems. Sometimes the width/text dimension
come out wrong. In ISO-8859-1/Windows-1252 encoding, the text inside a button
doesn't get rendered. In UTF-8 or other encodings, buttons present no problem. 


 3. Split 'GetGlyphsForUCS4String' and 'GetGlyphsForUTF16String' in
nsICTLService into 3 or 4 methods : Analyze (similar to Dhananjay's),
RetrieveGlyphs, RetrieveSpacing (only invoked where necessary), FreeGlyphs (may
not need this).

 4. Think more about nsICTLService interface. 

  - Do we need it as an xpcom service? 
  - If so, we also have to implement PrevCluster, NextCluster or similar
methods for the caret movement and selection. 
  - Do we want them to be dependent on the rendering APIs (e.g. Pango)? We can
just implement the Unicode grapheme boundary 'algorithm' independently of the
rendering 'backend'	
  -  Do we want to wrap Uniscribe(Win32), PangoLite, ICU, ATSUI(MacOS X) APIs
with this interface as well?
Attachment #129850 - Attachment is obsolete: true
> 2. ASCII/8bit code path has some problems. 

  It turned out that for some reason pango_shape fills up glyphString with
invalid glyph indices when a PS type1 font is selected. With 'Nimbus Roman No.
1'(type 1) selected, apparently pango_shape returns '0x373'(well beyond the
'glyph index' upperlimit of a type1 font) for 'A'. When I changed Western fonts
to truetype fonts (Luxi Sans, etc), everything worked fine. 

 It's strange that it doesn't work for Type1 fonts because pango_shape() appears
to invoke  pango_xft_font_get_glyph (), which in turn calls XftCharIndex().
XftCharIndex() sure works for both type1 and ttf. Besides, gedit and other gtk2
applications run fine with type1 fonts... 

> Sometimes the width/text dimension come out wrong. 

  To measure the width, nsRenderingContextGFX invokes the UTF16 version of
nsFontMetricsXft::GetWidth[1] even for ASCII strings while it invokes the 8bit
version of DrawString for 8bit strings. My patch doesn't invoke Pango APIs in
the ASCII code path. Therefore, the aforementioned problem with Type1 fonts only
showed up in the form of misaligned text (char. drawing is fine). It also showed
up when 'spacing' is pre-calculated in nsTextFrame.cpp with
nsIRenderingContext::GetWidth() (e.g. when rendering <p
style="text-align:justified">......</p>.) 


As for the font type/feature ID issue with fontconfig (comment #6), I've been
discussing off-line with Keith. It happened that SIL (the home of Graphite among
many other things linguistic) people also wanted a similar feature to identify
Graphite fonts (in SILA : http://sila.mozdev.org, the method that got Keith to
'shudder' is used to identify Graphite fonts) and it was filed as fontconfig b#
105 ( http://fontconfig.org/cgi-bin/bugzilla/show_bug.cgi?id=105)


[1] Actually, there's no 8bit version of GetWidth in
nsFontMetricsGTK/nsFontMetricsXft
Attached patch v 0.2 (obsolete) — Splinter Review
In this patch, I implemented the 'realignment' of 'spacing' array with glyphs.
nsTextFrame.cpp assigns 'spacing' value to each character, but after the
conversion to glyphs from characters, these values can't be used because char
-> glyph mapping is not one to one in general.
 
It still has a problem because for some strange reason, on my Linux,
glyph.geometry.x_offset is not scaled (with the font size) while
glyph.geometry.width is scaled where 'glyph' is PangoGlyphInfo

Another problem is that family name and size are not sufficient to fully
specify a font. We also need weight, style, variant, etc before calling
pango_font_description*(), which means we have to move across Xpcom boundary a
whole bunch of font properties. Perhaps, we can wrap them up all in nsFont with
fields filled with 'matched' values.

Being v0.2, it's littered with debug statements, quick and dirty/ad-hocish code
and #if 0'd block. I'm not thrilled at this patch, but I'm uploading it to put
it somwwhere safe :-)
 
BTW, I didn't split nsICTLService::GetGlyphsForUCS4String into three methods in
this patch. Maybe I'll do it later. 

Another BTW, nsICTLService is not scriptable (no intend to make it scriptable)
so that I may just as well write C++ header file instead of nsICTLService.idl.
Attachment #129921 - Attachment is obsolete: true
I just realized that this approach (of getting glyph strings from Pango and feed
them to Xft APIs) if opentype GPOS table is used. In that case, we have to
retrieve not only glyphs but also x and y position info. as well from Pango.
Perhaps, it's doable, but nsICTLService APIs would get more complicated.  In
this light, the approach of bug 214715 looks  more attractive (only if we can
come up with an 'escape clause' for MathML and scripts that are not well-handled
by Pango that Mozilla supports well)

As for aligning the spacing array with glyphs, there's  write-up on
a related issue at  http://people.redhat.com/otaylor/grid-fitting
Attached patch v 0.3 (see also comment #22) (obsolete) — Splinter Review
I moved new system-pango stuffs out of intl/ctl and into intl/ctl2. intl/ctl2
was made to get compiled in when gtk2 and xft are enabled (without CTL
enabled). I did that way partly because I had a  trouble compiling with
gtk2+xft+ctl and partly because  I wanted to make this bug 'orthogonal to' bug
203406 and bug 204286. As a result, nsFontMetricsXft.cpp is littered with
#ifdef'd lines. 

Anyway, applying this patch to the trunk would get complex scripts rendered by
Pango (just enable xft and choose gtk2 toolkit). Users of South and SE Asian
scripts may want to use it while the overal strategy for CTL in Mozilla is
worked out. 

 I haven't yet solved the problem with aligning 'spacing' array with glyphs so
that 'justified' runs of text in complex scripts wouldn't come out right.
Attachment #130110 - Attachment is obsolete: true
I have added elementary support for editing operations in other languages.

Edited the following file in main mozilla source
editor/libeditor/html/nsHTMLEditor.cpp
In the HandleKeyPress Function
screenshot and source at earlier url.

Currently I am not indentifying the 'isogroup' of the keymap that is currently
active since I haven't yet found a way to identify it in mozilla.

cut-copy-paste-selection works fine
patch for file
widget/src/gtk2/nsGtkKeyUtils.cpp
Comment on attachment 130373 [details] [diff] [review]
v 0.3 (see also comment #22)


>Index: gfx/src/gtk/Makefile.in
....
>+ifdef MOZ_ENABLE_XFT
>+ifeq ($(MOZ_WIDGET_TOOLKIT),gtk2)
>+REQUIRES	+= ctl
>+endif
>+endif

  In the above lines, 'ctl' should be 'ctl2'.  Somehow I must have uploaded the
second latest patch(or a patch made before I corrected the above bug) instead
of the latest. 

Rajkumar S kindly pointed out that to me.
Attachment #130373 - Attachment description: v 0.3 → v 0.3 (see also comment #22)
Fixed 3 problems with Jungshik Shin's patches so that it can be applied on a
new tree and compiled. I have put a tarball of mozilla with this and Dhananjay
Joshi's  input patches at
http://www.linuxense.com/oss/render/mozilla/index.html

raj
This is a simple test case for testing Malayalam rendering. I have not included
all complex cases, but will add them eventually. Latest version of this page
can  be found at http://www.linuxense.com/oss/render/mozilla/index-ml.html 

A free (beer) OT font for testing Malayalam rendering can be found at
http://www.supersoftweb.com/Unicode.htm

raj
Attached patch v 0.3a (Makefile.in changes) (obsolete) — Splinter Review
Identical to attachment 130373 [details] [diff] [review] except for three changes in Makefile.in's that
fix compilation errors (I didn't notice them because I ran make separately in
each directory, but Raj came across them while compiling from the scratch).
Attachment 131127 [details] [diff] doesn't include new files so that I'm uploading this with all
the files included.
Attachment #130373 - Attachment is obsolete: true
Attachment #131127 - Attachment is obsolete: true
Re comment #5:
> Pango on my Linux box seems to have a little problem with ZWJ/ZWNJ.  

  This is a known pango bug (http://bugzilla.gnome.org/show_bug.cgi?id=118302)
that  causes the inhibition of conjuct formation by ZWNJ and the representation
of (independent) half-form with ZWJ not to work.  

Re comment #20 and comment #21 : 

dsjoshi, can you file (a) separate bug(s) for those issues and add me to CC? 
 
Attached file Malayalam test page
I combined my Devanagari test page (with code points shifted to Malayalam
range), Tamil test page(ditto) and several test cases in Raj's page. This is
rather to reveal Pango's shortcomings[1] so that this is not for this bug, in a
sense. Nonetheless, I thought this may be useful here as well. 


[1] gedit has identical problems on linux and Mozilla-Win renders most test
cases the same way as Yudit does on Linux, which is an indication that the font
used (ThoolikaUnicode) is fine. Curiously, MS IE 6(that directly invokes
Uniscribe) has some obvious defects that Mozilla (that relies on ExtTextOutW)
doesn't have.
It seems that Control characters like ZWNJ/ZWJ are being shown as blank boxes
(http://www.peacefulaction.org/sayamindu/Screenshot.png) Can those be supressed?
This does not happen with other apps that use Pango, I I think this has
something to do with mozilla.
Suppresing it is not to deal with the problem at its root(it's just hiding it).
As I wrote in  comment #26, it's a known Pango bug that is to blame(see Gnome
bugzilla bug I quoted for details). When that's fixed, Mozilla with this patch
will get automatically fixed.
 
I applied this patch to moz 1.5 on my redhat 9 system. I am interested in seeing
indic languages work. 
configure cmd: ./configure  --prefix=/home/jatin/mozilla-1.5
--enable-default-toolkit=gtk2 --enable-debug --disable-tests --enable-crypto
--enable-xft --with-system-zilb --enable-ctl
The string to be rendered by nsRenderingContextGTK::DrawString consists of
unicode (indic) and ascii characters. When the string is rendered, indic
characters are rendered correctly, but ascii is totally incorrect. If the string
is only ascii - it renders correctly - as it goes thru a different set of
functions. 
So my thinking is that the glyphs returned by
nsSystemPangoLE::GetGlyphsForUCS4String for ascii are incorrect - probably
because of incorrect font being selected. 
i have seen that the same patch works fine on gnome 2.4 integrated into the
morphix CD released by the IndLinux.org team. So could this be only a
configuration mistake for the font subsystem?
I would like pointers on how i should go further in this regard

Thanks
re: comment #30
I haven't had a chance to work on this recently. I'll take a look again. 
BTW, you don't need 'enable-ctl' for this patch. (see comment #19).

see also bug 189433. Hmm, you managed to build with gtk2 + ctl.
Attached patch a patch updated to the trunk (obsolete) — Splinter Review
Nothing has changed since attachment 131157 [details] [diff] [review], but nsFontMetricsXft.cpp has
changed since September so that I had to update it. With this applied to
1.7alpha,
I can't reproduce the problem reported in comment #30 with Malaylalam + English
text. Both Latin letters and Malaylalam letters are rendered fine.  Could you
attach a sample file and a screenshot and tell me what font you used? 

 I'll also test Devanagari and Tamil, soon.
Attachment #131157 - Attachment is obsolete: true
This is to show where I'm now. 

Five different 'justification' options are tested. Without any justification,
it works fine. With 'text-align="justify"', combining character sequences don't
get rendered as they should. At least, I manage to get them rendered 'together'
even with 'letter-spacing: 0.2em'. Note that there's no difference between the
4th line and the 3rd line as far as 'intra-syllable' ('intra-grapheme-cluster')
spacing is concerned, which indicates that each grapheme cluster is treated as
a unit.  This is done at the 'drawing' stage, but it has to move up eventually
to 'nsText*" where 'spacing'  is added (see bug 229896)

Fonts used are Raghindi for Devanagari and  ThoolikaUnicode for Malaylalam.
This patch now can render 'justified' text as well.I'm gonna upload a
screenshot soon. With this patch, Mozilla's rendering ability is only limited
by Pango. That is, what Pango can render can be rendered by Mozilla, too. The
exception is Arabic/Hebrew because at the moment, I don't invoke nsICTLService
for Arabic/Hebrew assuming that they're 'pre-shaped'.
Attachment #138292 - Attachment is obsolete: true
This test case uses Raghindi for Devanagair and ThoolikaTraditionalUnicode for
Malaylalam. Both are freely available opentype fonts with GSUB/GPOS tables.
compare this with attachment 138315 [details] (sorry for the quality. I degraded it too
much). 
In this screenshot, you can see that all graphemes in justified text (four
cases) are rendered exactly the same way as unjustified text (the last case)
This appears to be using the mini-pango in moz instead of the system pango
installation?
added to comment #37

Does it possible to removes
  libmozpango.so
  libmozpango-thaix.so
  libmozpango-dvngx.so
  ...
from Mozilla, and uses system Pango instead?

pros:
- lesser module maintenance
- smaller footprint
- get fresher functionalities/fixes of Pango, including new languages
  (libmozpango updates is more delayed)

cons:
- broken if inappropriate version/none of Pango found.
re comment #37 & #38. 
This code does not use mini-pango, which is built ONLY when 'enable-ctl' is
turned on. I can't even build with 'enable-ctl' and 'gtk2'.  The very purpose of
this bug is to take advantage of the pango installed on the system instead of
using mozilla's own mini-pango. I wonder how that fact (which was made clear
from the beginning, comment #0) could be missed. 

If it did use minpango, it would not render anything other than Thai and
Devanagari because minipango supports only Thai and Devanagari with
'custom-encoded' fonts. However, my screenshot was taken of with __opentype_
fonts  for Devanagari and Malaylalam (raghindi and  ThoolikaUnicode).
This is to show that with this patch Mozilla-Xft (+gtk2) is better than
Mozilla-Win on Win2k/XP in complex script rendering.  Mozilla-Win currently
uses TextOutExW (or something like that) instead of Uniscribe APIs (used by MS
IE, a rough equivalent of Pango). When rendering justified text (text-align:
justify and/or letter-spacing), Mozilla-Win  hands over one Unicode character
at a time to TextOutExW. Without the 'context', TextOutExW just draws it with
the default glyph. Therefore, only the last example (unjustified) in attachment
1383328 is rendered correctly by Mozilla-Win on Win2k/XP because Mozilla passes
a sequence of characters to TextOutExW() with 'context' preserved.
re comment #30 :
jatin, I guess your problem is related to what I described in comment #16.
However, I couldn't reproduce it any more with type1 fonts (such as Luxi Serif).
I have about a month old CVS version of Pango, fontconfig and Xft on my Linux
box. I'm not sure which and what, but you may try updating those libraries to
the CVS head and see if that makes a difference.
I've just realized that I had not included intl/ctl2  in attachment 138331 [details] [diff] [review],
which may have confused blizzard. This patch includes intl/ctl2 (the heart of
the patch). My tree was somehow messed up and firebird didn't even start (I
clobbered and am building it again) so that I'm not 100% sure of this patch.
Attachment #138331 - Attachment is obsolete: true
Comment on attachment 138471 [details] [diff] [review]
patch  0.5(now includes intl/ctl2 directory)

I just confirmed that this patch works as intended (that is, I recovered
successfully from a slightly messed-up tree) Indian Linux distribution builders
can apply this patch to the cvs. 1.6branch would have a  couple of rejects, but
can be easily fixed manually.
test text for various languages
http://www.w3.org/2001/06/utf-8-test/quickbrown.html
Alias: CTL2
Summary: add a bridge to Pango from Xft rendering code for CTL : → add a bridge to Pango from Xft rendering code for CTL
well, that page doesn't have 'many' scripts covered. :-) The following page has
links to several Unicode example pages with a lot more comprehensive script
coverage.

http://www.i18nguy.com/unicode/unicode-example-intro.html

Some may not work for complex scripts because non-opentype fonts are specified
for complex scripts.
Yeah, that's exactly why I was confused.  Let me know when you're ready for me
to look at this, jshin.
Attached patch patch updated to the trunk (obsolete) — Splinter Review
Updated the patch to the trunk. It's become a moving target :-). 

blizzard, thanks. I'll let you know when I'm ready. BTW, is there any way to
build gtk1 (as opposed to gtk2) and pango together? I guess not, but if that's
possible, I can avoid littering with '#ifdef ... GTK2'. 

I also removed APIs for cursor movement and selection because I now think they
should be implemented in an XP-way per UTR #29 (see bug 229896). 
 

There's a little problem in that approach, though because UTR #29 and actual
platform-dependent drawing APIs may not agree on what makes up 'grapheme
clusters'. Anyway, that's a bit off-topic here.

With cursor movement/selection related APIs removed, there's little reason to
use xpcom because what's left of nsICTLService can be folded into
nsFontMetricsXft.cpp (unless we want to offer an alternative to Pango, e.g. ICU
or built-in pango-lite).
Attachment #138471 - Attachment is obsolete: true
With reference to Hebrew and Arabic, if Pango's shaping is at least as good as
our own (which seems likely, but I don't know) we could use it, setting a hint
on the rendering context as we do in Windows so that our own routines don't get
called.
http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsRenderingContextWin.cpp#629
pango with gtk1.2 will never work, so don't bother using ifdefs.
The latest trunk code definitely makes it better. But can you guys see this page
correctly? http://www.abhyuday.org/xprajna/html/jagat_prakash.php

i am attaching a file of what I see in justified text.
This is how I see this page
Your screenshot looks odd. However, as far as I can see, my build renders the
text exactly the same way as Pango does in gedit (needless to say, the
*inter*-cluster spacing is different because it's justified and gedit doesn't
have a notion of justification). 

BTW, why does the page have the following CSS?

FONT-FAMILY: "Arial Unicode MS", Mangal, GH-Amrita;

Arial Unicode MS has 'nominal' glyphs for Devanagari so that it'll deprive
Mangal (a opentype font for Devanagari) of a chance to render Devanagari unless
you set 'always use my fonts' in firebird (and __uncheck__ 'allows documents to
use other fonts' in mozilla). Why don't you turn on 'always use my fonts' in
firebird and set fonts for Devanagari to 'Raghindi' (or Mangal if you can use
it) to see if there's any difference? 

The above CSS snippet should be 

Font-Family: Mangal, Raghindi,  GH-Amrita, Arial MS Unicode, serif;

re comment #50, did you apply the latest patch (attachment 138670 [details] [diff] [review]) to the trunk
source and build with gtk2 + xft? 
Mozilla 1.6 with the patch in attachment 138471 [details] [diff] [review] applied is available at
http://i18nl10n.com/mozilla/mozilla-i686-pc-linux-gnu-gtk2-pango.tar.gz
It doesn't have an installer but just a gzipped tar ball (because I couldn't
build mozilla-installer in a tree separate from the source tree). I'm hoping to
move it to 'contrib' directory of ftp.mozilla.org. 

   
Couple of notes here:

The #ifdef GTK2 stuff is bordering on making a lot of this code unreadable.  I
want to find another way to do that or just say that it's time that we just say
that if you want to use Xft you need to use gtk2 as well.

The CTL interface to native pango seems like it doesn't need to be there.  If
we're going to use pango, and given my statement above, let's put it in the Xft
file instead of using the interface.  (Really, I want to have an
nsFontMetricsPango.cpp instead, but history is going to leave us with a
mis-named file.)
blizzard, I agree with you on both points. As for the second point, I already
mentioned it in comment #47 (the last paragraph). We can always pull it out of
nsFontMetricsXft.cpp if later we have another consumer (like printing code).

> I want to find another way to do that or just say that it's time that we 
> just say that if you want to use Xft you need to use gtk2 as well

As for the first point, do you have any idea as to how to avoid littering
nsFontMetricsXft.cpp with #ifdef's without going with the second option? One way
(not very attractive) would be to make a second copy of nsFontMetricsXft.* and
name them nsFontMetricsPango.* The first copy is for gtk1+xft without Pango and
the second is for gtk2+xft with Pango. Only one of them will be compiled
depending on the configuration option. This makes it hard to maintain the code
(we always have to do  twice when we fix things). Anyway, there's a precedent,
nsFontMetricsGtk.* and nsFontMetricsXlib.*.

I'm for the second option if others are not against it. There should be no
problem on Linux (but on other Unix, the situation may be different) That is, I
can't think of anyone installing gtk2 but not pango on Linux. How about
embedders and small devices? Then, I'm not sure. Their target markets may not
require pango.

re comment #41 on the problem with PS type 1 font. My suspicion was sorta
confirmed by Lin (linchear@rogers.com), the author of Pango module for Khmer
(not yet committed to the Pango cvs). After upgrading Xft to the cvs trunk, the
problem went away. BTW, he wrote to me that, as it should, my patch just works
for Khmer with Pango supporting Khmer. 
Just FYI, I made a 1.6 binary with my patch here applied and blizzard put it up
at ftp.mozilla.org. It's available at 

ftp://ftp.mozilla.org/pub/mozilla.org/mozilla/releases/mozilla1.6/contrib/mozilla-i386-pc-linux-gnu-gtk2-pango.tar.gz

Please, refer to 
ftp://ftp.mozilla.org/pub/mozilla.org/mozilla/releases/mozilla1.6/contrib/README-pango

for more information about the build.

While using it as my everyday browser, I found mozilla crashes when rendering
_some_ (not all) query results at bugzilla (the surest way is to query bugzilla
with layout-ctl as a component). With a debug build (identical to the optimized
build at ftp.mozilla.org), I got a stack trace, but the result is not very much
illuminating because it crashes where it seems to have nothing to do with Gfx
(and layout).  I'll try to figure out what's wrong.
I'd rather not fork the file again.  I suspect that there are people out there
using gtk1.2 and xft together (this is how the Mozilla in Red Hat 8 shipped, for
example) but I think that aside from that one instance it's time to assume that
if you've got a working xft setup you're going to have a working gtk2 setup as well.

That being said, let's make the plunge.

As for printing, let's talk about that for a bit.  I would love to make this
file the central location for turning a string into a glyph string that both
printing and on-screen rendering can use.  From what I've seen you've had a
little more experience understanding what needs to happen on the postscript side
(don't you need the characters as well as the fonts?)  Do you have an opinions
on how to do this all at once?
The above and below base marks (abvm and blwm) are not rendered correctly.
Mozilla seems to ignore abvm and blwm anchor points in pango+gtk2+xft+ctl enabled
build of mozilla.
I never finished submitting the bug, You'll need my font at
http://www.ece.uic.edu/~msripada/delme/Devanagari.ttf
I confirmed the problem reported in comment #61 while talking to Blade_X over
IRC. I mentioned the problem in comment #18, but forgot about it partly because
with Raghindi font, it's not so easy to tell the difference. Blade_X's font
(Devanagari : his font name is the same as the name of the script) makes it easy
to tell the difference. I guess we can move forward this with the understanding
that GPOS issue has to be fixed. 

re: comment #59.
1. Re: making gtk2 a requirement for Xft build : I'm willing to do it although I
have a little reservation about shutting down the door for embedders for small
devices who may want to stay with gtk1 + xft. BTW, SuSE has been including Gtk1
+ xft builds for as long as RedHat has been, but I guess they have no trouble
switching to Gtk2 + Xft. I noticed that Fedora core1 has gtk2 + xft (1.4)

2. Re: printing. Phew, that's tough. I wish I had a far better understanding as
to how PS printing works on Mozilla. A lot of things appear to get tangled and I
 haven't figured out a way to sort them out without the code duplication. Just
as much as you do,  I really like to make 'the file' the center of Unicode
strings to glyphs conversion for the screen rendering and printing.  
I don't know what's required specifically, either.  However, if we have a list
of glyphs, files for those glyphs and possibly the original string as well, that
might be enough.

Adding the local PS dude for some ideas on how we can use pango to choose our
fonts and the glyphs for postscript rendering.  What does postscript need?  And
in what form would that information be useful?
This probably doesn't answer all of your questions, but it's a start. I'd be
happy to try to answer any followup questions you have.

The basic PS printing module only knows about the 13 standard postscript fonts.
All of the font-metric (character width) information is built into the PS
module; there's some code to load FM data for additional fonts, but it's broken
(bug 203794). Each font is assumed to be available to the printer; there's no
provision to embed fonts in the postscript output. The freetype extension gives
access to additional fonts, and it will embed fonts in the postscript if needed.

For string width calculations, mozilla uses the builtin font-metric tables for
characters below 0x00ff. For other characters it just uses one of two hardcoded
width values (cryllic and everything else).

8-bit strings are printed by selecting the current western font (e.g.
Times-Roman) with a latin1 encoding vector, copying the raw text of the string
into the postscript, and issuing a PS "show" command to draw the text on the page.

To print 16-bit strings, mozilla uses a procedure called "unicodeshow".
unicodeshow receives the string as a series of 16-bit UCS-2 character codes; one
character at a time it tries to find an appropriate glyph and print it. Western
text is normally handled by looking up the character code in a hash table which
maps codes to glyph names; if the code is in the table, unicodeshow will try to
print the specified glyph from the current western font.

Unicodeshow can also look for the character code in a user-specified CID font,
and an additional user-specified font using a user-specified code-to-glyph map.
Both of these features require the user to set some hidden prefs and to add
information to the PS output before printing. I don't know how many people go to
the trouble of setting this up.

In short, mozilla only really knows about the current font for latin1
characters. It doesn't really look at the strings being printed (other than the
string width calculations); it just copies them into the postscript. Mozilla
doesn't know anything about the user-supplied fonts other than their names, and
it has no logic to switch to special fonts for math symbols, or to embed a font
containing the Euro character in the document when needed.
This should be discussed in bug 190031. I should have referred to it when I 
wrote the last paragraph of comment #62. I'm now adding my response there (bug 
190031).

I removed intl/ctl2 and included  'string to glyph' converter in
nsFontMetricsXft.cpp. It's not yet ready for review. I have to do some
clean-up. configure.in was changed so that xft + gtk1 is not supported any
more.
Attachment #138670 - Attachment is obsolete: true
Let me know when you're ready for me to look at this.  I'm chomping at the bit.
Attached patch update (obsolete) — Splinter Review
I'd not say it's easy to read this patch. Anyway, blizzard, can you take a
look? After a couple of iterations, it'll get clearer.
Attachment #141453 - Attachment is obsolete: true
Blocks: 233462
I'm being ambitious here. If blizzard can review soon and we can iterate a
couple of times, we might make it in 1.7final. If not, we'll do it in 1.8alpha.
Target Milestone: --- → mozilla1.7final
Blocks: 197649
Hello,

While the Pango-enabled Mozilla 1.6 binary in contrib works wonderfully for
complex text such as Devanagari, it tends to mess up right-to-left text such as
Arabic. Please see the accompanying attachment arabic-rendering.png for an
example. The upper screenshot is that of regular Mozilla 1.6 rendering the BBC
home page in Arabic, while the lower one shows the Pango-enabled Mozilla 1.6.
As can be seen, the Pango-enabled browser shows some of the Arabic text in a
left-to-right order, and some other lines in the correct right-to-left order.

Just a heads-up; once these patches hit mainstream, I suppose it wouldn't be a
good thing if they broke RTL support.
We'll have to fix that, to be sure.  (dbaron points out that there might be some
good information in bug 83958 about this.)

The BIDI code probably reorders the characters before passing it down to this
level.  Is pango re-reversing them again?  It's very possible.  dbaron claims
that there's a flag to indicate if the OS supports setting the directionality of
the text, but I haven't found it in the code yet.  This also gets tricky because
the way this patch is right now (if I'm reading it correctly,) we might end up
using pango for some of the text and not for other parts of the text.
Oh, while I'm thinking of it, it would be helpful if you could attach a very
small html test case that we could use that mis-renders with pango.
(In reply to comment #71)
> We'll have to fix that, to be sure.  ...good information in bug 83958 about this.)

That's for sure. I'll take a look at it. 


> The BIDI code probably reorders the characters before passing it down to this
> level.  Is pango re-reversing them again?  It's very possible.  

Yes, the BIDI code does that, which is why I tried to exclude Arabic and Hebrew
characters from Pango processing. However, my simplistic test (IsPreShaped() ?))
may miss some cases or mess up some non-trivial BIDI cases.

> dbaron claims that there's a flag to indicate if the OS supports setting 
> the directionality of the text, but I haven't found it in the code yet.  

  There is. See comment #48. However, I didn't touch that part because my
intention was to exclude Arabic/Hebrew from the Pango processing in this bug.
Later we can turn on that flag (as we do on Windows) when with Pango, we can do
the job as well as layout does.

> This also gets tricky because
> the way this patch is right now (if I'm reading it correctly,) we might end up
> using pango for some of the text and not for other parts of the text.

My intent is to avoid using Pango whenever it makes sense (e.g. ASCII code
path). Anyway, that could mess up BIDI cases. BTW, as you wrote, it'd be nice to
have a simple test to demonstrate the problem. I appreciated comment #70, but
without a simple test case clearly indicating which part gets rendered wrong, I
can't do much. 
Yeah, when I looked at the patch IsPreShaped() seemed _really_ hackish to me. 
Looking at the the code, it looks like we can use that hint for arabic since we
know that we're going to shape it.  That would solve this particular problem.

Do we need to add a couple more kinds of hints to indicate that we are going to
not use pango for the ASCII case?  Is that even a problem?  My immediate
response is that it isn't.

What about for hebrew for ltr and the various indian languages that require
shaping?  All of those are going to be outside of the ascii code path.  So the
more that I think about it the more that I think that adding those hints is just
the right way to go.
(In reply to comment #74)
> Yeah, when I looked at the patch IsPreShaped() seemed _really_ hackish to me. 
> Looking at the the code, it looks like we can use that hint for arabic since we
> know that we're going to shape it.  

 IsPreShaped() already checks for Arabic as well as for Hebrew. That's why I
need a simplified test case to see what's wrong with the test in IsPreShaped().


> Do we need to add a couple more kinds of hints to indicate that we are going to
> not use pango for the ASCII case?  Is that even a problem?  My immediate
> response is that it isn't.

 The current patch does use Pango if it's PRUnichar* we're dealing with no
matter what. It doesn't use Pango if we deal with 'char *' (which is supposed to
be a pure ASCII string.)  I'm not sure what your 'it' or 'that' is. Are you
worried that Pango is used when PRUnichar* is purely made up of ASCII characters ?  


(In reply to comment #73)
...
> 
>   There is. See comment #48. However, I didn't touch that part because my
> intention was to exclude Arabic/Hebrew from the Pango processing in this bug.
> Later we can turn on that flag (as we do on Windows) when with Pango, we can do
> the job as well as layout does.
> 
I don't understand the Linux-related technicalities here (I know the Windows
world much better), but perhaps you guys can clarify what I am looking for. (If
I should be looking at another bug, please tell me which one.) My requirement,
or at least hope, is that Mozilla should display the fully pointed and accented
text of the Hebrew Bible, on Linux (and all other platforms) as well as it now
can on Windows.

1) Is there currently any way to do this, with the standard Mozilla releases on
Linux? I have been told there is not, but I can't confirm this.

2) Could such support be provided with Pango? Would this in principle work with
an OpenType font like SBL Hebrew
(http://www.sbl-site.org/Resources/default.aspx) or Ezra SIL
(http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=EzraSIL_Home -
which by the way does not have Graphite tables)?

3) Is the best hope to see this support in Mozilla for this bug to be fixed and
then the restriction mentioned above (from comment #73) to be lifted?
I have enabled ctl2 on Mdk cooker Mozilla 1.6 package (using patch 0.5) and I'm
getting random crashing when loading some pages (gnomedesktop.org or when loggin
on bugzilla.gnome.org) with the following stacktrace (sorry, it is not a full
debug build) :

 Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 24445)]
0x41d1d45a in nsFontXftUnicode::GetFamilyName() () from
/usr/lib/mozilla-1.6/components/libgfx_gtk.so
(gdb) bt
#0  0x41d1d45a in nsFontXftUnicode::GetFamilyName() () from
/usr/lib/mozilla-1.6/components/libgfx_gtk.so
#1  0x08a327a8 in ?? ()
#2  0x41d2f5ab in nsICharRepresentable::GetIID()::iid () from
/usr/lib/mozilla-1.6/components/libgfx_gtk.so
jshin: do you have a version of this patch which works with the current
trunk?  I applied the latest version attached with some modifications,
but it crashes when bringing up a browser window:

#0  0x0775e4be in pango_font_description_free () from /usr/lib/libpango-1.0.so.0
#1  0x07763a03 in pango_context_set_font_description () from
/usr/lib/libpango-1.0.so.0
#2  0x012dec3d in ConvertUCS4ToGlyphs (aFamilyName=0x8b9b638 "Nimbus Roman No9
L", aFontSize=0, aSrc=0xbfe95114, 
    aSrcSpacing=0x0, aSrcLen=1, aP2T=0, aDest=@0xbfe920f0, aDestSpacing=0x0,
aDestLen=@0xbfe920ec)
    at /home/tor/src/moz-pango/mozilla/gfx/src/gtk/nsFontMetricsXft.cpp:3121
#3  0x012dc0fb in nsFontXftUnicode::GetTextExtents32 (this=0x8c56338,
aString=0xbfe95114, aLen=1, aIsASCII=0, 
    aGlyphInfo=@0xbfe95010) at
/home/tor/src/moz-pango/mozilla/gfx/src/gtk/nsFontMetricsXft.cpp:2151
#4  0x012dbbfe in nsFontXft::GetWidth32 (this=0x8c56338, aString=0xbfe95114, aLen=1)
    at /home/tor/src/moz-pango/mozilla/gfx/src/gtk/nsFontMetricsXft.cpp:2020
#5  0x012dacc9 in nsFontMetricsXft::GetWidthCallback (this=0x8b9abe0,
aString=0xbfe95114, aLen=1, aFont=0x8c56338, 
    aData=0xbfe98050) at
/home/tor/src/moz-pango/mozilla/gfx/src/gtk/nsFontMetricsXft.cpp:1709
#6  0x012da33b in nsFontMetricsXft::EnumerateXftGlyphs (this=0x8b9abe0,
aString=0xbfe95114, aLen=1, aCallback=
      {__pfn = 0x12dabba <nsFontMetricsXft::GetWidthCallback(unsigned int
const*, unsigned int, nsFontXft*, void*)>, __delta = 0},
aCallbackData=0xbfe98050) at
/home/tor/src/moz-pango/mozilla/gfx/src/gtk/nsFontMetricsXft.cpp:1529
#7  0x012da431 in nsFontMetricsXft::EnumerateGlyphs (this=0x8b9abe0,
aString=0xbfe9810e, aLen=1, aCallback=
      {__pfn = 0x12dabba <nsFontMetricsXft::GetWidthCallback(unsigned int
const*, unsigned int, nsFontXft*, void*)>, __delta = 0},
aCallbackData=0xbfe98050) at
/home/tor/src/moz-pango/mozilla/gfx/src/gtk/nsFontMetricsXft.cpp:1550
#8  0x012d9964 in nsFontMetricsXft::RawGetWidth (this=0x8b9abe0,
aString=0xbfe9810e, aLength=1)
    at /home/tor/src/moz-pango/mozilla/gfx/src/gtk/nsFontMetricsXft.cpp:1311
#9  0x012d875b in nsFontMetricsXft::CacheFontMetrics (this=0x8b9abe0)
    at /home/tor/src/moz-pango/mozilla/gfx/src/gtk/nsFontMetricsXft.cpp:949
#10 0x012d854b in nsFontMetricsXft::RealizeFont (this=0x8b9abe0)
    at /home/tor/src/moz-pango/mozilla/gfx/src/gtk/nsFontMetricsXft.cpp:883
#11 0x012d7bf3 in nsFontMetricsXft::Init (this=0x8b9abe0, aFont=@0x8b99da4,
aLangGroup=0x8b96e70, aContext=0x8b86a18)
    at /home/tor/src/moz-pango/mozilla/gfx/src/gtk/nsFontMetricsXft.cpp:539
...
any updates on this bug?  it would be a good one to get for 1.8 if we can.
Flags: blocking1.8a+
Yeah, I'm making some progress offline.
Attached patch patch update (obsolete) — Splinter Review
While blizzard is working on his patch (if i understand correctly, his patch is
actually taking the approach suggested in bug 214715 and is quite a radical
rewrite), here's my patch updated to the trunk for those who want to experiment
with it.
Flags: blocking1.8a1+ → blocking1.8a1-
Blocks: 245157
Jungshik, I use Gentoo and depend on your patch to get mozilla compiled so that
Devanagari shows up properly. I've filed a bug with Gentoo that the maintainers
say that they won't fix. 

Could you update your patch so that it applies cleanly to moz 1.7 ? appending
the build output below and am attaching the patch output as well.
Attached file failed patch attempt
I'll upload my 1.7 build (patched) soon to 'contributed' directory of
ftp.mozilla.org.
Comment on attachment 151878 [details]
failed patch attempt

if you're uploading a text file, please say so.
Attachment #151878 - Attachment mime type: application/octet-stream → text/plain
Attached patch patch for 1.7 branch (obsolete) — Splinter Review
This is for those who want to build their own 1.7branch builds with Pango patch
applied.
+IsPreShaped(const FcChar32 *aString, PRUint32 aLen)
+{
+    while (--aLen)

I often fear this kind of operations with unsigned integers. It will cause an
infinite loop if started with zero.
(In reply to comment #87)
> +IsPreShaped(const FcChar32 *aString, PRUint32 aLen)
> +{
> +    while (--aLen)
> 
> I often fear this kind of operations with unsigned integers. It will cause an
> infinite loop if started with zero.

  I haven't come across that problem in my build (that I use every day), but to
be safe, I'll replace PRUint32 with PRInt32 and |--aLen| with |--aLen >= 0| wtih
casting at call-sites. 

re: comment #70:

> While the Pango-enabled Mozilla 1.6 binary in contrib works wonderfully for
> complex text such as Devanagari, it tends to mess up right-to-left text such as
> Arabic.

  I can't find any difference between Mozilla-Gtk2 (1.7) and
Mozilla-Gtk2+Pango(1.7) when I tried BBC Arabic pages a couple of days ago. As
blizzard wrote, can you write a small test case that reveals the problem? 


>   I can't find any difference between Mozilla-Gtk2 (1.7) and
> Mozilla-Gtk2+Pango(1.7) when I tried BBC Arabic pages a couple of days ago. As
> blizzard wrote, can you write a small test case that reveals the problem? 

Mozilla-Gtk2+Pango 1.6 and 1.7 (two builds, one with --enable-ctl and one with 
--disable-ctl) render it the same way for me -- Arabic text is rendered in the 
wrong direction. It's the same for Urdu (which uses a similar script).

I'll see if I can narrow things down a bit when I have a little time. (I don't 
know Arabic, but I can read the script.)
OK, I've attached a simple test case that doesn't render well with the Pango
patch in either the contrib 1.6 binary with gtk2+xft+pango or with my build of
the 1.7 source with the latest patch (using gtk2+xft).
^ In the attached screenshots, the upper window is that of a patched Mozilla
1.7 and the lower one is that of stock Firefox 0.9 rendering the same page.
(The lower one is correct; note how the glyphs are laid out left-to-right
rather than right-to-left in the upper window.)
Attachment #144979 - Attachment is obsolete: true
(In reply to comment #91)
> (The lower one is correct; note how the glyphs are laid out left-to-right
> rather than right-to-left in the upper window.)

I just verified with a build of stock 1.7; its rendering looks like the lower 
window.
Thanks a lot for the test case. I'll try to see what's wrong. All those
characters belong to the range I'm filtering out (with IsPreShaped()) so that
they're supposed to be treated the old way (instead of being handled by Pango). 
+PRBool
+IsPreShaped(const FcChar32 *aString, PRUint32 aLen)
+{
+    while (--aLen)
+    {
+        if (0x0590 <= aString[aLen] && aString[aLen] <= 0x6ff || 
+            0xfd1b <= aString[aLen] && aString[aLen] <= 0xfdff) 
+            return PR_TRUE;
+    }
+      
+  return PR_FALSE;
 }

I'm not sure where the values in the second line of the condition come from. I
suspect you meant to write 0xfb1d and 0xfbff. You also probably want to test for
the range 0xfe70-0xfefc.
(In reply to comment #94)
> +PRBool
> +IsPreShaped(const FcChar32 *aString, PRUint32 aLen)

> +        if (0x0590 <= aString[aLen] && aString[aLen] <= 0x6ff || 
> +            0xfd1b <= aString[aLen] && aString[aLen] <= 0xfdff) 

> 
> I'm not sure where the values in the second line of the condition come from. I
> suspect you meant to write 0xfb1d and 0xfbff. You also probably want to 
> test for the range 0xfe70-0xfefc.

 Oh my gosh. I have no idea how I could mess that up. (actually, I guess I know.
I meant '0xfb1d' but exchanged 'b' and 'd' when typing it.) I meant to cover all
Hebrew and Arabic presentation form blocks in U+Fxxx (I still don't understand
how come I missed Arabic presentation form B). I'll fix that and update the
patch. Thanks a lot for catching it. BTW, you meant 0xfb1d - 0xfdff (instead of
0xfbff), right? 
(In reply to comment #95)
>  Oh my gosh. I have no idea how I could mess that up. (actually, I guess I know.
> I meant '0xfb1d' but exchanged 'b' and 'd' when typing it.) I meant to cover all
> Hebrew and Arabic presentation form blocks in U+Fxxx (I still don't understand
> how come I missed Arabic presentation form B). I'll fix that and update the
> patch. Thanks a lot for catching it. BTW, you meant 0xfb1d - 0xfdff (instead of
> 0xfbff), right? 

Are Hebrew and Arabic the only RTL scripts that come to Pango pre-shaped?
(Checking for Unicode codepoint range sounds like a maintenance nightmare in the
long run -- what if the ranges that come preshaped change? And are we sure these
are the only ranges that correspond to RTL scripts?)

Is there no way that's more maintainable (say a flag that indicates that Gecko
already knows this is an RTL script and has reordered the glyphs prior to
sending down to Pango)?

Incidentally, the BBC Arabic page uses windows-1256 encoding. Would this range
checking work for non-Unicode encodings, or is everything converted to UCS-2 (or
UCS-4) internally?
(In reply to comment #96)

> Are Hebrew and Arabic the only RTL scripts that come to Pango pre-shaped?
> (Checking for Unicode codepoint range sounds like a maintenance nightmare in the
> long run -- what if the ranges that come preshaped change? And are we sure these
> are the only ranges that correspond to RTL scripts?)


As I understand it, 0590-08FF and 00010800-00010FFF, plus the preshaped
Presentation Forms  which you have already discovered, are allocated or
roadmapped for RTL scripts. See http://www.unicode.org/roadmaps/. Only some of
these scripts have complex joining behaviour, and in principle all characters
(even the Presentation Forms) can take combining marks.

Microsoft ran into problems with Uniscribe in that at first it was used only for
certain scripts. Later they realised that its shaping capabilities were needed
even for Latin script. So the latest version of Uniscribe (comes with Office
2003) processes all characters, not just official complex script ones. Let's not
make the same mistake here. So my advice would be not to filter out anything as
preshaped but to let Pango decide which characters it needs to work on. As the
Presentation Forms are in fact deprecated, there is little point in treating
them sepcially just for performance reasons.
> 
> Is there no way that's more maintainable (say a flag that indicates that Gecko
> already knows this is an RTL script and has reordered the glyphs prior to
> sending down to Pango)?

Reordering and complex script shaping should be treated as independent
processes. There are RTL scripts which are not complex, and there are LTR
scripts which are complex.
> (In reply to comment #94)
> BTW, you meant 0xfb1d - 0xfdff (instead of 0xfbff), right? 

I didn't, but now that I understand better what you were trying to do, I agree
that 0xfdff is what you want.

(In reply to comment #96)

> Are Hebrew and Arabic the only RTL scripts that come to Pango pre-shaped?
> (Checking for Unicode codepoint range sounds like a maintenance nightmare in the
> long run -- what if the ranges that come preshaped change? And are we sure these
> are the only ranges that correspond to RTL scripts?)

I had the same concern when I filed bug 244036  :)

> Incidentally, the BBC Arabic page uses windows-1256 encoding. Would this range
> checking work for non-Unicode encodings, or is everything converted to UCS-2 (or
> UCS-4) internally?

Yes, to UTF-16 to be precise.(In reply to comment #95)
(In reply to comment #96)
> (In reply to comment #95)

> Are Hebrew and Arabic the only RTL scripts that come to Pango pre-shaped?
> (Checking for Unicode codepoint range sounds like a maintenance nightmare in the
> long run -- what if the ranges that come preshaped change? And are we sure 

This is not a long-term solution as my 'XXX' comment in the patch indicates. 
Eventually, I want to delegate as much as possible to Pango. 
Needless to say, Hebrew and Arabic are not the only RTL scripts, but they're the
only scripts that reach Gfx pre-shaped and reordered in the _current_ mozilla
partly because only for them, presentation-forms are encoded in Unicode.  

re comment #97
>  Let's not make the same mistake here. So my advice would be not to filter 
> out anything as preshaped but to let Pango decide which characters 
> it needs to work on.

  Gee, you apparently haven't read my 'XXX' comment right above |IsPreShaped| in
attachment 151928 [details] [diff] [review]. I'm very well aware of the need.  Please, note that this is
an interim solution only. blizzard has been working on a better solution. I'm
updating this patch and am going to release my build for Indian people to use
because I want to give them something to use right now instead of a year(?) 
from now. 
With the change per smontagu's comment, now it works for Arabic as well.
Attachment #151928 - Attachment is obsolete: true
This is the latest patch for the trunk.
Attachment #141894 - Attachment is obsolete: true
Attachment #148391 - Attachment is obsolete: true
The 1.7 patch is causing crash in epiphany 1.2.6 at startup (probably when
loading a page), which complains about "free".

This code gets the code that I just checked in building.  Pretty simple
changes, actually.  Should be easy to get into the tree.
Attachment #155008 - Flags: review?(jshin)
Comment on attachment 155008 [details] [diff] [review]
glue patch for pango code

I don't think this patch belongs here especially considering that you're
planning to add nsFontMetricsPango. Why don't you attach it to bug 214715? 
Besides, it seems like some unrelated patches crept in.
Attachment #155008 - Flags: review?(jshin)
I've been having a problem with rendering Devanagri script on a build with this
patch (215219.patch.1.7branch2). While certain web pages render properly (such
as the BBC Hindi page at 
http://bbc.co.uk/hindi ) , some others like Google ( http://www.google.co.in/hi
) or my colleague's blog
( http://blog.sarai.net/users/ravikant ) use the font that unpatched builds do
to render Hindi text ( example at
http://www.geocities.com/kream77/badrendering.html - this is how the last
webpage renders on my machine). The really frustrating thing is that it worked
just fine in Mozilla 1.6 . 

If the patch can't be updated to fix this inconsistency in rendering, how do I
find out which is the font that is being used to render hindi text by default ? 
(In reply to comment #106)
> I've been having a problem with rendering Devanagri script on a build with this
> patch (215219.patch.1.7branch2). While certain web pages render properly (such
> as the BBC Hindi page at 
> http://bbc.co.uk/hindi ) , some others like Google ( http://www.google.co.in/hi

  You may write to google to specify 'lang=hi' in their Hindi page. I wrote
several times to add 'lang=xx' to all of their localized pages, but they'd not
listen. 

> ) or my colleague's blog
> ( http://blog.sarai.net/users/ravikant ) use the font that unpatched builds do
> to render Hindi text ( example at


> The really frustrating thing is that it worked just fine in Mozilla 1.6 .

  What exactly do you mean by Mozilla 1.6? Do you mean 1.6 without any patch. (
in that case, is it Xft or X11core build? ) or 1.6 with the patch for this bug? 
 
> If the patch can't be updated to fix this inconsistency in rendering, how do I
> find out which is the font that is being used to render hindi text by default ? 

  It depends on fonts you have on your system and fonts specified in the web
page if there's no 'lang=hi' in the web page. Without 'lang="hi"' and with fonts
like 'Arial, Helvetica' specified in the web page, Mozilla will use the first
font that supports Devanagari character which may not be suitable for rendering
text written in Devanagari because it doesn't have any opentype table for
Devanagari. 

  If you have pango 1.5.1(?) or later, you may want to build with
'--enable-pango' configuration option. See bug 214715.
Re: comment # 107
> What exactly do you mean by Mozilla 1.6? Do you mean 1.6 without any patch. (
in that case, is it Xft or X11core build? ) or 1.6 with the patch for this bug? 

1.6 with the patch for this bug. 

> Without 'lang="hi"' and with fonts like 'Arial, Helvetica' specified in the
web page, Mozilla will use the first font that supports Devanagari character 

Aha! how do I find out which font this would be ? I'm desperate enough to try
renaming it and seeing if anything bad happens or symlinking it to the proper
Devanagari font.

The webpage http://www.geocities.com/kream77/badrendering.html has been updated
with examples of the way that different browsers render it - Mozilla, including
Jungshik Shin's contributed build, Konqueror (perfectly) and Opera (imperfectly,
again) and in a few hours i'll try and have an example of a patched Mozilla 1.6
rendering it properly.
re: comment # 106

>  If you have pango 1.5.1(?) or later, you may want to build with
'--enable-pango' configuration option.

If I upgrade to >=pango-1.5.1, and configure mozilla with --enable-pango, do i
still need to apply this patch ?
No, you shouldn't need this patch.
I'm still trying to track down the bug... now even a patched version of Mozilla
1.6 does not render properly. I did a fresh install and installed mozilla
*without* adding any Hindi fonts but obviously there's some builtin font which
has some **** Devanagari characters that is installed along with xorg-x11.
Wish there was some way to find out which font it is. 

It's obviously now a problem with my dependencies - because indic characters in
1.6 used to work earlier. Here are it's dependencies, along with the package
versions they belong to. Can someone tell me what I should downgrade ? Also, is
this thing finally fixed in Moz 1.8 ?

ldd /usr/lib/mozilla/mozilla-bin 
        linux-gate.so.1 =>  (0xffffe000)
        libmozjs.so => /usr/lib/mozilla/libmozjs.so (0x40021000)
net-www/mozilla-1.7.3
        libplds4.so => /usr/lib/mozilla/libplds4.so (0x400a1000)
net-www/mozilla-1.7.3
        libplc4.so => /usr/lib/mozilla/libplc4.so (0x400a4000) net-www/mozilla-1.7.3
        libnspr4.so => /usr/lib/mozilla/libnspr4.so (0x400a9000)
net-www/mozilla-1.7.3
        libpthread.so.0 => /lib/libpthread.so.0 (0x400dd000)
sys-libs/glibc-2.3.3.20040420
        libdl.so.2 => /lib/libdl.so.2 (0x400ee000) sys-libs/glibc-2.3.3.20040420
        libgtk-x11-2.0.so.0 => /usr/lib/libgtk-x11-2.0.so.0 (0x400f2000)
x11-libs/gtk+-2.4.9
        libgdk-x11-2.0.so.0 => /usr/lib/libgdk-x11-2.0.so.0 (0x403bc000)
x11-libs/gtk+-2.4.9
        libatk-1.0.so.0 => /usr/lib/libatk-1.0.so.0 (0x4042c000) dev-libs/atk-1.6.1
        libgdk_pixbuf-2.0.so.0 => /usr/lib/libgdk_pixbuf-2.0.so.0 (0x40446000)
x11-libs/gtk+-2.4.9
        libpangoxft-1.0.so.0 => /usr/lib/libpangoxft-1.0.so.0 (0x4045c000)
x11-libs/pango-1.6.0
        libpangox-1.0.so.0 => /usr/lib/libpangox-1.0.so.0 (0x40461000)
x11-libs/pango-1.6.0
        libpango-1.0.so.0 => /usr/lib/libpango-1.0.so.0 (0x4046d000)
x11-libs/pango-1.6.0
        libgobject-2.0.so.0 => /usr/lib/libgobject-2.0.so.0 (0x404a4000)
dev-libs/glib-2.4.5
        libgmodule-2.0.so.0 => /usr/lib/libgmodule-2.0.so.0 (0x404dc000)
dev-libs/glib-2.4.5
        libglib-2.0.so.0 => /usr/lib/libglib-2.0.so.0 (0x404e0000)
dev-libs/glib-2.4.5
        libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x4055d000)
x11-base/xorg-x11-6.7.0
        libm.so.6 => /lib/libm.so.6 (0x40624000) sys-libs/glibc-2.3.3.20040420
        libstdc++.so.5 =>
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.4/libstdc++.so.5 (0x40647000)
sys-devel/gcc-3.3.4
        libgcc_s.so.1 => /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.4/libgcc_s.so.1
(0x40710000) sys-devel/gcc-3.3.4
        libc.so.6 => /lib/libc.so.6 (0x40719000) sys-libs/glibc-2.3.3.20040420
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
sys-libs/glibc-2.3.3.20040420
        libXrandr.so.2 => /usr/X11R6/lib/libXrandr.so.2 (0x4082a000)
x11-base/xorg-x11-6.7.0
        libXi.so.6 => /usr/X11R6/lib/libXi.so.6 (0x4082e000) x11-base/xorg-x11-6.7.0
        libXinerama.so.1 => /usr/X11R6/lib/libXinerama.so.1 (0x40837000)
x11-base/xorg-x11-6.7.0
        libXext.so.6 => /usr/X11R6/lib/libXext.so.6 (0x4083a000)
x11-base/xorg-x11-6.7.0
        libXft.so.2 => /usr/X11R6/lib/libXft.so.2 (0x40848000)
x11-base/xorg-x11-6.7.0
        libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0x4085a000)
media-libs/freetype-2.1.5
        libfontconfig.so.1 => /usr/lib/libfontconfig.so.1 (0x408c4000)
media-libs/fontconfig-2.2.2
        libXcursor.so.1 => /usr/X11R6/lib/libXcursor.so.1 (0x408eb000)
x11-base/xorg-x11-6.7.0
        libXrender.so.1 => /usr/X11R6/lib/libXrender.so.1 (0x408f5000)
x11-base/xorg-x11-6.7.0
        libpangoft2-1.0.so.0 => /usr/lib/libpangoft2-1.0.so.0 (0x408fd000)
x11-libs/pango-1.6.0
        libexpat.so.0 => /usr/lib/libexpat.so.0 (0x40924000) dev-libs/expat-1.95.7
        libz.so.1 => /lib/libz.so.1 (0x40943000) sys-libs/zlib-1.2.1
Here's Mozilla's latest nightly incorrectly rendering Hindi text on my system.
Blocks: 270012
Blocks: 60546
For complex script users, I updated my patch for suite 1.7/aviary 1.0 branches.
This patch also includes the patch for bug 234182 (ported to branches). 
Btw, with this patch applied, I can't reproduce the problem reported in
attachment 152059 [details]. The rendering of Arabic with this patch is identical to that
without this patch.
Is there a patched build for this? I can't get the patch to apply cleanly to
mozilla-1.7.11. Is anyone interested in the .out file of the patch reject? 
We don't need this in the current code base.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → WONTFIX
Component: Layout: CTL → Layout: Text
QA Contact: arthit → layout.fonts-and-text
comment 116 is a copy of comment 3, with spam added
comment 117 is a copy of part of comment 71, with spam added
comment 118 is a copy of comment 82, with spam added
Flags: needinfo?(askaboutwp)
(In reply to David Baron [:dbaron] ⌚️UTC+13 (busy, returning 8 February) from comment #119)
> comment 116 is a copy of comment 3, with spam added
> comment 117 is a copy of part of comment 71, with spam added
> comment 118 is a copy of comment 82, with spam added

This bug was WONTFIXED in 2008, and since then (with a sharp increase in the past year) practically nothing but spam. Leave it at that, or set restricted access?
Flags: needinfo?(dbaron)
The spam would probably just move elsewhere, but maybe dolske can set it.
Flags: needinfo?(dbaron) → needinfo?(dolske)
Restricting per comment 134.
Flags: needinfo?(dolske)
Restrict Comments: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: