Last Comment Bug 157967 - Make Gecko interoperate better with advanced typography systems such as ATSUI, Uniscribe, Pango & STSF
: Make Gecko interoperate better with advanced typography systems such as ATSUI...
Status: NEW
[line-breaking]
: intl
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: All All
: P3 normal with 39 votes (vote)
: Future
Assigned To: Nobody; OK to take it and work on it
:
:
Mentors:
Depends on: 168884
Blocks: atsui 188294 uniscribe 288439 378271
  Show dependency treegraph
 
Reported: 2002-07-17 13:03 PDT by Simon Fraser
Modified: 2009-08-23 19:32 PDT (History)
56 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Simon Fraser 2002-07-17 13:03:12 PDT
Gecko's text rendering code currently makes some very basic assumptions about
the way that text is flowed, and individual lines of text are rendered. For
example, it has some very basic linebreaking rules that seem to be hard-coded
for Western languages, and assumes that every Unicode character can be
represented by a single rendered glyph.

These assumptions make it impossible to make use of advanced typography features
available in the text rendering engines on some platforms (e.g. Mac OS X -- see
bug 105800). For instance, it's currently impossible to have text rendered with
ligatures, because this breaks selection. This is a serious issue, because
displaying text with ligatures is essential for correctly rendering text in some
language systems, like Arabic.

In addition, the way layout currently does its own linebreaking, and then calls
the gfx APIs to draw text in small chunks, makes for a very inefficient use of
the lower-level platform text drawing APIs. For example, ATSUI on Mac expects
you to use the ATSUI calls to lay out entire paragraphs at once, which can then
be drawn in one go. This allows ATSUI do do all the text layout within that
paragraph, enabling it to do language-sensitive word and linebreaking,
right-to-left layout (potentially), and even to handle selection. Right now,
when rendering with ATSUI, we're driving a paragraph-layout-API with word-sized
chunks of text, which is very inefficient.

These problems indicate that the core text layout code needs overhaul. It would
be nice to see this happen for Gecko 2.0.
Comment 1 Christopher Hoess (gone) 2002-07-17 17:04:55 PDT
How would this affect bugs like bug 56652 and bug 7969? Also, IIRC, Unicode has
a line-breaking algorithm (which I'm sure we don't follow)--how would putting
this type of layout in the hands of ATSUI in the like affect compliance, if we
did pull our line-breaking up to spec?
Comment 2 Greg K. 2002-10-21 12:57:35 PDT
Is there a Gecko 2 bug this could block?
Comment 3 Greg K. 2002-11-01 16:06:11 PST
Maybe a better summary for this bug would be, "Make Gecko interoperate better
with advanced typography systems such as ATSUI".
Comment 4 Greg K. 2002-11-02 11:50:15 PST
Would any other platforms benefit from such refinement? Do any of them have
anything like ATSUI?
Comment 5 Boris Zbarsky [:bz] (still a bit busy) 2002-11-02 11:59:20 PST
Not yet, but they likely will at some point.  In any case, the code that would
need changing is XP.
Comment 6 Greg K. 2002-11-18 22:52:00 PST
Anyone at Netscape agree with my suggestion in comment 3?
Comment 7 Simon Fraser 2002-11-18 23:02:26 PST
greg: sounds good to me
Comment 8 Greg K. 2002-11-19 10:59:42 PST
Changing summary.
Comment 9 Greg K. 2002-12-04 20:41:35 PST
This probably blocks bug 121540. Marking as such.

It seems to me as though bug 121540 blocks two basic things for Mac Mozilla:
good Unicode text display, and extended typography functions such as automatic
ligatures. The first is obviously more important than the latter.

Simon, is this assigned to the right person? Perhaps, in the context of
improving Unicode support on Mac, this deserves a better target than Future.
(Sure, things like shipping Chimera and transitioning Fizzilla to Mach are
higher priority, but this deserves some consideration following those two things.)
Comment 10 Ali Hussnain Shah 2003-04-06 16:48:04 PDT
it is not only one of the decorative problems and not only related to the Mac
world. at this stage there are lot of html sites using ligatures in arabic
script, where ligatures are indispensable to correct pronunciation. so such
sites are renderned properly only by MS Explorer. Additionally, this has also
impact on writing emails in arabic script with ligatures. If this problem is not
solved soon, lot of users using arabic based scripts would prefer to use MS
products.
Comment 11 Boris Zbarsky [:bz] (still a bit busy) 2003-04-29 10:42:17 PDT
.
Comment 12 Jungshik Shin 2003-07-17 03:09:20 PDT
I agree that we need to move in the direction suggested here.

> very basic linebreaking rules that seem to be hard-coded for Western languages,

  Mozilla's line-breaking code is based on JIS X 4501 and works more or less for
Western scripts, CJ, K (I'm on purpose separating CJ and K here because they
behave differently when it comes to line breaking) and Thai. However, JIS X 4501
is not as extenstive as Unicode Line breaking algorithm(UTR #14). We have to
move on to Unicode Linebreaking. (see bug 56652 comment #18 and bug 206152)

Anyway, this bug is kinda 'meta-meta' bug in a sense.

 On Win32, we have a similar issue with Uniscribe and opentype fonts for complex
script support (rendering, text selection, and caret movement). On Linux, Pango
is similar to ATSUI. So does Sun's STSF. Currently, on Win32 and Linux, we use
our own font-specific glyph-based solution to render complex scripts (bug
176290, bug 177877, bug 203052, bug 204286, etc), but using the system APIs is
certainly desirable. 
 
As for Arabic and Hebrew, they're dealt with differently from other complex
scripts. They're handled in nsTextFrame by IBM_BIDI code that maps Unicode
Arabic strings to strings of Arabic presentation forms. (Aside from BIDI,
_modern_ Hebrew - as opposed to Biblical Hebrew- doesn't seem to require
ligatures). This is not the best possible solution, but it works (doesn't it on
Mac OS X?)
Using Opentype fonts on Win32/Unix and ??? - ooops.. the name of a very advanced
font format for MacOS X is escaping me at the moment- on Mac OS X would be better.  

SILA (http://sila.mozdev.org) uses a third advanced font format (SIL's
Graphite), but it's only for Win32 at the moment. 
Comment 13 Simon Montagu :smontagu 2003-07-18 01:09:45 PDT
Your comments on Bidi are not quite correct. On Win32 with Arabic and/or Hebrew
support enabled, we bypass most of our own routines for reordering and shaping
and hand the raw text to the Windows APIs (but not OpenType yet). I'm not sure
if we ended up doing the same thing on Mac or not.
Comment 14 Jungshik Shin 2003-07-23 20:17:53 PDT
I didn't mean to misrepersent what Mozilla does with 'BIDI scripts' on
'BIDI-enabled platform(s)'. I knew Mozilla bypasses most of its own BIDI-related
processing on platform(s) with 'native BIDI support'(Win2k/XP and Middle East
version of  Win9x/ME [1]), but my description was too coarse-grained..

[1] I remember seeing a block of code by which 'native BIDI capability' is
'detected', but can't find it at the moment. So, I'm not sure whether Mozilla
actually makes a distinction between ME version of Win9x/ME and non-ME version
of Win9x/ME. I'm even less sure of Mac OS.  
Comment 15 Greg K. 2003-12-06 04:09:26 PST
> the name of a very advanced font format for MacOS X is escaping me at the moment

OS X uses OpenType and the old TrueType GX advanced font formats*. ATSUI
probably abstracts both of them via its' API.

* Skia, Apple Chancery, and Hoefler Text are TTGX from the System 7+QDGX days
and still ship with OS X; Zapfino is another advanced bundled font, possibly
OpenType.
Comment 16 Jungshik Shin 2003-12-06 07:11:23 PST
well, it's not opentype but AAT(Apple Advanced Typography). AAT uses 'mort'
truetype table and it's more advanced than opentype gsub/gpos tables in a sense
because 'mort' table is kinda 'self-contained' while gsub/gpos requires more
intervention from the upper layer. ATSUI doesn't take advantage of  opentype
GSUB/GPOS tables. 
Comment 17 Greg K. 2004-06-03 06:51:04 PDT
See also bug Bug 218887 about Uniscribe.
Comment 18 Greg K. 2004-11-18 01:03:41 PST
Blizzard is apparently working on Pango support:
http://www.0xdeadbeef.com/html/2004/11/

I wonder if he's run into any of the problems discussed here or on bug 121540.
Comment 19 Simon Fraser 2005-01-23 11:34:25 PST
Is this bug on the Gecko 2.0 radar?
Comment 20 Jungshik Shin 2005-01-23 15:57:40 PST
It seems not. Pango work is being done in bug 214715. See also bug 260663. 
Comment 21 Greg K. 2005-01-27 16:00:16 PST
I'd be interested in contributing towards a bug bounty for this so bug 121540
can be fixed. Is there any point in my doing so?
Comment 22 Greg K. 2005-05-17 02:28:08 PDT
Is this dependent on, duplicate of, or superseded by the Cairo initiative?
Comment 23 Jungshik Shin 2005-05-17 05:29:18 PDT
(In reply to comment #22)
> Is this dependent on, duplicate of, or superseded by the Cairo initiative?

I'm not sure what those who work on Cairo initiative have in mind. roc's blog
has some interesting information about Cairo and ATSUI and Pango. I'm not sure
what Cairo has to offer on Windows. (e.g. whether it uses/relies on Uniscribe)

Here's an excerpt from roc's blog at
http://weblogs.mozillazine.org/roc/archives/2005/05/cairo_progress.html
---------------------------
What I read about your work of porting to Cairo really makes me think that the
i18n aspect of it is a really delicate part, that you must not get wrong. The
MoFo product are used in many part of the world, in many languages, and have a
great support for that. Cairo must not mean a regression.
Make sure you synchronize with the i18n team as much as needed, it must not be
an afterthought.
-----------------
Do you about Sila ? Even if the project stopped, it included interesting change
to mozilla GFX to enable powerful mechanism to support really complex scripts :
http://sila.mozdev.org/

Posted by: jmdesp at May 10, 2005 02:34 AM

Owen Taylor is working on cairo, and he wrote Pango which is a really strong
i18n library, so i18n needs are being taken care of. My current cairo code uses
Pango so i18n is probably already at least as good as what's on the trunk.
---------------
Comment 24 Robert O'Callahan (:roc) (email my personal email if necessary) 2005-05-17 14:21:12 PDT
cairo has a really nice approach to fonts. Basically, it stays out of the way.
It gives you a way to draw a string of glyphs from a platform font; you have to
specify the glyphs and their positions. (There is a "toy" API that takes a UTF8
string and tries to draw it, but we won't be using that.) So the whole
UTF8->glyphstring and glyph positioning process remains under our control; we
should probably move to Pango for this on Linux/Unix, but on Mac and Windows we
should probably use ATSUI and Uniscribe. We can port over our existing code for
use when those libraries aren't available.
Comment 25 Simon Fraser 2005-05-17 19:24:45 PDT
> you have to specify the glyphs and their positions

This makes it sound like Cairo internally will draw the glyphs one by one. If
it's using ATSUI to do that, it's going to be very very slow.
Comment 26 Robert O'Callahan (:roc) (email my personal email if necessary) 2005-05-17 19:32:02 PDT
Ask tor, he's doing it.
Comment 27 Jungshik Shin 2005-05-17 19:38:07 PDT
(In reply to comment #25)
> > you have to specify the glyphs and their positions
> 
> This makes it sound like Cairo internally will draw the glyphs one by one. If
> it's using ATSUI to do that, it's going to be very very slow.

Indeed, it does. Avoiding that is the whole point of this bug. The same is more
or less true of Uniscribe and Pango. 

Comment 28 Robert O'Callahan (:roc) (email my personal email if necessary) 2005-05-17 21:07:54 PDT
(In reply to comment #25)
> > you have to specify the glyphs and their positions
> 
> This makes it sound like Cairo internally will draw the glyphs one by one. If
> it's using ATSUI to do that, it's going to be very very slow.

From a quick look at the Apple docs, it looks like we should be using ATSUI to
convert text to a glyph string and a list of glyph positions (via
ATSUDirectGetLayoutDataArrayPtrFromTextLayout). Then we can pass that into
cairo. The cairo Quartz backend can call CGContextShowGlyphsWithAdvances to show
the glyphs. (Well, this assumes CGGlyph and ATSGlyph are the same thing...)
Comment 29 Simon Fraser 2005-05-17 21:44:12 PDT
(In reply to comment #28)

> From a quick look at the Apple docs, it looks like we should be using ATSUI to
> convert text to a glyph string and a list of glyph positions (via
> ATSUDirectGetLayoutDataArrayPtrFromTextLayout).

The description of ATSUDirectGetLayoutDataArrayPtrFromTextLayout warns of some
performance implications, but my guess is that we can get arrays of glyph
positions from ATSUI and have then rendered by Core Graphics. It does seem
somewhat counter intuitive, since the fastest code path would just be to have
ATSUI render those glyph arrays itself, rather than going back through cairo.

> Then we can pass that into
> cairo. The cairo Quartz backend can call CGContextShowGlyphsWithAdvances to 
> show the glyphs.

CGContextShowGlyphsWithAdvances is only available in 10.3 or later, so using
this would change our base OS requirements.

> Well, this assumes CGGlyph and ATSGlyph are the same thing...

Both are unsigned shorts, but I can't find any docs that specify whether they
are the same.

Glyph rendering is one of the two high-cost areas of text rendering. The other
major performance area with ATSUI is text layout (i.e. glyph positioning). It
sounds like Cairo will know even less about text layout than gfx does, placing
more of the burden on Gecko, and making this bug all the more applicable. To get
good performance from ATSUI, we should be delegating more of the text layout
logic to it, and we need to be able to cache ATSUTextLayout and/or ATSUStyles.
Comment 30 Robert O'Callahan (:roc) (email my personal email if necessary) 2005-05-17 22:30:09 PDT
(In reply to comment #29)
> Glyph rendering is one of the two high-cost areas of text rendering. The other
> major performance area with ATSUI is text layout (i.e. glyph positioning). It
> sounds like Cairo will know even less about text layout than gfx does, placing
> more of the burden on Gecko, and making this bug all the more applicable. To
> get good performance from ATSUI, we should be delegating more of the text
> layout logic to it, and we need to be able to cache ATSUTextLayout and/or
> ATSUStyles.

I totally agree we need to do this, but we still need to figure out the shape of
the interfaces.

See https://bugzilla.mozilla.org/show_bug.cgi?id=288439#c17

Let's go to the wiki for more discussion.
http://wiki.mozilla.org/index.php?title=Gecko2:NewTextAPI
I think you're probably right that we don't want to separate measurement from
drawing at our cross-platform-API level. Then your question can be reduced to
"how are we going to do high-performance text rendering with ATSUI and cairo on
the Mac?" and some Mac-specific hacks can be considered.
Comment 31 Tony Mechelynck [:tonymec] 2005-11-14 02:59:33 PST
I don't follow the reasoning here above well enough to understand whether or not my bug 316249 is a manifestation of this one, or if it is soething else. Can someone please check? Thanks.
Comment 32 Arthit Suriyawongkul 2006-05-07 14:24:47 PDT
related: Bug 336959 - Line Breaking with Pango/Uniscribe

Note You need to log in before you can comment on or make changes to this bug.