Last Comment Bug 614476 - gfxPangoFonts sets up grapheme cluster boundaries twice
: gfxPangoFonts sets up grapheme cluster boundaries twice
Status: RESOLVED FIXED
[fixed in bug 703100]
:
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: x86_64 Linux
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
:
Mentors:
Depends on: 553981 614468 703100
Blocks: 474068 597147
  Show dependency treegraph
 
Reported: 2010-11-23 20:28 PST by Karl Tomlinson (back Dec 13 :karlt)
Modified: 2012-01-11 12:39 PST (History)
2 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Karl Tomlinson (back Dec 13 :karlt) 2010-11-23 20:28:55 PST
Since bug 597147, gfxPlatform::SetupClusterBoundaries first does a rough
cluster setup, which is not yet really used.  It could be useful in the future
for:

  * font selection for the entire cluster
    (instead of for characters within the cluster)

  * choosing a point to split up long text runs

  * gfxHarfBuzzShaper (bug 569770), which would not perform the more thorough
    setup below.

Second, if Pango is used for shaping, pango_break is called.
pango_break first calls pango_default_break and then sometimes uses a
script-specific PangoEngineLang to improve the breaking of grapheme clusters.

Currently the second SetupClusterBoundaries in gfxPangoFonts.cpp does not
clear the first cluster setup, but merely removes breaks.  That's probably OK
because gfxPlatform::SetupClusterBoundaries is conservative about forming
clusters.

pango_default_break has a more complete cluster breaking algorithm than
gfxPlatform::SetupClusterBoundaries.

This is most complex for Hangul Jamo, Thai, and Lao scripts.

It also correctly handles U+FF9E HALFWIDTH KATAKANA VOICED SOUND MARK and
U+FF9F HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK, which could easily be added
to gfxPlatform::SetupClusterBoundaries in bug 614468.

Pango accidentally treats ZWNJ and ZWJ like GB_ControlCRLF which means that they
always have grapheme boundaries on both sides (and
gfxPlatform::SetupClusterBoundaries has the same bug).
http://git.gnome.org/browse/pango/tree/pango/break.c?id=1.28.3#n659

Pango has indic and arabic PangoEngineLangs.  The arabic lang engine only
alters PangoLogAttr::backspace_deletes_character, which we do not use.  The
indic lang engine does alter PangoLogAttr::is_cursor_position, which we do
use.

For some scripts, we do not need to bother with the more complete pango_break.
(This was originally pointed out by jfkthame in bug 569770 comment 32.)

However, given we are moving to gfxHarfBuzzShaper for simple scripts (and hopefully more scripts in the future), I doubt adding code to decide when to pango_break will be worthwhile as we often won't be calling pango_break anyway.
Comment 1 Karl Tomlinson (back Dec 13 :karlt) 2010-11-23 20:29:50 PST
With current default preferences, the HarfBuzz shaper would not be used for
Hangul, Thai, Lao, or indic scripts, so we need only use pango_break when using
Pango for shaping.

This is helpful because setting up another PangoAnalysis for pango_break would
be inconvenient from gfxPlatform::SetupClusterBoundaries.
Comment 2 Karl Tomlinson (back Dec 13 :karlt) 2010-12-06 12:28:56 PST
Bug 474068 may be one good reason to skip pango_break when unnecessary.
Comment 3 Karl Tomlinson (back Dec 13 :karlt) 2012-01-11 12:39:30 PST
Fixed by not calling pango_break in any situations.

Note You need to log in before you can comment on or make changes to this bug.