Closed Bug 728866 Opened 12 years ago Closed 12 years ago

support unicode composition and decomposition during harfbuzz shaping


(Core :: Layout: Text and Fonts, defect)

Not set





(Reporter: jfkthame, Assigned: jfkthame)


(Depends on 1 open bug)



(3 files)

Harfbuzz has added support for (limited/specialized) normalization during shaping, but currently we are only taking partial advantage of this - we get reordering of combining marks into a standard order for display (e.g. see bug 662055), but we don't get canonical (de)composition of precomposed accented characters, as that depends on a couple of new callbacks that we haven't yet implemented.

The main advantage of implementing this is that some fonts - especially older ones - have a number of precomposed forms, but do not have good (or any) dynamic mark positioning code. Hence, if a page includes a decomposed letter+diacritic sequence, the mark positioning may be poor, whereas if we support composition in the shaping process, we can use the better-looking precomposed glyph.

To support this, harfbuzz needs two callbacks, one to implement a single step of decomposition, and one to compose two characters into one. Unfortunately, although our normalization component has the relevant data tables (though somewhat obsolete, see bug 728180), it does not currently provide this level of API, but we can pretty easily expose it.
For harfbuzz shaping purposes, I don't want the extra indirection of going through an nsIUnicodeNormalizer; the idea here is to expose the low-level data needed in the most direct way possible.
Attachment #598847 - Flags: review?(smontagu)
Unfortunately, using standard canonical composition does not give us good rendering of Hebrew with old fonts (such as those from WinXP), because the Hebrew presentation forms (U+FBxx) with dagesh, etc., are excluded from composition. However, it appears that Uniscribe et al use these glyphs if available, and I think we should do the same.

With this patch, I finally get the desired rendering of your testcase from bug 727736 using the WinXP versions of Arial and Courier, for example.
Attachment #598849 - Flags: review?(smontagu)
Blocks: 722139
Attachment #598847 - Flags: review?(smontagu) → review+
Comment on attachment 598847 [details] [diff] [review]
part 1 - expose low-level character composition/decomposition functions in nsUnicodeNormalizer

Review of attachment 598847 [details] [diff] [review]:

Hmm, on second thoughts perhaps Decompose should have a different name to clarify that it isn't doing a full decomposition. r=smontagu stands either way.
Attachment #598848 - Flags: review?(smontagu) → review+
Comment on attachment 598849 [details] [diff] [review]
part 3 - add support for composition of Hebrew presentation forms

Review of attachment 598849 [details] [diff] [review]:

Looks good.
Attachment #598849 - Flags: review?(smontagu) → review+
Humm.  Ideally I want to support both Arabic and Hebrew presentation-form shaping in HB.  Lets start thinking about that.
Depends on: 733376
Depends on: 756850
I added the hebrew decompositions to harfbuzz upstream now.
You need to log in before you can comment on or make changes to this bug.