Closed Bug 1719544 Opened 4 years ago Closed 4 years ago

Unify Intl APIs in gfx/thebes/gfxHarfBuzzShaper.cpp

Categories

(Core :: Internationalization, task, P3)

task

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox94 --- fixed

People

(Reporter: gregtatum, Assigned: jfkthame)

References

Details

(Whiteboard: [i18n-unification], [i18n-unification-help-wanted] )

Attachments

(2 files)

Work: Medium
What it is: UNormalizer2 and UText

Blocks: 1719664
Whiteboard: [i18n-unification-help-wanted]
Whiteboard: [i18n-unification-help-wanted] → [i18n-unification], [i18n-unification-help-wanted]
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED

gfxHarfBuzzShaper needs two low-level normalization data accessors that it currently gets from UNormalizer2:

  • composePair - given two Unicode characters, return their single-character composed equivalent if any
  • decomposeRaw - given a Unicode character, return its one- or two-character single-level decomposition if any (not recursive decomposition, like full NFD normalization would do)

Because these are used only in relation to canonical normalization, they do not have to deal with arbitrary-length decompositions. All canonical decompositions in Unicode are either singletons (like U+212A KELVIN SIGN -> U+004B LATIN CAPITAL LETTER K) or decompose to a pair of characters (which may themselves have further decompositions, like U+01D8 LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE which decomposes to <U+00FC LATIN SMALL LETTER U WITH DIAERESIS, U+0301 COMBINING ACUTE ACCENT>, where U+00FC in turn also has a pairwise decomposition). But these low-level APIs are only concerned with a single decomposition step.

Although arguably these are not really "string" operations -- they're more like queries about individual characters -- I think it probably makes most sense to include them in mozilla::intl::String alongside the higher-level normalization APIs.

The low-level APIs I'm suggesting we add here are very specific, single-purpose methods, deliberately not taking a parameter to indicate whether to use Canonical or Compatibility decompositions; this simplifies the APIs as multi-char (> 2 components) decompositions need not be handled, and avoids a test and branch at runtime for flexibility that we don't need. The use of these methods by harfbuzz is quite perf-sensitive, so I want to keep them as simple and lightweight as possible.

Once these are provided by mozilla::intl, gfxHarfBuzzShaper will no longer need either UNormalizer2 or UText.

Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b5d8d32d8326 part 1 - Add low-level normalization-data accessors ComposePairNFC and DecomposeRawNFD to mozilla::intl::String. r=platform-i18n-reviewers,dminor https://hg.mozilla.org/integration/autoland/rev/4ab330369412 part 2 - Convert gfxHarfBuzzShaper normalization callbacks from direct ICU access to mozilla::intl::String APIs. r=platform-i18n-reviewers,dminor
Backout by ccozmuta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9786ceb8a9ab Backed out 2 changesets for causing bustages on gtest.h:1445:11. CLOSED TREE
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/83baefac5b38 part 1 - Add low-level normalization-data accessors ComposePairNFC and DecomposeRawNFD to mozilla::intl::String. r=platform-i18n-reviewers,dminor https://hg.mozilla.org/integration/autoland/rev/984ec26e4217 part 2 - Convert gfxHarfBuzzShaper normalization callbacks from direct ICU access to mozilla::intl::String APIs. r=platform-i18n-reviewers,dminor
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch
Flags: needinfo?(jfkthame)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: