Open Bug 759284 Opened 8 years ago Updated 6 years ago

Arabic OpenType ‘locl’ feature not applied to European digits

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set

Tracking

()

People

(Reporter: dr.khaled.hosny, Unassigned)

Details

Attachments

(2 files)

I've an italic font with left-slanted and right-slanted common characters to be used when they are inside Arabic or Latin text runs, respectively, through OpenType ‘locl’.

Since the 0-9 digits have Common script property, they should take the script of sounding text, however it seems Firefox is always classifying them as Latin applying the ‘locl’ feature registered for ‘latn’ script unconditionally, unlike other common characters.

In the attached test file the European digits should be slanted to the left, but they are slanted to the right (the default with no ‘locl’ applied is upright digits), compared to parentheses that are handled correctly by the same feature.
Attachment #627913 - Attachment mime type: text/plain → text/html
Yes, this is a limitation that I think is pretty hard to fix within the current Gecko text architecture. We do script itemization (identifying runs of text that should be treated as a single script, including Common or Inherited characters that take on a script based on their context) within gfxFontGroup::InitTextRun. This will normally cause digits to be treated as part of the same script run as the surrounding letters.

However, in the case of digits within RTL text, the text has already been split into direction runs before gfxFontGroup::InitTextRun is called (as an individual gfxTextRun has just a single directionality). This means that the digits are handled separately from the surrounding Arabic letters, and the script itemization process doesn't "see" the necessary context - and in this case we default to treating the Common-only run as Latin.

So to fix this, we'd need to either pass some kind of context in to gfxFontGroup::InitTextRun, so that it can resolve Common/Inherited characters to an appropriate script even if there are no script-specific characters within the text run being constructed, or else invert the architecture such that we do script itemization _before_ breaking the text into direction runs.

Note that there'd still be a problem in the case where the document *only* contains Common characters (even after considering the document as a whole, not just the direction run containing the digits). In this case, how would we decide which script to apply for shaping purposes?
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: x86 → All
Version: 12 Branch → Trunk
Don't we already have code that passes in just such a context to gfxFontGroup::InitTextRun? I mean the TEXT_INCOMING_ARABICCHAR flag at http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFont.cpp#3416
We have that (for the bidi.numeral stuff), but it's not really sufficient; e.g. it wouldn't help if the number falls at the beginning of the entire text (think of a numbered paragraph). And we need something more generalized, as similar issues could apply to any script, not just Arabic.
IMO it makes sense to do script itemization before bidi (and I vaguely recall this is what Pango is doing).

Also I don’t think falling back to Latin for a Common-only run is a good idea (at least as far as OpenType features are concerned), IMO it makes more sense to use ‘DFLT’ script if the font have it else the script of the font if there is only one else ‘latn’ if present else picking the first one. OK, this might complicate things too much, and I’m not sure such complexity is justifiable.
BTW, any idea if the issue described in bug 208309 comment 27 is related to this, or should I open a new bug for it?
Does XITS Math want to use an 'rtlm' feature to produce the right-to-left root sign? If so, it sounds like that may be another manifestation of the problem that we don't really know which script to apply when handling a "common" character in isolation.

Actually, if you're using 'rtlm' to produce the "mirrored" glyphs, I think you should include this feature in all script tags, including 'latn' as well as 'arab'. Consider the case of Latin-script text with direction overrides: you'd want mirroring, but it'll only happen if the feature is present in the 'latn' script.
Right, ‘rtlm’ is used in XITS Math for all mirrored chars including roots, I had it with ‘DFLT’ script and I thought that should make it work for any script (shouldn’t it?), but you are right, after including it in ‘latn’ script it now works.
(In reply to Khaled Hosny from comment #8)
> Right, ‘rtlm’ is used in XITS Math for all mirrored chars including roots, I
> had it with ‘DFLT’ script and I thought that should make it work for any
> script (shouldn’t it?), but you are right, after including it in ‘latn’
> script it now works.

'DFLT' should have worked if we used that for Common-only runs, I think, but we currently prefer 'latn'; we probably should reconsider this.

However, you'd _also_ want to include the feature in the individual script systems, as the Common characters will be "adopted" by specific scripts when there are also letters etc in the context. 'DFLT' would only be used when a specific script cannot be resolved (or isn't supported in the font).
I see, I think I misunderstood the rule of ‘DFLT’ script.
You need to log in before you can comment on or make changes to this bug.