Closed Bug 953385 Opened 6 years ago Closed 6 years ago

Need a way to specify OpenType script tags

Categories

(Core :: Graphics: Text, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: jkitch, Assigned: jkitch)

References

(Blocks 1 open bug, )

Details

Attachments

(3 files, 8 obsolete files)

Currently there is a way to set the opentype language tag to use (css font-language-override) but there does not appear to be a way to override the script tag.  

This will be helpful for Cambria Math, which stores its 'ssty' font feature setting glyphs within the 'math' script, but not the scripts that Firefox selects/defaults to.  Font-language-override is not suitable as there is no corresponding mathematical language.
Actually this may be a duplicate of bug 407059.
(In reply to James Kitchener (:jkitch) from comment #1)
> Actually this may be a duplicate of bug 407059.

Yeah, I think for this case it would be better to handle the 'math' script code within MathML code, since I'm guessing that it's use should probably be automatic without the need for user intervention.

+1 to marking this a duplicate.
(In reply to James Kitchener (:jkitch) from comment #1)
> Actually this may be a duplicate of bug 407059.

This is not a duplicate. Bug 407059 is for reading information from the MATH table (especially operator constructions) while here we need to implement OpenType Tags used by math handling engine to get access to particular glyph variants. Microsoft's "MATH table and OpenType Features for
Math Processing" document defines the following tag:

math: Script tag to be used with the MATH table features. The only language system supported with this tag is the default.

ssty: Script Style

flac: Flattened Accents over Capitals. This feature provides flattened forms of accents to be used over high-rise bases such as capitals.

dtls: Dotless Forms This feature provides dotless forms for Math Alphanumeric characters.

What we need here is really the "math" script tag (I think all the others are accessible from the CSS font-feature-settings property), but it's probably not necessary to define a CSS property for that ; we only want the script to be set automatically by the MathML code. James, what did you get from your investigation (442637 comment 17)?
Summary: Need a way to specify a preference for opentype script selection → Need a way to specify OpenType script tags
The problem is simpler than I first thought.

In gfxHarfBuzzShaper::ShapeText(), one of the arguments is a default script tag, which is passed onto Harfbuzz by this line.

https://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxHarfBuzzShaper.cpp#949
> hb_buffer_set_script(buffer, scriptTag);

I performed a little experimental modification of the code, and setting this to an enum value of HB_TAG('M','a','t','h') (sic) is sufficient to get the ssty glyphs of Cambria Math to display.

The script tag is passed in as an argument to ShapeText(), and can be traced back to gfxFontGroup::InitTextRun().  

It will also require a one-line change to the Harfbuzz code to add the 'math' script to the enum in hb_common.h.
Depends on: 442637
Assignee: nobody → jkitch.bug
Blocks: 947654
Currently the plan for the bug is to set a textrun flag to indicate the circumstances when the OpenType 'math' script tag should be used (textFrames belonging to MathML tokens).

The code will then test for this flag within gfxFontGroup::InitTextRun() and replace whatever script code which would otherwise have been used with MOZ_SCRIPT_MATH.  If the font in question happens to support 'math' script tag, Harfbuzz will select it and all is well.

If it doesn't support the 'math' script tag, Harfbuzz will fall back to looking for 'latn', 'dflt' and 'DFLT' or one based on the language tag passed to it.

My concern is for fonts that don't support these fallback options, or situations where the language tag is incorrect.

Is this likely to be a problem?

I could try to detect whether the font actually supports the math tag before setting it, but there are some like Cambria Math which do support it, but only for a subset of their glyph substitutions.
Flags: needinfo?(khaledhosny)
(In reply to James Kitchener (:jkitch) from comment #5)
> Is this likely to be a problem?

If the font used for math does not have the relevant features in ‘math’ or ‘DFLT’ script, then IMO it is either broken or not a OpenType math font, in either case I don’t think there is much Gecko can do here.
Flags: needinfo?(khaledhosny)
Attached patch Part 0: Harfbuzz changes (obsolete) — Splinter Review
Script tags are stored within an enum in Harfbuzz, so to process the 'math' script, it needs to be added to the enum.  Once this is done, no other Harfbuzz change is needed for it to work.

I realise this is third party code and that I should send it to the Harfuzz project to get them to incorporate it, but some feedback confirming (or not) the necessity of this patch would be appreciated before I contact the Harfbuzz team.
Attachment #8362208 - Flags: feedback?(roc)
Attachment #8362208 - Flags: feedback?(mozilla)
Attached patch Part 1: intl changes (obsolete) — Splinter Review
To support the 'math' OpenType script (a Microsoft proposal), I would like to add it to the script lookup table.

The 'math' script has no characters associated with it (selection is handled by a textrun flag) and I do not believe anything else is needed for proper support. 

This patch also updates the data generated from xidmodifications.txt to version 6.3.0.  I do not pretend to understand what the changes are doing, but I followed the instructions at https://wiki.mozilla.org/I18n:Updating_Unicode_version.  

I've sent an earlier iteration of this series of patches to tryserver (https://tbpl.mozilla.org/?tree=Try&rev=3ccd33e9a28e) and there does not appear to be any fallout from these changes.
Attachment #8362212 - Flags: review?(smontagu)
Attached patch Part 2: gfx changes (obsolete) — Splinter Review
This adds a new flag and uses it to determine whether or not the 'math' script should be used (only for MathML token elements: <mtext>, <mi> etc).


Strictly speaking, overriding the Latin script in the 8-bit case is optional, but not doing so is inefficient.
Attachment #8362213 - Flags: review?(roc)
Attached patch Part 3: layout changes (obsolete) — Splinter Review
This code sets the flag that the gfx code relies upon.

I tidied up the flag propagation code and made it partially nested, as the single char MI case is always in a MathML token frame.
Attachment #8362215 - Flags: review?(roc)
Attached patch Part 4: tests (obsolete) — Splinter Review
Static and dynamic tests, and non-MathML content tests to ensure it doesn't have undesirable side effects.

This adds a new font which puts the ssty font feature under the 'math' script.  I believe the two fonts ssty setting fonts (this one and ssty.woff) should coexist, as the latter is testing that fallback script selection works.
Attachment #8362216 - Flags: review?(fred.wang)
Comment on attachment 8362216 [details] [diff] [review]
Part 4: tests

Review of attachment 8362216 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.

You forgot to hg add the mathssty.woff font.

::: layout/reftests/mathml/mathscript-2-ref.html
@@ +14,5 @@
> +
> +  <!-- Demonstrate that it has no effect outside MathML -->
> +  <div style="font-family: 'mathssty';" >DD</div>
> +
> +  <!-- Demonstrate thata it works within MathML -->

that

::: layout/reftests/mathml/mathscript-2.html
@@ +27,5 @@
> +    {
> +      // Does nothing to non-MathML
> +      document.getElementById("div0").appendChild(document.createTextNode("A"));
> +      document.getElementById("mtext0").appendChild(document.createTextNode("A"));
> +      // Does something to MathML

the comment should be before the second appendChild
Attachment #8362216 - Flags: review?(fred.wang) → review+
(In reply to James Kitchener (:jkitch) from comment #9)
> Created attachment 8362213 [details] [diff] [review]
> Part 2: gfx changes
> 
> This adds a new flag and uses it to determine whether or not the 'math'
> script should be used (only for MathML token elements: <mtext>, <mi> etc).

I think <mtext> should not use the ‘math’ script as it is just regular text that happens to be inside math formula.
(In reply to James Kitchener (:jkitch) from comment #8)
> To support the 'math' OpenType script (a Microsoft proposal), I would like
> to add it to the script lookup table.

Note that the 'math' script is tag is already standard and is said to correspond to "Mathematical Alphanumeric Symbols". See section 6.4.1 "Scripts tags" of the ISO/IEC 14496-22:2009
Comment on attachment 8362208 [details] [diff] [review]
Part 0: Harfbuzz changes

Nope.  hb_script_t will remain scripts only.  However, it's hb_tag_t for a reason.  You should use hb_ot_tag_to_script(HB_TAG('m','a','t','h')).
(In reply to Behdad Esfahbod from comment #15)
> Comment on attachment 8362208 [details] [diff] [review]
> Part 0: Harfbuzz changes
> 
> Nope.  hb_script_t will remain scripts only.  However, it's hb_tag_t for a
> reason.  You should use hb_ot_tag_to_script(HB_TAG('m','a','t','h')).

We can add math and other non-Unicode scripts to hb-ot-tag.h though.
Comment on attachment 8362212 [details] [diff] [review]
Part 1: intl changes

Review of attachment 8362212 [details] [diff] [review]:
-----------------------------------------------------------------

Please regenerate this with the old xidmodifications.txt (http://www.unicode.org/Public/security/revision-05/xidmodifications.txt). There are some changes in the 6.3.0 version which I think are wrong, but I haven't had any response to my query on the Unicode list https://www.mail-archive.com/unicode@unicode.org/msg33862.html. In the meantime, let's stick to the previous version.
Attachment #8362212 - Flags: review?(smontagu) → review-
Attached patch Part 1: intl changes (obsolete) — Splinter Review
This one is generated with the old xidmodifications.txt file.

It also updates the generator script to work with the latest version of Harfbuzz.

Harfbuzz decided to rename HB_SCRIPT_CANADIAN_ABORIGINAL to HB_SCRIPT_CANADIAN_SYLLABICS, which broke the perl script. Renaming our script name to match theirs broke the perl script in other ways.

However they thoughtfully includes a #define to map the old name to the new one in hb-deprecated.h, so I have parsed that file for such #defines in order to correct the script names as appropriate.  

Note that this is my first time I have used perl, so greater attention may be warranted.
Attachment #8362212 - Attachment is obsolete: true
Attachment #8363557 - Flags: review?(smontagu)
Comment on attachment 8363557 [details] [diff] [review]
Part 1: intl changes

Cancelling review.  Need to account for comment 15.
Attachment #8363557 - Flags: review?(smontagu)
Attachment #8362213 - Flags: review?(roc)
Sorry about the CANADIAN_ABORIGINAL change.
Attachment #8362208 - Flags: feedback?(mozilla)
Blocks: 963079
Comment on attachment 8362208 [details] [diff] [review]
Part 0: Harfbuzz changes

Review of attachment 8362208 [details] [diff] [review]:
-----------------------------------------------------------------

jfkthame should review this when it's ready
Attachment #8362208 - Flags: feedback?(roc)
Attached patch Part 1: gfx changes (obsolete) — Splinter Review
The Changes to Harfbuzz or intl were unnecessary. 

The knowledge that 'math' script should be used can be propagated by setting a shapedText flag.
Attachment #8362208 - Attachment is obsolete: true
Attachment #8362213 - Attachment is obsolete: true
Attachment #8363557 - Attachment is obsolete: true
Attachment #8365737 - Flags: review?(jfkthame)
Reflagging for review as the changes are non-trivial relative to the previous patch.

This excludes mtext from the elements which use math script, per comment 13.
Attachment #8362215 - Attachment is obsolete: true
Attachment #8365740 - Flags: review?(roc)
Attached patch Part 3: testsSplinter Review
Review comments addressed as well as minor changes to test that mtext is properly excluded.
Attachment #8362216 - Attachment is obsolete: true
(In reply to Behdad Esfahbod from comment #15)
> Comment on attachment 8362208 [details] [diff] [review]
> Part 0: Harfbuzz changes
> 
> Nope.  hb_script_t will remain scripts only.  However, it's hb_tag_t for a
> reason.  You should use hb_ot_tag_to_script(HB_TAG('m','a','t','h')).

Should we change the hb_buffer_{set,get}_script APIs to explicitly take an hb_tag_t parameter rather than hb_script_t, then? If the API takes hb_script_t, which is an enum, then in principle it seems wrong to be passing any value (such as 'Math') that is not defined as a value of the enumeration.

And likewise the corresponding field in hb_segment_properties_t.
Flags: needinfo?(mozilla)
(In reply to Jonathan Kew (:jfkthame) from comment #25)
> (In reply to Behdad Esfahbod from comment #15)
> > Comment on attachment 8362208 [details] [diff] [review]
> > Part 0: Harfbuzz changes
> > 
> > Nope.  hb_script_t will remain scripts only.  However, it's hb_tag_t for a
> > reason.  You should use hb_ot_tag_to_script(HB_TAG('m','a','t','h')).
> 
> Should we change the hb_buffer_{set,get}_script APIs to explicitly take an
> hb_tag_t parameter rather than hb_script_t, then? If the API takes
> hb_script_t, which is an enum, then in principle it seems wrong to be
> passing any value (such as 'Math') that is not defined as a value of the
> enumeration.

A brief check of standards suggests that this would lead to undefined behavior if anyone tried to use a tag whose value happened to be outside the (min, max) range of the defined enumerators for the type.

So if you want to keep hb_script_t as the type here, perhaps we should add sentinel values for 0 and 0xFFFFFFFF to the enumeration, to ensure we have well-defined behavior for any tag.
Comment on attachment 8365737 [details] [diff] [review]
Part 1: gfx changes

Review of attachment 8365737 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxHarfBuzzShaper.cpp
@@ +941,5 @@
>      hb_buffer_set_direction(buffer, isRightToLeft ? HB_DIRECTION_RTL :
>                                                      HB_DIRECTION_LTR);
> +    hb_script_t scriptTag;
> +    if (aShapedText->Flags() & gfxTextRunFactory::TEXT_USE_MATH_SCRIPT) {
> +      scriptTag = (hb_script_t) HB_TAG('M','a','t','h');

I think Behdad's recommendation (comment 15) of using hb_ot_tag_to_script(HB_TAG('m','a','t','h')) would be better here, to avoid exposing a dependency on the internal representation used for hb_script_t. The additional cost of calling the function should be negligible compared to the rest of the work we're doing.

Or you could even do
  static const hb_script_t mathScript = hb_ot_tag_to_script(...);
  scriptTag = mathScript;
so that the function is only called once.
Attached patch Part 1: gfx changes (obsolete) — Splinter Review
Feedback from comment 27 applied.

I didn't realise at first the two functions lived in different libraries.
Attachment #8365737 - Attachment is obsolete: true
Attachment #8365737 - Flags: review?(jfkthame)
Attachment #8367166 - Flags: review?(jfkthame)
Comment on attachment 8367166 [details] [diff] [review]
Part 1: gfx changes

Review of attachment 8367166 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxHarfBuzzShaper.cpp
@@ +796,5 @@
>  
>  static hb_font_funcs_t * sHBFontFuncs = nullptr;
>  static hb_unicode_funcs_t * sHBUnicodeFuncs = nullptr;
> +static const hb_script_t sMathScript = 
> +  hb_ot_tag_to_script(HB_TAG('m','a','t','h'));

nit: lose the trailing space, and use 4-space indent (dominant local style in thebes) until the day of gecko's Grand Code Reformatting.

@@ +951,5 @@
> +      // a better default?)
> +      scriptTag = HB_SCRIPT_LATIN;
> +    } else {
> +      scriptTag = hb_script_t(GetScriptTagForCode(aScript));
> +    }

4-space indents.
Attachment #8367166 - Flags: review?(jfkthame) → review+
White space fixes applied

https://tbpl.mozilla.org/?tree=Try&rev=5275d303ee67

Do the concerns in comment 25 and comment 26 block landing?
Attachment #8367166 - Attachment is obsolete: true
(In reply to James Kitchener (:jkitch) from comment #30)
> Do the concerns in comment 25 and comment 26 block landing?

No need to block for that. The only "extra" tag (not explicitly listed as a value of hb_script_t) we're dealing with here is 'Math', which falls within the range of the enumeration, so AIUI it's safe to use.

I do think harfbuzz should fix this (in fact, I'll submit a patch there) to avoid any future risk, in the event someone tries to use other custom tags, but we can move forward here anyhow.
(In reply to Jonathan Kew (:jfkthame) from comment #25)
> (In reply to Behdad Esfahbod from comment #15)
> > Comment on attachment 8362208 [details] [diff] [review]
> > Part 0: Harfbuzz changes
> > 
> > Nope.  hb_script_t will remain scripts only.  However, it's hb_tag_t for a
> > reason.  You should use hb_ot_tag_to_script(HB_TAG('m','a','t','h')).
> 
> Should we change the hb_buffer_{set,get}_script APIs to explicitly take an
> hb_tag_t parameter rather than hb_script_t, then? If the API takes
> hb_script_t, which is an enum, then in principle it seems wrong to be
> passing any value (such as 'Math') that is not defined as a value of the
> enumeration.
> 
> And likewise the corresponding field in hb_segment_properties_t.

No, that defeats the purpose of having hb_script_t to begin with.  We already have hb_script_from_iso15924_tag().  We can add a macro, but that wouldn't be doing any cleanups the way this function does.  WDUT?
Flags: needinfo?(mozilla)
Keywords: checkin-needed
Blocks: 1069929
You need to log in before you can comment on or make changes to this bug.