Open Bug 961482 Opened 10 years ago Updated 2 months ago

Use OpenType MATH MathItalicsCorrectionInfo and MathKernInfoRecord to improve script placements

Categories

(Core :: MathML, defect, P5)

defect

Tracking

()

People

(Reporter: fredw, Unassigned)

References

(Blocks 1 open bug, )

Details

Follow-up of bug 945254.

Microsoft's document does not seem to mention mmultiscripts so let's do as if there were at most 4 scripts (pre-superscripts, post-superscripts, pre-subscripts, post-subscripts). That won't change anything for basic msub/msup/msubsup.

- apply the same italic correction to all post-superscripts (bug 945254)
- use the lowest bottom of pre-superscripts to determine the correction height; and apply the same TopRightKern shift to all the pre-superscripts. Similarly for the three other scripts.

Microsoft's document also only mentions italic correction / kerning when the base is a single glyph. Perhaps we can generalize that for base with multiple-letters (use the leftmost and rightmost glyph). We should also handle more complex base (e.g. <mrow><mrow><mi>x</mi></mrow></mrow> should be equivalent to <mi>x</mi>).

There is also an ambiguity with left/right in RTL mode here. Should it be interpreted as leading/trailing as in the MathML spec?
Depends on: 945254
(In reply to Frédéric Wang (:fredw) from comment #0)
> Microsoft's document also only mentions italic correction / kerning when the
> base is a single glyph. Perhaps we can generalize that for base with
> multiple-letters (use the leftmost and rightmost glyph).

Sounds reasonable.

> There is also an ambiguity with left/right in RTL mode here. Should it be
> interpreted as leading/trailing as in the MathML spec?

I think not, OpenType positioning is always in LTR direction. Italic correction should follow the text direction, though, since it is a single value with no inherent direction.
Priority: -- → P5
Blocks: 963136
We currently have two types of italic correction:

- (right) italic correction = max(0, right bearing - advance width)
- left italic correction = max(0, -left bearing)

These are used in nsMathMLContainerFrame.cpp to add space before and after a frame. The fact that the successive frames are slanted or upright does not seem to be taken into account, although it is for the additional interframe spacing we add. The MATH spec does not mention the left italic correction (so perhaps we can just keep the current value) and says that the italic correction is only applied when a sequence of slanted characters is followed by a straight character.

In nsMathMLmunderoverFrame.cpp, (half of) the italic correction is used to shift the under/over scripts horizontally. This is always done except when an align=left/right attribute is used or for accentover/under scripts. The MATH spec only mentions that shift for large operators so I'm not sure this should be kept for arbitrary base. See also bug 963131 for the new concept of "Stretch Stacks" that has different constants for the vertical gaps and shifts. The munderover code also layouts in two steps: first it attaches the overscript on the base, then consider base+overscript as an "anonymous base" and finally it attaches the underscript on this "anonymous base". So for the overscript we use the italic correction of the base but for the underscript we use the italic correction of the "anonymous base" http://hg.mozilla.org/mozilla-central/annotate/8f4ecbf938cd/layout/mathml/nsMathMLmunderoverFrame.cpp#l563, which may be different if the overscript is wider than the base. This should be probably be changed if we want to follow the MATH spec.

Finally, the italic correction is used (or used to be used) in nsMathMLmmultiscriptsFrame.cpp (bug 945254). Here we position the subscript next to the base and shift the superscript horizontally by the italic correction. For large operators, the italic correction is substracted not added (bug 407059 comment 71). The temporary hack I have done for bug 407059 is to reduce the advance width in nsMathMLChar so that we end up with the correct positioning, however this should be revised here. I wonder if some workaround should be kept for STIX Word...

Regarding the kern values, I'm trying to understand the MATH spec and the XeTeX code. For the selection of values, I don't understand at all how the XeTeX code can even work: http://sourceforge.net/p/xetex/code/ci/master/tree/source/texk/web2c/xetexdir/XeTeXOTMath.cpp#l452. Cambria Math seems to have at most two kern values so the effect is probably not obvious anyway. The accesses to element in the height/kern tables do not seem correct to me and it seems that the loop always stops at i = 0 or i = 1. My understanding is that 
1) the kern pointer should be shift by sizeof(MathValueRecord) * count before being used
2) you only need one loop for int i = 0; i < count; i++ (perhaps a binary search). Return kern[i] if height <= height[i] and kern[count] at the exit.

For the determination of the kern values, the spec's wording "Kern the default horizontal positions by the minimum of sums of those values at the correction heights for the base and for the sub/superscript." didn't seem really clear to me, so I can trust what XeTeX does. However, for the "height of the bottom for the bounding box of superscript relative to the base glyph and the height of the top of the base relative to the super" I understand that it should be correct_height_bot = shift + depth(script) and correct_height_top = height(base) - shift http://sourceforge.net/p/xetex/code/ci/master/tree/source/texk/web2c/xetexdir/XeTeXOTMath.cpp#l518. In the MathML case, I believe "script" should be the first super postscript and "shift" should be maxSupScriptShift ; and we apply the kern to all the super postscripts. Same for subscripts and I wonder something similar should be done to prescripts too.

There is one additional issue to consider in our case. Since we rely on the CSS layout we must determine the preferred width without necessarily knowing the vertical metrics. However, in the case where we can determine the kerning, that is a single glyph (perhaps MathML tokens with multiple chars and additional dummy <mrow>'s or <mstyle>'s) it should be possible to get the vertical metrics ; and in other cases we treat the kern value as zero anyway. Otherwise as said above there are not many values to check and I guess the horizontal shifts are small, so at worst we could just compute the maximum kern.
(In reply to Frédéric Wang (:fredw) from comment #2)
> The MATH spec does not mention the left italic correction (so perhaps we can just keep the current value)

I think it should just be dropped, (for MATH fonts at least) since cut-ins can be used to control the placement of prescripts (and since TeX do not have a concept of left italic correction, I don’t think there are any other valid uses of it).

> In nsMathMLmunderoverFrame.cpp, (half of) the italic correction is used to
> shift the under/over scripts horizontally. This is always done except when
> an align=left/right attribute is used or for accentover/under scripts. The
> MATH spec only mentions that shift for large operators so I'm not sure this
> should be kept for arbitrary base.

If it is an over/underscript, then it should be shifted the same I think (since the underlying issue is the same; compensating for the slant of italic characters), if it is an accent then the accent placement logic should be used instead.

> See also bug 963131 for the new concept
> of "Stretch Stacks" that has different constants for the vertical gaps and
> shifts. The munderover code also layouts in two steps: first it attaches the
> overscript on the base, then consider base+overscript as an "anonymous base"
> and finally it attaches the underscript on this "anonymous base". So for the
> overscript we use the italic correction of the base but for the underscript
> we use the italic correction of the "anonymous base"
> http://hg.mozilla.org/mozilla-central/annotate/8f4ecbf938cd/layout/mathml/
> nsMathMLmunderoverFrame.cpp#l563, which may be different if the overscript
> is wider than the base. This should be probably be changed if we want to
> follow the MATH spec.

I agree.

> I wonder if some
> workaround should be kept for STIX Word...

I think font bugs in STIX Math can be ignored for now as the font will be eventually fixed.

> Regarding the kern values, I'm trying to understand the MATH spec and the
> XeTeX code. For the selection of values, I don't understand at all how the
> XeTeX code can even work:

Neither I actually :) so try not to read much into that code. It was hacked together by trial and error and I still can’t make sense of most of it.
See also https://sourceforge.net/p/stixfonts/tracking/35/ for STIX (I can not use "See Also" because of bug 624522)
Microsoft has a patent for the algorithm to place scripts using MathKernInfoRecord:

http://www.google.com/patents/US7492366

So we should not implement this until this is clarified.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.