Open Bug 1931561 Opened 24 days ago Updated 9 days ago

Punctuation Spacing Broken Across Script-Run Boundaries

Categories

(Core :: Layout: Text and Fonts, defect, P3)

Firefox 132
defect

Tracking

()

People

(Reporter: tjw123hh, Unassigned)

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:132.0) Gecko/20100101 Firefox/132.0

Steps to reproduce:

HaffBuzz issue related (including the system details and the font used):
https://github.com/harfbuzz/harfbuzz/issues/4937

When enabling the chws (contextual horizontal spacing) feature through fontconfig on Arch Linux with Firefox, the contextual spacing between certain punctuation marks occasionally behaves unexpectedly.

For example:

  • In the string “silkworm”(意为“丝虫”), the spacing between the last two punctuation marks ( and ) disappears.
  • However, in the string (“”), the spacing appears as expected.
  • Additionally, even for the same string, the punctuation spacing may behave differently across different renderings (e.g., after editing or rendering the string in different contexts).

Due to https://github.com/harfbuzz/harfbuzz/issues/4937#issuecomment-2478946748 by @jfkthame:

I think what you're seeing (in Firefox specifically; I haven't checked how other browsers behave) is an artifact of the script-run itemization process, so it's an issue with how Gecko handles the text rather than a harfbuzz bug.

In your example of “silkworm”(意为“丝虫”), the text begins with a Latin-script run that includes the English word (and its surrounding quote marks), and also includes the opening parenthesis (because that has Script=COMMON in Unicode, and so is considered to continue the current script run).

Then we encounter the ideographs, which create a Han-script run; this includes the quote marks around 丝虫, which are Script=COMMON and so resolve to the current script. But the closing parenthesis is assigned script=Latin because the script itemizer attempts to pair opening and closing characters and ensure they are assigned the same script.

This means that the last two characters, ”), end up assigned to different script runs, and so are shaped separately; and font features like chws do not work across script-run boundaries.

Actual results:

Contextual spacing between specific punctuation marks ( and ) is missing in certain cases.

Expected results:

The script-run boundaries identifying should be improved to ensure that the contextual horizontal spacing (provided by the chws feature) should apply consistently to all relevant punctuation marks without omission or irregularities.

Jonathan do you know if this is a known-dupe? Triaging as S3 for now but feel free to reclassify.

Severity: -- → S3
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jfkthame)
Priority: -- → P3

Not a dupe AFAIK; S3 seems fine.

(Though it's potentially a tricky issue, because in general OpenType doesn't play well with shaping across script boundaries, since substitution and positioning rules are organized by script in the font, and the shaper has to explicitly choose which script's rules it is applying to any given run of text. And it's not clear that there's a universally-"right" answer to how we should resolve ambiguous characters such as punctuation at run boundaries.)

Flags: needinfo?(jfkthame)

Now Chromium can even achieve cross-element punctuation squeezing. Is it possible to take it as a reference?

You need to log in before you can comment on or make changes to this bug.