Open Bug 1711017 Opened 3 years ago Updated 3 years ago

Unicode Characters U+FFA0, U+3164, U+115F, U+1160 not showing in body text

Categories

(Core :: Layout: Text and Fonts, defect)

Firefox 88
defect

Tracking

()

UNCONFIRMED

People

(Reporter: kazabana.tsukime, Unassigned)

Details

Attachments

(6 files)

Attached file test.html

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0

Steps to reproduce:

  1. Create an HTML file whose body tag contains the following string: "|ᅠ|<br />|ㅤ|<br />|ᅟ|<br />|ᅠ|"
  2. Open HTML file in Firefox.

(I have attached a minimal example "test.html" to reproduce this issue.)

Actual results:

The following characters do not show up in Firefox 88 (on Windows 10).

  • U+FFA0 (Halfwidth Hangul Filler)
  • U+3164 (Hangul Filler)
  • U+115F (Hangul Choseong Filler)
  • U+1160 (Hangul Jungseong Filler)
  1. The issue arises with both Firefox 88 (on Windows 10) and Firefox Nightly 90.0a1 (Build #2015809803; on Android 9)
  2. The issue still arises even when using fonts that have non-empty glyphs for said characters (such as GNU Unifont).

Expected results:

The characters U+FFA0, U+3164, U+115F, U+1160 should show up.

(NOTE)
The character showed up when using the following browsers:

  • Chrome 90.0.4430.212 (Official Build) (64-bit) on Windows 10
  • Microsoft Edge 90.0.4430.212 (Official Build) (64-bit) on Windows 10
  • Firefox Daylight 33.1 (4629) on iOS 14.4.2
  • Safari on iOS 14.4.2
    So I expect the characters showing up to be the default behavior.

A fix would be preferable as there can be cases where a font with non-empty glyphs for these characters are used.

Attached image Screenshot (Firefox)
Attached image Screenshot (Chrome)

The Bugbug bot thinks this bug should belong to the 'Core::Layout: Text and Fonts' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Layout: Text and Fonts
Product: Firefox → Core

This happens because these characters have the Default_Ignorable_Code_Point property (see https://www.unicode.org/Public/UCD/latest/ucd/DerivedCoreProperties.txt), and the Gecko code filters them out of the results of shaping, if the shaper/font itself didn't eliminate them, to avoid potentially rendering .notdef tofu-boxes when using a font that doesn't fully handle them.

However, since we implemented that, the internal handling of default-ignorables in HarfBuzz has been enhanced, so I think we could try dropping that code from gfxHarfBuzzShaper::SetGlyphsFromRun and relying just on HB's own handling. I'll push a try job and see how it looks.

On a related note, not sure if you've already found the following behavior of Firefox while working with the Hangul Filler characters, but by trial-and-error I've found this, so I'm documenting it here just in case.

Currently in Firefox, the Jamo sequence <U+115F, U+1160> (<HCF, HJF>) does trigger .notdef tofu-boxes to be rendered anyway if a font not fully able to handle HCF/HJF happens to intercept them before a proper Korean font does. For example, with the following CSS property setting: font-family: 'Times New Roman', 'Noto Sans CJK KR';

This includes the case when they're part of a "partial" hangul syllable (such as <U+115F, U+1160, U+11AB>), in which .notdef shows up and also seems to inhibit proper hangul syllable formation.

This behavior is in practical rare, though it can be found in the wild. For example,

  • this page in Firefox 88.0.1 on Windows 10
  • this page in Firefox Nightly 90.0a1 on Android

Interestingly, viewing the first page on Android, or the second one on Windows 10 causes no problem, so there seems to be additional factors for this behavior to manifest itself.

There is a quick fix on the page owner's side: rearrange the fallback font list so that Korean fonts intercept HCF/HJF first (such as font-family: 'Times New Roman', 'Noto Sans CJK KR';). But this does have its own consequences (such as having to use the CSS property unicode-range if you don't want to use Latin characters from the Korean font), so a fix on Firefox's side would be preferable in my opinion.

As for other browsers, Chrome and Edge (on Windows 10), Firefox Daylight and Safari (on iOS 14) don't show this behavior.

I'll attach an example file and screenshots for this behavior below.

Strings rendered in different fonts. All strings include a Jamo sequence for a "partial" Hangul syllable, i.e. a syllable using the Hangul Choseong Filler (U+115F) and Hangul Jungseong Filler (U+1160).

This file shows how strings are rendered in the following cases:

  • with the base CJK font (Noto Sans CJK KR)
  • when there is some other font not supporting the Hangul script intercepting the characters U+115F and U+1160 before the base CJK font (Noto Sans CJK KR)

In particular, in cases other than using only the base CJK font,

  • Partial Hangul syllables are malformed.
  • The sequence <U+115F, U+1160> triggers rendering of the .notdef tofu-box glyph, including when appearing as a subsequence for another partial Hangul syllable (such as <U+115F, U+1160, U+11AB>).

Current rendering of Partial Hangul syllables in Firefox.

Partial Hangul syllables = Jamo sequences containing Hangul Choseong Filler (U+115F) and Hangul Jungseong Filler (U+1160)

In each line from left to right

  • Description of characters in the string, (for example [HCF,U+1161])
  • (Correct rendering) Rendering using only the fallback CJK font (font-family: 'Noto Sans CJK KR';)
  • Rendering using one of the following non-Hangul-supporting fonts first then the CJK font (font-family: [non-Hangul font], 'Noto Sans CJK KR';
    • Unifont (has glyphs for Hangul but cannot form syllables; included for reference because of visible HCF/HJF glyphs)
    • Times New Roman
    • Arial
    • Ubuntu Mono
    • Iosevka
    • Inconsolata

NOTES

  • Sometimes when led with some characters, the partial Hangul syllables are rendered correctly. See lines beginning with 가 (U+AC00), ㉠ (U+3260), and ㋿ (U+32FF).
    • Unfortunately, in other realistic cases such as the last two lines [U+0020... and [U+0027..., the Hangul syllables are malformed.
  • The lines [HCF, HJF] and [HCF,HJF,U+11ab] show that the sequence <U+115F, U+1160> triggers rendering of .notdef tofu-boxes. (Inconsolata's .notdef is a blank glyph.)

Rendering of partial Hangul syllables in Chrome. (Rendering behaves as expected.)

To follow up on comment 4 above, https://treeherder.mozilla.org/jobs?repo=try&revision=fefaf3ed5d4d1f1b6ffddf2221e0c6ed0712ff63 indicates that the "simple" change here does affect an existing testcase (from bug 1238243) where a lone Hangul filler character is expected to be ignored. So we'll need to be a bit cautious here to avoid regressing the current handling of default-ignorables.

Severity: -- → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: