Unicode Characters U+FFA0, U+3164, U+115F, U+1160 not showing in body text
Categories
(Core :: Layout: Text and Fonts, defect)
Tracking
()
People
(Reporter: kazabana.tsukime, Unassigned)
Details
Attachments
(6 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0
Steps to reproduce:
- Create an HTML file whose body tag contains the following string: "|ᅠ|<br />|ㅤ|<br />|ᅟ|<br />|ᅠ|"
- Open HTML file in Firefox.
(I have attached a minimal example "test.html" to reproduce this issue.)
Actual results:
The following characters do not show up in Firefox 88 (on Windows 10).
- U+FFA0 (Halfwidth Hangul Filler)
- U+3164 (Hangul Filler)
- U+115F (Hangul Choseong Filler)
- U+1160 (Hangul Jungseong Filler)
- The issue arises with both Firefox 88 (on Windows 10) and Firefox Nightly 90.0a1 (Build #2015809803; on Android 9)
- The issue still arises even when using fonts that have non-empty glyphs for said characters (such as GNU Unifont).
Expected results:
The characters U+FFA0, U+3164, U+115F, U+1160 should show up.
(NOTE)
The character showed up when using the following browsers:
- Chrome 90.0.4430.212 (Official Build) (64-bit) on Windows 10
- Microsoft Edge 90.0.4430.212 (Official Build) (64-bit) on Windows 10
- Firefox Daylight 33.1 (4629) on iOS 14.4.2
- Safari on iOS 14.4.2
So I expect the characters showing up to be the default behavior.
A fix would be preferable as there can be cases where a font with non-empty glyphs for these characters are used.
Reporter | ||
Comment 1•3 years ago
|
||
Reporter | ||
Comment 2•3 years ago
|
||
Comment 3•3 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Layout: Text and Fonts' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
Comment 4•3 years ago
|
||
This happens because these characters have the Default_Ignorable_Code_Point property (see https://www.unicode.org/Public/UCD/latest/ucd/DerivedCoreProperties.txt), and the Gecko code filters them out of the results of shaping, if the shaper/font itself didn't eliminate them, to avoid potentially rendering .notdef
tofu-boxes when using a font that doesn't fully handle them.
However, since we implemented that, the internal handling of default-ignorables in HarfBuzz has been enhanced, so I think we could try dropping that code from gfxHarfBuzzShaper::SetGlyphsFromRun
and relying just on HB's own handling. I'll push a try job and see how it looks.
Reporter | ||
Comment 5•3 years ago
|
||
On a related note, not sure if you've already found the following behavior of Firefox while working with the Hangul Filler characters, but by trial-and-error I've found this, so I'm documenting it here just in case.
Currently in Firefox, the Jamo sequence <U+115F, U+1160>
(<HCF, HJF>
) does trigger .notdef
tofu-boxes to be rendered anyway if a font not fully able to handle HCF/HJF happens to intercept them before a proper Korean font does. For example, with the following CSS property setting: font-family: 'Times New Roman', 'Noto Sans CJK KR';
This includes the case when they're part of a "partial" hangul syllable (such as <U+115F, U+1160, U+11AB>
), in which .notdef
shows up and also seems to inhibit proper hangul syllable formation.
This behavior is in practical rare, though it can be found in the wild. For example,
Interestingly, viewing the first page on Android, or the second one on Windows 10 causes no problem, so there seems to be additional factors for this behavior to manifest itself.
There is a quick fix on the page owner's side: rearrange the fallback font list so that Korean fonts intercept HCF/HJF first (such as font-family: 'Times New Roman', 'Noto Sans CJK KR';
). But this does have its own consequences (such as having to use the CSS property unicode-range
if you don't want to use Latin characters from the Korean font), so a fix on Firefox's side would be preferable in my opinion.
As for other browsers, Chrome and Edge (on Windows 10), Firefox Daylight and Safari (on iOS 14) don't show this behavior.
I'll attach an example file and screenshots for this behavior below.
Reporter | ||
Comment 6•3 years ago
|
||
Strings rendered in different fonts. All strings include a Jamo sequence for a "partial" Hangul syllable, i.e. a syllable using the Hangul Choseong Filler (U+115F) and Hangul Jungseong Filler (U+1160).
This file shows how strings are rendered in the following cases:
- with the base CJK font (Noto Sans CJK KR)
- when there is some other font not supporting the Hangul script intercepting the characters U+115F and U+1160 before the base CJK font (Noto Sans CJK KR)
In particular, in cases other than using only the base CJK font,
- Partial Hangul syllables are malformed.
- The sequence
<U+115F, U+1160>
triggers rendering of the.notdef
tofu-box glyph, including when appearing as a subsequence for another partial Hangul syllable (such as<U+115F, U+1160, U+11AB>
).
Reporter | ||
Comment 7•3 years ago
|
||
Current rendering of Partial Hangul syllables in Firefox.
Partial Hangul syllables = Jamo sequences containing Hangul Choseong Filler (U+115F) and Hangul Jungseong Filler (U+1160)
In each line from left to right
- Description of characters in the string, (for example
[HCF,U+1161]
) - (Correct rendering) Rendering using only the fallback CJK font (
font-family: 'Noto Sans CJK KR';
) - Rendering using one of the following non-Hangul-supporting fonts first then the CJK font (
font-family: [non-Hangul font], 'Noto Sans CJK KR';
- Unifont (has glyphs for Hangul but cannot form syllables; included for reference because of visible HCF/HJF glyphs)
- Times New Roman
- Arial
- Ubuntu Mono
- Iosevka
- Inconsolata
NOTES
- Sometimes when led with some characters, the partial Hangul syllables are rendered correctly. See lines beginning with 가 (U+AC00), ㉠ (U+3260), and ㋿ (U+32FF).
- Unfortunately, in other realistic cases such as the last two lines
[U+0020...
and[U+0027...
, the Hangul syllables are malformed.
- Unfortunately, in other realistic cases such as the last two lines
- The lines
[HCF, HJF]
and[HCF,HJF,U+11ab]
show that the sequence<U+115F, U+1160>
triggers rendering of.notdef
tofu-boxes. (Inconsolata's.notdef
is a blank glyph.)
Reporter | ||
Comment 8•3 years ago
|
||
Rendering of partial Hangul syllables in Chrome. (Rendering behaves as expected.)
Comment 9•3 years ago
|
||
To follow up on comment 4 above, https://treeherder.mozilla.org/jobs?repo=try&revision=fefaf3ed5d4d1f1b6ffddf2221e0c6ed0712ff63 indicates that the "simple" change here does affect an existing testcase (from bug 1238243) where a lone Hangul filler character is expected to be ignored. So we'll need to be a bit cautious here to avoid regressing the current handling of default-ignorables.
Description
•