Open Bug 1751731 Opened 2 years ago Updated 2 years ago

Browsers don't hyphenate Mongolian text

Categories

(Core :: Layout: Text and Fonts, enhancement)

Firefox 96
enhancement

Tracking

()

UNCONFIRMED

People

(Reporter: ishida, Unassigned)

References

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:96.0) Gecko/20100101 Firefox/96.0

Steps to reproduce:

Hyphenation occurs in writing Mongolian and Todo. U+1806 MONGOLIAN TODO SOFT HYPHEN is used to indicate resumption of a broken word. It functions like U+2010 HYPHEN, except that it appears at the beginning of a line rather than at the end. (Note that lines of Mongolian text are vertical, and progress from left to right.)

Specs:
issue Better describe the likely outcomes of hyphenation Open.
https://github.com/w3c/csswg-drafts/issues/5973

css-text Describes how to apply hyphenation. It makes no special mention of Mongolian, nor of which character to use and where.
https://drafts.csswg.org/css-text-3/#hyphenation

css-text Has a hyphenate-character property which will allow users to specify the character to use for hyphenation, but it doesn't allow control of the location of the character.
https://drafts.csswg.org/css-text-4/#hyphenation

Actual results:

Note: Webkit is unable to display traditional Mongolian script.

Interactive test, Mongolian text is hyphenated when hyphens:auto is set
https://github.com/w3c/line_paragraph_tests/issues/64
Gecko: ❌ Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:94.0) Gecko/20100101 Firefox/94.0
Blink: ❌ Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
Webkit: ❌ Safari doesn't display correctly. Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15

Interactive test, Mongolian adds a hyphen to the start of the second line when a word is manually hyphenated with SHY
https://github.com/w3c/line_paragraph_tests/issues/65
Gecko: ? Produces a vertical baseline extension at the bottom of the first line. Not clear whether this is just part of the cursive glyph or a hyphen.. Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:94.0) Gecko/20100101 Firefox/94.0
Blink: ❌ Produces a horizontal hyphen at the bottom of the first line. Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
Webkit: ❌ Unable to correctly display traditional Mongolian script. Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15

i18n test suite, CSS3 Text, hyphens
https://www.w3.org/International/i18n-tests/results/hyphens#auto
General tests for hyphens support. (Results may need updating.)

Expected results:

Please enable hyphenation for Traditional Mongolian text, and show the correct hyphenation character at the start of the second line.

More generally, Gecko has a bug whereby hyphens don't display properly in vertical writing mode; see bug 1751738.

For traditional Mongolian script, we don't currently have hyphenation rules, so hyphens:auto isn't expected to work. (It should work for Mongolian written in Cyrillic script. I don't know how directly the break positions would map between the two writing systems; maybe traditional-Mongolian patterns could be generated by a conversion process from the Cyrillic ones?)

Depends on: 1751738
Depends on: 1751840
No longer depends on: 1751840

Although the hyphens property doesn't work, my understanding is that the choice of hyphen and position should also be invoked by use of a SHY, so that may be a relevant area to look at.

Wrt generating break points, i'm not sure, although (despite the complexity of traditional mongolian shaping), the mapping of sounds to characters appears to be relatively straightforward in both cases, so it may work. I put out a question on our Mongolian network - see https://github.com/w3c/mlreq/issues/43. They don't always reply, but it's worth asking.

Right -- the rendering of the soft-hyphen should be the same whether it comes from a manual ­ (U+00AD) in the content or an auto-hyphenation rule. We don't currently have locale-dependent support for this, so if hyphenation worked at all, it would just give you the default rendering of a standard hyphen at the end of the line before the break.

(The newly-implemented hyphenate-character property would let you change what character(s) will be rendered for the hyphen, but doesn't support adding a glyph after the break; this may be a possible future enhancement.)

So there's definitely work needed in order to fully support Mongolian here.

To confirm, when (traditional) Mongolian is hyphenated, does it only use the U+1806 character at the beginning of the line after the break, and no hyphen character of any kind at the end of the line before the break? (Or are there visible indicators of hyphenation both before and after the break?)

Yes, i believe that the hyphen only occurs after the break, and not before (unlike, say, Chinese). I checked in various places, and all said the same, as does The Unicode Standard v14 on page 545.

You need to log in before you can comment on or make changes to this bug.