Uighur hyphenation should be supported
Categories
(Core :: Layout: Text and Fonts, enhancement)
Tracking
()
People
(Reporter: ishida, Unassigned)
References
Details
Attachments
(1 file)
86.05 KB,
image/png
|
Details |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0
Steps to reproduce:
Unlike Arabic, which is never hyphenated, words in Uighur text in the Arabic script can be broken at line ends. A short horizontal stroke is added at the end of the line, separated from the previous text by a small space, and joining forms are retained for left-joining letters at line end and line start.
See the attached illustration.
This hyphenation method needs to be supported in browsers.
Specs:
css-level-3 provides controls for hyphenation, and alludes to the requirement to create joining letter forms at line end and start for Arabic-script text where hyphenation is allowed, but leaves it to the browser implementation to produce the specific type of hyphenation that is appropriate to a given language.
Actual results:
Tests & results:
The following tests use the second half of the text in the image shown above.
interactive test, hyphens:auto makes the browser hyphenate Uighur text and uses a low stroke at the line end, and joining forms at line end and start.
Results:
Gecko: ❌ No hyphenation occurs *Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:98.0) Gecko/20100101 Firefox/98.0*
Blink: ❌ No hyphenation occurs *Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36*
Webkit: ❌ No hyphenation occurs *Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.3 Safari/605.1.15*
interactive test, After setting hyphens:manual, the browser hyphenates Uighur text where soft hyphens occur. Hyphenation is shown by a low stroke at the line end, slightly separated from the foregoing text, and joining forms at line end and start.
Results:
Gecko: ✅❌ The lines break and the line-end and line-start letters have joining forms, but the marker used is an ordinary hyphen and not on the baseline. *Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:98.0) Gecko/20100101 Firefox/98.0*
Blink: ✅❌ The lines break and the line-end and line-start letters have joining forms, but the marker used is an ordinary hyphen and not on the baseline. *Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36*
Webkit: ✅❌ The lines break but the line-end and line-start letters don't have joining forms, and the marker used is an ordinary hyphen and not on the baseline. *Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.3 Safari/605.1.15*
Expected results:
Priority:
Uighur hyphenation is common in printed material, so it should also work on the Web. At least, the manual hyphenation should use the appropriate characters and placement.
Reporter | ||
Comment 1•2 years ago
|
||
Argh. Forgot to add the links to the tests. (I wish these forms would accept markdown or html.)
interactive test, After setting hyphens:manual, the browser hyphenates Uighur text where soft hyphens occur. Hyphenation is shown by a low stroke at the line end, slightly separated from the foregoing text, and joining forms at line end and start.
Reporter | ||
Comment 2•2 years ago
|
||
This bug report is being tracked at the W3C at https://www.w3.org/TR/arab-ug-gap/#issue250_hyphenation
Comment 3•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Layout: Text and Fonts' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 4•2 years ago
|
||
Arguably, a font designed for Uighur could/should provide a hyphen glyph that is shaped/positioned appropriately.
(My impression from some printed examples of Uighur that I have seen is that in at least some cases, what's happened is that the author or printer has "repurposed" a kashida glyph (U+0640) to use as a hyphen. It's unclear to me whether having the hyphen aligned with the flat baseline of the script is specifically desired, or if it is merely an artifact of this being the only small horizontal-line glyph that printers had on hand.)
If an author does specifically want to use the U+0640 glyph as a hyphen in Uighur, this could be done using something like
hyphenate-character: '\2005\0640\2005';
to request this (with a little space added, as it would normally butt up against or even slightly overlap adjacent glyphs). Adding this rule to your hyphens: manual
example seems to work OK in Firefox.
For hyphens: auto
, a set of hyphenation rules/patterns for Uighur would be needed; I am not aware of any that are currently available.
Reporter | ||
Comment 5•2 years ago
|
||
Ah, yes. I meant to mention that. It does seem that the appropriate character would be a tatweel. Mamoun Sakkal thought that was the case also. Fwiw, according to Anshu tatweel would also be appropriate for Hanifi Rohingya hyphenation.
I'll put another test together. But it would be nice if the author didn't have to set it up manually – or is that likely to become the recommended way forward as we move into the long tail?
Yes, i'm also not aware of hyphenation rules/patterns. I'll ask around.
Comment 6•2 years ago
|
||
(In reply to Richard Ishida from comment #5)
I'll put another test together. But it would be nice if the author didn't have to set it up manually – or is that likely to become the recommended way forward as we move into the long tail?
If it's clear that it should always be a tatweel/kashida character (what about spacing?), then in principle browsers could do this automatically (for the initial value auto
of hyphenate-character
) in the presence of lang=ug
(which will also be required for any auto-hyphenation to happen, of course).
Anyone using manual ­
breakpoints in content that may not be properly language-tagged could still set the char (or string) manually. Though an author who is in a position to set hyphenate-character
may also be in a position to set a lang
tag....
Description
•