Open Bug 1913415 Opened 1 month ago Updated 25 days ago

Text wrapping fails in various SE Asian scripts

Categories

(Core :: Layout: Text and Fonts, defect)

Firefox 128
defect

Tracking

()

UNCONFIRMED

People

(Reporter: ishida, Unassigned)

References

(Blocks 1 open bug)

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:128.0) Gecko/20100101 Firefox/128.0

Steps to reproduce:

Words are not separated by spaces in the Javanese orthography. Javanese is also one of a small number of scripts where an initial consonant for a word may be subjoined below the final consonant of the preceding word. Because these stacked consonants cannot be split, segmentation for line-breaking, etc. uses orthographic syllables as a unit, where orthographic means a character or stack of characters with all associated combining marks.

Unlike Thai, which uses dictionary lookup to wrap word-by-word, the basic break points in Javanese can be calculated using a grammar for syllables. (There are likely to be additional considerations to check related to punctuation, digits, etc.)

See this discussion for examples. https://github.com/w3c/sealreq/issues/2

It is possible to fudge things, using CSS properties, so that the text wraps, but the resulting line breaks are not always correct. It is also possible to make the breaking happen by inserting ZWSP at appropriate places, but we cannot expect Javanese users to do that accurately and consistently.

More:

script resources https://www.w3.org/TR/java-lreq/#line_breaking

Besides Javanese, this issue applies to text written in the following scripts: Balinese, Batak, Brahmi, (Eastern) Cham, Dives Akuru, Grantha, Gurung Khema, Javanese, Kawi, Makasar, and Tulu Tigalari.

Actual results:

Gecko doesn't wrap at all in rendered HTML. In the textarea, however, lines ARE broken by orthographic syllables in a textarea element.

This problem has recently been fixed for Blink and WebKit browsers.

Interactive test, Text should wrap to the next line at the line end
https://github.com/w3c/line_paragraph_tests/issues/11

Expected results:

Line breaking at orthographic syllable boundaries for Javanese and several other scripts was specified in the Unicode Line Breaking Algorithm in Unicode 15.1 based on L2/22-080R2. ICU has been updated and has started to roll out to browsers and platforms.

Both Blink and WebKit browsers now wrap Javanese text at line ends. Gecko browsers fail to wrap the line in rendered HTML, although the wrapping occurs inside a textarea control.

This bug is being tracked by the W3C at https://github.com/w3c/sealreq/issues/40

The Bugbug bot thinks this bug should belong to the 'Core::Layout: Text and Fonts' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Layout: Text and Fonts
Product: Firefox → Core

The textarea is able to wrap because it has overflow-wrap: break-word. We should look into what the appropriate line-breaker options ought to be for the more general case.

Strongly related to bug 1897472.

Severity: -- → S3
You need to log in before you can comment on or make changes to this bug.