Created attachment 685081 [details] ethi-line-brk.html User Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:17.0) Gecko/17.0 Firefox/17.0 Build ID: 20121120042814 Steps to reproduce: View page with text in Ethiopic script with words separated by U+1361 "Ethiopic word space". E.g. the attachment, or http://unicode.org/udhr/d/udhr_amh.html Actual results: Lines separated by such characters are not broken, and simply run out of the window. Expected results: It should also break lines at the Ethiopic characters U+1360 section mark U+1362 full stop U+1363 comma U+1364 semicolon U+1365 colon U+1366 preface colon U+1367 question mark U+1369 paragraph separator (It's not clear to me whether paragraph separator should always start a new line...in the above page, they put a new-line after each instance of that character, so maybe it shouldn't.) Note there are other Unicode script ranges that contain line-breaking punctuation: consult UniCodeData.txt. Also note: Gedit on the same system (Ubuntu) does break lines properly.
Created attachment 8902492 [details] [diff] [review] fix Not sure if this is the right fix but it seems to work. (I'm also moving the check for OGHAM SPACE MARK later than EM SPACE etc since I suspect it's less common.) https://treeherder.mozilla.org/#/jobs?repo=try&revision=5149a65969edf31908541145877579f66f2f0032
Comment on attachment 8902492 [details] [diff] [review] fix Review of attachment 8902492 [details] [diff] [review]: ----------------------------------------------------------------- LGTM. (FWIW, it's less clear to me whether we should do anything further to handle the other characters listed in comment 0. In the Amharic UDHR document, for example, any occurrences of these seem to be followed by either a newline or an Ethiopic wordspace, which provides the desired line-break. So let's do this, and wait for a better understanding before considering any followup that may be appropriate.)
FWIW, none of the other characters are a line-break opportunity in Chrome.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/073963897752 Make unicode ETHIOPIC WORDSPACE count as a space character. r=jfkthame
(In reply to Mats Palmgren (:mats) from comment #3) > FWIW, none of the other characters are a line-break opportunity in Chrome. Fair enough. Thanks for checking!
I concur about the other punctuation marks I listed -- I just got carried away. Only U+1361 is listed on http://www.unicode.org/reports/tr14/tr14-39.html as a "line break opportunity". In Ge'ez text I've found, the other punctuation is always followed by a space. Could we see an image or PDF showing how the fix handles the line in the example text?
It's what you'd expect :-) The fix should be available in Nightly in a few days so you can verify. http://nightly.mozilla.org/
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/2ab09319c214 Make unicode ETHIOPIC WORDSPACE count as a space character: remove test failure expecation. r=wpt-expectation-update