Last Comment Bug 752459 - word-wrap:break-word should not break up base+diacritic clusters or surrogate pairs
: word-wrap:break-word should not break up base+diacritic clusters or surrogate...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: mozilla15
Assigned To: Jonathan Kew (:jfkthame)
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-07 05:05 PDT by Jonathan Kew (:jfkthame)
Modified: 2012-05-18 18:17 PDT (History)
2 users (show)
ryanvm: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch, respect clusters when doing word-wrap:break-word (1.31 KB, patch)
2012-05-07 05:39 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Review
add a reftest for word-wrap cluster support (2.17 KB, patch)
2012-05-07 05:40 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Review

Description Jonathan Kew (:jfkthame) 2012-05-07 05:05:34 PDT
The word-wrap:break-word property (being renamed to overflow-wrap in CSS3 Text) inappropriately causes breaks between a base character and its applied diacritics, and (even worse) breaks between the two code units of a surrogate pair.

Testcase:
data:text/html;charset=utf-8,
  <div style="width:0px;word-wrap:break-word">abc d̥e̕f̣  
Comment 1 Jonathan Kew (:jfkthame) 2012-05-07 05:12:53 PDT
Argh, I forgot bugzilla would truncate my text as soon as it encountered a surrogate codepoint.

Trying again....

data:text/html;charset=utf-8,
  <div style="width:0px;word-wrap:break-word">abc d̥e̕f̣ %F0%90%90%80%F0%90%90%81%F0%90%90%82

Note the "blank" lines between the Deseret letters, because internally we split the low surrogate onto a separate line. (The fact that they're split across lines can be demonstrated by drag-selecting one of the Deseret glyphs (or hexboxes), *or* one of the invisible blanks in between them, and copy-pasting it into the search box - you get just the single, unpaired surrogate codepoint.)
Comment 2 Jonathan Kew (:jfkthame) 2012-05-07 05:39:29 PDT
Created attachment 621571 [details] [diff] [review]
patch, respect clusters when doing word-wrap:break-word

It looks like this can be trivially fixed, actually - we just need to check whether the current character is a cluster start when considering a possible breakpoint for word-wrap:break-word.
Comment 3 Jonathan Kew (:jfkthame) 2012-05-07 05:40:09 PDT
Created attachment 621572 [details] [diff] [review]
add a reftest for word-wrap cluster support

Note You need to log in before you can comment on or make changes to this bug.