Closed Bug 752459 Opened 9 years ago Closed 9 years ago
word-wrap:break-word should not break up base+diacritic clusters or surrogate pairs
The word-wrap:break-word property (being renamed to overflow-wrap in CSS3 Text) inappropriately causes breaks between a base character and its applied diacritics, and (even worse) breaks between the two code units of a surrogate pair. Testcase: data:text/html;charset=utf-8, <div style="width:0px;word-wrap:break-word">abc d̥e̕f̣
Argh, I forgot bugzilla would truncate my text as soon as it encountered a surrogate codepoint. Trying again.... data:text/html;charset=utf-8, <div style="width:0px;word-wrap:break-word">abc d̥e̕f̣ %F0%90%90%80%F0%90%90%81%F0%90%90%82 Note the "blank" lines between the Deseret letters, because internally we split the low surrogate onto a separate line. (The fact that they're split across lines can be demonstrated by drag-selecting one of the Deseret glyphs (or hexboxes), *or* one of the invisible blanks in between them, and copy-pasting it into the search box - you get just the single, unpaired surrogate codepoint.)
It looks like this can be trivially fixed, actually - we just need to check whether the current character is a cluster start when considering a possible breakpoint for word-wrap:break-word.
Assignee: nobody → jfkthame
Attachment #621571 - Flags: review?(smontagu)
Attachment #621571 - Flags: review?(smontagu) → review+
Attachment #621572 - Flags: review?(smontagu) → review+
Target Milestone: --- → mozilla15
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.