Last Comment Bug 249159 - implement 'word-break' properties of CSS3
: implement 'word-break' properties of CSS3
Status: RESOLVED FIXED
[parity-IE][parity-webkit]
: css3, dev-doc-complete, intl
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: -- normal with 21 votes (vote)
: mozilla15
Assigned To: Makoto Kato [:m_kato]
:
Mentors:
http://dev.w3.org/csswg/css3-text/#wo...
: 399155 524683 (view as bug list)
Depends on: grapheme-breaker 780680
Blocks: css-text-3 line-breaking ringmark-ring0
  Show dependency treegraph
 
Reported: 2004-06-29 20:05 PDT by Jungshik Shin
Modified: 2012-08-06 12:14 PDT (History)
38 users (show)
roc: blocking1.9.2-
roc: wanted1.9.2-
smontagu: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-


Attachments
patch v0 (25.29 KB, patch)
2008-09-15 05:48 PDT, Makoto Kato [:m_kato]
no flags Details | Diff | Splinter Review
patch v1 (25.60 KB, patch)
2008-10-27 06:27 PDT, Makoto Kato [:m_kato]
no flags Details | Diff | Splinter Review
patch v2 (25.44 KB, patch)
2009-01-04 05:37 PST, Makoto Kato [:m_kato]
smontagu: review-
Details | Diff | Splinter Review
patch v2.1 (26.07 KB, patch)
2009-02-06 09:42 PST, Makoto Kato [:m_kato]
no flags Details | Diff | Splinter Review
patch v2.2 (26.01 KB, patch)
2009-07-03 19:08 PDT, Makoto Kato [:m_kato]
no flags Details | Diff | Splinter Review
patch v2.3 (26.74 KB, patch)
2009-11-17 03:49 PST, Makoto Kato [:m_kato]
no flags Details | Diff | Splinter Review
patch v3 (26.75 KB, patch)
2009-11-30 03:25 PST, Makoto Kato [:m_kato]
no flags Details | Diff | Splinter Review
patch v4 (42.70 KB, patch)
2010-06-23 01:48 PDT, Makoto Kato [:m_kato]
dbaron: review+
Details | Diff | Splinter Review
Part 1 - Add word-break interface to nsILineBreaker (13.85 KB, patch)
2011-03-28 04:46 PDT, Makoto Kato [:m_kato]
smontagu: review+
Details | Diff | Splinter Review
Part 2 - Implement word-break property (20.35 KB, patch)
2011-03-28 04:47 PDT, Makoto Kato [:m_kato]
no flags Details | Diff | Splinter Review
Part 2. Implement word-break property v2 (19.45 KB, patch)
2011-06-17 00:44 PDT, Makoto Kato [:m_kato]
no flags Details | Diff | Splinter Review
Part 1 - Add word-break interface to nsILineBreaker v2 (13.06 KB, patch)
2011-08-26 04:25 PDT, Makoto Kato [:m_kato]
smontagu: review+
Details | Diff | Splinter Review
Part 2 - Implement word-break property v3 (19.47 KB, patch)
2011-08-26 04:26 PDT, Makoto Kato [:m_kato]
no flags Details | Diff | Splinter Review
Some more reftests (7.03 KB, patch)
2012-05-02 02:54 PDT, Simon Montagu :smontagu
no flags Details | Diff | Splinter Review
Even more reftests (10.49 KB, patch)
2012-05-02 07:40 PDT, Simon Montagu :smontagu
jfkthame: review+
Details | Diff | Splinter Review
Part 3 -- fix for mixed 8-bit and 16-bit text (1.09 KB, patch)
2012-05-03 07:04 PDT, Simon Montagu :smontagu
jfkthame: review+
Details | Diff | Splinter Review
part 4 - harfbuzz shaper should not modify character-clustering info due to diacritic within ligature (2.57 KB, patch)
2012-05-04 09:16 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
modification to wordbreak-7 test (7.65 KB, patch)
2012-05-05 03:55 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
roll-up patch for Aurora (49.37 KB, patch)
2012-05-14 03:16 PDT, Jonathan Kew (:jfkthame)
jfkthame: review+
jpr: approval‑mozilla‑aurora-
Details | Diff | Splinter Review

Description Jungshik Shin 2004-06-29 20:05:43 PDT
http://www.w3.org/TR/2003/CR-css3-text-20030514/#wordbreak-props
CSS3 text module defines word-break (word-break-cjk and word-break-inside)
properties, but we haven't implemented them yet. Implementing them should be a
part of overhauling/improving our line-breaking routines.
Comment 1 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2004-06-29 20:40:52 PDT
Is there a good reason for implementing these?

Why would anyone want word-break-cjk?  Isn't the default value "the right way"
and everything else "the wrong way"?

And I don't think we're about to implement dictionary-based hyphenation.

I vote WONTFIX (just like most of css3-text).
Comment 2 Jungshik Shin 2004-06-29 21:59:41 PDT
In CJK typography, 'the wrong' way ('break-all') is sometimes (actually pretty
often) used for a short chunk of Latin text. A Korean user asked for it (writing
that it's possible in MS IE) at mozilla.or.kr and I filed this bug. 

As for the dictionary-based hyphenating, it's needless to say it's a lot of work
and it won't happen anytime soon, but it doesn't have to be 'all or nothing',
does it? 
Comment 3 Xiaoyu Huang 2004-08-16 01:17:15 PDT
It's really a bug. Though it's a small problem but there're many small problems
like this in mozilla for CJK users. So they use IE not mozilla.

I vote fix it.
Comment 4 fantasai 2004-08-16 02:28:36 PDT
The word-breaking section of CSS3 Text is unstable and the definitions therein
may be changed. Fixing this bug right now is therefore discouraged.
Comment 5 Jungshik Shin 2004-08-16 03:03:31 PDT
Don't worry :-) We don't have any resource to work on this even if we think we
have to/want to. There are a lot more important bugs related to line breaking
(see bug 206152) than this that we need to fix. 
Comment 6 Thangaraj 2007-01-17 02:29:44 PST
Is there any Way or Workaround to do this in Firefox????
Comment 7 Kaare A. Larsen 2007-10-09 02:17:20 PDT
*** Bug 399155 has been marked as a duplicate of this bug. ***
Comment 8 Syophone 2008-03-07 00:37:31 PST
(In reply to comment #6)
> Is there any Way or Workaround to do this in Firefox????
> 

NO,as bug 99457.

Comment 9 Makoto Kato [:m_kato] 2008-09-01 18:54:03 PDT
I look this.  At latest draft (http://www.w3.org/TR/2007/WD-css3-text-20070306/), word-break-cjk and word-break-inside is removed.
Comment 10 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2008-09-01 18:57:48 PDT
Before implementing, you should ask fantasai what is stable.
Comment 11 fantasai 2008-09-01 19:20:40 PDT
The Editor's Draft is here:
  http://dev.w3.org/csswg/css3-text/#word-break
The functionality described there should be stable, but the syntax is not.

Microsoft has already implemented
  word-break: normal | break-all | keep-all
without a vendor extension, so those values should be safe. The other values should be implemented with a -moz- prefix for now.
Comment 12 Makoto Kato [:m_kato] 2008-09-15 05:48:37 PDT
Created attachment 338626 [details] [diff] [review]
patch v0

Now I'm still testing this patch.  After testing, I will send code-review.
Comment 13 Makoto Kato [:m_kato] 2008-10-27 06:27:00 PDT
Created attachment 344881 [details] [diff] [review]
patch v1

patch with test case
Comment 14 Makoto Kato [:m_kato] 2008-10-27 06:27:56 PDT
fantasai, Do you have a good test case for this?
Comment 15 fantasai 2008-10-30 14:21:49 PDT
No, I don't. If you make yours match the CSS Test Suite guidelines, though, I'd love to add them to the W3C's collection. :) http://wiki.csswg.org/test/css2.1/format
Comment 16 Makoto Kato [:m_kato] 2009-01-04 05:37:00 PST
Created attachment 355280 [details] [diff] [review]
patch v2
Comment 17 Masayuki Nakano [:masayuki] (Mozilla Japan) 2009-01-04 08:30:27 PST
Kato-san, I think that you need to add word-break property to property-database.js too.

http://mxr.mozilla.org/mozilla-central/source/layout/style/test/property_database.js

Thank you for your work!
Comment 18 Simon Montagu :smontagu 2009-01-09 02:40:58 PST
Comment on attachment 355280 [details] [diff] [review]
patch v2

I'm not qualified to review changes to style system, but I'm fairly sure that there are some parts missing here (apart from what Masayuki-san already mentioned). You should complete all the steps in https://developer.mozilla.org/En/Adding_a_new_style_property and make sure that mochitests pass in layout/style.
Comment 19 Makoto Kato [:m_kato] 2009-02-06 09:42:27 PST
Created attachment 360925 [details] [diff] [review]
patch v2.1
Comment 20 Makoto Kato [:m_kato] 2009-02-08 16:53:08 PST
Comment on attachment 360925 [details] [diff] [review]
patch v2.1

pass all test
Comment 21 Masayuki Nakano [:masayuki] (Mozilla Japan) 2009-03-04 16:16:33 PST
Simon, would you continue to review?
Comment 22 Makoto Kato [:m_kato] 2009-07-03 19:08:07 PDT
Created attachment 386789 [details] [diff] [review]
patch v2.2
Comment 23 philippe (part-time) 2009-10-28 01:07:13 PDT
*** Bug 524683 has been marked as a duplicate of this bug. ***
Comment 24 Makoto Kato [:m_kato] 2009-11-17 03:49:01 PST
Created attachment 412820 [details] [diff] [review]
patch v2.3

Update with latest.  If you have no time to review this, please let me know.
Comment 25 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2009-11-17 10:27:41 PST
Comment on attachment 412820 [details] [diff] [review]
patch v2.3

>diff --git a/layout/style/nsCSSPropList.h b/layout/style/nsCSSPropList.h
> CSS_PROP_TEXT(
>+    word-break,
>+    word_break,
>+    WordBreak,
>+    0,
>+    Text,
>+    mWordBreak,
>+    eCSSType_Value,
>+    kWordbreakKTable,
>+    CSS_PROP_NO_OFFSET,
>+    eStyleAnimType_None)

You should make the last two lines offsetof(nsStyleText, mWordBreak) and eStyleAnimType_EnumU8.  Then nsStyleAnimation will be able to animate the property (even if nothing takes advantage of it yet).
Comment 26 Makoto Kato [:m_kato] 2009-11-30 03:25:11 PST
Created attachment 415098 [details] [diff] [review]
patch v3
Comment 27 Makoto Kato [:m_kato] 2009-11-30 03:30:41 PST
change title because current spec no longer uses word-break-cjk and word-break-inside.

Also, WebKit has implemented "word-break: break-all", and IE has implemented "word-break: break-all" and "keep-all".
Comment 28 Makoto Kato [:m_kato] 2009-11-30 03:31:43 PST
Comment on attachment 415098 [details] [diff] [review]
patch v3

modified by dbaron's comment.
Comment 29 Kevin Brosnan 2010-01-30 07:39:25 PST
*** Bug 543211 has been marked as a duplicate of this bug. ***
Comment 30 d 2010-03-05 09:12:00 PST
How is progress on this?
Comment 31 Makoto Kato [:m_kato] 2010-03-14 07:15:14 PDT
(In reply to comment #30)
> How is progress on this?

Simon is no reply for review.
Comment 32 Simon Montagu :smontagu 2010-03-16 14:10:38 PDT
Comment on attachment 415098 [details] [diff] [review]
patch v3

>diff --git a/content/base/src/nsLineBreaker.cpp b/content/base/src>@@ -81,11 +83,13 @@ nsLineBreaker::FlushCurrentWord()
>   nsTArray<PRPackedBool> capitalizationState;
> 
>   if (!mCurrentWordContainsComplexChar) {
>-    // Just set everything internal to "no break"!
>-    memset(breakState.Elements(), PR_FALSE, length*sizeof(PRPackedBool));
>+    // Just set everything internal to "no break" except to break-strict!

"For break-strict set everything internal to "break", otherwise to "no break"

>@@ -417,3 +425,31 @@ nsLineBreaker::Reset(PRBool* aTrailingBr
>+nsresult
>+nsLineBreaker::SetBreakMode(PRUint8 aMode)
>+{
>+  switch (aMode)
>+  {
...
>+  default:
>+    return NS_ERROR_FAILURE;

don't we want to default to "normal"?

>diff --git a/intl/lwbrk/src/nsJISx4501LineBreaker.cpp b/intl/lwbrk

>+const PRUint8 gLoose30[] =

Can you document which characters this contains and why?

Is it correct that it doesn't include the "small katakana" letters from U+31F0-U+31FF? In the test later for half-width katakana, what about U+FF70	HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK, which seems to be equivalent to U+30FC?

>+static PRBool
>+IS_LOOSE_CHAR(PRUnichar aChar)
>+{
>+  PRUnichar hi = aChar & 0xff00;
>+  PRUnichar lo = aChar & 0xff;
>+
>+  if (hi == 0x3000)
>+  {
>+    return (gLoose30[lo / 8] >> (7 - (lo % 8))) & 0x01;
>+  }
>+  else if (hi == 0xff00)

Don't do else after return

>+// A helper function of word-break property
>+static void
>+ShouldBreakByBreakMode(PRUnichar aChar, PRUint8 aMode, PRBool *aAllow)
>+{
>+  // Check word-break style
>+  if (aMode == nsILineBreaker::kBreakMode_Normal)
>+    return;
>+
>+  if (IS_CJK_CHAR(aChar))
>+  { 
>+    if (aMode & nsILineBreaker::kBreakMode_CJK_Keep)
>+      *aAllow = PR_FALSE;
>+    else if (aMode & nsILineBreaker::kBreakMode_CJK_Loose && IS_LOOSE_CHAR(aChar))
>+      *aAllow = PR_TRUE;
>+  } else {
>+    if (aMode & nsILineBreaker::kBreakMode_BreakStrict)
>+      *aAllow = PR_TRUE;
>+  }
>+}

More documentation here would be good. I'm not sure what is happening in the cases not covered by the conditions.

For future iterations, can you separate the style changes and the linebreaker changes into separate patches, and request review from dbaron for the style and from me for the linebreaker.
Comment 33 Makoto Kato [:m_kato] 2010-06-23 01:48:09 PDT
Created attachment 453321 [details] [diff] [review]
patch v4
Comment 34 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2010-06-26 16:30:41 PDT
Comment on attachment 453321 [details] [diff] [review]
patch v4

nsStyleConsts.h:

>+// See nsStaleText

s/Stale/Style/

>diff --git a/layout/html/tests/css/word-break.css b/layout/html/tests/css/word-break.css

Don't add this file; it doesn't do anything.

Would it make more sense for nsILineBreaker::SetBreakMode (and
nsLineBreaker::mBreakMode) to be called SetWordBreak or
SetWordBreakMode, since the property is called word-break?

Did you check that all of the reftests fit within the 800x1000 tested
area?  (You can check by changing them to != tests and looking at the
image.)

In nsCSSProps, kWordbreakKTable should be kWordBreakKTable (with a
capital B), to match surrounding style.  If you don't mind, could you
also make the same change to kWordwrapKTable -> kWordWrapKTable.

In nsComputedDOMStyle.cpp, you don't need the 
>+  if (text->mWordBreak != NS_STYLE_WORDBREAK_NORMAL) {
...
>+  } else {
>+    val->SetIdent(eCSSKeyword_normal);
>+  }
(You only need the code where I put "...".)


Have you tested that the implementation of the three unprefixed values
is compatible with what IE implements?

With those changes, r=dbaron on the dom/ and layout/ changes.

smontagu should review the content/ and intl/ and layout/reftests/ changes.
Comment 35 mani 2010-11-22 23:20:38 PST
can i know the status of this fix ? its very much needed for proper display of lengthy words. Except FF, all the other major browsers like IE, safari, chrome are supporting this word-break feature.
Comment 36 Makoto Kato [:m_kato] 2010-11-23 18:15:25 PST
(In reply to comment #35)
> can i know the status of this fix ? its very much needed for proper display of
> lengthy words. Except FF, all the other major browsers like IE, safari, chrome
> are supporting this word-break feature.

This isn't blocker for 4.0  Also, since editor draft for work-break is modified, I will create new fix after post 4.0 tree is opened.
Comment 37 Makoto Kato [:m_kato] 2011-03-28 04:46:48 PDT
Created attachment 522332 [details] [diff] [review]
Part 1 - Add word-break interface to nsILineBreaker
Comment 38 Makoto Kato [:m_kato] 2011-03-28 04:47:19 PDT
Created attachment 522333 [details] [diff] [review]
Part 2 - Implement word-break property
Comment 39 Makoto Kato [:m_kato] 2011-06-17 00:44:27 PDT
Created attachment 540007 [details] [diff] [review]
Part 2. Implement word-break property v2
Comment 40 Makoto Kato [:m_kato] 2011-06-17 01:02:08 PDT
(In reply to comment #34)
> Have you tested that the implementation of the three unprefixed values
> is compatible with what IE implements?

IE9 supports both break-all and keep-all.  About break-all, this is same implementation. But IE9's keep-all implementation is strange.  Example, U+3001 is break character even if keep-all.  (U+3002 doesn't break even if keep-all.)

WebKit supports break-all.  It is same implementation.
Comment 41 Makoto Kato [:m_kato] 2011-08-26 04:25:36 PDT
Created attachment 555997 [details] [diff] [review]
Part 1 - Add word-break interface to nsILineBreaker v2

rebase with latest
Comment 42 Makoto Kato [:m_kato] 2011-08-26 04:26:41 PDT
Created attachment 555998 [details] [diff] [review]
Part 2 - Implement word-break property v3

rebase with the latest
Comment 43 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-09-06 14:47:55 PDT
Comment on attachment 555998 [details] [diff] [review]
Part 2 - Implement word-break property v3

Why are you requesting review again?  I granted review in comment 34 (conditional on changes described there).  If this is different in some way that it needs to be reviewed again, please explain how.
Comment 44 Jet Villegas (:jet) 2012-04-30 13:10:52 PDT
smontagu: please review the r? items and land this patch.
Comment 45 Kang-Hao (Kenny) Lu [:kennyluck] 2012-04-30 13:33:21 PDT
(In reply to Jet Villegas (:jet) from comment #44)
> smontagu: please review the r? items and land this patch.

To be clear, it seems what's left is the content/ and intl/ and layout/reftests/ parts in Part 2 that should be reviewed by smontagu. (Part 1 already got a r+ and Part 2 got partial r+ from Comment 34)
Comment 46 Jet Villegas (:jet) 2012-05-01 14:40:51 PDT
These patches have been updated and are winding their way through the try servers now...
Comment 47 Simon Montagu :smontagu 2012-05-02 02:54:04 PDT
Created attachment 620239 [details] [diff] [review]
Some more reftests
Comment 48 Simon Montagu :smontagu 2012-05-02 07:40:41 PDT
Created attachment 620311 [details] [diff] [review]
Even more reftests
Comment 49 Jonathan Kew (:jfkthame) 2012-05-02 08:36:35 PDT
Comment on attachment 620311 [details] [diff] [review]
Even more reftests

Review of attachment 620311 [details] [diff] [review]:
-----------------------------------------------------------------

The reftest files should all have <!DOCTYPE html> at the top, I think.

::: layout/reftests/text/wordbreak-7-ref.html
@@ +12,5 @@
> +  </style>
> +  <title>Test - word-break: break-all with rtl and diacritics</title>
> +  </head>
> +  <body>
> +    <div>&#x0648;&#x064E;<br/>&#x0644;&#x064E;&#x0627;<br/>&#x0627;<br/>&#xFEDF;<br/>&#xFEC0;&#x064E;&#x0651;<br/>&#xFE8E;<br/>&#xFEDF;&#x0650;&#x0651;<br/>&#xFEF4;<br/>&#xFEE6;&#x064E;</div><br/>

<br/>&#x0644;&#x064E;&#x0627;<br/>

This implies that we don't break between characters that form a ligature (lam-alef). That seems surprising; although it'll look bad, I'd expect this to be a legal break position. What about in Latin? Do we break between "f" and "i" in "fire" (in a typical serif font with standard ligatures)?
Comment 50 Simon Montagu :smontagu 2012-05-02 23:41:42 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #49)
> This implies that we don't break between characters that form a ligature
> (lam-alef). That seems surprising; although it'll look bad, I'd expect this
> to be a legal break position.

FWIW, IE doesn't break a lam-alef ligature, but Webkit does. Chrome on Windows (but not on Mac) also breaks between base letters and diacritics, which I'm sure is wrong; between Devanagari consonants and matras; and between every character in NFD Hangul. The last two I suppose are permissible by the letter of the spec, but (IMNSHO) that just means that the spec should be changed.

> What about in Latin? Do we break between "f"
> and "i" in "fire" (in a typical serif font with standard ligatures)?

Answered on IRC -- summarizing here for the record:
We don't (although we do exhibit behaviour similar to bug 479829)
Comment 51 Simon Montagu :smontagu 2012-05-02 23:43:42 PDT
Sorry, we *do* break between "f" and "i" in "fire"
Comment 52 Jonathan Kew (:jfkthame) 2012-05-03 01:21:30 PDT
(In reply to Simon Montagu from comment #50)
> (In reply to Jonathan Kew (:jfkthame) from comment #49)
> > This implies that we don't break between characters that form a ligature
> > (lam-alef). That seems surprising; although it'll look bad, I'd expect this
> > to be a legal break position.
> 
> FWIW, IE doesn't break a lam-alef ligature, but Webkit does. Chrome on
> Windows (but not on Mac) also breaks between base letters and diacritics,
> which I'm sure is wrong;

Indeed.

> between Devanagari consonants and matras; and
> between every character in NFD Hangul. The last two I suppose are
> permissible by the letter of the spec, but (IMNSHO) that just means that the
> spec should be changed.

I don't think the Devanagari case is permissible per spec, because the matras are not "letters" as defined in CSS3-Text (characters with GC=L* or N*); they're combining marks (GC=Mc or Mn).

The Hangul NFD breaks would be allowed, presumably, though they shouldn't be.
Comment 53 Jonathan Kew (:jfkthame) 2012-05-03 03:09:30 PDT
The patches here result in an unwanted dependency on diacritics.

In this example:

  data:text/html;charset=utf-8,<div style="width:0px; word-break:break-all">لَام السلَام

the lam-alef ligatures (which have a diacritic on the lam) do not break; however, in

  data:text/html;charset=utf-8,<div style="width:0px; word-break:break-all">لام السلام

where the diacritics are omitted, a break is allowed between the lam and alef.
Comment 54 Jonathan Kew (:jfkthame) 2012-05-03 03:19:57 PDT
Another oddity, mentioned on IRC but adding it here so we don't lose it...

  data:text/html;charset=utf-8,<div style="width:0px; word-break:break-all">fire f̣ire

The first "fire" (without the added combining mark under "f") doesn't break at all.
Comment 55 Jonathan Kew (:jfkthame) 2012-05-03 03:30:32 PDT
And one more example:

  data:text/html;charset=utf-8,<div style="width:0px; word-break:break-all">لاَم

(i.e. with a diacritic on the alef of the lam-alef ligature). In this case, the lam-alef stays together but then we break before the diacritic.
Comment 56 Jonathan Kew (:jfkthame) 2012-05-03 03:40:23 PDT
data:text/html;charset=utf-8,<div style="width:0px; word-break:break-all">fi̥re

(where the font has an "fi" ligature, but not an "i̥" composite).... renders the entire fi-ligature plus  ̥ diacritic on the second line, with a blank line before it.
Comment 57 Simon Montagu :smontagu 2012-05-03 07:04:15 PDT
Created attachment 620689 [details] [diff] [review]
Part 3 -- fix for mixed 8-bit and 16-bit text

This fixes the issue described in commment 54
Comment 58 Jonathan Kew (:jfkthame) 2012-05-03 09:43:54 PDT
Here's another unexpected interaction - compare:

  data:text/html;charset=utf-8,<div style="width:0px; word-break:break-all">ab̥c

(which works as expected) with:

  data:text/html;charset=utf-8,<div style="text-transform:lowercase; width:0px; word-break:break-all">ab̥c

In the latter case, the  ̥ gets split onto its own line.
Comment 59 Jet Villegas (:jet) 2012-05-03 12:14:56 PDT
Nominating for Fennec tracking...
Comment 60 Jonathan Kew (:jfkthame) 2012-05-04 04:51:58 PDT
Comment on attachment 620311 [details] [diff] [review]
Even more reftests

Given the odd behavior of the Arabic lam-alef ligature (whether we break depends on the presence/absence of a diacritic), I think we should drop that testcase for now. If we don't figure out and fix the behavior there before landing (which I don't think is critical), let's have a followup bug to deal with it.

So r=me for the tests with #7 removed. And please add <!DOCTYPE html> to them, just because it's the Right Thing to avoid quirks mode, although I don't think it affects how they behave.
Comment 61 Jonathan Kew (:jfkthame) 2012-05-04 07:39:47 PDT
I figured out the source of the issue described in comment #53; this is not an issue with the word-break patches, but a result of how our harfbuzz glue deals with character/glyph associations.

In the simple lam-alef case, we have two characters that result in a single glyph, which is associated with the first character position; the second has no glyph, but is marked as a ligature continuation. But each character is still considered a valid cluster-start and so a break can happen (and we then draw half the ligature on each line, which is ugly, but that's a separate issue...)

However, in the lam-fatha-alef case, we have three characters that map to two glyphs, but the first glyph is associated with the character offsets 0 and 2, and the second glyph is associated with the character offset 1. In gfxHarfBuzzShaper, we see the discontiguous character range for the first glyph, and extend our cluster to "fill in" the gap, so that we end up with a cluster that has three characters and two glyphs, both stored in a DetailedGlyphs list at the first character position, with the fatha and alef both marked as continuations of the cluster and ligature.

It's not clear to me that gfxHarfBuzzShaper actually needs to modify the cluster information in this case, however. I think this was originally done so that Indic examples involving reordering (e.g. Devanagari short-i) would behave properly (if हि were not treated as a single cluster, the user would appear to be able to select the individual characters, but the actual underlying selection would not match the apparent highlighting). However, I don't think clustering based on detection of reordering in gfxHarfBuzzShaper is in fact needed for that, because the Indic vowel matras are already category M*, and so will be cluster-extending characters regardless of reordering; the standard SetupClusterBoundaries should have handled this adequately.

Therefore, I think we may be able to remove this reordering-detection and resulting cluster-formation from gfxHarfBuzzShaper, and that will resolve the discrepancy in break-all behavior between lam-alef and lam-fatha-alef. (The result will be that in both cases, a break before the alef can occur, and they'll look pretty bad. But that's just the existing problem of line-breaking within ligatures, as in bug 479829; in certain cases, we need to re-shape at line breaks.)

I'm experimenting with a patch to fix the clustering issue here, and will post it if it doesn't seem to cause other problems.
Comment 62 Jonathan Kew (:jfkthame) 2012-05-04 09:16:05 PDT
Created attachment 621070 [details] [diff] [review]
part 4 - harfbuzz shaper should not modify character-clustering info due to diacritic within ligature

This fixes the lam-fatha-alef clustering issue, so that the line-breaking behavior is not affected by the presence of diacritics. It will make the wordbreak-7 testcase fail, but as already noted, I think that test is wrong and should be dropped for now. We can't just "fix" the reference there because of the need to re-shape when line breaks occur within ligatures (which we don't support).

(You could have an Arabic testcase that doesn't include lam-alef ligatures, I suppose; that might be worthwhile for now, so that we've at least got RTL test coverage here.)
Comment 63 JP Rosevear [:jpr] 2012-05-04 11:56:29 PDT
Jet, we need a better rationale for why this is a blocker.  Please renom with that.
Comment 64 Jonathan Kew (:jfkthame) 2012-05-05 03:55:32 PDT
Created attachment 621272 [details] [diff] [review]
modification to wordbreak-7 test

This modifies the wordbreak-7 test from "Even more reftests", splitting it into two testcases - 7a, without any lam-alef ligatures, which passes, and 7b that includes ligatures and is marked as known-fail due to bug 479829, but has the appropriate reference rendering in preparation for the day when we address that issue.
Comment 65 Simon Montagu :smontagu 2012-05-07 08:30:47 PDT
Comment on attachment 555997 [details] [diff] [review]
Part 1 - Add word-break interface to nsILineBreaker v2

Review of attachment 555997 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with a few changes. I'll check in the whole patchset from this bug including the changes.

::: content/base/public/nsLineBreaker.h
@@ +205,5 @@
> +   * Set word-break mode for linebreaker.  This is set by word-break property.
> +   * @param aMode is nsILineBreaker::kBreakMode_* value.
> +   */
> +  void SetBreakMode(PRUint8 aMode) { mBreakMode = aMode; }
> +

As dbaron already suggested, s/BreakMode/WordBreak/ throughout

::: content/base/src/nsLineBreaker.cpp
@@ +87,5 @@
>    nsTArray<PRPackedBool> capitalizationState;
>  
>    if (!mCurrentWordContainsComplexChar) {
> +    // For break-strict set everything internal to "break", otherwise to
> +    // just set everything internal to "no break"!

// For break-strict set everything internal to "break", otherwise
// to "no break"!

@@ +407,5 @@
>      PRBool isBreakableSpace = isSpace && !(aFlags & BREAK_SUPPRESS_INSIDE);
>  
>      if (aSink) {
> +      // Consider word-break style.  Since the break position of CJK scripts
> +      // will set by nsILineBreaker, we don't consider CJK at this point.

"will be set"

::: intl/lwbrk/public/nsILineBreaker.h
@@ +47,4 @@
>  #define NS_ILINEBREAKER_IID \
>  { 0x5ae68851, 0xd9a3, 0x49fd, \
>      { 0x93, 0x88, 0x58, 0x58, 0x6d, 0xad, 0x80, 0x44 } }
>  

rev the interface.

::: intl/lwbrk/src/nsJISx4501LineBreaker.cpp
@@ +553,5 @@
> +// as JIS X4051.
> +// If break rule is changed by mode, set it to aAllow
> +
> +static PRBool
> +ShouldBreakByBreakMode(PRUint8 aMode, PRBool aAllow)

The helper function isn't necessary -- see below.

@@ +877,1 @@
>      PRBool allowBreak;

This can be done more simply without the helper function, like this:

-    bool allowBreak;
+    bool allowBreak = false;
     if (cur > 0) {
       NS_ASSERTION(CLASS_COMPLEX != lastClass || CLASS_COMPLEX != cl,
                    "Loop should have prevented adjacent complex chars here");
-      if (state.UseConservativeBreaking())
-        allowBreak = GetPairConservative(lastClass, cl);
-      else
-        allowBreak = GetPair(lastClass, cl);
-    } else {
-      allowBreak = false;
+      if (aWordBreak == nsILineBreaker::kWordBreak_Normal) {
+        allowBreak = (state.UseConservativeBreaking()) ?
+          GetPairConservative(lastClass, cl) : GetPair(lastClass, cl);
+      } else if (aWordBreak == nsILineBreaker::kWordBreak_BreakAll) {
+        allowBreak = true;
+      }

@@ +905,5 @@
> +      // We have to consider word-break value again for complext characters
> +      if (aBreakMode != nsILineBreaker::kBreakMode_Normal) {
> +        // Respect word-break property 
> +        for (PRUint32 i = cur; i < end; i++)
> +          aBreakBefore[i] = ShouldBreakByBreakMode(aBreakMode, aBreakBefore[i]);

This too can be simplified to
   aBreakBefore[i] = (aWordBreak == nsILineBreaker::kWordBreak_BreakAll)
since we are already inside 
 if (aBreakMode != nsILineBreaker::kBreakMode_Normal) {}

@@ +944,3 @@
>      }
>  
>      PRBool allowBreak;

this can be simplified in the same way as the parallel method above
Comment 66 Simon Montagu :smontagu 2012-05-07 08:40:21 PDT
Comment on attachment 555998 [details] [diff] [review]
Part 2 - Implement word-break property v3

Review of attachment 555998 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/nsLineBreaker.cpp
@@ +430,1 @@
>                                breakState.Elements() + wordStart);

I moved this hunk into the first patch so that each patch builds independently

::: layout/generic/nsTextFrameThebes.cpp
@@ +1769,5 @@
> +  if (wordBreak != NS_STYLE_WORDBREAK_NORMAL) {
> +    PRUint8 breakRule;
> +    switch (wordBreak) {
> +    case NS_STYLE_WORDBREAK_BREAK_ALL:
> +      breakRule = nsILineBreaker::kBreakMode_BreakAll;

s/BreakMode/WordBreak/ in this patch too

::: layout/reftests/text/wordbreak-1-ref.html
@@ +1,1 @@
> +<html>

I added <!DOCTYPE html> to these tests per comment 60

::: layout/style/nsCSSProps.cpp
@@ +1343,1 @@
>  const PRInt32 nsCSSProps::kWordwrapKTable[] = {

I capitalized WordWrap as well per comment 34
Comment 69 Joe Drew (not getting mail) 2012-05-11 12:30:31 PDT
This is a fennec 1.0 blocker; can you either nom the relevant patches, or find some other way of fixing it?
Comment 70 Jet Villegas (:jet) 2012-05-14 01:22:46 PDT
I don't think we'll have any other way to fix this, but to take the entire patch queue. Fennec Triagers: this bug implements a missing feature and is plenty of code. I recommend we let it into Fennec to avoid shipping with bug 748249.
Comment 71 Jean-Yves Perrier [:teoli] 2012-05-14 01:39:15 PDT
Just curious, why is this a blocker? The comments on this bug asking for blocking and granting the block doesn't explain why.

This is of interest for documenting the feature: it may help us find practical examples of use.
Comment 72 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2012-05-14 01:52:17 PDT
Because it's our only failure in Ring 0 of the Ringmark test suite, which has gotten a bit of attention lately.
Comment 73 Jonathan Kew (:jfkthame) 2012-05-14 03:16:23 PDT
Created attachment 623622 [details] [diff] [review]
roll-up patch for Aurora

This is a roll-up of the 6 patches landed above, rebased to current Aurora tip. (Just needed minor context change in nsStyleStruct.) Carrying forward r+ from the individual patches.

I'm currently building aurora tip + this patch locally to check that it all transplanted ok, but don't expect any problems.
Comment 74 Jonathan Kew (:jfkthame) 2012-05-14 03:45:23 PDT
Comment on attachment 623622 [details] [diff] [review]
roll-up patch for Aurora

[Approval Request Comment]
Regression caused by (bug #): new feature wanted for fennec

User impact if declined: fail to complete Ringmark ring 0

Testing completed (on m-c, etc.): landed with unit tests

Risk to taking this patch (and alternatives if risky): Although the patch is quite big, the behavior change is pretty localized and should not affect pages that don't actually use the word-break feature, so I think the risk is reasonably low. If the patches regressed existing line-breaking behavior, unit tests on m-c would likely have detected this already.

String changes made by this patch: none
Comment 75 JP Rosevear [:jpr] 2012-05-14 12:18:29 PDT
This was accidently blocking +'ed by me last week.  This patch affects desktop to, there is no need to rush this through into aurora, unless this fixes other blockers.
Comment 76 Jean-Yves Perrier [:teoli] 2012-07-20 11:40:32 PDT
Kato-san created:
https://developer-new.mozilla.org/en-US/docs/CSS/word-break

and

https://developer-new.mozilla.org/en-US/docs/Firefox_15_for_developers is up-to-date.

We are done here. (Slightly bettered on Kuma, soon on the regular MDN site)

Note You need to log in before you can comment on or make changes to this bug.