Closed Bug 827779 Opened 11 years ago Closed 10 years ago

Break word on Soft hyphen not working in FF 17 after "»-"

Categories

(Core :: Layout: Text and Fonts, defect)

17 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: fabio, Assigned: jfkthame)

Details

Attachments

(5 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/536.26.17 (KHTML, like Gecko) Version/6.0.2 Safari/536.26.17

Steps to reproduce:

It should break the word "«The Trammps»-Sänger" at the soft-hyphen to a new line because this word has not enough place.


Actual results:

The word will not be broken correct at soft hyphen if a "»" is infront of. This problem persists since FF 17. In other browser it will break correct.


Expected results:

The word should brake on soft hypen to a new line.
Where on blick.ch is that made off?

I just found
http://www.blick.ch/people-tv/musik/saenger-von-us-soulgruppe-the-trammps-gestorben-id1800150.html
http://www.blick.ch/people-tv/hotshots/the-trammps-saenger-ist-tot-id1800200.html

On those there are no Areas like on your attached Images.
Flags: needinfo?(fabio)
1) Check, if problem exists in current nightly version from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/firefox-21.0a1.en-US.mac.dmg
2) If yes, attach to this bug simple testcase showing the issue
Component: Untriaged → Layout: Text
Product: Firefox → Core
the problem persists also in the nightly build.
this thing is on the front page also on blick.ch but they all day changing the teased article. i uploaded a image more in the context.
Flags: needinfo?(fabio)
Attachment #699605 - Attachment description: showing the hot shots teaser on the front → showing the hot shots teaser on the front (you see it on the bottom of the image
Testcase: 

1. go on blick.ch
2. search the hotshots teaser (the image called "showing the hot shots teaser on the front (you see it on the bottom of the image" helps to find where it is located on the front)
3. Change the current Teasertitle with help of Firefox-Plugin "firebug" (www.getfirebug.com) with 
following teaser title "«The Trammps»-Sänger ist tot" (I uploaded an image there you see blue marked that the title between the span should be changed)
4. you should see that the word "Sänger" is not on a new line. 

Hint: you can also make a more longer title like ""«The Trammps»-SängerTEST ist tot" than you see that the "SängerTest" is not on a new line.
Testcase: 

1. go on blick.ch
2. search the hotshots teaser (the image called "showing the hot shots teaser on the front (you see it on the bottom of the image" helps to find where it is located on the front)
3. Change the current Teasertitle with help of Firefox-Plugin "firebug" (www.getfirebug.com) with 
following teaser title "«The Trammps»-Sänger ist tot" (I uploaded an image there you see blue marked that the title between the span should be changed)
4. you should see that the word "Sänger" is not on a new line. 

Hint: you can also make a more longer title like ""«The Trammps»-SängerTEST ist tot" than you see that the "SängerTest" is not on a new line.
A simple testcase:

data:text/html;charset=utf-8,<div style="font-family:times;width:19em;border:1px solid red">«The Trammps»-Sänger «The Trammps»-Sänger «The Trammps»-Sänger

(width needed might vary depending on available fonts; this works to illustrate the issue on OS X.)

The problem is that <hyphen> is only treated as a valid place to break if it is both preceded and followed by letters or numbers (but not numbers on both sides):

http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsJISx4501LineBreaker.cpp#680

So the presence of » (or other punctuation characters, like !, ?, /, etc) before the hyphen effectively prevents us using it as a break point.

If we relax the check here, e.g., to just require a letter on one side of the hyphen, then this example will break as desired. However, we'd need to check what other situations this affects, as there may have been compelling reasons why the code was written this way in the first place.
Status: UNCONFIRMED → NEW
Ever confirmed: true
We should be careful when we add a new breakable point since that means we give up the compatibility with older Gecko. It may cause some people complaining about that.
I agree we need to be careful with changes here, though at least if we're changing to match other browsers, the risk of web-compat problems may not be too great. As an experiment, I modified the code as suggested above, and pushed a tryserver job; it didn't show any unit test failures (https://tbpl.mozilla.org/?tree=Try&rev=08cac36db936). Please consider this and see if you think it will have undesirable effects anywhere (if so, testcases illustrating where it's a problem would be great).
Attachment #699687 - Flags: review?(masayuki)
Simple reftest based on the example reported here.
Attachment #700553 - Flags: review?(masayuki)
Attachment #699687 - Flags: review?(masayuki) → review+
Attachment #700553 - Flags: review?(masayuki) → review+
Comment on attachment 699687 [details] [diff] [review]
allow break after hyphen even if preceded or followed by non-alphanumeric char.

Hmm, please wait. I'd like to think about this change deeper.
Attachment #699687 - Flags: review+ → review?(masayuki)
Comment on attachment 699687 [details] [diff] [review]
allow break after hyphen even if preceded or followed by non-alphanumeric char.

>@@ -672,26 +672,25 @@ ContextualAnalysis(PRUnichar prev, PRUni
>     if (IS_HYPHEN(next))
>       return CLASS_CHARACTER;
>     // If prev and next characters are numeric, it may be in Math context.
>     // So, we should not break here.
>     bool prevIsNum = IS_ASCII_DIGIT(prev);
>     bool nextIsNum = IS_ASCII_DIGIT(next);
>     if (prevIsNum && nextIsNum)
>       return CLASS_NUMERIC;
>-    // If one side is numeric and the other is a character, or if both sides are
>-    // characters, the hyphen should be breakable.
>+    // If either side (or both) is a character, the hyphen should be breakable.
>     if (!aState.UseConservativeBreaking(1)) {
>       PRUnichar prevOfHyphen = aState.GetPreviousNonHyphenCharacter();
>       if (prevOfHyphen && next) {
>         bool prevIsChar = !NEED_CONTEXTUAL_ANALYSIS(prevOfHyphen) &&
>                             GetClass(prevOfHyphen) == CLASS_CHARACTER;
>         bool nextIsChar = !NEED_CONTEXTUAL_ANALYSIS(next) &&
>                             GetClass(next) == CLASS_CHARACTER;
>-        if ((prevIsNum || prevIsChar) && (nextIsNum || nextIsChar))
>+        if (prevIsChar || nextIsChar)
>           return CLASS_CLOSE;
>       }

I'm still concerned about the compatibility with old Gecko's behavior by this change.

The classes are defined as:
http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsJISx4501LineBreaker.cpp#316

So, I think that prevIsChar and nextIsChar should be true when those classes are CLASS_CHARACTER *or* CLASS_OPEN_LIKE_CHARACTER or CLASS_CLOSE_LIKE_CHARACTER. Then, I guess that the new reftest is passed and the risk is lower than your approach. How do you think about this idea?

If you agree with my idea, then, don't call GetClass() multiple times for every comparing with the classes since here is very sensitive for performance.
hi together 

how is the state for this problem ?

Best regards!

Fabio
(sorry, been busy with other issues...)

(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #14)
> So, I think that prevIsChar and nextIsChar should be true when those classes
> are CLASS_CHARACTER *or* CLASS_OPEN_LIKE_CHARACTER or
> CLASS_CLOSE_LIKE_CHARACTER. Then, I guess that the new reftest is passed and
> the risk is lower than your approach. How do you think about this idea?

This would fix the original example here (as in comment 8), but does not handle the similar case where the word *after* the hyphen is in quote marks (whether English-style “quotes” or European «guillemets»), because the “, ‘ and « characters are not classified as OPEN_LIKE (or OPEN), they are caught by the NEED_CONTEXTUAL_ANALYSIS macro.

So with that approach, we'll fail to find a line-break in:

  data:text/html;charset=utf-8,<div style="width:0pt">extra-“special”

even though we *would* find one in:

  data:text/html;charset=utf-8,<div style="width:0pt">“extra”-special

I think both of these should permit a break after the hyphen.
hi togehter

can you give a date already when this problem will be fixed

best regards
masayuki, I'm not sure your suggested change from comment 14 is good, because it will leave inconsistent behavior as described in comment 16. So I'd prefer to stay with the original patch here, unless there are specific problems it will cause. Please take another look and decide if you think this is OK.
I have some urgent work for now. I'll check this again as far as possible. Sorry.
hi guys

what's going on with this task?
Sorry for the delay, this bug is in the bottom of my queue. I work hard on D3E Keyboard.key for all platforms (especially, B2G people need this ASAP) and TSF related bugs for Metrofox for this a couple of months. However, the patch of this bug tries to change VERY sensitive and complicated part. So, I want to check the change deeply at least for a day. But unfortunately, I don't have much time for doing that. Perhaps, D3E KeyboardEvent.key will be fixed soon, then, probably I can be back here.

I apologize the delay, it's really my fault but still I need your perseverance. You don't need additional ping for me at least for two weeks.

Thank you.
Sorry for the delay, now, this is at the top of my queue. I'm trying to review this again.
I'm still afraid that the patch's approach because it could cause unexpected line break. So, I'd like to suggest more contextual approach.

What should be changed in this bug is, only between <hyphen> and open quote like character becomes a break opportunity but it should be only when the previous character of hyphen(s) is a character class, right? Then, we should just do so.

Before this line,
http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsJISx4051LineBreaker.cpp#685
You should insert special handler for U_CLOSE_(SINGLE_QUOTE|DOUBLE_QUOTE|GUILLEMET).

Then, we can just change the behavior what is expected by this bug.

I'd like to suggest following style:

if (prevOfHyphen && next) {
  switch (next) {
    case U_CLOSE_SINGLE_QUOTE:
    case U_CLOSE_DOUBLE_QUOTE:
    case U_CLOSE_GUILLEMET:
      if (prevIsNum || GetClass(prevOfHyphen) == CLASS_CHARACTER) {
        return CLASS_CLOSE;
      }
      break;
    default:
      bool prevIsChar = !NEED_CONTEXTUAL_ANALYSIS(prevOfHyphen) &&
                        GetClass(prevOfHyphen) == CLASS_CHARACTER;
      bool nextIsChar = !NEED_CONTEXTUAL_ANALYSIS(next) &&
                        GetClass(next) == CLASS_CHARACTER;
      if ((prevIsNum || prevIsChar) && (nextIsNum || nextIsChar)) {
        return CLASS_CLOSE;
      }
      break;
  }
}
Oops, I meant:

  switch (prevOfHyphen) {
    case U_CLOSE_SINGLE_QUOTE:
    case U_CLOSE_DOUBLE_QUOTE:
    case U_CLOSE_GUILLEMET:
      if (nextIsNum || GetClass(next) == CLASS_CHARACTER) {
        return CLASS_CLOSE;
      }
      break;
> What should be changed in this bug is, only between <hyphen> and open quote like 
> character becomes a break opportunity but it should be only when the previous 
> character of hyphen(s) is a character class, right?

So, this should be:

What should be changed in this bug is, only between <hyphen> and character or numeric becomes a break opportunity even after a punctuation like close quote but it should not be so when the hyphen is surrounded by non-character class characters.
hi Guys

Do you have some updates regarding this issue?

Best regards

Fabio
Some great news ??
hi guys? whats going on with this issue?
Assignee: nobody → masayuki
Jonathan Kew already posted the patch and I'm a reviewer of it.
Assignee: masayuki → jfkthame
hi guys

is this fixed now?
No.

When it's fixed, the status will be changed to "resolved: fixed".

Meanwhile, please refrain from nagging here. That doesn't help towards fixing the bug. I'm sorry if this issue is particularly bothering you, but you should understand that it may not be the most major/important/urgent thing we have to work on.
Comment on attachment 699687 [details] [diff] [review]
allow break after hyphen even if preceded or followed by non-alphanumeric char.

-'ing this, already reviewed.
Attachment #699687 - Flags: review?(masayuki) → review-
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #25)

> What should be changed in this bug is, only between <hyphen> and character
> or numeric becomes a break opportunity even after a punctuation like close
> quote but it should not be so when the hyphen is surrounded by non-character
> class characters.

I don't think that (and the suggested code from comments 23-24) really covers the issue; it's too narrowly specific, and will miss analogous cases that should behave similarly. A break after hyphen should be possible in cases before opening punctuation, as well as after closing punctuation; and this applies to opening/closing parentheses, etc., as well as quote marks.

So IMO, we should allow a break after the hyphen, despite the presence of adjacent nonletters, in any of the following examples:

  extra-“special”
  “extra”-special
  “extra”-“special”
  extra-«special»
  «extra»-special
  «extra»-«special»
  extra-(special)
  (extra)-special
  (extra)-(special)

though obviously not in variations like:

  (extra-)special
  extra(-special)
Here is a new version that is stricter than the original patch here but still aims to cover the classes of example mentioned in comment 33.
Attachment #8394874 - Flags: review?(masayuki)
Attachment #699687 - Attachment is obsolete: true
And updated the reftest to also use the examples from comment 33.
Attachment #8394875 - Flags: review?(masayuki)
Attachment #700553 - Attachment is obsolete: true
Try run with this patch & reftest: https://tbpl.mozilla.org/?tree=Try&rev=4235ef6c1f56
Attachment #8394875 - Flags: review?(masayuki) → review+
Comment on attachment 8394874 [details] [diff] [review]
allow break after hyphen even if preceded by closing punctuation and/or followed by opening punctuation.

># HG changeset patch
># User Jonathan Kew <jkew@mozilla.com>
># Date 1395418663 0
>#      Fri Mar 21 16:17:43 2014 +0000
># Node ID d624f3236c0bd32d0f58d757b2f23e6d7a8a0d4f
># Parent  1f4d95d9b225543d429583675b7416a6c7f026a9
>bug 827779 - allow break after hyphen even if preceded by closing punctuation and/or followed by opening punctuation. r?masayuki
>
>diff --git a/intl/lwbrk/src/nsJISx4051LineBreaker.cpp b/intl/lwbrk/src/nsJISx4051LineBreaker.cpp
>--- a/intl/lwbrk/src/nsJISx4051LineBreaker.cpp
>+++ b/intl/lwbrk/src/nsJISx4051LineBreaker.cpp
>@@ -678,22 +678,35 @@ ContextualAnalysis(char16_t prev, char16
>     bool nextIsNum = IS_ASCII_DIGIT(next);
>     if (prevIsNum && nextIsNum)
>       return CLASS_NUMERIC;
>     // If one side is numeric and the other is a character, or if both sides are
>     // characters, the hyphen should be breakable.
>     if (!aState.UseConservativeBreaking(1)) {
>       char16_t prevOfHyphen = aState.GetPreviousNonHyphenCharacter();
>       if (prevOfHyphen && next) {
>-        bool prevIsChar = !NEED_CONTEXTUAL_ANALYSIS(prevOfHyphen) &&
>-                            GetClass(prevOfHyphen) == CLASS_CHARACTER;
>-        bool nextIsChar = !NEED_CONTEXTUAL_ANALYSIS(next) &&
>-                            GetClass(next) == CLASS_CHARACTER;
>-        if ((prevIsNum || prevIsChar) && (nextIsNum || nextIsChar))
>+        int8_t prevClass = GetClass(prevOfHyphen);
>+        int8_t nextClass = GetClass(next);
>+        bool prevIsNumOrCharOrOpen =
>+          prevIsNum ||
>+          (prevClass == CLASS_CHARACTER &&
>+            !NEED_CONTEXTUAL_ANALYSIS(prevOfHyphen)) ||
>+          prevClass == CLASS_CLOSE ||
>+          prevClass == CLASS_CLOSE_LIKE_CHARACTER;

prevIsNumOrCharOrClose?
                 ^^^^^

>+        bool nextIsNumOrCharOrClose =
>+          nextIsNum ||
>+          (nextClass == CLASS_CHARACTER && !NEED_CONTEXTUAL_ANALYSIS(next)) ||
>+          nextClass == CLASS_OPEN ||
>+          nextClass == CLASS_OPEN_LIKE_CHARACTER ||
>+          next == U_OPEN_SINGLE_QUOTE ||
>+          next == U_OPEN_DOUBLE_QUOTE ||
>+          next == U_OPEN_GUILLEMET;

nextIsNumOfCharOrOpen?
                 ^^^^
Attachment #8394874 - Flags: review?(masayuki) → review+
Oops, yes, I got those names reversed - sorry! Fixed locally.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: