Closed Bug 809020 Opened 7 years ago Closed 3 years ago

Hyphenated text in narrow box sometimes fails to wrap and exceeds bounds

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: u444543, Assigned: jfkthame)

References

(Blocks 1 open bug, )

Details

Attachments

(7 files, 1 obsolete file)

Attached file TestCase.htm
User Agent: Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0
Build ID: 20121024073032

Steps to reproduce:

Browsed the TestCase.htm page in Firefox 16.0.2 on Windows XP SP3.


Actual results:

The hyphenated text does not wrap to a second line as expected but instead exceeds the bounds of the table cell, even though the cell has width and max-width inline CSS set.


Expected results:

The text should have wrapped at the hyphen like it does in Chrome, IE and Opera. Similarly if the max-width inline CSS attribute is stripped.
Attachment #678720 - Attachment mime type: text/plain → text/html
bug 95067?
Component: Untriaged → Layout: Text
OS: Windows XP → All
Product: Firefox → Core
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Component: Layout: Text → Untriaged
OS: All → Windows XP
Product: Core → Firefox
Resolution: --- → DUPLICATE
Duplicate of bug: uax14
Component: Untriaged → Layout: Text
OS: Windows XP → All
Product: Firefox → Core
Hardware: x86 → All
I don't think this is a dupe of 95067. We do break lines after hyphen when needed - usually. But there appears to be a bug whereby we sometimes fail to break when we ought to; I've been able to reproduce this with a small <div>, too, it's not unique to table cells.

The exact width needed to demonstrate the problem may be system-dependent (as fonts and hence text width may vary), but these examples illustrate the issue for me on OS X with default font settings:

(a) data:text/html,<div style="border:1px solid red;width:80px">Flaschen-teste</div>

(b) data:text/html,<div style="border:1px solid red;width:80px">Flasch-en-tes-te</div>

Example (a) fails to wrap at the hyphen, although it clearly should. Example (b) does wrap, showing that we are capable of line-breaking after a hyphen (sometimes!).

Even more curious is example (c):

(c) data:text/html,<div style="border:1px solid red;width:80px">Flaschen-testes</div>

This breaks correctly at the hyphen, which is the same breakpoint as we failed to find in example (a), although the only difference is the addition of an extra letter at the end of the word! It seems as though there have to be at least 6 characters following the hyphen, otherwise we don't break the line. (I'm not sure why 6 should be the magic number, or whether this is really consistent in all cases.)
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: DUPLICATE → ---
Summary: Table cell hyphenated text fails to wrap and exceeds bounds → Hyphenated text in narrow box sometimes fails to wrap and exceeds bounds
Version: 16 Branch → Trunk
Aha - I think the "magic number" 6 comes from here:
http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsJISx4501LineBreaker.cpp#566

While avoiding breaks very near the start/end of a word is generally a good idea, I think the number here may be "too conservative", particularly when we're dealing with narrow lines (as in the original example here), or when no other line break opportunity exists. We need to relax this constraint, or find a way to make this more flexible.
Duplicate of this bug: 875078
Duplicate of this bug: 732479
The source link in comment 4 is broken (by now).

So what would be a solution to this issue? (I can't see the responsible code to make a good suggestion...)
Attached file TestCase2.htm
Proper link for comment 4:
http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/nsJISx4051LineBreaker.cpp#565
>#define CONSERVATIVE_BREAK_RANGE 6
added in bug 389056
http://hg.mozilla.org/mozilla-central/rev/579d8ab862ee

ni? the author of these changes regarding comment 3.
Flags: needinfo?(masayuki)
The limitation is necessary for preventing odd line-breaking in smiley (face mark).
Flags: needinfo?(masayuki)
Attached image firefox_linebreak.png
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #10)
> The limitation is necessary for preventing odd line-breaking in smiley (face
> mark).

Not a day goes by without me encountering another case of Firefox's reluctance for linebreaking breaking the layout of text.

Isn't there something which can be done?
Attached image hyphens matter.png
Are smileys more important than proper text-layout?
Flags: needinfo?(masayuki)
(In reply to Elbart from comment #12)
> Created attachment 8657177 [details]
> hyphens matter.png
> 
> Are smileys more important than proper text-layout?

Basically, I think so. However, if you can suggest better idea to distinguish "-" between 2 words, we should take it.
Flags: needinfo?(masayuki)
How about: There are only letters (and maybe digits, like in "100-jährig") and no punctuation around the hyphen. Smileys with a hyphen usually have a dot or colon or parentheses around it. Other smileys might not even be recognised by a human... It's simple and probably still a hack, but less so than right now.
Duplicate of this bug: 1302251
Duplicate of this bug: 1341198
I just filed 1341198 because of this, after I couldn't explain to my college why a word with two columns breaks as expected in Edge, Chrome, etc. but not in Firefox. It would be great if this could be changed so that four-letters or 100-jährig wraps if needed (and as expected).
Here's a reftest (currently marked as failing) with a bunch of examples of the problem here: hyphenated compounds that would reasonably be expected to break (if necessary), along with some examples where we really don't want to break at a hyphen (smiley sequences, and IMO also single-character prefixes or suffixes such as "T-shirt"). I don't think it's possible to come up with a heuristic that will be "right" 100% of the time, partly because the desirability (or otherwise) of allowing a break at an explicit hyphen depends on many details of context and semantics, and even then is often a subjective matter; but I do think we can make things substantially better than they are right now, such that the examples in this test all behave as per the reference file.
Attachment #8841248 - Flags: review?(VYV03354)
Proposed patch that adjusts the "conservative" breaking range where we avoid breaks at hyphens (among other things) near word edges. The basic idea here is to check whether we're dealing with letter or non-letter characters, and use different "conservative" ranges for the two cases. This makes the above reftest pass. It also leads to failures on a few existing tests, but IMO they are acceptable changes to cases that are not well-specified and where browsers differ radically in their behavior, so there's no existing interop to worry about. Patch to adjust the existing tests follows...
Attachment #8841249 - Flags: review?(VYV03354)
(To be folded into the preceding patch.) Here are the adjustments needed to existing reftests if we take the patch above. I think these behavior changes are reasonable, and the improvement in behavior for "normal" hyphenated compounds is well worth the impact here.
Attachment #8841251 - Flags: review?(VYV03354)
Comment on attachment 8841248 [details] [diff] [review]
Reftest for breaking near word beginning/end (the "conservative" breaking range)

I think Nakano-san should review this.
Attachment #8841248 - Flags: review?(VYV03354) → review?(masayuki)
Attachment #8841249 - Flags: review?(VYV03354) → review?(masayuki)
Attachment #8841251 - Flags: review?(VYV03354) → review?(masayuki)
Comment on attachment 8841251 [details] [diff] [review]
part 1.1 - Update existing reftests that are affected by the changed behavior

>-<p>2007-Jan-01</p>
>+<p>2007-<br>Jan-01</p>

Hmm, is this change really okay? Although, this is hacky idea, shouldn't we keep not breaking after "-" if it's previous character is a numeric?

>-<p>2007-Jan-01&nbsp;00:00:00</p>
>+<p>2007-<br>Jan-01&nbsp;00:00:00</p>

Same here.


But I'm not a native speaker of Western languages and I'm Japanese speaker, so, breaking at almost all points doesn't make me feel odd. So, I don't know if this change is acceptable for Western users. How do you think (Or, do you know somebody who is familiar with typography of Western languages)?
Flags: needinfo?(jfkthame)
(In reply to Masayuki Nakano [:masayuki] from comment #22)
> Comment on attachment 8841251 [details] [diff] [review]
> part 1.1 - Update existing reftests that are affected by the changed behavior
> 
> >-<p>2007-Jan-01</p>
> >+<p>2007-<br>Jan-01</p>
> 
> Hmm, is this change really okay? Although, this is hacky idea, shouldn't we
> keep not breaking after "-" if it's previous character is a numeric?

It's hard to say. In general, it might be preferable not to break such a date; but on the other hand, if it comes down to a choice between breaking the date or overflowing the line, it's better to break. Moreover, there are plenty of examples (e.g. "100-jährig", see comment 14) where allowing the break seems quite reasonable.

Also note that Chrome allows these breaks (more widely than even the patched version of Firefox); it breaks

    <p>2007-Jan-01</p>

as

    2007-
    Jan-
    01

in that reftest. So there is far from being any interop that authors can rely on; and our (patched) behavior is still more conservative than Chrome's.

> 
> >-<p>2007-Jan-01&nbsp;00:00:00</p>
> >+<p>2007-<br>Jan-01&nbsp;00:00:00</p>
> 
> Same here.

The &nbsp; here suggests an author who is deliberately trying to avoid line-breaks within the time/date string; but in that case, they should have also used non-breaking hyphens (&#x2011;) to make sure of that. Again, note that Chrome would break this as

    2007-
    Jan-
    01 00:00:00

so authors who want to avoid such breaks already need to be more careful about their content.

> 
> But I'm not a native speaker of Western languages and I'm Japanese speaker,
> so, breaking at almost all points doesn't make me feel odd. So, I don't know
> if this change is acceptable for Western users. How do you think (Or, do you
> know somebody who is familiar with typography of Western languages)?

My view here is that relaxing our "conservative" rules slightly for these examples is acceptable. Ideally, we'd have multiple "levels" of hyphenation points with differing priorities: some that are always available to use, and others that are only used as a last resort to avoid overflow. But our current line-breaking architecture isn't that sophisticated.

The problem of -failing- to break in examples where the hyphen would be expected to allow a break is much more serious (see comment 11).
Flags: needinfo?(jfkthame)
Updated patch to fix a debug-build assertion failure in UseConservativeBreaking().
Attachment #8841634 - Flags: review?(masayuki)
Attachment #8841249 - Attachment is obsolete: true
Attachment #8841249 - Flags: review?(masayuki)
Assignee: nobody → jfkthame
Status: REOPENED → ASSIGNED
(In reply to Jonathan Kew (:jfkthame) from comment #23)
> (In reply to Masayuki Nakano [:masayuki] from comment #22)
> Also note that Chrome allows these breaks (more widely than even the patched
> version of Firefox); it breaks

Well. Makoto-san and I discussed the compatibility with Chrome and our traditional behavior, though, perhaps, both browsers should use ICU for Unicode characters but ASCII characters should keep traditional behavior if it's better than other browser's behavior.

As you said, I think that breaking after 000- may be better for generic case, but not so for date-time. The important thing is, which is major case for the users.

> > >-<p>2007-Jan-01&nbsp;00:00:00</p>
> > >+<p>2007-<br>Jan-01&nbsp;00:00:00</p>
> > 
> > Same here.
> 
> The &nbsp; here suggests an author who is deliberately trying to avoid
> line-breaks within the time/date string; but in that case, they should have
> also used non-breaking hyphens (&#x2011;) to make sure of that.

Indeed.

> > But I'm not a native speaker of Western languages and I'm Japanese speaker,
> > so, breaking at almost all points doesn't make me feel odd. So, I don't know
> > if this change is acceptable for Western users. How do you think (Or, do you
> > know somebody who is familiar with typography of Western languages)?
> 
> My view here is that relaxing our "conservative" rules slightly for these
> examples is acceptable. Ideally, we'd have multiple "levels" of hyphenation
> points with differing priorities: some that are always available to use, and
> others that are only used as a last resort to avoid overflow. But our
> current line-breaking architecture isn't that sophisticated.

Yeah, if we have prioritized line breaker, this issue must be simpler (bug 389710). It's okay to break date-time format if there is no other line break opportunity.

I'll be back soon, let me think this deeper before review.
(In reply to Masayuki Nakano [:masayuki] from comment #25)
> As you said, I think that breaking after 000- may be better for generic
> case, but not so for date-time. The important thing is, which is major case
> for the users.

My gut feeling (not supported by any actual data) is that date-time formats such as 2017-Feb-28, where we should arguably try to avoid breaking at the hyphen after 2017-, are not very commonly used in running text (where line-breaking is more important); they're more likely to occur in contexts such as tabular data or lists, where they occur as standalone items that are not usually subject to line breaking anyway. If breaking -is- needed in such a context, it's because the column allocated for the date is very narrow, and in that case breaking at the hyphen may be needed anyway to avoid overflow.

So my view is currently that the proposed change here provides more improvement than regression, on balance. Specifically, it improves behavior for common natural-language cases that authors cannot be expected to handle in any other way, whereas the (debatable) regression is in a specific context where the author does have the option to control the behavior more precisely by use of non-breaking hyphens -- and indeed, the author already needs to do this for the sake of other browsers.
Ah, tables... I worry about that date-time is separated by 2 lines when a date-time is unfortunately positioned end of a line. However, Chromium and Edge breaks in it.
https://jsfiddle.net/d_toybox/22q5jcL7/

Okay, let's take it. I think that when we support prioritized line break, we should mark such point as lower priority.
Attachment #8841248 - Flags: review?(masayuki) → review+
Attachment #8841251 - Flags: review?(masayuki) → review+
Comment on attachment 8841634 [details] [diff] [review]
part 1 - Use a shorter "conservative breaking" range at word edges when dealing with letters rather than punctuation etc

>-    // Note that index is always larger than CONSERVATIVE_BREAK_RANGE here.
>-    for (uint32_t i = index; index - CONSERVATIVE_BREAK_RANGE < i; --i) {
>+    // Note that index is always larger than conservativeRange here.

It might be better to use MOZ_ASSERT here.

>-    // Note that index is always less than mLength - CONSERVATIVE_BREAK_RANGE.
>+    // Note that index is always less than mLength - conservativeRange.

And here.

>+    // If the character at index is a letter (rather than various punctuation
>+    // characters, etc) then we want a shorter "conservative" range
>+    uint32_t conservativeRangeStart, conservativeRangeEnd;
>+    if (index < mLength &&
>+        GetGenCategory(GetCharAt(index)) == nsIUGenCategory::kLetter) {
>+      // Primarily for hyphenated word prefixes/suffixes; we add 1 to Start
>+      // to get more balanced behavior (if we break off a 2-letter prefix,
>+      // that means the break will actually be three letters from start of
>+      // word, to include the hyphen; whereas a 2-letter suffix will be
>+      // broken only two letters from end of word).
>+      conservativeRangeEnd = CONSERVATIVE_RANGE_LETTER;
>+      conservativeRangeStart = CONSERVATIVE_RANGE_LETTER + 1;

Hmm, I may not understand this well, though, looks like that this block assumes that the last line break is caused by a close punctuation. If it's caused by a open punctuation like "(", does this work as you expected? E.g., "/a/ab/abc/abcd", "/abcd/abc/ab/a", etc.

If it works as you expected, r=masayuki (and it's better to add above cases to the testcase.)
Attachment #8841634 - Flags: review?(masayuki) → review+
A sequence such as "/a/ab/abc/abcd" is unaffected by the change here, it continues to use the 6-char "conservative breaking" range. We have some URL examples somewhat like that already, but I'll add a test file with additional sequences of varying lengths, which passes unchanged both before and after the patch.
https://hg.mozilla.org/integration/mozilla-inbound/rev/283f43e8790b268c6f5316b007fddb6658b6dfba
Bug 809020 - Reftest for breaking near word beginning/end (the "conservative" breaking range). r=masayuki

https://hg.mozilla.org/integration/mozilla-inbound/rev/e980c683af8cb856dbb7409373d2cef7972fb4ca
Bug 809020 - Use a shorter "conservative breaking" range at word edges when dealing with letters rather than punctuation etc., and adjust existing tests accordingly. r=masayuki
https://hg.mozilla.org/mozilla-central/rev/283f43e8790b
https://hg.mozilla.org/mozilla-central/rev/e980c683af8c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Depends on: 1344459
Depends on: 1524727
No longer depends on: 1524727
You need to log in before you can comment on or make changes to this bug.