Closed Bug 809020 Opened 12 years ago Closed 8 years ago

Hyphenated text in narrow box sometimes fails to wrap and exceeds bounds

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla54

Tracking Flags:

Tracking

Status

firefox54

---

fixed

People

(Reporter: u444543, Assigned: jfkthame)

References

(Blocks 1 open bug,
URL
)

Details

Attachments

(7 files, 1 obsolete file)

TestCase.htm 12 years ago u444543 837 bytes, text/html		Details
TestCase2.htm 11 years ago u444543 503 bytes, text/html		Details
firefox_linebreak.png 10 years ago Elbart 130.27 KB, image/png		Details
hyphens matter.png 9 years ago Elbart 177.62 KB, image/png		Details
Reftest for breaking near word beginning/end (the "conservative" breaking range) 8 years ago Jonathan Kew [:jfkthame] 3.38 KB, patch	masayuki : review+	Details \| Diff \| Splinter Review
part 1 - Use a shorter "conservative breaking" range at word edges when dealing with letters rather than punctuation etc 8 years ago Jonathan Kew [:jfkthame] 4.85 KB, patch		Details \| Diff \| Splinter Review
part 1.1 - Update existing reftests that are affected by the changed behavior 8 years ago Jonathan Kew [:jfkthame] 5.94 KB, patch	masayuki : review+	Details \| Diff \| Splinter Review
part 1 - Use a shorter "conservative breaking" range at word edges when dealing with letters rather than punctuation etc 8 years ago Jonathan Kew [:jfkthame] 4.83 KB, patch	masayuki : review+	Details \| Diff \| Splinter Review

u444543

Reporter

Description

•

12 years ago

Attached file TestCase.htm — Details

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 Build ID: 20121024073032 Steps to reproduce: Browsed the TestCase.htm page in Firefox 16.0.2 on Windows XP SP3. Actual results: The hyphenated text does not wrap to a second line as expected but instead exceeds the bounds of the table cell, even though the cell has width and max-width inline CSS set. Expected results: The text should have wrapped at the hyphen like it does in Chrome, IE and Opera. Similarly if the max-width inline CSS attribute is stripped.

u444543

Reporter

Updated

•

12 years ago

URL: http://www.cs.tut.fi/~jkorpela/html/n...

[:Aleksej]

Updated

•

12 years ago

Attachment #678720 - Attachment mime type: text/plain → text/html

[:Aleksej]

Comment 1

•

12 years ago

bug 95067?

Component: Untriaged → Layout: Text

OS: Windows XP → All

Product: Firefox → Core

Virgil Dicu [:virgil] [QA]

Updated

•

12 years ago

Status: UNCONFIRMED → RESOLVED

Closed: 12 years ago

Component: Layout: Text → Untriaged

OS: All → Windows XP

Product: Core → Firefox

Resolution: --- → DUPLICATE

Virgil Dicu [:virgil] [QA]

Updated

•

12 years ago

Component: Untriaged → Layout: Text

OS: Windows XP → All

Product: Firefox → Core

Hardware: x86 → All

Jonathan Kew [:jfkthame]

Assignee

Comment 3

•

12 years ago

I don't think this is a dupe of 95067. We do break lines after hyphen when needed - usually. But there appears to be a bug whereby we sometimes fail to break when we ought to; I've been able to reproduce this with a small <div>, too, it's not unique to table cells. The exact width needed to demonstrate the problem may be system-dependent (as fonts and hence text width may vary), but these examples illustrate the issue for me on OS X with default font settings: (a) data:text/html,<div style="border:1px solid red;width:80px">Flaschen-teste</div> (b) data:text/html,<div style="border:1px solid red;width:80px">Flasch-en-tes-te</div> Example (a) fails to wrap at the hyphen, although it clearly should. Example (b) does wrap, showing that we are capable of line-breaking after a hyphen (sometimes!). Even more curious is example (c): (c) data:text/html,<div style="border:1px solid red;width:80px">Flaschen-testes</div> This breaks correctly at the hyphen, which is the same breakpoint as we failed to find in example (a), although the only difference is the addition of an extra letter at the end of the word! It seems as though there have to be at least 6 characters following the hyphen, otherwise we don't break the line. (I'm not sure why 6 should be the magic number, or whether this is really consistent in all cases.)

Status: RESOLVED → REOPENED

Ever confirmed: true

Resolution: DUPLICATE → ---

Summary: Table cell hyphenated text fails to wrap and exceeds bounds → Hyphenated text in narrow box sometimes fails to wrap and exceeds bounds

Version: 16 Branch → Trunk

Jonathan Kew [:jfkthame]

Assignee

Comment 4

•

12 years ago

Aha - I think the "magic number" 6 comes from here: http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/nsJISx4501LineBreaker.cpp#566 While avoiding breaks very near the start/end of a word is generally a good idea, I think the number here may be "too conservative", particularly when we're dealing with narrow lines (as in the original example here), or when no other line break opportunity exists. We need to relax this constraint, or find a way to make this more flexible.

Yves Goergen [:ygoe]

Comment 7

•

11 years ago

The source link in comment 4 is broken (by now). So what would be a solution to this issue? (I can't see the responsible code to make a good suggestion...)

u444543

Reporter

Comment 8

•

11 years ago

Attached file TestCase2.htm — Details

Elbart

Comment 9

•

10 years ago

Proper link for comment 4: http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/nsJISx4051LineBreaker.cpp#565 >#define CONSERVATIVE_BREAK_RANGE 6 added in bug 389056 http://hg.mozilla.org/mozilla-central/rev/579d8ab862ee ni? the author of these changes regarding comment 3.

Flags: needinfo?(masayuki)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 10

•

10 years ago

The limitation is necessary for preventing odd line-breaking in smiley (face mark).

Flags: needinfo?(masayuki)

Elbart

Comment 11

•

10 years ago

Attached image firefox_linebreak.png — Details

(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #10) > The limitation is necessary for preventing odd line-breaking in smiley (face > mark). Not a day goes by without me encountering another case of Firefox's reluctance for linebreaking breaking the layout of text. Isn't there something which can be done?

Elbart

Comment 12

•

9 years ago

Attached image hyphens matter.png — Details

Are smileys more important than proper text-layout?

Flags: needinfo?(masayuki)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 13

•

9 years ago

(In reply to Elbart from comment #12) > Created attachment 8657177 [details] > hyphens matter.png > > Are smileys more important than proper text-layout? Basically, I think so. However, if you can suggest better idea to distinguish "-" between 2 words, we should take it.

Flags: needinfo?(masayuki)

Yves Goergen [:ygoe]

Comment 14

•

9 years ago

How about: There are only letters (and maybe digits, like in "100-jährig") and no punctuation around the hyphen. Smileys with a hyphen usually have a dot or colon or parentheses around it. Other smileys might not even be recognised by a human... It's simple and probably still a hack, but less so than right now.

Albert Scheiner [:alberts]

Comment 17

•

8 years ago

I just filed 1341198 because of this, after I couldn't explain to my college why a word with two columns breaks as expected in Edge, Chrome, etc. but not in Firefox. It would be great if this could be changed so that four-letters or 100-jährig wraps if needed (and as expected).

Jonathan Kew [:jfkthame]

Assignee

Comment 18

•

8 years ago

Attached patch Reftest for breaking near word beginning/end (the "conservative" breaking range) — Details — Splinter Review

Here's a reftest (currently marked as failing) with a bunch of examples of the problem here: hyphenated compounds that would reasonably be expected to break (if necessary), along with some examples where we really don't want to break at a hyphen (smiley sequences, and IMO also single-character prefixes or suffixes such as "T-shirt"). I don't think it's possible to come up with a heuristic that will be "right" 100% of the time, partly because the desirability (or otherwise) of allowing a break at an explicit hyphen depends on many details of context and semantics, and even then is often a subjective matter; but I do think we can make things substantially better than they are right now, such that the examples in this test all behave as per the reference file.

Attachment #8841248 - Flags: review?(VYV03354)

Jonathan Kew [:jfkthame]

Assignee

Comment 19

•

8 years ago

Attached patch part 1 - Use a shorter "conservative breaking" range at word edges when dealing with letters rather than punctuation etc (obsolete) — Details — Splinter Review

Proposed patch that adjusts the "conservative" breaking range where we avoid breaks at hyphens (among other things) near word edges. The basic idea here is to check whether we're dealing with letter or non-letter characters, and use different "conservative" ranges for the two cases. This makes the above reftest pass. It also leads to failures on a few existing tests, but IMO they are acceptable changes to cases that are not well-specified and where browsers differ radically in their behavior, so there's no existing interop to worry about. Patch to adjust the existing tests follows...

Attachment #8841249 - Flags: review?(VYV03354)

Jonathan Kew [:jfkthame]

Assignee

Comment 20

•

8 years ago

Attached patch part 1.1 - Update existing reftests that are affected by the changed behavior — Details — Splinter Review

(To be folded into the preceding patch.) Here are the adjustments needed to existing reftests if we take the patch above. I think these behavior changes are reasonable, and the improvement in behavior for "normal" hyphenated compounds is well worth the impact here.

Attachment #8841251 - Flags: review?(VYV03354)

Masatoshi Kimura [:emk]

Comment 21

•

8 years ago

Comment on attachment 8841248 [details] [diff] [review] Reftest for breaking near word beginning/end (the "conservative" breaking range) I think Nakano-san should review this.

Attachment #8841248 - Flags: review?(VYV03354) → review?(masayuki)

Masatoshi Kimura [:emk]

Updated

•

8 years ago

Attachment #8841249 - Flags: review?(VYV03354) → review?(masayuki)

Masatoshi Kimura [:emk]

Updated

•

8 years ago

Attachment #8841251 - Flags: review?(VYV03354) → review?(masayuki)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 22

•

8 years ago

Comment on attachment 8841251 [details] [diff] [review] part 1.1 - Update existing reftests that are affected by the changed behavior >-<p>2007-Jan-01</p> >+<p>2007-<br>Jan-01</p> Hmm, is this change really okay? Although, this is hacky idea, shouldn't we keep not breaking after "-" if it's previous character is a numeric? >-<p>2007-Jan-01 00:00:00</p> >+<p>2007-<br>Jan-01 00:00:00</p> Same here. But I'm not a native speaker of Western languages and I'm Japanese speaker, so, breaking at almost all points doesn't make me feel odd. So, I don't know if this change is acceptable for Western users. How do you think (Or, do you know somebody who is familiar with typography of Western languages)?

Flags: needinfo?(jfkthame)

Jonathan Kew [:jfkthame]

Assignee

Comment 23

•

8 years ago

(In reply to Masayuki Nakano [:masayuki] from comment #22) > Comment on attachment 8841251 [details] [diff] [review] > part 1.1 - Update existing reftests that are affected by the changed behavior > > >-<p>2007-Jan-01</p> > >+<p>2007-<br>Jan-01</p> > > Hmm, is this change really okay? Although, this is hacky idea, shouldn't we > keep not breaking after "-" if it's previous character is a numeric? It's hard to say. In general, it might be preferable not to break such a date; but on the other hand, if it comes down to a choice between breaking the date or overflowing the line, it's better to break. Moreover, there are plenty of examples (e.g. "100-jährig", see comment 14) where allowing the break seems quite reasonable. Also note that Chrome allows these breaks (more widely than even the patched version of Firefox); it breaks <p>2007-Jan-01</p> as 2007- Jan- 01 in that reftest. So there is far from being any interop that authors can rely on; and our (patched) behavior is still more conservative than Chrome's. > > >-<p>2007-Jan-01 00:00:00</p> > >+<p>2007-<br>Jan-01 00:00:00</p> > > Same here. The   here suggests an author who is deliberately trying to avoid line-breaks within the time/date string; but in that case, they should have also used non-breaking hyphens (‑) to make sure of that. Again, note that Chrome would break this as 2007- Jan- 01 00:00:00 so authors who want to avoid such breaks already need to be more careful about their content. > > But I'm not a native speaker of Western languages and I'm Japanese speaker, > so, breaking at almost all points doesn't make me feel odd. So, I don't know > if this change is acceptable for Western users. How do you think (Or, do you > know somebody who is familiar with typography of Western languages)? My view here is that relaxing our "conservative" rules slightly for these examples is acceptable. Ideally, we'd have multiple "levels" of hyphenation points with differing priorities: some that are always available to use, and others that are only used as a last resort to avoid overflow. But our current line-breaking architecture isn't that sophisticated. The problem of -failing- to break in examples where the hyphen would be expected to allow a break is much more serious (see comment 11).

Flags: needinfo?(jfkthame)

Jonathan Kew [:jfkthame]

Assignee

Comment 24

•

8 years ago

Attached patch part 1 - Use a shorter "conservative breaking" range at word edges when dealing with letters rather than punctuation etc — Details — Splinter Review

Updated patch to fix a debug-build assertion failure in UseConservativeBreaking().

Attachment #8841634 - Flags: review?(masayuki)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

8 years ago

Attachment #8841249 - Attachment is obsolete: true

Attachment #8841249 - Flags: review?(masayuki)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

8 years ago

Assignee: nobody → jfkthame

Status: REOPENED → ASSIGNED

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 25

•

8 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #23) > (In reply to Masayuki Nakano [:masayuki] from comment #22) > Also note that Chrome allows these breaks (more widely than even the patched > version of Firefox); it breaks Well. Makoto-san and I discussed the compatibility with Chrome and our traditional behavior, though, perhaps, both browsers should use ICU for Unicode characters but ASCII characters should keep traditional behavior if it's better than other browser's behavior. As you said, I think that breaking after 000- may be better for generic case, but not so for date-time. The important thing is, which is major case for the users. > > >-<p>2007-Jan-01 00:00:00</p> > > >+<p>2007-<br>Jan-01 00:00:00</p> > > > > Same here. > > The   here suggests an author who is deliberately trying to avoid > line-breaks within the time/date string; but in that case, they should have > also used non-breaking hyphens (‑) to make sure of that. Indeed. > > But I'm not a native speaker of Western languages and I'm Japanese speaker, > > so, breaking at almost all points doesn't make me feel odd. So, I don't know > > if this change is acceptable for Western users. How do you think (Or, do you > > know somebody who is familiar with typography of Western languages)? > > My view here is that relaxing our "conservative" rules slightly for these > examples is acceptable. Ideally, we'd have multiple "levels" of hyphenation > points with differing priorities: some that are always available to use, and > others that are only used as a last resort to avoid overflow. But our > current line-breaking architecture isn't that sophisticated. Yeah, if we have prioritized line breaker, this issue must be simpler (bug 389710). It's okay to break date-time format if there is no other line break opportunity. I'll be back soon, let me think this deeper before review.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Updated

•

8 years ago

Blocks: line-breaking

Jonathan Kew [:jfkthame]

Assignee

Comment 26

•

8 years ago

(In reply to Masayuki Nakano [:masayuki] from comment #25) > As you said, I think that breaking after 000- may be better for generic > case, but not so for date-time. The important thing is, which is major case > for the users. My gut feeling (not supported by any actual data) is that date-time formats such as 2017-Feb-28, where we should arguably try to avoid breaking at the hyphen after 2017-, are not very commonly used in running text (where line-breaking is more important); they're more likely to occur in contexts such as tabular data or lists, where they occur as standalone items that are not usually subject to line breaking anyway. If breaking -is- needed in such a context, it's because the column allocated for the date is very narrow, and in that case breaking at the hyphen may be needed anyway to avoid overflow. So my view is currently that the proposed change here provides more improvement than regression, on balance. Specifically, it improves behavior for common natural-language cases that authors cannot be expected to handle in any other way, whereas the (debatable) regression is in a specific context where the author does have the option to control the behavior more precisely by use of non-breaking hyphens -- and indeed, the author already needs to do this for the sake of other browsers.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 27

•

8 years ago

Ah, tables... I worry about that date-time is separated by 2 lines when a date-time is unfortunately positioned end of a line. However, Chromium and Edge breaks in it. https://jsfiddle.net/d_toybox/22q5jcL7/ Okay, let's take it. I think that when we support prioritized line break, we should mark such point as lower priority.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Updated

•

8 years ago

Attachment #8841248 - Flags: review?(masayuki) → review+

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Updated

•

8 years ago

Attachment #8841251 - Flags: review?(masayuki) → review+

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 28

•

8 years ago

Comment on attachment 8841634 [details] [diff] [review] part 1 - Use a shorter "conservative breaking" range at word edges when dealing with letters rather than punctuation etc >- // Note that index is always larger than CONSERVATIVE_BREAK_RANGE here. >- for (uint32_t i = index; index - CONSERVATIVE_BREAK_RANGE < i; --i) { >+ // Note that index is always larger than conservativeRange here. It might be better to use MOZ_ASSERT here. >- // Note that index is always less than mLength - CONSERVATIVE_BREAK_RANGE. >+ // Note that index is always less than mLength - conservativeRange. And here. >+ // If the character at index is a letter (rather than various punctuation >+ // characters, etc) then we want a shorter "conservative" range >+ uint32_t conservativeRangeStart, conservativeRangeEnd; >+ if (index < mLength && >+ GetGenCategory(GetCharAt(index)) == nsIUGenCategory::kLetter) { >+ // Primarily for hyphenated word prefixes/suffixes; we add 1 to Start >+ // to get more balanced behavior (if we break off a 2-letter prefix, >+ // that means the break will actually be three letters from start of >+ // word, to include the hyphen; whereas a 2-letter suffix will be >+ // broken only two letters from end of word). >+ conservativeRangeEnd = CONSERVATIVE_RANGE_LETTER; >+ conservativeRangeStart = CONSERVATIVE_RANGE_LETTER + 1; Hmm, I may not understand this well, though, looks like that this block assumes that the last line break is caused by a close punctuation. If it's caused by a open punctuation like "(", does this work as you expected? E.g., "/a/ab/abc/abcd", "/abcd/abc/ab/a", etc. If it works as you expected, r=masayuki (and it's better to add above cases to the testcase.)

Attachment #8841634 - Flags: review?(masayuki) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 29

•

8 years ago

A sequence such as "/a/ab/abc/abcd" is unaffected by the change here, it continues to use the 6-char "conservative breaking" range. We have some URL examples somewhat like that already, but I'll add a test file with additional sequences of varying lengths, which passes unchanged both before and after the patch.

Jonathan Kew [:jfkthame]

Assignee

Comment 30

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/283f43e8790b268c6f5316b007fddb6658b6dfba Bug 809020 - Reftest for breaking near word beginning/end (the "conservative" breaking range). r=masayuki https://hg.mozilla.org/integration/mozilla-inbound/rev/e980c683af8cb856dbb7409373d2cef7972fb4ca Bug 809020 - Use a shorter "conservative breaking" range at word edges when dealing with letters rather than punctuation etc., and adjust existing tests accordingly. r=masayuki

Carsten Book [:Tomcat]

Comment 31

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/283f43e8790b https://hg.mozilla.org/mozilla-central/rev/e980c683af8c

Status: ASSIGNED → RESOLVED

Closed: 12 years ago → 8 years ago

status-firefox54: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla54

Francesco Lodolo [:flod]

Updated

•

8 years ago

Depends on: 1344459

Alice0775 White

Updated

•

6 years ago

Depends on: 1524727

Jonathan Kew [:jfkthame]

Assignee

Updated

•

6 years ago

No longer depends on: 1524727

You need to log in before you can comment on or make changes to this bug.