Closed Bug 218580 Opened 21 years ago Closed 17 years ago

line break should be allowed after slash('/'), unless followed by a number

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: michal, Unassigned)

References

(Blocks 2 open bugs, )

Details

User-Agent:       Nutscrape/1.0 (CP/M; 8-bit)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5b) Gecko/20030903

Table is made too wide. Instead, the text should have been wrapped.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Mozilla doesn't try to split words, and it's not even required to do so ( see
http://www.w3.org/TR/REC-html40/struct/text.html#h-9.3.3 ).

In this case, it's caused by the 'word'
Francja/Hiszpania/Włochy/Holandia/Argentyna. Both Mozilla 1.5b and Safari 1.0 
(on Mac OS X 10.2.6) never attempted to split this word, so the layout was
broken. Internet Exploder 5.0.2 was able to split after the '/', but the layout
was still broken up, just a bit less then Mozilla or Safari.

There are 2 bugs that could help :
- Bug 9101 wants to respect shoft-hypens (­), as mentioned in HTML-4.0.
- Bug 95067 is mostly concerned with splitting after a hyphen, a more general
word-splitting algorithm can become very difficult in an international environment.

This case (splitting on /) could be fixed by the patch in bug 95067.
Not a DOM HTML issue.
Assignee: dom_bugs → other
Component: DOM HTML → Layout
.
Assignee: other → font
Component: Layout → Layout: Fonts and Text
I think the rules UAX #14 describes for '/' are the same as those for '-',
although it manages to describe them in a different way.  Thus marking dependent
on bug 95067.  See http://www.unicode.org/reports/tr14/ .
Status: UNCONFIRMED → NEW
Depends on: uax14
Ever confirmed: true
Summary: Table content explodes the table. → should break after a '/', unless followed by a number
Summary: should break after a '/', unless followed by a number → line break should be allowed after slash('/'), unless followed by a number
(I found the difference between the rules for '-' and '/'.  UAX #14 suggests
that a break should not be allowed between a space and a '/'.)
*** Bug 217520 has been marked as a duplicate of this bug. ***
Blocks: 218791
David Baron, I believe your comment 5 is in error.  The UAX suggestion "that a 
break should not be allowed between a space and a '/'" actually applies to the 
backslash (\); see:
  http://www.unicode.org/reports/tr14/#PR

Slash (/) has a different rule, which acknowledges the character's use as a path 
concatenator, but which does not say anything about a leading space:
  http://www.unicode.org/reports/tr14/#SY

I also believe the duplicate is in error, as the test case in bug 217520 is 
exactly about breaking after backslash.  See my comments there.
I was referring to rule LB 8 in http://www.unicode.org/reports/tr14/#Algorithm .

The descriptions of the character classes seem to disagree with the algorithm.
(Although the disagreement is only about the fact that the comment that a break
after a space after a backslash is not prevented by the algorithm.)  The two
discussions are about the combinations space-slash and backslash-space.
OK, I've reread that, including the Algorithm section, and I see where I 
misinterpreted the comment about the spaces; sorry.  (Also, the disagreement 
cited is due to the fact that the algorithm is pair-based, but that's not 
germane to this bug.)  

I also note here that David did un-dupe bug 217520 from this one.

xref to bug 56652.
*** Bug 253543 has been marked as a duplicate of this bug. ***
*** Bug 158211 has been marked as a duplicate of this bug. ***
*** Bug 272039 has been marked as a duplicate of this bug. ***
*** Bug 272039 has been marked as a duplicate of this bug. ***
*** Bug 332253 has been marked as a duplicate of this bug. ***
*** Bug 335744 has been marked as a duplicate of this bug. ***
*** Bug 350408 has been marked as a duplicate of this bug. ***
This is a suggested approach to squash this bug. But first, a recap just to make sure I understand: 
The example URL in bug 350408 contains the following snippet purported to be a trigger of the table sizing issue:

<table width="100%" border="0"><tr><td align="left"><b>
http://everything2.com/?node=C%2B%2B%3A+static+extern+%22C%22
</b>
</td><td align='right'><b>http://everything2.com/?node_id=1347412</b></td>
</tr></table>

In this case, the entire table is made wider than the browser window content width, because the two URI strings concatenate to a string which is wider than the browser window, and this is also a function of the user choice of display font.

Asumming the above description of the issue is correct, then we can logically derive the beginnings of a solution which is more general and effective than heuristic hyphenating (a good solution is not to be found by hyphenation). The following discussion regards RELATIVE size tables specifically; fixed size tables are not considered yet.

1. We want to retain the current behavior of table rendering wherein the table gets wider to fit the contents, yes?, but only up to the maximum window or page width when printing. 

2. There is probably no value in trying shrink a "100%" table (having wide content) to fit in a window below a certain practical minimum width, maybe taking the width of a PDA screen or photo printer paper as the practical minimum. This is something to keep in mind.

3. While rendering a table, gecko is aware of the window width (or page width), and is in a position to detect immediately when the total width of the outermost table has increased past the window width.

4. The real difficulty then is not in deciding where to soft wrap very long words with lots of complicated rules, but rather in trying to decide what should be the maximum width of each table column in proportion to the other columns and the whole browser window. If a suitable column width can be determined on-the-fly, then any long word can simply be broken at the right boundary of the cell and made to fit (after first trying the existing gecko hyphenation rules, of course). MS Excel does something similar with long words, and the effect is fairly intuitive and workable for the user. Unlike Excel, gecko doesn't know in advance what the column width should be.

5. Restricting the width of table cells is a special case which only needs to happen when the condition in #3 above has been detected. 

6. There are effective rendering techniques to consider, and (to gain acceptance of these) there should be some user preference control over the rendering behavior.

7. Technique A: This is an internationally compatible solution, though I use English for example in the concise explanation. This technique is intelligent and so simple it would work very nicely with few changes to gecko. (The term character here refers to visual space occupied by a glyph, not a byte or unit of memory.) 
   The longest nontechnical word in English is 29 letters long.  Let us assume a nice round 30 characters as the max length of an English word we would rather not break (not counting a possible hyphen or dot punctuation). 
   The reason for talking about the length of a typical word is because it provides a sensible way to have to a deterministic text wrapping behavior which is also intuitive for the user and doesn't look bad. Good hyphenation rules are still as beneficial as ever, but are non-deterministic of length and may vary from one language to the next leading to a complex implementation and lower performance.
   A default word-length limit value (30?) should be configurable by the user. Thus, non-english users can have a different default limit value and any user (e.g. a scientist) can change the limit value. In truth, the word length limit should probably be set much lower, say 20 characters, as some 95% of English words are probably below this limit (Somebody double-check this statistic).
 
   When gecko processes a row and the table is becoming too wide (the condition in #3 above has been detected), the offending wide row is re-measured and, in any cell containing a (nonhyphenated) long word of length > limitWordLength (e.g. 20), that word is measured as being of length limitWordLength for the purpose of determining the longest word in the cell. (For speed, instead of re-measuring after-the-fact, both the first and the second measurement can be made during the initial sweep.)
   The effect of this strategy would be to put a sensible cap only when necessary on the width of any one column, and still most words would not need to be broken in the middle. This capping action could be reduced or eliminated when: the user reduces the font size, widens the browser window, or changes the value of limitWordLength. 
   At display time, gecko will simply wrap the long word to multiple lines so it will fit in the cell, not necessarily at intervals of limitWordLength, but rather according to the actual width of the cell, which may in fact still be wider than limitWordLength. 

8. Technique B. This one is similar to Technique A. Again, when a row is found to become too wide for the window, then a change is made to the display of that row or perhaps the whole table. In technique B, the offending wide row is re-measured using a smaller font size, and then displayed with the smaller font size. To facilitate this, imagine the preference  Minimum Font Size ___  acompanied by a check box "Except In Wide Table Rows".
   So the "smaller" font size is just the original font size specified by the web page. If the page hasn't specified a font size, then gecko could either scale down to a suitable smaller font (down to a minimum of say 9 points) in a very sophisticated way, or just display the table with the default font and let it extend past the window size. The effect of this technique would be an intelligent use of a smaller font to display a table row of which contains one or more extra-long words. 

9. Finally, it would be possible to combine techniques A & B by making three width prediction measurements in one sweep, and choosing to render with one or both techniques according to user preference and what fits best. The three width measurements are: Normal summing, summing with limitWordLength, and summing with limitWordLength at reduced font size. 

10. Using these techniques, it should be possible to consistently display relative-size tables of up to three columns without a horizontal scroll bar and without bad-looking text wrapping. With more than three columns, the techniques A&B are still very helpful at preventing the plague of super-wide columns which we have now. Perhaps a clever programmer can even extend techniques A & B to work even better with four or more columns. Of the two, technique A holds the most promised for enabling gecko to display tables on small display screens, and for aiding the visually impaired.
I am experiencing this bug on Mac OS X 10.4.4, so the Hardware and OS tags for this bug should both be set to All.
This bug is about breaking following a "/".  General changes to table width calculation belong in another bug.
In fact, I originally created a separate bug, but an expert reviewer said it was a duplicate and combined it to 218580. At first I was disappointed, but then after careful reading I satisfied myself that the two issues were essentially one in the same.  

218580 is not really a bug about breaking following a "/".  It originally began as the same complaint: table too wide.  The reporter suggested the "breaking following a /" idea as a solution. This possible solution has taken on a life of its own in this bug, but has failed to be compelling and nothing has been done.

It was pointed out that breaking following a "/" is not a requirement of the standards, so then failing to break after a "/" isn't a bug in and of itself. Even so, the discussion took the direction of trying to reach parity with IE behavior, that is, using the partial solution of breaking following a "/" to reduce the table sizing problem wherever it is due to URLs. But it isn't a fully reliable solution since some long words won't contain the "/".

Even though I agree the "/" delimiter is an excellent choice for a spot to break a word to the next line, I'm also suggesting the bug can only be solved reliably by a different approach which must compare visual dimensions instead of character codes.  This belief is based on the concept that the table size % value is defined as pertaining to the real window width or page width when printing (in pixels or equiv. units), and as such it is a true bug to ever draw a <= 100% relative-size table any wider than the containing window or page. As for cell content which will not fit within such a table, it logically follows that the content must be wrapped and/or cropped horizontally, or perhaps scaled to fit. The same logic would seem to apply to cell content in absolute-width tables and cells. 

Incidentally, in the bug example page, I did try substituting an absolute width value in place of the percentage value in the tag to see what would happen. There was no change in rendering. The table was still displayed wider than the requested size.
Please stop filling up *this* bug with huge comments about other things.  It just makes this bug much less likely to be fixed.
Mr. Baron:
The resolution INVALID is reserved for when "The problem described is not a bug." It is therefore inappropriate to label the bug 350408 as invalid, as it has been twice found (by different persons) to be a duplicate of bug 218580 which is widely agreed to be a bug. If the complaint in 350408 is fixed, then bug 218580 will automatically be closeable too. But if 350408 is decided to not be a bug, then 218580 is also not a bug, because:

Bug 218580 Description: Table is made too wide. Instead, the text should have been wrapped. Reproducible: Always
IS BY TESTING FOUND TO BE THE SAME SITUATION AS:
Bug 350408 Expected Results: Lines of text should wrap inside the browser window width. Reproducible: Always
If a choice is made to not resolve a bug, then it is to be eventually given the resolution WONTFIX.
That was referring to a particular table in a particular page, where our behavior is incompatible with other browsers, and the way to fix that incompatibility is to allow breaks after "/".
(In reply to comment #19)
> the Hardware and OS tags for
> this bug should both be set to All.

fixed

OS: Linux → All
Hardware: PC → All
*** Bug 363676 has been marked as a duplicate of this bug. ***
this is fixed by bug 255990.

-> FIXED
Status: NEW → RESOLVED
Closed: 17 years ago
Depends on: 255990
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.