Open Bug 67715 Opened 19 years ago Updated 2 years ago

layout of full-justified text is sub-optimal

Categories

(Core :: Layout: Text and Fonts, enhancement, P5)

enhancement

Tracking

()

Future

People

(Reporter: jmcbray, Unassigned)

References

()

Details

(Keywords: helpwanted)

(full-) justified text in Mozilla is not as good as it could be -- there are
often wide spaces within a line, even for reasonably wide columns.  Is there
some fundamental reason that Mozilla can't use a multiline H&J (hyphenation and
justification) algorithm, like the one in TeX?
Apart from the fact that hyphenation is very language-specific while Mozilla
does not do (and cannot easily do) language detection on the page source? (Note
that this is not the same as charset detection since different languages often
use the same charset.)

This would be very nice to have, though difficult.  setting status to new.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Dang, I thought I was going to be the one who was going to file this RFE 
eventually. :-)

We can do hyphenation where the author has included the ­ (soft hyphen) 
character -- but smart justification can be done independently of that. So when a 
justified block element has completely loaded, something like the following 
should occur:

finishedJustifying = false;
while (not finishedJustifying)
{
  finishedJustifying = true;
  worstLine = number of line in block element which has the widest spaces;
  if
  (
    (last word, or soft-hyphenated part-word, of (line(worstLine - 1)) can fit on
    line(worstLine))
  and
    (moving this word would not make line(worstLine - 1) worse than worstLine is
    currently)
    /* though ideally you'd probably want to recurse right back to the first line
    of the block element */
  )
  {
    shift the last word of line(worstLine - 1) to the beginning of worstLine;
    finishedJustifying = false;
  }
}

That's probably not the perfect algorithm, but it would be a start. (The perfect 
algorithm probably takes time which increases exponentially with the number of 
lines in the block element, or something horrid like that.)

Now Ian is going to come along and tell us all this is impossible because it 
can't be done with CSS, and things which can't be done with CSS are sinful.
Reassigning to Buster.
Assignee: karnaze → buster
I'd love to see something like this, but we (Netscape) don't have the resources
for it.  Any takers?
Keywords: helpwanted
Priority: -- → P5
Target Milestone: --- → Future
I believe scc has an even better algorithm, and since he has experience with
word processors, he'd be the guy to talk about this.

mpt: It isn't against CSS, you can do whatever text justification algorithm you
want. However, I would have thought that it would be better to have a suboptimal
text justification algorithm to one that causes jumpiness while the page is
loading or is otherwise incrementally reflowed. (Note: Moving the mouse over the
page can cause an incremental reflow.) In other words, I would have thought 
you'd be the one against this, not me...
It may be that smart justification should only be used when printing -- it would 
help counter the `printed HTML sucks' attitude which causes so much useful 
information which might otherwise be on the Web to be locked up as PDF or PS 
files instead of HTML (on the grounds that PDF or PS looks better when printed).

(Note: If a mouseover can cause an incremental reflow, the W3C should be 
thoroughly ashamed of itself.)
TeX's layout and hyphenation algorithms probably produce the best looking
output, however, it's a great deal of work to implement, and the entire
paragraph can re-wrap with the addition of a single word at the end (e.g., to
better balance white-space).  I'll happily to point you to the appropriate doc,
if you so desire.
please set the url to it, even if we don't implement it, i'd like to read it.
Heh. Should have guessed Hermann Zapf would be behind this somehow.
I have text that simply doesn't justify if the browser window is set to some 
sizes -- it reverts to left-aligned... not nice at all. Using CSS on a plain 
paragraph not nested inside anything at all, although there aree <br>'s within
the paragraph. This is on an FAQ page and most of the other paragraphs with the 
same style behave just fine. Bizzare.

Grant
Grant, that should be filed as a separate bug. This RFE is something else 
entirely.
Build reassigning Buster's bugs to Marc.
Assignee: buster → attinasi
I've looked a bit at descriptions of the TeX algorithm and read over the CSS
specs, and here's a brief review of the subject:

CSS appears to separate hyphenation from justification. CSS 2.1 describes
justification as solely a matter of adjusting line-spacing; there's really not
much of anything in there about hyphenation, but the current CSS 3 Text CR
provides a "word-break-inside" property which must be explicitly set to
"hyphenate" to invoke language-specific hyphenation from UAs. For the time
being, then, hyphenation is out of the picture, which is fine given the likely
peformance and debugging costs of coming up with hyphenation dictionaries. Note
also that CSS limits stretching/compression of interlinear spaces to text
formatted with text-align: justify, so at least initially, the introduction of a
more sophisticated justification algorithm is likely to have minimal impact on
real Web pages.

The TeX algorithm operates on a per-paragraph basis. It assigns each line a
degree of "badness" depending on how much it needs to be stretched/compressed at
spaces to fit the desired width; this also accomodates control over how much
spaces can be stretched in a given line, e.g., not at all when CSS
"word-spacing" is set. A penalty is also attached to the different points in the
line where it can be broken (spaces, hyphens, soft hyphens, etc.) The TeX
algorithm lays out the paragraph so as to minimize the least-squares sum of
line-breaking penalties and stretching badness.

This leads to two major issues, namely, incremental reflow at the paragraph
level and the effects of triggering incremental reflow after initial layout
(e.g., by :hover, as mentioned above). How "jumpy" we'd look if we did
incremental reflow paragraph-by-paragraph rather than line-by-line is a question
the layout gurus will have to answer. The second issue is a little more tricky:
if CSS sets a larger font-size on :hover, the line will expand (although
technically apply styles that would cause reflow on :hover is optional per CSS),
and the lines below it in the block may be reflowed. Using the TeX algorithm, it
appears that the entire contents of the block would have to be reflowed, instead
of just the lines below.

So using TeX algorithms (sans dictionary-based hyphenation) is not de facto
impractical, but the tradeoffs (block-based reflow, probable perf costs, some
additional reflow on :hover) would have to be carefully weighed against the
gains (improved readability and appearance of justified text).

As the original URL has rotted, I've replaced it with a link to Han The Tranh's
thesis, which describes the Knuth hyphenation & justification algorithm as a
prelude to discussion of the Zapf techniques, which build on the H&J algorithm
to achieve a more uniform text density and other desirable typographic qualities.
Note comment 6 (about printing); :hover is a non-issue there, as are incremental
reflows and so forth.  And printing is really where this would come in handy.
(In reply to comment #1)
> Apart from the fact that hyphenation is very language-specific while Mozilla
> does not do (and cannot easily do) language detection on the page source? 

Yes you could, by using the lang attribute. Hyphenation is even stated as an application of this attribute, in the HTML4 specification: http://www.w3.org/TR/html4/struct/dirlang.html#h-8.1

For XHTML xml:lang should take precedence according to http://www.w3.org/TR/xhtml1/#C_7

(In reply to comment #13)
> but the current CSS 3 Text CR
> provides a "word-break-inside" property which must be explicitly set to
> "hyphenate" to invoke language-specific hyphenation from UAs.

I can't find a "word-break-inside" property, but instead a "hyphenate" property that can be set to "auto": http://www.w3.org/TR/css3-text/#hyphenate

Note that HTML4 also mentions explicit break points in
http://www.w3.org/TR/html4/struct/text.html#hyphenation
As they affect justified layout as well, I'd make this depend on bug #9101 as well---if I had the right to change this setting.

Would it be possible to use the OpenOffice Hyphenator, or one of its predecessors? http://lingucomponent.openoffice.org/hyphenator.html
Assignee: attinasi → nobody
QA Contact: chrispetersen → layout
Component: Layout → Layout: Text
You need to log in before you can comment on or make changes to this bug.