Last Comment Bug 9101 - (shy) Break lines at soft hyphens (­) and display hyphens if line broken
(shy)
: Break lines at soft hyphens (­) and display hyphens if line broken
Status: RESOLVED FIXED
[line-breaking] [p-ie/mac][p-opera][p...
: helpwanted, html4, intl, testcase
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: All All
: P2 normal with 157 votes (vote)
: mozilla1.9alpha8
Assigned To: Robert O'Callahan (:roc) (email my personal email if necessary)
: Hixie (not reading bugmail)
:
Mentors:
http://www.robinlionheart.com/stds/ht...
: 55191 64626 150401 163281 203181 215663 230045 231251 244263 250741 273939 281797 308206 351593 393691 (view as bug list)
Depends on: 255990 333659
Blocks: line-breaking 7455 robin's 88791 164421 385578
  Show dependency treegraph
 
Reported: 1999-06-30 16:59 PDT by Robin Lionheart
Modified: 2014-04-26 03:22 PDT (History)
134 users (show)
jwalden+bmo: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
simple testcase (683 bytes, text/html)
1999-07-16 05:41 PDT, David Baron :dbaron: ⌚️UTC-10
no flags Details
Soft Hyphen Test in a Simple Table (1.46 KB, text/html)
2001-06-25 11:18 PDT, Gus Richter
no flags Details
Ghost chars apears when marking text with shys. See http://magnus.de/shy/ (2.30 KB, text/html)
2003-10-30 04:47 PST, Jochen Magnus
no flags Details
gfx fix (1.72 KB, patch)
2007-06-28 04:07 PDT, Robert O'Callahan (:roc) (email my personal email if necessary)
pavlov: review+
Details | Diff | Splinter Review
textframe changes (12.32 KB, patch)
2007-06-28 04:19 PDT, Robert O'Callahan (:roc) (email my personal email if necessary)
smontagu: review+
Details | Diff | Splinter Review
reftests (2.76 KB, patch)
2007-06-28 04:22 PDT, Robert O'Callahan (:roc) (email my personal email if necessary)
no flags Details | Diff | Splinter Review

Description Robin Lionheart 1999-06-30 16:59:17 PDT
[5.xP] Mozilla always renders the soft hyphen (­) entity as a hard hyphen.
Soft hyphens should only printed at the end of a line when they are breaking up
a word. Easy to fix even without adding hyphenation logic-- ignoring all soft
hyphens would be correct behavior.
Comment 1 rickg 1999-07-02 11:11:59 PDT
Kipp -- lexomorphic transform bug.
Comment 2 kipp 1999-07-02 14:34:59 PDT
Marking as a feature request; currently ­ is mapped into the appropriate
unicode code. There is zero lines of code, after that, to handle how it should
render...
Comment 3 Robin Lionheart 1999-07-05 19:50:59 PDT
The present behavior of rendering a soft hyphen all the time does not comply
with the HTML 4.0 specificiation (http://www.w3.org/TR/REC-
html40/struct/text.html#h-9.3.3).
Comment 4 David Baron :dbaron: ⌚️UTC-10 1999-07-16 05:41:59 PDT
Created attachment 906 [details]
simple testcase
Comment 5 Jamus Jegier 1999-07-18 13:09:59 PDT
I originally "claimed" this bug for the bugathon, but dbaron@fas.harvard.edu has
posted a testcase.  I changed the status whiteboard to indicate this.
Comment 6 kipp 1999-10-25 16:09:59 PDT
I've fixed the short term issue such that shy characters will no longer be
rendered. However, they don't work either. So I'm latering the bug for that
issue.

Note for future code archeologists: the code that hides the shy characters lives
in the nsTextTransformer.cpp
Comment 7 Chris Petersen 1999-10-29 13:47:59 PDT
Marking as verified later
Comment 8 Chris Petersen 1999-10-29 13:50:59 PDT
Marking as verified later
Comment 9 Tobias Burnus 2000-09-11 06:32:49 PDT
If you look at this bug, have also a look at #31304
Comment 10 Mats Palmgren (:mats) 2000-11-21 11:44:53 PST
*** Bug 55191 has been marked as a duplicate of this bug. ***
Comment 11 David Hallowell 2001-01-08 11:11:29 PST
Reopening as an enhancement request.
Clearing milestone, updating summary as soft hyphens are not displayed anymore. The 
original bug was that soft hyphens are either displayed correctly or not displayed at all 
(either is correct behaviour). At the moment soft hyphens are not displayed, this bug is 
about getting this feature working.

(http://www.w3.org/TR/1998/REC-html40-19980424/struct/text.html#h-9.3.2)
Comment 12 David Hallowell 2001-01-08 11:13:04 PST
*** Bug 64626 has been marked as a duplicate of this bug. ***
Comment 13 Karl Ove Hufthammer 2001-02-06 08:40:39 PST
This (the original) bug is actually still valid. Soft hyphens are rendered as
hard hyphens in XUL (for an example, use the 'Links' panel in the sidebar on a
Web page that uses soft hyphens).
Comment 14 Gus Richter 2001-06-25 11:18:08 PDT
Created attachment 39944 [details]
Soft Hyphen Test in a Simple Table
Comment 15 Gus Richter 2001-06-26 21:38:49 PDT
I don't understand why Soft Hyphen is tagged as an enhancement.
HTML 4.01 9.3.3 clearly and firmly states:
"In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen." 
The soft hyphen is in the specification and any browser that professes to
support the specifications must support it.

Comment 16 Robin Lionheart 2001-08-03 06:38:47 PDT
We never break a line at a soft hyphen, and we never display soft hyphens, which
is the minimum necessary to follow the semantics:

  If a line is broken at a soft hyphen, a hyphen character must be displayed at
  the end of the first line. If a line is not broken at a soft hyphen, the user
  agent must not display a hyphen character. For operations such as searching and
  sorting, the soft hyphen should always be ignored.
  -- http://www.w3.org/TR/REC-html40/struct/text.html#h-9.3.3

Documents would look much nicer if Mozilla were smart enough to break lines at
soft hyphens. Nevertheless, that's an enhancement, not a requirement.
Comment 17 Frank Tang 2001-08-21 14:19:19 PDT
buster is no longer work for netscape. reassign to ftang
Comment 18 Gus Richter 2001-08-26 22:09:32 PDT
   We never break a line at a soft hyphen, and we never display soft hyphens, 
which
   is the minimum necessary to follow the semantics:

That is exactly what is wrong with Mozilla's behaviour regarding soft hyphen and 
you
come to the wrong conclusion. The minimum requirement is;

1. "If a line is broken at a soft hyphen",
   (then) a hyphen character must be displayed at the end of the first line.
2. "If a line is not broken at a soft hyphen",
   (then) the user agent must not display a hyphen character.

See  http://bugzilla.mozilla.org/showattachment.cgi?attach_id=39944  with IE 5.x 
to
see how it should work. This test case could also be applied in a div for 
example.

This in Bug#:9101;
Status Whiteboard: soft hyphens must be ignored or properly displayed
is not correct. It should read;
Status Whiteboard: if a line is broken at a soft hyphen, it must be displayed
and;
Resolve bug, changing resolution to  FIXED
should read;
Resolve bug, changing resolution to  LATER?

   Documents would look much nicer if Mozilla were smart enough to break lines 
at
   soft hyphens. Nevertheless, that's an enhancement, not a requirement.

Not so, unless Mozilla is one of those browsers that do not interpret soft 
hyphens.
The Unicode Soft Hyphen is in the specs, IE 5.x supports it, Opera has a bug 
filed
and Mozilla must also support it.
http://www.w3.org/TR/REC-html40/struct/text.html#h-9.3.3  clearly stipulates it 
as
a requirement and in no way is it only an enhancement.

It is only correct in behaviour, as is, for searching and sorting operations.
Comment 19 Frank Tang 2001-08-31 12:54:01 PDT
 richterf@golden.net: I have no clue about your last comment, I cannot figure
which part is your opinion and which part is quote of previous comment.

This bug is currently an "enhancment" since mozilla already fulfill the minimun
requirement:
>1. "If a line is broken at a soft hyphen",
>   (then) a hyphen character must be displayed at the end of the first line.
>2. "If a line is not broken at a soft hyphen",
>   (then) the user agent must not display a hyphen character.

mozilla does not do the "if" in 1 therefore there are no requirement to perform
the "then" part in 1
mozilla does do the "if" in 2 and also do the "then" part in 2. 

There are no requirement that uesr agent have to do the "if" part in 1. that is
not part of the minimun requirement. 

There are no XML nor XUL specification mention softhypen so the XUL argument
does not exist. 
If we display wront in XUL, please file a seperate bug about XUL and assign to
hyatt@netscape.com, thanks.
Comment 20 Frank Tang 2001-08-31 13:56:08 PDT
mark it as future. 
Comment 21 Robin Lionheart 2001-11-03 10:44:57 PST
Renaming from "Support correct display of..." to "Break lines at..." since the
meaning of this bug has shifted from minimum compliance to enhancement.
Comment 22 Kai Lahmann (is there, where MNG is) 2002-06-09 08:51:41 PDT
*** Bug 150401 has been marked as a duplicate of this bug. ***
Comment 23 Fredrik Wendt 2002-07-28 05:21:20 PDT
1. I just noticed that there's no hyphenation or breaking of lines whithin the
mail composer.

2. There's also a (user experienced) heavy bug, which appears when altering text
whithin the "insert HTML"-window that contains ­s. The ­ entity isn't
written out, and the editing of the text behaves very inconsistently. I'm afraid
that the best description I can give you now, however, I figure this last "bug"
should be reported somewhere else (don't know where).
Comment 24 Christopher Aillon (sabbatical, not receiving bugmail) 2002-08-17 19:28:43 PDT
*** Bug 163281 has been marked as a duplicate of this bug. ***
Comment 25 Mike Cowperthwaite 2002-08-17 20:53:32 PDT
My submission #163281, which was a dup, incorrectly stated that ‌ should also provide an invisible opportunity for line break.  However, I have discovered that there should be an "opportunity for line break" before or after the —, and after the –.  Mozilla does not do either.  Perhaps that should be a separate bug?

See the article here:
http://www.cs.tut.fi/~jkorpela/dashes.html
Comment 26 rgpublic 2002-09-20 11:11:49 PDT
Why is this bug still marked "Future"? 
This is also an important i18n issue. English language usually has much shorter
words than many other languages in the world. No browser has automatic hyphenation
which would be a lot more difficult to implement. But in Internet Explorer
the webdesigner can at least "help" the browser in case of longer words.
So, if i.e a German or Turkish webpage is rendered in Mozilla large gaps occur,
especially within table cells. This is quite visible on some pages and
people will notice that IE displays the page correctly while Mozilla doesn't in
its 1.2a version.
Comment 27 rossi 2002-10-12 03:12:29 PDT
uhm... we support <wbr>, it can't be too hard to take that and add a hyphen...
Comment 28 Mike Cowperthwaite 2002-10-14 10:44:40 PDT
Following up on my comment in #25: in addition to break opportunities 
around various dash characters, entites such as emsp, ensp, thinsp, etc.
should also provide break opportunities.  Mozilla (1.1) treats them as
non-breaking spaces, altho the widths are drawn correctly.
Comment 29 Robin Lionheart 2002-10-14 18:31:48 PDT
By section 9.1 of the HTML 4.0 standard, only line breaks in the source and the space, tab, form feed, and zero-width space characters are considered "white space" characters.

Since thinsp, ensp, and emsp are not included in that definition, it is standard to render them as non-breaking spaces. It may even be intentional, considering that specific space widths would imply that the author is trying to be exacting about positioning.
Comment 30 Martin v. Löwis 2002-10-15 00:17:48 PDT
I don't think section 9.1 of the HTML spec (i.e. the definition of whitespace 
in HTML) has any relevance with regard to breaking lines. The HTML spec even
points out that rendering of words should be layed out according to the 
conventions of the language.

For breaking lines, I think the Unicode Line Breaking properties,

http://www.unicode.org/unicode/reports/tr14/

should be taken as normative. HTML overrides these only with regard to &#x2028;
and &#x2029; by explicitly specifying that they are not line breaks in HTML. 
The spec then elaborates that it imposes no requirements on behaviour of
other space characters. So an implementation that uses the thin space as a
breaking opportunity would still conform with HTML 4, while simultaneously also
conforming to UAX#14.
Comment 31 Mike Cowperthwaite 2002-10-15 08:19:04 PDT
Reading thru that Unicode specification, I notice this in the definition
of the "SP" category: "SPACE, but none of the other breaking spaces, is
used in determining an indirect break."  The other spaces are categorized
as "break after" and so provide a direct break opportunity.

The difference between direct and indirect breaks is particularly subtle
when dealing with these specific-width spaces.  As I read it:

Assume the text section in question consists of "prev&emsp;next".  
If "prev" + emsp extends beyond the margin, the break takes place before 
"prev", so that the width of the space is apparent on the next line. 
If, instead, "next" exceeds the margin, then the break occurs before "next",
so that the width of the space is apparent on the current line.
(For right-justified text breaking after the emsp, the emsp is 'visible'
at the margin.)

The common ASCII SPACE, however, causes an indirect break.  In this sense, it
acts inversely from the &shy; -- it has zero width if it is at the break, otherwise it is rendered 'visible' between the words.

Mozilla does not break at all on the special spaces.  Opera6 handles them as described above.  IE/Win breaks on them as described, but does not maintain space width at the right for justification purposes -- except, for some reason, &puncsp; provides a break opportunity on either side, not just after.
Comment 32 Mike Cowperthwaite 2002-10-15 08:21:04 PDT
Also in the cited Unicode spec, I found this for category "GL" (glue):
"The word joiner character [U-2060 = &8288;] is the preferred choice for
an invisible character to keep other characters  together that would
otherwise be split across the line at a direct break." Mozilla displays
this as a (non-breaking) glyph -- a question mark, with my current setup.
(Opera and IE both handle the character the same way.)
Comment 33 Stefan Moebius 2002-12-09 14:25:13 PST
Nice dicussion on breaking and non-breaking enities and spaces, but somewhat
unrelated to this bug, isn't it?
Actually, the issue at hand (correct me if I'm wrong) could be described as follows:
(1) &shy; is a character where the current line may be broken
(2) the display of this character depends on what it is followed by (namely a
linebreak or anything else)

So basically, (2) makes me think of other combining characters (think Tamil or
the like). Now my questions are: How are these cases rendered? Is the rendering
engine able to ask for the next thing in the layout?

If I'm completely wrong, sorry for the spam. In that case: any chance this is
getting fixed in the foreseable future?
Comment 34 Andrew Schultz 2002-12-15 20:39:05 PST
==> fonts/text
Comment 35 Will Budreau 2003-03-21 16:29:35 PST
Extensive discussion at
http://www.cs.tut.fi/~jkorpela/shy.html
Comment 36 Mike Cowperthwaite 2003-03-21 18:42:09 PST
I have to say, I am quite unimpressed by Prof. Korpela's assertion that the
SHY should always be rendered.  He admits that ISO-8859-1 is ambiguous, then
declares his interpretation, then goes on to insist that any subsequent spec
or clarification effort is obviously mistaken.  I consider him to be an
obstructionist.

I believe the relevant, and quite clear, specification is found in
  http://www.unicode.org/unicode/reports/tr14/
See the section on "breaking hyphens" and the particular discussion of the SHY
character, 00AD.
Comment 37 Matti Aarnio 2003-03-31 08:16:17 PST
Reading Mozilla 1.3 code in regards of &shy;  and <WBR> processing (both
are mentioned in comments over the years), I do think that to make things
to render properly, there should be some magic to translate   &shy;  to
"<shy>", and let the latter to do the actual work.  Perhaps similarly
to several other punctuation kind entities ?

Understanding the code flow of about 3 million lines of CPP and H files
is .. quite challenging..

What little I have understood is that character entities get translated
into unicode (UCS2?) for internal storage, and then rendered from there
as strings.   Tags are separate items with separate presentation boxes,
working at much higher level, therefore turning some characters into
(internal) tags could aid the rendering problem ?
Comment 38 Manko 2003-04-18 00:08:58 PDT
Very important to switch from MSIE. Especially for languages with long words
such as Russian or German.

Thus, adding dependency with bug 164421 and suggesting to add "intl" keyword.
Comment 39 Jo Hermans 2003-04-24 05:48:12 PDT
*** Bug 203181 has been marked as a duplicate of this bug. ***
Comment 40 Rostislav Chebykin 2003-05-31 15:59:22 PDT
I don't like the way Mozilla treats soft hyphens.
First, when a text block is presented with text-align: justify;, and when a word
with &shy; occurs in the end of the line - in this case this word often goes
beyond the right edge of the text. This looks very sloppy (for example, page on
my homesite - http://www.philigon.ru/prosa/stories/essay.html).
Second, sometimes Mozilla breaks line on the &shy; (in the cases such
ac&shy;ces&shy;<span>si&shy;</span>bi&shy;li&shy;ty), but doesn't display the
hyphen itself, which looks sloppy, too.
IE and Opera don't have these bugs.
Comment 41 Claus Sørensen 2003-06-18 14:48:40 PDT
I found it very disturbing that the Mozilla developers see this bug as an minor
one (priority 3 and target: future).

In many European languages (like Danish) which has long words - some words are
so long that they can't be handled by this textarea input box.

Here soft hyphens are essential to get readable websites.

And as others already have said it is a part of W3C HTML specifications.

The best weapon in compeeting with closed source software is open standards - so
let us support them as best as we can - and wait for support extra stuff.
Comment 42 Hadrien Nilsson 2003-07-16 00:53:58 PDT
This bug was open in 1999 ! :-o
Is it so difficult to code ? IE has it for ages. It's a shame IE handles soft
hypen better than Gecko. At least Gecko does not break on hard hypens :-/
Comment 43 David Baron :dbaron: ⌚️UTC-10 2003-07-16 10:18:49 PDT
Assigning to default component owner.
Comment 44 Markyze 2003-08-09 14:35:32 PDT
Too bad no target has been set yet for this bug. Hyphenation is an essential
part of publication in many languages in the world. The HTML-standard provides a
graceful solution to the problem. Why does it take so long for Mozilla to fix
this problem?
Comment 45 Markyze 2003-08-09 14:36:35 PDT
Too bad no target has been set yet for this bug. Hyphenation is an essential
part of publication in many languages in the world. The HTML-standard provides a
graceful solution to the problem. Why does it take so long for Mozilla to fix
this problem?
Comment 46 Simon Paquet [:sipaq] 2003-08-09 16:42:06 PDT
*** Bug 215663 has been marked as a duplicate of this bug. ***
Comment 47 Alexander Skwar 2003-08-10 09:23:10 PDT
Just curious - why is this bug still "NEW"? 
Comment 48 Mike Cowperthwaite 2003-08-19 10:27:36 PDT
Note that bug 95067 comment 68 introduces a patch which, it is claimed, 
"includes some changes for hyphenation of soft hyphen"; this is part of a larger 
bug on improving wrap capabilities generally.
Comment 49 Jesse Glick 2003-08-28 16:44:19 PDT
Should this bug be considered to cover other hyphen-like characters, such as
mdash, or strictly shy? E.g. Moz 1.4 fails to break at "&mdash;" boundaries;
even the mini HTML renderer in Java/Swing 1.4.2 will break after "&#8212;", I
found. Inadequate mdash handling gives a poor appearance to English text. Cf.:

http://unicode.org/reports/tr14/#B2

Bug #56652 seems more general in regards to line breaking properties of
characters. Do we need a tracker bug, or is it likely that all these problems
could be solved together?
Comment 50 Mike Cowperthwaite 2003-08-31 10:03:13 PDT
Re: Jesse Glick's comment 49:

Bug 206152 is "[meta] line breaking bugs" but is inexplicably blocking, rather 
than depending on, this and almost every other bug it's tracking.

&shy; is a special case because not only does it control breaking, the display 
of the glyph is variable depending on whether the break occurs or not.  Also, I 
think the correct handling of &shy; is more in demand than the correct breaking 
around &mdash; &ndash; and similar characters.
Comment 51 Andrey Novikov 2003-10-24 04:18:46 PDT
Guys, the browser is now EXCEPTIONAL, can you just postpone all other
enhancement activities and give consideration to this particular horrific issue.
Today is 2003! We already widely use XHTML and you can't deal with ancient HTML!

Please, give us a chance to switch our users from IE to Mozilla!
Comment 52 Greg K Nicholson [:gkn] 2003-10-24 14:47:44 PDT
re comment 50: surely breaking at soft hyphens can be handled the similarly to
breaking at spaces:

If a line break is required at a space character, the space character is
replaced with a carriage-return.

If a line break is required at a soft hyphen, the soft hyphen should be replaced
with a hyphen and a carriage-return.

Is it really that simple, or am I missing something?
Comment 53 Jochen Magnus 2003-10-30 04:47:57 PST
Created attachment 134477 [details]
Ghost chars apears when marking text with shys. See http://magnus.de/shy/

   Ghost Chars appears while marking a Text with <shy>

If you mark text in the http://magnus.de/shy/testcase.html some "ghost chars"
will appear at the right side of the column. Sporadically the text rendering
will become nearly destroyed after unselecting.

It happens when the text contains soft hyphenation signes. Either in the form
<shy> or in the compact form of Char #173, which looks like a '-'. The bug
ccures regardess of the kind of text justification.

   Unhappy with Mozilla
Therefore we are unhappy with Mozilla. It does not make a right use of soft
hyphenation and it is impossible to mark such text correctly. The bug 9101 is
open since years, while Internet Explorer handles all this right since years!
Because we are an Internet Service Provider to, we are distributing Starterkit
CDs containing Mozilla ...
Comment 54 Jochen Magnus 2003-10-30 04:52:59 PST
Comment on attachment 134477 [details]
Ghost chars apears when marking text with shys. See http://magnus.de/shy/

Screenshots on http;//magnus.de/shy/
Comment 55 Ralf Hagen 2003-11-23 04:24:25 PST
I really would welcome if this bug would
-be set to another status as "NEW" after so many years
-be given a severity of at least "normal" now
-received now a definite target milestone

When this bug first appeared, it may have been an "enhancement" for the "Future".
But this bug now is open since June, 1999 and still no fix.

We can hope it gets fixed now in the foreseeable future.
Comment 56 James Paige 2003-11-23 22:07:40 PST
Unfortunately this bug cannot be chaged from NEW to ASSIGNED right now, nor can
a target milestone be set, because no developer is available to fix it. I am
adding the "helpwanted" keyword. Hopefully someone with the knowledge and skill
to implement this will volunteer someday soon.
Comment 57 Jason Bassford 2003-12-07 06:10:16 PST
I do think, however, that this is not an enhancement but a failure to follow
HTML guidelines.  As such, it should be a proper bug.

Of course, if things continue to remain unresolved here, it could all be
eventually "fixed", assuming that we end up supporting the following CSS3 proposal:

http://www.w3.org/TR/2003/CR-css3-text-20030514/#wrap-option-prop
Comment 58 Peter Asemann 2003-12-15 14:14:26 PST
Yes, this should be a "real" bug.
Microsoft Minions begin argueing "IE can handle &shy, but Mozilla doesn't comply
to the standards..."
This can't be. Assign that bug!
Comment 59 Frank Wein [:mcsmurf] 2004-01-04 10:32:18 PST
*** Bug 230045 has been marked as a duplicate of this bug. ***
Comment 60 Henrik Pauli 2004-01-04 11:04:25 PST
Re: #42 From Hadrien Nilsson  2003-07-16 00:53, where it was stated:

> Is it so difficult to code ? IE has it for ages. It's a shame IE handles soft
> hypen better than Gecko. At least Gecko does not break on hard hypens :-/

I have to argue with you about the "at least Gecko does not break on hard
hyphens" part.  Why is that good for anyone?  I encounter it again and again
that someone doesn't know how to make an anchor, and just leaves it to whatever
software to linkify an URL, and they paste 900+ pixels wide amazon.com or news
sites' URLs.  I, for one, would be much much happier if these broke on those
Hyphens.

Then again, this part of the issue seems to have been discussed in Bug 95067.
Comment 61 Olivier Miakinen 2004-01-04 13:39:08 PST
Re: #60 From Henrik Pauli  2004-01-04 11:04, where it was stated:

> I have to argue with you about the "at least Gecko does not break on
> hard hyphens" part.  Why is that good for anyone? [...] I, for one,
> would be much much happier if these broke on those Hyphens.

I suppose that the point was that MSIE breaks on hard hyphens, even when it
could simply reject the whole word at next line.

I agree with you that, when it is necessary to split a word, it is better to
split it after a hard hyphen than anywhere else.

But I also agree with Hadrien that it is a *bad* thing to split a word after a
hard hyphen when it is possible to write the whole word within the box boundaries.
Comment 62 Henrik Pauli 2004-01-04 15:00:58 PST
It seems usually good to leave hyphen-minuses alone when numbers are involved. 
However, in most cases people use hyphen-minuses instead of hyphens, nb-hyphens,
en and em dashes...  Just like how the *nix community likes to use grave accents
and apostrophes for quotation (UGH!).  In most cases people:
 1) don't know about Hyphen
 2) don't have easy access to it (yay for my custom keyboard layout with en and
em dash and horizontal ellipsis)
 3) just plain don't care.

But should that mean that words that naturally contain a hyphen (and I can tell
there are lots of such in Hungarian) should continue breaking sites' looks, just
because people can't bother using / can't use the proper characters?

Indeed, if it fits inside the box, it's probably safer (not 'better' as such,
looks ugly if it's text! though with numbers it could be understandable.) to
push the word to the next line.  Alas currently this is followed by stretching
the box (say, table cell) when that's possible, instead of breaking the stupidly
long something down where it'd be decent.

Question: -1234 should use the Minus Sign U+2212?
Comment 63 Christian :Biesinger (don't email me, ping me on IRC) 2004-01-17 07:45:55 PST
*** Bug 231251 has been marked as a duplicate of this bug. ***
Comment 64 Michael Jennings 2004-02-26 16:20:32 PST
in&shy;for&shy;ma&shy;tion: ie has been do&shy;ing soft hy&shy;phens for some time
e&shy;ven though de&shy;vel&shy;o&shy;per at&shy;ti&shy;tude there is
si&shy;mi&shy;lar to what I read here: we don't got&shy;ta do a&shy;ny&shy;thing
we don't wan&shy;na do.
Comment 65 Michael Jennings 2004-02-26 16:23:31 PST
in&shy;for&shy;ma&shy;tion: ie has been do&shy;ing soft hy&shy;phens for some time
e&shy;ven though de&shy;vel&shy;o&shy;per at&shy;ti&shy;tude there is
si&shy;mi&shy;lar to what I read here: we don't got&shy;ta do a&shy;ny&shy;thing
we don't wan&shy;na do.
Comment 66 David Edwards 2004-03-22 08:32:52 PST
I really hope this bug still isn't "NEW" when it hits its 5th birthday in a few
months time...

I came to this bug because I found that Mozilla wasn't breaking lines at hyphens
where Internet Explorer was. Personally, I'd like to see a resolution where
Mozilla could recognise some kind of magic hyphen (like &shy; for instance) that
would break lines and also be rendered (always). Unfortunately the w3c
specification doesn't seem to agree with me, as has been stated earlier, the
HTML4 spec asks that &shy; does not render when it is not breaking.

Do I need to make noise here or take it to the w3c mailing lists?
Comment 67 David Baron :dbaron: ⌚️UTC-10 2004-03-22 09:01:57 PST
Neither.  Bug 95067 is on file, and noise is not helpful.
Comment 68 Robin Lionheart 2004-03-22 16:07:09 PST
Adding [p-opera] to whiteboard. Opera 7.20+ for Windows supports soft hyphens.
Comment 69 Robin Lionheart 2004-03-24 09:57:36 PST
66> I really hope this bug still isn't "NEW" when it hits its 5th birthday 
66> in a few months time...

65 votes and 8 duplicates. I hope so too.

But it wouldn't have so many duplicates if NEW bugs weren't excluded
by the default search settings.
Comment 70 Hadrien Nilsson 2004-03-30 01:49:15 PST
Safari supports soft hyphens too now.

http://weblogs.mozillazine.org/hyatt/
Comment 71 Hungerburg 2004-04-15 07:43:26 PDT
david hyat is right.

one might say: if it was so easy for him to implement, he sure did not take into
account that in some european languages words alter spelling, when hyphenated.

as a soft hypen is placed at the authors will, and not algorithmically by the
browser, the author can decide, whether she wants a linebreak to occur or she
wants correct spelling with no break.

there are rare places, esp. in cms, where hyphenation can save a layout from
falling apart. e.g. german language/writers favour long compounds of words ;)
Comment 72 Fredrik Wendt 2004-04-15 13:35:57 PDT
"e.g. german language/writers favour long compounds of words"

The situation is similar for nordic languages (swedish, norwegian, danish), but
as the quote above may imply, we do NOT choose long compounds of words - this is
how the language is formed, there are no "shorter alternatives". Hence, mozilla
really does awful things to many sites (mostly news papers, e-zines and alikes),
naturally having users prefer other browsers.

And this is truly strange - the main thing about the web is to read/present text
(as with news/articles) and although this bug was reported almost five years ago
- breaking the number one task of a web browser - it's still there. Status quo.
Zip progress (from user perspective).

Sorry for making "noise", but this prevents convincing "decision makers" that
Linux is a mature option for schools etc when "they can't even produce a
sensible browser".
Comment 73 Hixie (not reading bugmail) 2004-04-15 13:40:49 PDT
Everyone agrees this should be implemented and that it is important.
Now someone just needs to step up and do it.
Comment 74 Henrik Pauli 2004-04-15 14:06:58 PDT
(In reply to comment #71)
> one might say: if it was so easy for him to implement, he sure did not take into
> account that in some european languages words alter spelling, when hyphenated.

I'd like to add my comment here and nod a little.  Will go a bit off-topic, but
I doubt anyone will mind it, especially since this bug is so silent anyway.

In Hungarian -- and here's when I grumble at some linguists and some Unicode
people -- if you see a long sz, it's written as ssz.  However, when hyphenated,
it's sz-sz.  Now we see how that would be just *snap* this easy, had we an sz
code (and zs, and ty, and...) in Unicode and then if we see two sz characters
next to each other, they'd get nicely rendered by the UA to look like ssz.  If
you can have ligatures for Arabic, can build funky strings of Devanagari, and
Croatian and Dutch have DZ/Dz/dz or IJ/ij, it would be a great help to about
having to give the browsers biiiig dictionaries so that they recognise a long sz
letter or a long gy letter, and so on.

Hyphenation is a big problem, and even when one goes out of their way to put shy
characters all over their text, it still remains a problem, regardless of
support for it.

Hungarian too, as is Finnish and the Germanic languages, is full with long
enough words to encounter problems with them.  "Automatic" hyphenation like in
(La)TeX would be great, but... I know this is not publishing ;)  We have to make
it with shy, and as that, it would be great if we got support for it, after all
these years.
Comment 75 Nicholas Avenell 2004-04-24 01:31:36 PDT
With all due respect, that's not the problem here. Auto-hypenation would be a
Nice Thing, and could possible get a bug all of its very own where you can
discuss the problems with it in every single language ever, and that may even
*depend* on Mozilla/Fx/etc. handling &shy; correctly. But auto-hypenating is Not
This Bug, and the discussion makes it look even more complicated than it could
possibly be.
Comment 76 Robert O'Callahan (:roc) (email my personal email if necessary) 2004-05-05 13:03:11 PDT
Should be fairly fixable by hacking nsTextFrame line measurement code.
Comment 77 José Jeria 2004-05-21 04:22:58 PDT
*** Bug 244263 has been marked as a duplicate of this bug. ***
Comment 78 Kess Vargavind 2004-06-11 06:28:04 PDT
With regards to the page http://www.cs.tut.fi/~jkorpela/shy.html which summarize
the use of SHY quite nicely, I wonder if it's possible to—in an html
document—insert a SHY and with CSS customize it to behave differently for
different languages.

For example, that SHY in one case may do nothing and not being shown, while in
another is shown as a regular hyphen but nothing happens, and in yet another it
is only displayed if used for breaking/hyphenating the word at an end of the
line et cetera...

Is this possible? It would, as I see it, fix some of the argument of its use and
behaviour. (Similar could be done to other hyphens and different kinds of spaces.)
Comment 79 Gus Richter 2004-06-11 22:00:31 PDT
Much has been discussed here and much has drifted off course.
In this case, you sadly don't understand the purpose of or how the soft hyphen
is used. There is only one purpose of the Soft hyphen:

The soft hyphen tells the user agent where a line break can occur. The hyphen
may only be displayed at such a line break. Period.

I suggest that you check out the "Soft Hyphen Test in a Simple Table" in the
Attachments above. Run it in IE or Opera (which support it) and compare it to
any Gecko (which do not support it). Compare the result and look at the source
code to see how it is used. Note the multiple entries of the soft hyphen in the
last table which indicate the only permissable line break points in the word. In
IE or Opera, the hyphen is displayed correctly _only_ at the permissable line
break point, which is the last soft hyphen entry prior to being overset
(including the width of the hyphen character itself) for the required linelength
and as dictated by the soft hyphen entries.

Opera supports it now. IE has supported it for close to a full decade. Mozilla
still does not support it.

--Gus
Comment 80 Maurício Collares Neto [:mauricioc] 2004-07-10 05:29:32 PDT
*** Bug 250741 has been marked as a duplicate of this bug. ***
Comment 81 Bill Mason 2004-07-10 08:47:21 PDT
*** Bug 250741 has been marked as a duplicate of this bug. ***
Comment 82 Luis Miguel Lagoa Baptista Ferro 2004-08-11 16:41:17 PDT
Why correct hyphenation (and &amp;shy;) is important to me:

In portuguese, we use soft and hard hyphens. 

Basicly, soft hyphens where a word can be broken down in two (a "-" is added at
the end of the first part) and the second part is placed in the next line. 

The hard hyphens work also as soft hyphens, but with a revenge... in this case
the word is splited in two at that point and ONE "-" is placed in the end of the
first part and ANOTHER "-" is placed in the begining of the next line, before
the second part of the word.

Examples:

the word "manda-se" would be broken as
"manda-" in one line... and...
"-se" in the next line.

(no idea what kind of hyphen is used in this case)

the same word with a soft hyphen:

"man&amp;shy;da-se" would be also brokenable as:

"man-" in one line... and...
"da-se" in the next line.

So yes, proper hyphens are a must!



Comment 83 Will Rickards 2004-10-11 07:20:41 PDT
Peter-Paul Koch's browser compatibility test & results for: <wbr>,  &#8203;,
&shy;.  http://www.quirksmode.org/index.html?/oddsandends/wbr.html
Seems implementation of &shy; should be similar to &#8203; which is supported.
Comment 84 Robin Lionheart 2004-10-28 06:31:45 PDT
Per comment #70, adding [p-safari] to whiteboard too.
Comment 85 rgpublic 2004-10-28 11:37:06 PDT
Please please please fix this bug, finally. After writing Firefox 
Plug-In managers, NTLM authentication and what not - I just cannot 
imagine this bug is nearly as difficult. 
Is it perhaps because the Mozilla core team are all speaking English 
natively that they do not understand why this bug is important? 
This simple bug destroys any layout with margin columns as soon as there 
is one long word which is very often the case in languages other than 
English. Everybody at my office using Mozilla for a while notices this 
because it is so obvious if you read a language with long words 
(German, Turkish, Finnish - you name it). 
So, now get angry at me for spamming this bug. Most of the time I've kept  
my mouth shut, because I know that many wishlist items are really  
difficult to do and I've been patient for quite a while but here I just cannot 
believe why this is so difficult to do that we have to wait now 
for nearly 5 years for HTML 4 compatibility. 
 
 
 
 
 
 
 
 
 
 
Comment 86 Daniel Kinzler 2004-10-28 12:10:49 PDT
A quick example why this bug needs to be fixed: consider the following word,
which is the traditional name for Bangkok (yes, really!):

Krungthepmahanakornamornratanakosinmahintarayutthayamahadilokphopnopparatrajathaniburiromudomrajaniwesmahasatharnamornphimarnavatarnsathitsakkattiyavisanukamprasit

this bugger breaks the layout quite frequently, especially if you have a low
resolution screen. That's a little anoying when you try viewing a page that
contains that word.

regards,
daniel
Comment 87 Bill Mason 2004-12-09 11:07:59 PST
*** Bug 273939 has been marked as a duplicate of this bug. ***
Comment 88 Florian Groß 2004-12-27 11:21:31 PST
Can we please finally fix this? It's not only for typography but also for web
sites that have narrow columns with lots of content and web sites were users can
screw up other users by using very long words. This bugtracker is an example,
but there are lots of forums which also have this issue. It should be relatively
simple to support (after all the web site already has the hyphenation points
inserted.) 
Comment 89 Yoz Grahame 2005-01-17 18:46:27 PST
If anyone cares, I've created a really nasty JS kludge to enable soft hyphen
support for Mozilla:
http://cheerleader.yoz.com/archives/001889.html

I don't have the C++ skills required to hack on the Mozilla source, but I'm
betting that the three hours it took me to code the kludge is not much less work
than it would take to fix Mozilla once and for all. I'm hoping that, by putting
this really awful implementation out there, a Mozilla coder will actually take
up the challenge before my solution becomes common.
Comment 90 Aleksey Nogin 2005-01-28 12:16:39 PST
(In reply to comment #16)
> We never break a line at a soft hyphen, and we never display soft hyphens, which
> is the minimum necessary to follow the semantics:
[...]
>   For operations such as searching and
>   sorting, the soft hyphen should always be ignored.
>   -- http://www.w3.org/TR/REC-html40/struct/text.html#h-9.3.3

As far as I can tell, in Firefox 1.0 the search is quite broken for soft hyphens.
Comment 91 Aristotle Pagaltzis 2005-01-30 14:46:06 PST
Adding myself to CC:

Can someone please, *please* look into this?

Opera has finally gained soft hyphen support a while back, but the Gecko
browsers are still blissfully unaware.
Comment 92 Mano (::mano, needinfo? for any questions; not reading general bugmail) 2005-02-10 07:13:29 PST
*** Bug 281797 has been marked as a duplicate of this bug. ***
Comment 93 Thomas Forster 2005-02-20 17:13:35 PST
Ok, just a question: What would it _cost_ to have a mozilla developer fixing
this bug? I remember using Mozilla 0.8.? and thinking: They are doing good work,
they just need time. Now I have 1.8.a6. I really want this in 1.8!!!

Maybe we could have a "Mozilla soft hyphen development found".
Comment 94 Stefan Moebius 2005-02-21 02:00:26 PST
(In reply to comment #93)
> Ok, just a question: What would it _cost_ to have a mozilla developer fixing
> this bug? I remember using Mozilla 0.8.? and thinking: They are doing good work,
> they just need time. Now I have 1.8.a6. I really want this in 1.8!!!
> 
> Maybe we could have a "Mozilla soft hyphen development found".

Good question. And I'd even add a few bucks in. But I'd like to take this
discussion to the MozillaZine forums:
http://forums.mozillazine.org/viewtopic.php?p=1249741
Comment 95 Jungshik Shin 2005-02-21 02:50:28 PST
Fixing bug 255990 would fix this bug at least partially. For now, let this
depend on that bug.
Comment 96 Gabriel Sjoberg 2005-09-12 11:55:50 PDT
*** Bug 308206 has been marked as a duplicate of this bug. ***
Comment 97 Mark Fenbers 2005-09-13 06:49:25 PDT
(In reply to comment #96)
> *** Bug 308206 has been marked as a duplicate of this bug. ***

One would think that the amount of traffic on this bug, and the fact that so
many people have reported it (evidence: messages that say "marked as duplicate
of 9101") would make the Mozilla developers want to just fix it to get it out of
their hair.  Even comments from two years ago are saying, "C'mon, guys! Why has
this frustrating bug just sat here for so many years?"  That was in 2003.  Now
it's 75% of the way through 2005 and it appears no activity has been given to
this &shy; bug since 1999.  (Maybe the developers have gone to help clean up New
Orleans from the mess that Katrina made...)
Comment 98 Robert Wünsch 2005-09-13 09:29:47 PDT
Looks like the developers are too shy when they are facing this lady ...bug.

Please solve this bug!
Comment 99 Ben Basson 2005-09-13 09:33:47 PDT
Please read comment 73 before posting further advocacy.
Comment 100 rgpublic 2005-09-13 14:42:51 PDT
Sigh - sometimes Open Source requires _really_ much patience. 
Not everyone interested in a solution here is able to hack the Mozilla 
codebase unfortunately. So most of us will just have to wait. Like with 
many other bugs. That's OK considering the fact that we all get Firefox 
for free and it's become great software with or without this bug. 
Nevertheless sometimes the discrepancy between a seemingly simple bug 
and a time of 6(!) years means our patience is tested to the max.  
In the meantime we've seen it all: Mozilla, Phoenix, Firebird, Firefox, 
Whole new plugin architectures, new preference dialogs, etc. It's hard 
to believe that this major bug for international websites is still there. 
The longer the words in a language the worse it gets. This is destroying 
all layouts with narrow columns in those languages.  
So I hope that developers at least understand that we are a bit impatient to 
finally see at least some progress or a bit of information on this bug and 
why it isnt implemented. 
 
 
 
 
 
 
Comment 101 ╧ЫжыеТ 2005-09-13 17:36:09 PDT
Six years!! Oh my god~~
Comment 102 Arpad Borsos [:Swatinem] 2005-09-13 22:17:24 PDT
sorry for the spam but i too think that its simply not acceptable to leave bugs
open for that long. Its almost the same with inline-block which is my personal
RFE #1. The request to implement inline-block is also 6 years old. Its been i
dont know how many years in css 2.1 now. And its part of Acid2. If it would have
been fixed years ago gecko wouldnt be the 2nd last rendering engine to support
acid2. But at least there is hope for 1.9 ;)

The thing is, in open source you just gotta do things yourself. Believe me, i
would love to do these things myself. But i cant code C its just as simple as that.
Comment 103 Matthijs Wensveen 2005-09-14 01:40:26 PDT
You know what? I'm checking out the source right now. I've never done any
mozilla hacking or even built my own build. Some xp in C and C++ hacking, but I
guess this  should be a good way to score some geek points :)
I'm not saying that I will succeed in fixing this bug, but at least I will try.

Any pointers as to where I should begin looking?
Comment 104 Vidar Haarr (not reading bugmail) 2005-09-14 01:49:25 PDT
<https://bugzilla.mozilla.org/page.cgi?id=etiquette.html>

mrw@wanadoo.nl:
Please note that I really have no idea, but I'd start looking in /parser,
specifically the nsHTMLEntit* files and maybe nsTextTransformer.[cpp|h] in layout/.
Comment 105 Jungshik Shin 2005-09-14 08:16:20 PDT
Ok. I should have fixed bug 255990 last month, which will help fix this bug at
least partially. I'll try to spend some more time with it.

Comment 106 Bjørn S Tennøe 2005-11-30 01:47:27 PST
Starting in Q3 2006, this bug is likely to corrupt layout on the following online stores:

    Norway:  http://www.elkjop.no/
    Finland: http://www.gigantti.fi/
    Denmark: http://www.elgiganten.dk/
    Iceland: http://www.elko.is/
    Norway:  http://www.lefdal.com/
    Poland:  http://www.electroworld.pl/
    Czech R: http://www.electroworld.cz/
    Hungary: http://www.electroworld.hu/
    Sweden:  http://www.pccity.se/

As noted earlier, several European languanges differ from English in the way nouns are joined into compound words.

We are preparing a new version of several big-brand European online stores using the same technological foundation. For these stores, many of whom are market leaders in their respective countries, we will use a layout where 3 products are shown side by side, with teaser text to the right of a teaser image. This demands that text columns are no more than 80 pixels wide, and this, again, demands soft hyphenation. Since IE, Safari and Opera supports &shy; we are moving forward with this layout principle.

I will provide page view statistics when I get them, but as I say, these are large-brand physical stores, and as we release the new online stores platform their online presence will surge.

Regrettably, as a consultant I don't have the resources to initiate a bug fix.
Comment 107 Bjørn S Tennøe 2005-12-06 03:57:48 PST
From Q3 2006, this bug may affect some additional 1.000.000 to 1.500.000 unique users.

Background: I now have some preliminary page statistics for the sites listed in comment #106. Based on the November 2005 traffic generated on the up and running web sites, the combined total traffic in Q3 2006 will be between ten and fifteen million unique visitors per month. If 10% if these visitors use Gecko, we arrive at the numbers above. 

Also, we are discussing CMS soft hyphenation auto-insertion technology with some major vendors. A possible standardization of this feature will add additional sites to the list above.
Comment 108 Neil Harris 2005-12-06 04:51:28 PST
The wordbreak code appears to be at http://lxr.mozilla.org/mozilla1.8/source/intl/lwbrk/src/
Comment 109 Robert O'Callahan (:roc) (email my personal email if necessary) 2005-12-06 15:24:26 PST
We'll support hyphenation in the next major Gecko release, which will be early 2007.
Comment 110 Robert O'Callahan (:roc) (email my personal email if necessary) 2005-12-06 15:47:56 PST
In the meantime if someone wants to get soft hyphens working in the 1.8 branch, which will be shipped earlier, feel free; I'll be happy to review a patch. The relevant code is all in layout/generic/nsTextFrame.cpp, you shouldn't need to mess with line breaking itself. Unfortunately that is nasty code which is why I want to wait until that code gets cleaned up, which is starting to happen on the trunk.
Comment 111 Neil Harris 2005-12-12 02:07:00 PST
Given that proper hyphenation is going to be implemented by a substantial rewrite in 2.0, it seems to me that the soft hyphen implementation for 1.5.x is likely to be a throwaway hack.

In that spirit, what about considering the rather brutal approach of turning all the &shy; in the source text at parse time into something like <wbr softhyphen="true">, and then leveraging the <wbr> code, which seems to do nearly the right thing already, and using the (internal representation only, nonstandard) attribute "softhyphen" as a flag to trigger the generation of the hyphen characters?

Perhaps a special style mechanism could even be used to handle the hyphens, eg turning &shy; into 

<span style="display: x-mozilla_at_end_of_line;">-</span><wbr><span style="display: x-mozilla_at_beginning_of_line;">-</span>

where the x-mozilla stuff defines special automagic CSS properties to display the hyphens only at the start or end of the line, respectively?
 
Comment 112 j.j. 2005-12-12 14:58:40 PST
> In that spirit, what about considering the rather brutal approach of turning
> all the &shy; in the source text at parse time into something like <wbr
> softhyphen="true">, and then leveraging the <wbr> code, which seems to do
> nearly the right thing already, and using the (internal representation only,
> nonstandard) attribute "softhyphen" as a flag to trigger the generation of the
> hyphen characters?

The most important thing here is to brake the line if needed (to avoid destroying the layout). 
To display the hyphen is less important for an intermediate patch.
Comment 113 j.j. 2005-12-12 15:03:19 PST
IMO it would be enough for now to handle every &shy; and &#173; the same way as <wbr>.
Is there anyone here who is able to do this?
Comment 114 Neil Harris 2005-12-12 15:41:05 PST
Is there any sort of macro mechanism already provided in the parser code where we could perform this sort of manipulation?

An aside: in bug 6347, the suggestion is made that

<span style="font-size: 0px"> </span>

is a more standards-based alternative to <WBR>. But let's just go with <WBR>, if it does what is needed; a hack's a hack, so let's keep it a small hack so it can be easily chopped out later when the proper hyphenation code comes along.
Comment 115 David Baron :dbaron: ⌚️UTC-10 2005-12-12 15:41:40 PST
Breaking the line without adding the hyphen can result in loss of information; for example, if something means something different when it's one word than when it's two.

Breaking the line and adding the hyphen when doing so requires a good bit of care due to the complexities of text measurement, at least if you don't want the hyphen to overflow in some cases.

In any case, fixing bugs like this is not within the criteria for security and stability fixes that are taken for 1.5.* minor releases.
Comment 116 kaldari 2005-12-12 15:47:08 PST
1.5?? I was hoping we could shoot for 3.0 or maybe 2 if we're lucky ;)
Comment 117 Fredrik Wendt 2005-12-13 04:59:35 PST
> The most important thing here is to brake the line if needed (to avoid
> destroying the layout). 
> To display the hyphen is less important for an intermediate patch.

I strongly disagree, since breaking composite words in two (or more), will in many cases totally change the meaning of what the text says. 

Example (with the Swedish language):
The two words "upprätt hållande" translates to "holding [something] pointing up" while "upprätthållande" translates to something like sustaining/mainting [something] during some time. There's a huge difference in "life sustaining equipment" and "equipment that holds stuff pointing it upwards".
The example/translation is not 100 % accurate, but as of now I can't figure out a better way to demonstrate how bad it is to break the line without displaying a hyphen.
Comment 118 Egon Knapen 2005-12-15 13:38:37 PST
This problem has been spoken of for 6 years now, even safari supports it better, isn't it time to fix this?
Comment 119 Ivan Ičin 2005-12-15 14:44:52 PST
(In reply to comment #118)
> This problem has been spoken of for 6 years now, even safari supports it
> better, isn't it time to fix this?
> 

I am not a developer, but I hope that someone who reads comments can get this:

As noted in comment #109 and comment #110, Mozilla will fix this in 3.0, as it would be much more complicated to fix it on current code, and (my interpretation) pretty useless, as there will be less than a year between 2.0 and 3.0 releases.

IMHO, this sounds reasonable, and if someone feels this is not proper he may create the fix himself.
Comment 120 Eike 2005-12-16 11:22:47 PST
(In reply to comment #119)
> I am not a developer, but I hope that someone who reads comments can get this:
Nice tune. Maybe it's you, who ought to lean back and reflect a bit more after having read the roughly 100 messages ...

> As noted in comment #109 and comment #110, Mozilla will fix this in 3.0, as it
> would be much more complicated to fix it on current code, and (my
> interpretation) pretty useless, as there will be less than a year between 2.0
> and 3.0 releases.
How do you even dare saying it's "pretty useless" to fix the bug right now? Haven't you read the messages of dozens of people trying to explain why thousands of people are affected by the bug? Of course everybody regularly looking at poorly mozilla-rendered (*non-english*) pages due to ignoring &shy; is getting more and more assured it's "pretty useless" to bring this bug to the attention of someone knowledgable. As of me, I'm really getting upset and even angry about your ignorant attitude. You may know nothing about the fact that there are languages who heavily rely on the composition of words and thus "produce" partly very long words which really depend on a working &shy;- implementation; thinking about it, you should know about this --- just read the thread! You must not pretend though, it's "pretty useless" to fix the bug just because you may not need a working implementation. If you do not know nothing about the needs of other people, don't tell them what's good for them; just shut up.

In addition I don't quite get your argument regarding the phrase "less than a year". Is it in the lines of "Well guys, you're waiting for this for about six years ... you shouldn't care less about another one"? To the contrary! This bug is so important to many people, it would be worth to fix it even if the whole firefox code-base changes in a week.

> IMHO, this sounds reasonable, and if someone feels this is not proper he may
> create the fix himself.
Your opinion is not humble, it's just solely based on ignorance. Now to the "fix himself" part. It's true, as the source is open, anybody could theoretically come up with a solution to the problem and publish it; we even got an offer from Robert O'Callahan to do a review, which is very kind. In practice however, the bug seems to be quite hard to smash. Just look at the various wild proposals where to look at and how to circumvent it, without getting to the root of it. Robert even said it "is nasty code" and I suppose he knows what he's talking about. Jungshik Shin still seems to have a hard time fixing a related bug (255990). In essence, this is hardcore and many trying to invest their limited time and "give back a bit to the community" more successfully do so in other projects. At least this is true for me, but I strongly believe it's true for others as well.

Unfortunately the capable main-developers seem to not care about the severity this bug has for many people. That's their right. They do not owe me and others any of their time. However, from an ethical standpoint they ought to be more honest. The mozilla-project (and possibly others as well) should have a prominently placed message on their homepage: "We do not really care about other languages. We are mostly an english-based project."
Comment 121 Ivan Ičin 2005-12-16 11:44:14 PST
(In reply to comment #120)
First, this is probably the best example of bug spam, but I have to answer at least once.

Second, I have voted for this bug, check it out.

Third, I was probably supposed to propose that this bug and some other important bugs (like random history loss) should be fixed on current Gecko, though it would mean a hard work, and fix would be rendered obsolete with Firefox 3.0. Unfortunatly, it would probably mean that Firefox 2.0 would be released after Firefox 3.0.

Clearly, I know that many people would like to live 1000 thousand years, and beleive it or not, I don't care about their needs.
Comment 122 David Baron :dbaron: ⌚️UTC-10 2005-12-16 12:21:17 PST
We do care about non-English users.  Part of the reason that this bug is so hard to fix is that, in fact, we do care.  We've allowed people adding previous features in the text layout code (which was already pretty messy due to the myriad of CSS properties that supports) that were important to non-English users (such as line-breaking rules appropriate for CJK languages or bidirectional text layout so that Hebrew and Arabic are usable) to make the text layout code progressively messier and messier.  Good code ownership would have meant that we wouldn't have accepted these features in their current state, because they make the code much more complicated and make adding new features harder.  The end result is that (1) the code doesn't really have an owner -- somebody who understands it and can say what the "right" way to fix this bug is and (2) there are multiple codepaths resulting from the previous fixes (especially the CJK linebreaking) that all need to be fixed.

The solution is not to add one more important feature on top of the mess.  The solution is to fix the mess so that this and other even more important features (such as the fact that many South Asian languages are basically unreadable with Mozilla) can be fixed more easily.

There are a bunch of problems with adding one more feature on top of the mess:
 * it makes it even harder still to add the next important feature
 * it has a high likelyhood of causing regressions, perhaps even security bugs, if it involves modification of complicated code that nobody understands

This is why we prefer waiting for the code in question to be rewritten, although I'm not yet sure how much the linebreaking code will be improved in the cairo landing.  It is pretty clear that we do need a redesign of our linebreaking code, and it needs to consider features like this along with the requirements of CJK text, Thai text, shaping, combining characters, and other problems.
Comment 123 Robert O'Callahan (:roc) (email my personal email if necessary) 2005-12-16 14:24:49 PST
Stephen Blackheath has volunteered to rewrite nsTextFrame around the new gfxTextRun abstraction which has been designed for Thebes/cairo. He's hacked nsTextFrame before and I believe he'll pull this off. That will give us the platform to really fix this and other issues.
Comment 124 David Feuer 2005-12-18 16:43:46 PST
As well as automatic hyphenation (which would be great), TeX and LaTeX also offer non-broken hyphenation in languages like German, where shy and soft hyphens just aren't enough.  I think the real solution is an update to the HTML standard to support TeX-style manual hyphenation, and an update to the CSS standard to allow styles to control automatic hyphenation where supported.
Comment 125 Michael Schwenck 2006-01-29 08:02:59 PST
(In reply to comment #124)
> As well as automatic hyphenation (which would be great), TeX and LaTeX also
> offer non-broken hyphenation in languages like German, where shy and soft
> hyphens just aren't enough.  I think the real solution is an update to the HTML
> standard to support TeX-style manual hyphenation, and an update to the CSS
> standard to allow styles to control automatic hyphenation where supported.
> 
I think, the community don't need no academic pow wows. I'm still hoping for a little work, supporting the "&shy;" indent at first, as other browsers do, before it ist time to discuss the big solutions.
Comment 126 Jungshik Shin 2006-03-29 06:37:51 PST
I wonder whether an 'interim half-baked fix' (I made for bug 255990 but not uploaded yet) would be acceptable for 1.8 branch (that will be used for FF 2.0). That would do the following:

1. On Windows and Mac OS X, at least rendering of SHY is supported as it should be.

2. On Linux and other platforms, SHY is discarded as now, but the line can be broken on SHY if necessary 
Comment 127 Jo Hermans 2006-09-06 15:19:20 PDT
*** Bug 351593 has been marked as a duplicate of this bug. ***
Comment 128 Muke Tever 2006-10-07 16:09:56 PDT
(In reply to comment #19)
> >1. "If a line is broken at a soft hyphen",
> >   (then) a hyphen character must be displayed at the end of the first line.
> >2. "If a line is not broken at a soft hyphen",
> >   (then) the user agent must not display a hyphen character.
> 
> mozilla does not do the "if" in 1 therefore there are no requirement to
> perform the "then" part in 1

Not true.  The rule does not say "if a line is broken _by_ a soft hyphen" (i.e. using hyphenation rules), it says "if a line is broken _at_ a soft hyphen": when Mozilla is asked to break at a soft hyphen (e.g. by following the &shy; with a <br>) then it ignores the "then" and does not display the hyphen as it should.  (Opera 9 displays a hyphen in this circumstance.)
Comment 129 Bjørn S Tennøe 2006-12-12 04:11:24 PST
(In reply to comment #106 & #107)

This is a cry of frustration.

Dear FireFox developers, one year ago I estimated that this bug was going to have a negative impact on > 1 MILLION users on sites that I have designed alone.

One year later, FireFox is enjoying great popularity and it's a demand that layout is not broken, no matter who's fault it is. So you know what? FireFox stops me from implementing the design I desire.

For those of you who still read paper newspapers, you know how text is neatly organized in narrow columns. This is done because it is easy to read. It is user-friendly. It is tried and tested. It is the right way.

However, for German, Hungarian, Polish, Norwegian, Swedish, Danish, Finnish and a bunch of other European languages, this is impossible on the net. 
Because of FireFox. 
BECAUSE FIREFOX HAS BEEN BROKEN FOR MORE THAN SEVEN (7) YEARS.

David Baron (#122), you say you care about non-English users. Prove it by moving the target milestone away from "Future".
Comment 130 David Naylor 2006-12-12 08:47:34 PST
Will Cairo make fixing this bug easier?
Comment 131 Ryan VanderMeulen [:RyanVM] 2006-12-12 08:52:51 PST
No, but bug 333659 will, hence why this bug is shown as dependent on it at the top (and as some of the comments buried amongst the spam have indicated).
Comment 132 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-12-12 14:22:15 PST
Let me make this clear.

This bug WILL NOT ever be fixed in Firefox 2.

This bug WILL BE fixed in Firefox 3, via the code going into bug 333659.

Please refrain from commenting further, at least until this bug and that bug are marked FIXED.
Comment 133 Michael Ventnor 2007-06-20 23:24:11 PDT
According to http://developer.mozilla.org/en/docs/Firefox_3_for_developers

under the CSS section this is now fixed. Confirmation from anyone?
Comment 134 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-06-21 01:25:21 PDT
I'm not marking this fixed yet because there are some serious bugs in the soft hyphen code. I'm going to submit a followup patch soon.
Comment 135 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-06-28 04:07:19 PDT
Created attachment 270160 [details] [diff] [review]
gfx fix

There's a small fix required in BreakAndMeasureText: *aUsedHyphenation should be false if all the text fit.
Comment 136 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-06-28 04:19:22 PDT
Created attachment 270164 [details] [diff] [review]
textframe changes

The textframe fixes required to make soft hyphens respectable:

-- HasCompressedLeadingWhitespace has a bug that I found with my soft hyphen tests. We were failing to advance aIterator, so the test for whether the current character is still skipped was bogus. Fix that and simplify HasCompressedLeadingWhitepace.

-- Make PropertyProvider::GetHyphenationBreaks refuse to allow a hyphenation break at the start of a line.

-- Issue #1 in Reflow: when we're doing "reflow the line again, forcing a break at a certain previously identified break opportunity", if the forced break opportunity is not a normal linebreak opportunity then it must be a hyphenation break opportunity, so set usedHyphenation to true to trigger hyphenation behaviour.

-- Issue #2 in Reflow: if there's a soft hyphen at the end of a text frame, we need to record it as a potential break opportunity.

-- Unrelated fix #1 in Reflow: if we broke this line at a forced break opportunity, we should not add trimmed whitespace to our metrics --- we're breaking so we might as well trim eagerly.

-- Unrelated fix #2 in Reflow: when determining whether a potential break opportunity is in the available width, use <= instead of < --- if it's exactly at the available width, it fits.

-- Unrelated fix #3 in Reflow: when recording a potential break opportunity due to trailing whitespace in the text frame, don't just assume that it fits in the available width --- check and pass in the right value.

This passes all reftests, including the soft-hyphen reftests I'm about to attach.
Comment 137 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-06-28 04:22:41 PDT
Created attachment 270165 [details] [diff] [review]
reftests

soft-hyphen reftests that check various combinations of soft hyphens and inline element boundaries, where things get tricky.
Comment 138 Simon Montagu :smontagu 2007-06-28 20:55:59 PDT
Comment on attachment 270164 [details] [diff] [review]
textframe changes

kewl!
Comment 139 Gábor Stefanik 2007-06-30 11:55:58 PDT
Shouldn't this be RESOLVED FIXED now? (Resolve it if you agree.)
Comment 140 Gábor Stefanik 2007-06-30 11:58:10 PDT
Nevermind, they aren't checked in yet. Sorry for the spam.
Comment 141 Robert O'Callahan (:roc) (email my personal email if necessary) 2007-07-01 18:16:32 PDT
Checked in.

If you find soft-hyphen related bugs, please file them as new bugs. DO NOT reopen this bug.
Comment 142 Jesse Ruderman 2007-07-21 02:25:43 PDT
Comment 90 (Find should ignore soft hyphens) is bug 294615.
Comment 143 Jo Hermans 2007-08-25 11:01:47 PDT
*** Bug 393691 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.