[5.xP] Mozilla always renders the soft hyphen (­) entity as a hard hyphen.
Soft hyphens should only printed at the end of a line when they are breaking up
a word. Easy to fix even without adding hyphenation logic-- ignoring all soft
hyphens would be correct behavior.
Kipp -- lexomorphic transform bug.
Marking as a feature request; currently ­ is mapped into the appropriate
unicode code. There is zero lines of code, after that, to handle how it should
The present behavior of rendering a soft hyphen all the time does not comply
with the HTML 4.0 specificiation (http://www.w3.org/TR/REC-
Created attachment 906 [details]
I originally "claimed" this bug for the bugathon, but firstname.lastname@example.org has
posted a testcase. I changed the status whiteboard to indicate this.
I've fixed the short term issue such that shy characters will no longer be
rendered. However, they don't work either. So I'm latering the bug for that
Note for future code archeologists: the code that hides the shy characters lives
in the nsTextTransformer.cpp
Marking as verified later
If you look at this bug, have also a look at #31304
*** Bug 55191 has been marked as a duplicate of this bug. ***
Reopening as an enhancement request.
Clearing milestone, updating summary as soft hyphens are not displayed anymore. The
original bug was that soft hyphens are either displayed correctly or not displayed at all
(either is correct behaviour). At the moment soft hyphens are not displayed, this bug is
about getting this feature working.
*** Bug 64626 has been marked as a duplicate of this bug. ***
This (the original) bug is actually still valid. Soft hyphens are rendered as
hard hyphens in XUL (for an example, use the 'Links' panel in the sidebar on a
Web page that uses soft hyphens).
Created attachment 39944 [details]
Soft Hyphen Test in a Simple Table
I don't understand why Soft Hyphen is tagged as an enhancement.
HTML 4.01 9.3.3 clearly and firmly states:
"In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen."
The soft hyphen is in the specification and any browser that professes to
support the specifications must support it.
We never break a line at a soft hyphen, and we never display soft hyphens, which
is the minimum necessary to follow the semantics:
If a line is broken at a soft hyphen, a hyphen character must be displayed at
the end of the first line. If a line is not broken at a soft hyphen, the user
agent must not display a hyphen character. For operations such as searching and
sorting, the soft hyphen should always be ignored.
Documents would look much nicer if Mozilla were smart enough to break lines at
soft hyphens. Nevertheless, that's an enhancement, not a requirement.
buster is no longer work for netscape. reassign to ftang
We never break a line at a soft hyphen, and we never display soft hyphens,
is the minimum necessary to follow the semantics:
That is exactly what is wrong with Mozilla's behaviour regarding soft hyphen and
come to the wrong conclusion. The minimum requirement is;
1. "If a line is broken at a soft hyphen",
(then) a hyphen character must be displayed at the end of the first line.
2. "If a line is not broken at a soft hyphen",
(then) the user agent must not display a hyphen character.
See http://bugzilla.mozilla.org/showattachment.cgi?attach_id=39944 with IE 5.x
see how it should work. This test case could also be applied in a div for
This in Bug#:9101;
Status Whiteboard: soft hyphens must be ignored or properly displayed
is not correct. It should read;
Status Whiteboard: if a line is broken at a soft hyphen, it must be displayed
Resolve bug, changing resolution to FIXED
Resolve bug, changing resolution to LATER?
Documents would look much nicer if Mozilla were smart enough to break lines
soft hyphens. Nevertheless, that's an enhancement, not a requirement.
Not so, unless Mozilla is one of those browsers that do not interpret soft
The Unicode Soft Hyphen is in the specs, IE 5.x supports it, Opera has a bug
and Mozilla must also support it.
http://www.w3.org/TR/REC-html40/struct/text.html#h-9.3.3 clearly stipulates it
a requirement and in no way is it only an enhancement.
It is only correct in behaviour, as is, for searching and sorting operations.
email@example.com: I have no clue about your last comment, I cannot figure
which part is your opinion and which part is quote of previous comment.
This bug is currently an "enhancment" since mozilla already fulfill the minimun
>1. "If a line is broken at a soft hyphen",
> (then) a hyphen character must be displayed at the end of the first line.
>2. "If a line is not broken at a soft hyphen",
> (then) the user agent must not display a hyphen character.
mozilla does not do the "if" in 1 therefore there are no requirement to perform
the "then" part in 1
mozilla does do the "if" in 2 and also do the "then" part in 2.
There are no requirement that uesr agent have to do the "if" part in 1. that is
not part of the minimun requirement.
There are no XML nor XUL specification mention softhypen so the XUL argument
does not exist.
If we display wront in XUL, please file a seperate bug about XUL and assign to
mark it as future.
Renaming from "Support correct display of..." to "Break lines at..." since the
meaning of this bug has shifted from minimum compliance to enhancement.
*** Bug 150401 has been marked as a duplicate of this bug. ***
1. I just noticed that there's no hyphenation or breaking of lines whithin the
2. There's also a (user experienced) heavy bug, which appears when altering text
whithin the "insert HTML"-window that contains ­s. The ­ entity isn't
written out, and the editing of the text behaves very inconsistently. I'm afraid
that the best description I can give you now, however, I figure this last "bug"
should be reported somewhere else (don't know where).
*** Bug 163281 has been marked as a duplicate of this bug. ***
My submission #163281, which was a dup, incorrectly stated that ‌ should also provide an invisible opportunity for line break. However, I have discovered that there should be an "opportunity for line break" before or after the —, and after the –. Mozilla does not do either. Perhaps that should be a separate bug?
See the article here:
Why is this bug still marked "Future"?
This is also an important i18n issue. English language usually has much shorter
words than many other languages in the world. No browser has automatic hyphenation
which would be a lot more difficult to implement. But in Internet Explorer
the webdesigner can at least "help" the browser in case of longer words.
So, if i.e a German or Turkish webpage is rendered in Mozilla large gaps occur,
especially within table cells. This is quite visible on some pages and
people will notice that IE displays the page correctly while Mozilla doesn't in
its 1.2a version.
uhm... we support <wbr>, it can't be too hard to take that and add a hyphen...
Following up on my comment in #25: in addition to break opportunities
around various dash characters, entites such as emsp, ensp, thinsp, etc.
should also provide break opportunities. Mozilla (1.1) treats them as
non-breaking spaces, altho the widths are drawn correctly.
By section 9.1 of the HTML 4.0 standard, only line breaks in the source and the space, tab, form feed, and zero-width space characters are considered "white space" characters.
Since thinsp, ensp, and emsp are not included in that definition, it is standard to render them as non-breaking spaces. It may even be intentional, considering that specific space widths would imply that the author is trying to be exacting about positioning.
I don't think section 9.1 of the HTML spec (i.e. the definition of whitespace
in HTML) has any relevance with regard to breaking lines. The HTML spec even
points out that rendering of words should be layed out according to the
conventions of the language.
For breaking lines, I think the Unicode Line Breaking properties,
should be taken as normative. HTML overrides these only with regard to
by explicitly specifying that they are not line breaks in HTML.
The spec then elaborates that it imposes no requirements on behaviour of
other space characters. So an implementation that uses the thin space as a
breaking opportunity would still conform with HTML 4, while simultaneously also
conforming to UAX#14.
Reading thru that Unicode specification, I notice this in the definition
of the "SP" category: "SPACE, but none of the other breaking spaces, is
used in determining an indirect break." The other spaces are categorized
as "break after" and so provide a direct break opportunity.
The difference between direct and indirect breaks is particularly subtle
when dealing with these specific-width spaces. As I read it:
Assume the text section in question consists of "prev next".
If "prev" + emsp extends beyond the margin, the break takes place before
"prev", so that the width of the space is apparent on the next line.
If, instead, "next" exceeds the margin, then the break occurs before "next",
so that the width of the space is apparent on the current line.
(For right-justified text breaking after the emsp, the emsp is 'visible'
at the margin.)
The common ASCII SPACE, however, causes an indirect break. In this sense, it
acts inversely from the ­ -- it has zero width if it is at the break, otherwise it is rendered 'visible' between the words.
Mozilla does not break at all on the special spaces. Opera6 handles them as described above. IE/Win breaks on them as described, but does not maintain space width at the right for justification purposes -- except, for some reason,   provides a break opportunity on either side, not just after.
Also in the cited Unicode spec, I found this for category "GL" (glue):
"The word joiner character [U-2060 = &8288;] is the preferred choice for
an invisible character to keep other characters together that would
otherwise be split across the line at a direct break." Mozilla displays
this as a (non-breaking) glyph -- a question mark, with my current setup.
(Opera and IE both handle the character the same way.)
Nice dicussion on breaking and non-breaking enities and spaces, but somewhat
unrelated to this bug, isn't it?
Actually, the issue at hand (correct me if I'm wrong) could be described as follows:
(1) ­ is a character where the current line may be broken
(2) the display of this character depends on what it is followed by (namely a
linebreak or anything else)
So basically, (2) makes me think of other combining characters (think Tamil or
the like). Now my questions are: How are these cases rendered? Is the rendering
engine able to ask for the next thing in the layout?
If I'm completely wrong, sorry for the spam. In that case: any chance this is
getting fixed in the foreseable future?
Extensive discussion at
I have to say, I am quite unimpressed by Prof. Korpela's assertion that the
SHY should always be rendered. He admits that ISO-8859-1 is ambiguous, then
declares his interpretation, then goes on to insist that any subsequent spec
or clarification effort is obviously mistaken. I consider him to be an
I believe the relevant, and quite clear, specification is found in
See the section on "breaking hyphens" and the particular discussion of the SHY
Reading Mozilla 1.3 code in regards of ­ and <WBR> processing (both
are mentioned in comments over the years), I do think that to make things
to render properly, there should be some magic to translate ­ to
"<shy>", and let the latter to do the actual work. Perhaps similarly
to several other punctuation kind entities ?
Understanding the code flow of about 3 million lines of CPP and H files
is .. quite challenging..
What little I have understood is that character entities get translated
into unicode (UCS2?) for internal storage, and then rendered from there
as strings. Tags are separate items with separate presentation boxes,
working at much higher level, therefore turning some characters into
(internal) tags could aid the rendering problem ?
Very important to switch from MSIE. Especially for languages with long words
such as Russian or German.
Thus, adding dependency with bug 164421 and suggesting to add "intl" keyword.
*** Bug 203181 has been marked as a duplicate of this bug. ***
I don't like the way Mozilla treats soft hyphens.
First, when a text block is presented with text-align: justify;, and when a word
with ­ occurs in the end of the line - in this case this word often goes
beyond the right edge of the text. This looks very sloppy (for example, page on
my homesite - http://www.philigon.ru/prosa/stories/essay.html).
Second, sometimes Mozilla breaks line on the ­ (in the cases such
ac­ces­<span>si­</span>bi­li­ty), but doesn't display the
hyphen itself, which looks sloppy, too.
IE and Opera don't have these bugs.
I found it very disturbing that the Mozilla developers see this bug as an minor
one (priority 3 and target: future).
In many European languages (like Danish) which has long words - some words are
so long that they can't be handled by this textarea input box.
Here soft hyphens are essential to get readable websites.
And as others already have said it is a part of W3C HTML specifications.
The best weapon in compeeting with closed source software is open standards - so
let us support them as best as we can - and wait for support extra stuff.
This bug was open in 1999 ! :-o
Is it so difficult to code ? IE has it for ages. It's a shame IE handles soft
hypen better than Gecko. At least Gecko does not break on hard hypens :-/
Assigning to default component owner.
Too bad no target has been set yet for this bug. Hyphenation is an essential
part of publication in many languages in the world. The HTML-standard provides a
graceful solution to the problem. Why does it take so long for Mozilla to fix
*** Bug 215663 has been marked as a duplicate of this bug. ***
Just curious - why is this bug still "NEW"?
Note that bug 95067 comment 68 introduces a patch which, it is claimed,
"includes some changes for hyphenation of soft hyphen"; this is part of a larger
bug on improving wrap capabilities generally.
Should this bug be considered to cover other hyphen-like characters, such as
mdash, or strictly shy? E.g. Moz 1.4 fails to break at "—" boundaries;
even the mini HTML renderer in Java/Swing 1.4.2 will break after "—", I
found. Inadequate mdash handling gives a poor appearance to English text. Cf.:
Bug #56652 seems more general in regards to line breaking properties of
characters. Do we need a tracker bug, or is it likely that all these problems
could be solved together?
Re: Jesse Glick's comment 49:
Bug 206152 is "[meta] line breaking bugs" but is inexplicably blocking, rather
than depending on, this and almost every other bug it's tracking.
­ is a special case because not only does it control breaking, the display
of the glyph is variable depending on whether the break occurs or not. Also, I
think the correct handling of ­ is more in demand than the correct breaking
around — – and similar characters.
Guys, the browser is now EXCEPTIONAL, can you just postpone all other
enhancement activities and give consideration to this particular horrific issue.
Today is 2003! We already widely use XHTML and you can't deal with ancient HTML!
Please, give us a chance to switch our users from IE to Mozilla!
re comment 50: surely breaking at soft hyphens can be handled the similarly to
breaking at spaces:
If a line break is required at a space character, the space character is
replaced with a carriage-return.
If a line break is required at a soft hyphen, the soft hyphen should be replaced
with a hyphen and a carriage-return.
Is it really that simple, or am I missing something?
Created attachment 134477 [details]
Ghost chars apears when marking text with shys. See http://magnus.de/shy/
Ghost Chars appears while marking a Text with <shy>
If you mark text in the http://magnus.de/shy/testcase.html some "ghost chars"
will appear at the right side of the column. Sporadically the text rendering
will become nearly destroyed after unselecting.
It happens when the text contains soft hyphenation signes. Either in the form
<shy> or in the compact form of Char #173, which looks like a '-'. The bug
ccures regardess of the kind of text justification.
Unhappy with Mozilla
Therefore we are unhappy with Mozilla. It does not make a right use of soft
hyphenation and it is impossible to mark such text correctly. The bug 9101 is
open since years, while Internet Explorer handles all this right since years!
Because we are an Internet Service Provider to, we are distributing Starterkit
CDs containing Mozilla ...
Comment on attachment 134477 [details]
Ghost chars apears when marking text with shys. See http://magnus.de/shy/
Screenshots on http;//magnus.de/shy/
I really would welcome if this bug would
-be set to another status as "NEW" after so many years
-be given a severity of at least "normal" now
-received now a definite target milestone
When this bug first appeared, it may have been an "enhancement" for the "Future".
But this bug now is open since June, 1999 and still no fix.
We can hope it gets fixed now in the foreseeable future.
Unfortunately this bug cannot be chaged from NEW to ASSIGNED right now, nor can
a target milestone be set, because no developer is available to fix it. I am
adding the "helpwanted" keyword. Hopefully someone with the knowledge and skill
to implement this will volunteer someday soon.
I do think, however, that this is not an enhancement but a failure to follow
HTML guidelines. As such, it should be a proper bug.
Of course, if things continue to remain unresolved here, it could all be
eventually "fixed", assuming that we end up supporting the following CSS3 proposal:
Yes, this should be a "real" bug.
Microsoft Minions begin argueing "IE can handle ­, but Mozilla doesn't comply
to the standards..."
This can't be. Assign that bug!
*** Bug 230045 has been marked as a duplicate of this bug. ***
Re: #42 From Hadrien Nilsson 2003-07-16 00:53, where it was stated:
> Is it so difficult to code ? IE has it for ages. It's a shame IE handles soft
> hypen better than Gecko. At least Gecko does not break on hard hypens :-/
I have to argue with you about the "at least Gecko does not break on hard
hyphens" part. Why is that good for anyone? I encounter it again and again
that someone doesn't know how to make an anchor, and just leaves it to whatever
software to linkify an URL, and they paste 900+ pixels wide amazon.com or news
sites' URLs. I, for one, would be much much happier if these broke on those
Then again, this part of the issue seems to have been discussed in Bug 95067.
Re: #60 From Henrik Pauli 2004-01-04 11:04, where it was stated:
> I have to argue with you about the "at least Gecko does not break on
> hard hyphens" part. Why is that good for anyone? [...] I, for one,
> would be much much happier if these broke on those Hyphens.
I suppose that the point was that MSIE breaks on hard hyphens, even when it
could simply reject the whole word at next line.
I agree with you that, when it is necessary to split a word, it is better to
split it after a hard hyphen than anywhere else.
But I also agree with Hadrien that it is a *bad* thing to split a word after a
hard hyphen when it is possible to write the whole word within the box boundaries.
It seems usually good to leave hyphen-minuses alone when numbers are involved.
However, in most cases people use hyphen-minuses instead of hyphens, nb-hyphens,
en and em dashes... Just like how the *nix community likes to use grave accents
and apostrophes for quotation (UGH!). In most cases people:
1) don't know about Hyphen
2) don't have easy access to it (yay for my custom keyboard layout with en and
em dash and horizontal ellipsis)
3) just plain don't care.
But should that mean that words that naturally contain a hyphen (and I can tell
there are lots of such in Hungarian) should continue breaking sites' looks, just
because people can't bother using / can't use the proper characters?
Indeed, if it fits inside the box, it's probably safer (not 'better' as such,
looks ugly if it's text! though with numbers it could be understandable.) to
push the word to the next line. Alas currently this is followed by stretching
the box (say, table cell) when that's possible, instead of breaking the stupidly
long something down where it'd be decent.
Question: -1234 should use the Minus Sign U+2212?
*** Bug 231251 has been marked as a duplicate of this bug. ***
in­for­ma­tion: ie has been do­ing soft hy­phens for some time
e­ven though de­vel­o­per at­ti­tude there is
si­mi­lar to what I read here: we don't got­ta do a­ny­thing
we don't wan­na do.
in­for­ma­tion: ie has been do­ing soft hy­phens for some time
e­ven though de­vel­o­per at­ti­tude there is
si­mi­lar to what I read here: we don't got­ta do a­ny­thing
we don't wan­na do.
I really hope this bug still isn't "NEW" when it hits its 5th birthday in a few
I came to this bug because I found that Mozilla wasn't breaking lines at hyphens
where Internet Explorer was. Personally, I'd like to see a resolution where
Mozilla could recognise some kind of magic hyphen (like ­ for instance) that
would break lines and also be rendered (always). Unfortunately the w3c
specification doesn't seem to agree with me, as has been stated earlier, the
HTML4 spec asks that ­ does not render when it is not breaking.
Do I need to make noise here or take it to the w3c mailing lists?
Neither. Bug 95067 is on file, and noise is not helpful.
Adding [p-opera] to whiteboard. Opera 7.20+ for Windows supports soft hyphens.
66> I really hope this bug still isn't "NEW" when it hits its 5th birthday
66> in a few months time...
65 votes and 8 duplicates. I hope so too.
But it wouldn't have so many duplicates if NEW bugs weren't excluded
by the default search settings.
Safari supports soft hyphens too now.
david hyat is right.
one might say: if it was so easy for him to implement, he sure did not take into
account that in some european languages words alter spelling, when hyphenated.
as a soft hypen is placed at the authors will, and not algorithmically by the
browser, the author can decide, whether she wants a linebreak to occur or she
wants correct spelling with no break.
there are rare places, esp. in cms, where hyphenation can save a layout from
falling apart. e.g. german language/writers favour long compounds of words ;)
"e.g. german language/writers favour long compounds of words"
The situation is similar for nordic languages (swedish, norwegian, danish), but
as the quote above may imply, we do NOT choose long compounds of words - this is
how the language is formed, there are no "shorter alternatives". Hence, mozilla
really does awful things to many sites (mostly news papers, e-zines and alikes),
naturally having users prefer other browsers.
And this is truly strange - the main thing about the web is to read/present text
(as with news/articles) and although this bug was reported almost five years ago
- breaking the number one task of a web browser - it's still there. Status quo.
Zip progress (from user perspective).
Sorry for making "noise", but this prevents convincing "decision makers" that
Linux is a mature option for schools etc when "they can't even produce a
Everyone agrees this should be implemented and that it is important.
Now someone just needs to step up and do it.
(In reply to comment #71)
> one might say: if it was so easy for him to implement, he sure did not take into
> account that in some european languages words alter spelling, when hyphenated.
I'd like to add my comment here and nod a little. Will go a bit off-topic, but
I doubt anyone will mind it, especially since this bug is so silent anyway.
In Hungarian -- and here's when I grumble at some linguists and some Unicode
people -- if you see a long sz, it's written as ssz. However, when hyphenated,
it's sz-sz. Now we see how that would be just *snap* this easy, had we an sz
code (and zs, and ty, and...) in Unicode and then if we see two sz characters
next to each other, they'd get nicely rendered by the UA to look like ssz. If
you can have ligatures for Arabic, can build funky strings of Devanagari, and
Croatian and Dutch have DZ/Dz/dz or IJ/ij, it would be a great help to about
having to give the browsers biiiig dictionaries so that they recognise a long sz
letter or a long gy letter, and so on.
Hyphenation is a big problem, and even when one goes out of their way to put shy
characters all over their text, it still remains a problem, regardless of
support for it.
Hungarian too, as is Finnish and the Germanic languages, is full with long
enough words to encounter problems with them. "Automatic" hyphenation like in
(La)TeX would be great, but... I know this is not publishing ;) We have to make
it with shy, and as that, it would be great if we got support for it, after all
With all due respect, that's not the problem here. Auto-hypenation would be a
Nice Thing, and could possible get a bug all of its very own where you can
discuss the problems with it in every single language ever, and that may even
*depend* on Mozilla/Fx/etc. handling ­ correctly. But auto-hypenating is Not
This Bug, and the discussion makes it look even more complicated than it could
Should be fairly fixable by hacking nsTextFrame line measurement code.
*** Bug 244263 has been marked as a duplicate of this bug. ***
With regards to the page http://www.cs.tut.fi/~jkorpela/shy.html which summarize
the use of SHY quite nicely, I wonder if it's possible to—in an html
document—insert a SHY and with CSS customize it to behave differently for
For example, that SHY in one case may do nothing and not being shown, while in
another is shown as a regular hyphen but nothing happens, and in yet another it
is only displayed if used for breaking/hyphenating the word at an end of the
line et cetera...
Is this possible? It would, as I see it, fix some of the argument of its use and
behaviour. (Similar could be done to other hyphens and different kinds of spaces.)
Much has been discussed here and much has drifted off course.
In this case, you sadly don't understand the purpose of or how the soft hyphen
is used. There is only one purpose of the Soft hyphen:
The soft hyphen tells the user agent where a line break can occur. The hyphen
may only be displayed at such a line break. Period.
I suggest that you check out the "Soft Hyphen Test in a Simple Table" in the
Attachments above. Run it in IE or Opera (which support it) and compare it to
any Gecko (which do not support it). Compare the result and look at the source
code to see how it is used. Note the multiple entries of the soft hyphen in the
last table which indicate the only permissable line break points in the word. In
IE or Opera, the hyphen is displayed correctly _only_ at the permissable line
break point, which is the last soft hyphen entry prior to being overset
(including the width of the hyphen character itself) for the required linelength
and as dictated by the soft hyphen entries.
Opera supports it now. IE has supported it for close to a full decade. Mozilla
still does not support it.
*** Bug 250741 has been marked as a duplicate of this bug. ***
Why correct hyphenation (and &shy;) is important to me:
In portuguese, we use soft and hard hyphens.
Basicly, soft hyphens where a word can be broken down in two (a "-" is added at
the end of the first part) and the second part is placed in the next line.
The hard hyphens work also as soft hyphens, but with a revenge... in this case
the word is splited in two at that point and ONE "-" is placed in the end of the
first part and ANOTHER "-" is placed in the begining of the next line, before
the second part of the word.
the word "manda-se" would be broken as
"manda-" in one line... and...
"-se" in the next line.
(no idea what kind of hyphen is used in this case)
the same word with a soft hyphen:
"man&shy;da-se" would be also brokenable as:
"man-" in one line... and...
"da-se" in the next line.
So yes, proper hyphens are a must!
Peter-Paul Koch's browser compatibility test & results for: <wbr>, ​,
Seems implementation of ­ should be similar to ​ which is supported.
Per comment #70, adding [p-safari] to whiteboard too.
Please please please fix this bug, finally. After writing Firefox
Plug-In managers, NTLM authentication and what not - I just cannot
imagine this bug is nearly as difficult.
Is it perhaps because the Mozilla core team are all speaking English
natively that they do not understand why this bug is important?
This simple bug destroys any layout with margin columns as soon as there
is one long word which is very often the case in languages other than
English. Everybody at my office using Mozilla for a while notices this
because it is so obvious if you read a language with long words
(German, Turkish, Finnish - you name it).
So, now get angry at me for spamming this bug. Most of the time I've kept
my mouth shut, because I know that many wishlist items are really
difficult to do and I've been patient for quite a while but here I just cannot
believe why this is so difficult to do that we have to wait now
for nearly 5 years for HTML 4 compatibility.
A quick example why this bug needs to be fixed: consider the following word,
which is the traditional name for Bangkok (yes, really!):
this bugger breaks the layout quite frequently, especially if you have a low
resolution screen. That's a little anoying when you try viewing a page that
contains that word.
*** Bug 273939 has been marked as a duplicate of this bug. ***
Can we please finally fix this? It's not only for typography but also for web
sites that have narrow columns with lots of content and web sites were users can
screw up other users by using very long words. This bugtracker is an example,
but there are lots of forums which also have this issue. It should be relatively
simple to support (after all the web site already has the hyphenation points
If anyone cares, I've created a really nasty JS kludge to enable soft hyphen
support for Mozilla:
I don't have the C++ skills required to hack on the Mozilla source, but I'm
betting that the three hours it took me to code the kludge is not much less work
than it would take to fix Mozilla once and for all. I'm hoping that, by putting
this really awful implementation out there, a Mozilla coder will actually take
up the challenge before my solution becomes common.
(In reply to comment #16)
> We never break a line at a soft hyphen, and we never display soft hyphens, which
> is the minimum necessary to follow the semantics:
> For operations such as searching and
> sorting, the soft hyphen should always be ignored.
> -- http://www.w3.org/TR/REC-html40/struct/text.html#h-9.3.3
As far as I can tell, in Firefox 1.0 the search is quite broken for soft hyphens.
Adding myself to CC:
Can someone please, *please* look into this?
Opera has finally gained soft hyphen support a while back, but the Gecko
browsers are still blissfully unaware.
*** Bug 281797 has been marked as a duplicate of this bug. ***
Ok, just a question: What would it _cost_ to have a mozilla developer fixing
this bug? I remember using Mozilla 0.8.? and thinking: They are doing good work,
they just need time. Now I have 1.8.a6. I really want this in 1.8!!!
Maybe we could have a "Mozilla soft hyphen development found".
(In reply to comment #93)
> Ok, just a question: What would it _cost_ to have a mozilla developer fixing
> this bug? I remember using Mozilla 0.8.? and thinking: They are doing good work,
> they just need time. Now I have 1.8.a6. I really want this in 1.8!!!
> Maybe we could have a "Mozilla soft hyphen development found".
Good question. And I'd even add a few bucks in. But I'd like to take this
discussion to the MozillaZine forums:
Fixing bug 255990 would fix this bug at least partially. For now, let this
depend on that bug.
*** Bug 308206 has been marked as a duplicate of this bug. ***
(In reply to comment #96)
> *** Bug 308206 has been marked as a duplicate of this bug. ***
One would think that the amount of traffic on this bug, and the fact that so
many people have reported it (evidence: messages that say "marked as duplicate
of 9101") would make the Mozilla developers want to just fix it to get it out of
their hair. Even comments from two years ago are saying, "C'mon, guys! Why has
this frustrating bug just sat here for so many years?" That was in 2003. Now
it's 75% of the way through 2005 and it appears no activity has been given to
this ­ bug since 1999. (Maybe the developers have gone to help clean up New
Orleans from the mess that Katrina made...)
Looks like the developers are too shy when they are facing this lady ...bug.
Please solve this bug!
Please read comment 73 before posting further advocacy.
Sigh - sometimes Open Source requires _really_ much patience.
Not everyone interested in a solution here is able to hack the Mozilla
codebase unfortunately. So most of us will just have to wait. Like with
many other bugs. That's OK considering the fact that we all get Firefox
for free and it's become great software with or without this bug.
Nevertheless sometimes the discrepancy between a seemingly simple bug
and a time of 6(!) years means our patience is tested to the max.
In the meantime we've seen it all: Mozilla, Phoenix, Firebird, Firefox,
Whole new plugin architectures, new preference dialogs, etc. It's hard
to believe that this major bug for international websites is still there.
The longer the words in a language the worse it gets. This is destroying
all layouts with narrow columns in those languages.
So I hope that developers at least understand that we are a bit impatient to
finally see at least some progress or a bit of information on this bug and
why it isnt implemented.
Six years!! Oh my god~~
sorry for the spam but i too think that its simply not acceptable to leave bugs
open for that long. Its almost the same with inline-block which is my personal
RFE #1. The request to implement inline-block is also 6 years old. Its been i
dont know how many years in css 2.1 now. And its part of Acid2. If it would have
been fixed years ago gecko wouldnt be the 2nd last rendering engine to support
acid2. But at least there is hope for 1.9 ;)
The thing is, in open source you just gotta do things yourself. Believe me, i
would love to do these things myself. But i cant code C its just as simple as that.
You know what? I'm checking out the source right now. I've never done any
mozilla hacking or even built my own build. Some xp in C and C++ hacking, but I
guess this should be a good way to score some geek points :)
I'm not saying that I will succeed in fixing this bug, but at least I will try.
Any pointers as to where I should begin looking?
Please note that I really have no idea, but I'd start looking in /parser,
specifically the nsHTMLEntit* files and maybe nsTextTransformer.[cpp|h] in layout/.
Ok. I should have fixed bug 255990 last month, which will help fix this bug at
least partially. I'll try to spend some more time with it.
Starting in Q3 2006, this bug is likely to corrupt layout on the following online stores:
Czech R: http://www.electroworld.cz/
As noted earlier, several European languanges differ from English in the way nouns are joined into compound words.
We are preparing a new version of several big-brand European online stores using the same technological foundation. For these stores, many of whom are market leaders in their respective countries, we will use a layout where 3 products are shown side by side, with teaser text to the right of a teaser image. This demands that text columns are no more than 80 pixels wide, and this, again, demands soft hyphenation. Since IE, Safari and Opera supports ­ we are moving forward with this layout principle.
I will provide page view statistics when I get them, but as I say, these are large-brand physical stores, and as we release the new online stores platform their online presence will surge.
Regrettably, as a consultant I don't have the resources to initiate a bug fix.
From Q3 2006, this bug may affect some additional 1.000.000 to 1.500.000 unique users.
Background: I now have some preliminary page statistics for the sites listed in comment #106. Based on the November 2005 traffic generated on the up and running web sites, the combined total traffic in Q3 2006 will be between ten and fifteen million unique visitors per month. If 10% if these visitors use Gecko, we arrive at the numbers above.
Also, we are discussing CMS soft hyphenation auto-insertion technology with some major vendors. A possible standardization of this feature will add additional sites to the list above.
The wordbreak code appears to be at http://lxr.mozilla.org/mozilla1.8/source/intl/lwbrk/src/
We'll support hyphenation in the next major Gecko release, which will be early 2007.
In the meantime if someone wants to get soft hyphens working in the 1.8 branch, which will be shipped earlier, feel free; I'll be happy to review a patch. The relevant code is all in layout/generic/nsTextFrame.cpp, you shouldn't need to mess with line breaking itself. Unfortunately that is nasty code which is why I want to wait until that code gets cleaned up, which is starting to happen on the trunk.
Given that proper hyphenation is going to be implemented by a substantial rewrite in 2.0, it seems to me that the soft hyphen implementation for 1.5.x is likely to be a throwaway hack.
In that spirit, what about considering the rather brutal approach of turning all the ­ in the source text at parse time into something like <wbr softhyphen="true">, and then leveraging the <wbr> code, which seems to do nearly the right thing already, and using the (internal representation only, nonstandard) attribute "softhyphen" as a flag to trigger the generation of the hyphen characters?
Perhaps a special style mechanism could even be used to handle the hyphens, eg turning ­ into
<span style="display: x-mozilla_at_end_of_line;">-</span><wbr><span style="display: x-mozilla_at_beginning_of_line;">-</span>
where the x-mozilla stuff defines special automagic CSS properties to display the hyphens only at the start or end of the line, respectively?
> In that spirit, what about considering the rather brutal approach of turning
> all the ­ in the source text at parse time into something like <wbr
> softhyphen="true">, and then leveraging the <wbr> code, which seems to do
> nearly the right thing already, and using the (internal representation only,
> nonstandard) attribute "softhyphen" as a flag to trigger the generation of the
> hyphen characters?
The most important thing here is to brake the line if needed (to avoid destroying the layout).
To display the hyphen is less important for an intermediate patch.
IMO it would be enough for now to handle every ­ and ­ the same way as <wbr>.
Is there anyone here who is able to do this?
Is there any sort of macro mechanism already provided in the parser code where we could perform this sort of manipulation?
An aside: in bug 6347, the suggestion is made that
<span style="font-size: 0px"> </span>
is a more standards-based alternative to <WBR>. But let's just go with <WBR>, if it does what is needed; a hack's a hack, so let's keep it a small hack so it can be easily chopped out later when the proper hyphenation code comes along.
Breaking the line without adding the hyphen can result in loss of information; for example, if something means something different when it's one word than when it's two.
Breaking the line and adding the hyphen when doing so requires a good bit of care due to the complexities of text measurement, at least if you don't want the hyphen to overflow in some cases.
In any case, fixing bugs like this is not within the criteria for security and stability fixes that are taken for 1.5.* minor releases.
1.5?? I was hoping we could shoot for 3.0 or maybe 2 if we're lucky ;)
> The most important thing here is to brake the line if needed (to avoid
> destroying the layout).
> To display the hyphen is less important for an intermediate patch.
I strongly disagree, since breaking composite words in two (or more), will in many cases totally change the meaning of what the text says.
Example (with the Swedish language):
The two words "upprätt hållande" translates to "holding [something] pointing up" while "upprätthållande" translates to something like sustaining/mainting [something] during some time. There's a huge difference in "life sustaining equipment" and "equipment that holds stuff pointing it upwards".
The example/translation is not 100 % accurate, but as of now I can't figure out a better way to demonstrate how bad it is to break the line without displaying a hyphen.
This problem has been spoken of for 6 years now, even safari supports it better, isn't it time to fix this?
(In reply to comment #118)
> This problem has been spoken of for 6 years now, even safari supports it
> better, isn't it time to fix this?
I am not a developer, but I hope that someone who reads comments can get this:
As noted in comment #109 and comment #110, Mozilla will fix this in 3.0, as it would be much more complicated to fix it on current code, and (my interpretation) pretty useless, as there will be less than a year between 2.0 and 3.0 releases.
IMHO, this sounds reasonable, and if someone feels this is not proper he may create the fix himself.
(In reply to comment #119)
> I am not a developer, but I hope that someone who reads comments can get this:
Nice tune. Maybe it's you, who ought to lean back and reflect a bit more after having read the roughly 100 messages ...
> As noted in comment #109 and comment #110, Mozilla will fix this in 3.0, as it
> would be much more complicated to fix it on current code, and (my
> interpretation) pretty useless, as there will be less than a year between 2.0
> and 3.0 releases.
How do you even dare saying it's "pretty useless" to fix the bug right now? Haven't you read the messages of dozens of people trying to explain why thousands of people are affected by the bug? Of course everybody regularly looking at poorly mozilla-rendered (*non-english*) pages due to ignoring ­ is getting more and more assured it's "pretty useless" to bring this bug to the attention of someone knowledgable. As of me, I'm really getting upset and even angry about your ignorant attitude. You may know nothing about the fact that there are languages who heavily rely on the composition of words and thus "produce" partly very long words which really depend on a working ­- implementation; thinking about it, you should know about this --- just read the thread! You must not pretend though, it's "pretty useless" to fix the bug just because you may not need a working implementation. If you do not know nothing about the needs of other people, don't tell them what's good for them; just shut up.
In addition I don't quite get your argument regarding the phrase "less than a year". Is it in the lines of "Well guys, you're waiting for this for about six years ... you shouldn't care less about another one"? To the contrary! This bug is so important to many people, it would be worth to fix it even if the whole firefox code-base changes in a week.
> IMHO, this sounds reasonable, and if someone feels this is not proper he may
> create the fix himself.
Your opinion is not humble, it's just solely based on ignorance. Now to the "fix himself" part. It's true, as the source is open, anybody could theoretically come up with a solution to the problem and publish it; we even got an offer from Robert O'Callahan to do a review, which is very kind. In practice however, the bug seems to be quite hard to smash. Just look at the various wild proposals where to look at and how to circumvent it, without getting to the root of it. Robert even said it "is nasty code" and I suppose he knows what he's talking about. Jungshik Shin still seems to have a hard time fixing a related bug (255990). In essence, this is hardcore and many trying to invest their limited time and "give back a bit to the community" more successfully do so in other projects. At least this is true for me, but I strongly believe it's true for others as well.
Unfortunately the capable main-developers seem to not care about the severity this bug has for many people. That's their right. They do not owe me and others any of their time. However, from an ethical standpoint they ought to be more honest. The mozilla-project (and possibly others as well) should have a prominently placed message on their homepage: "We do not really care about other languages. We are mostly an english-based project."
(In reply to comment #120)
First, this is probably the best example of bug spam, but I have to answer at least once.
Second, I have voted for this bug, check it out.
Third, I was probably supposed to propose that this bug and some other important bugs (like random history loss) should be fixed on current Gecko, though it would mean a hard work, and fix would be rendered obsolete with Firefox 3.0. Unfortunatly, it would probably mean that Firefox 2.0 would be released after Firefox 3.0.
Clearly, I know that many people would like to live 1000 thousand years, and beleive it or not, I don't care about their needs.
We do care about non-English users. Part of the reason that this bug is so hard to fix is that, in fact, we do care. We've allowed people adding previous features in the text layout code (which was already pretty messy due to the myriad of CSS properties that supports) that were important to non-English users (such as line-breaking rules appropriate for CJK languages or bidirectional text layout so that Hebrew and Arabic are usable) to make the text layout code progressively messier and messier. Good code ownership would have meant that we wouldn't have accepted these features in their current state, because they make the code much more complicated and make adding new features harder. The end result is that (1) the code doesn't really have an owner -- somebody who understands it and can say what the "right" way to fix this bug is and (2) there are multiple codepaths resulting from the previous fixes (especially the CJK linebreaking) that all need to be fixed.
The solution is not to add one more important feature on top of the mess. The solution is to fix the mess so that this and other even more important features (such as the fact that many South Asian languages are basically unreadable with Mozilla) can be fixed more easily.
There are a bunch of problems with adding one more feature on top of the mess:
* it makes it even harder still to add the next important feature
* it has a high likelyhood of causing regressions, perhaps even security bugs, if it involves modification of complicated code that nobody understands
This is why we prefer waiting for the code in question to be rewritten, although I'm not yet sure how much the linebreaking code will be improved in the cairo landing. It is pretty clear that we do need a redesign of our linebreaking code, and it needs to consider features like this along with the requirements of CJK text, Thai text, shaping, combining characters, and other problems.
Stephen Blackheath has volunteered to rewrite nsTextFrame around the new gfxTextRun abstraction which has been designed for Thebes/cairo. He's hacked nsTextFrame before and I believe he'll pull this off. That will give us the platform to really fix this and other issues.
As well as automatic hyphenation (which would be great), TeX and LaTeX also offer non-broken hyphenation in languages like German, where shy and soft hyphens just aren't enough. I think the real solution is an update to the HTML standard to support TeX-style manual hyphenation, and an update to the CSS standard to allow styles to control automatic hyphenation where supported.
(In reply to comment #124)
> As well as automatic hyphenation (which would be great), TeX and LaTeX also
> offer non-broken hyphenation in languages like German, where shy and soft
> hyphens just aren't enough. I think the real solution is an update to the HTML
> standard to support TeX-style manual hyphenation, and an update to the CSS
> standard to allow styles to control automatic hyphenation where supported.
I think, the community don't need no academic pow wows. I'm still hoping for a little work, supporting the "­" indent at first, as other browsers do, before it ist time to discuss the big solutions.
I wonder whether an 'interim half-baked fix' (I made for bug 255990 but not uploaded yet) would be acceptable for 1.8 branch (that will be used for FF 2.0). That would do the following:
1. On Windows and Mac OS X, at least rendering of SHY is supported as it should be.
2. On Linux and other platforms, SHY is discarded as now, but the line can be broken on SHY if necessary
*** Bug 351593 has been marked as a duplicate of this bug. ***
(In reply to comment #19)
> >1. "If a line is broken at a soft hyphen",
> > (then) a hyphen character must be displayed at the end of the first line.
> >2. "If a line is not broken at a soft hyphen",
> > (then) the user agent must not display a hyphen character.
> mozilla does not do the "if" in 1 therefore there are no requirement to
> perform the "then" part in 1
Not true. The rule does not say "if a line is broken _by_ a soft hyphen" (i.e. using hyphenation rules), it says "if a line is broken _at_ a soft hyphen": when Mozilla is asked to break at a soft hyphen (e.g. by following the ­ with a <br>) then it ignores the "then" and does not display the hyphen as it should. (Opera 9 displays a hyphen in this circumstance.)
(In reply to comment #106 & #107)
This is a cry of frustration.
Dear FireFox developers, one year ago I estimated that this bug was going to have a negative impact on > 1 MILLION users on sites that I have designed alone.
One year later, FireFox is enjoying great popularity and it's a demand that layout is not broken, no matter who's fault it is. So you know what? FireFox stops me from implementing the design I desire.
For those of you who still read paper newspapers, you know how text is neatly organized in narrow columns. This is done because it is easy to read. It is user-friendly. It is tried and tested. It is the right way.
However, for German, Hungarian, Polish, Norwegian, Swedish, Danish, Finnish and a bunch of other European languages, this is impossible on the net.
Because of FireFox.
BECAUSE FIREFOX HAS BEEN BROKEN FOR MORE THAN SEVEN (7) YEARS.
David Baron (#122), you say you care about non-English users. Prove it by moving the target milestone away from "Future".
Will Cairo make fixing this bug easier?
No, but bug 333659 will, hence why this bug is shown as dependent on it at the top (and as some of the comments buried amongst the spam have indicated).
Let me make this clear.
This bug WILL NOT ever be fixed in Firefox 2.
This bug WILL BE fixed in Firefox 3, via the code going into bug 333659.
Please refrain from commenting further, at least until this bug and that bug are marked FIXED.
According to http://developer.mozilla.org/en/docs/Firefox_3_for_developers
under the CSS section this is now fixed. Confirmation from anyone?
I'm not marking this fixed yet because there are some serious bugs in the soft hyphen code. I'm going to submit a followup patch soon.
Created attachment 270160 [details] [diff] [review]
There's a small fix required in BreakAndMeasureText: *aUsedHyphenation should be false if all the text fit.
Created attachment 270164 [details] [diff] [review]
The textframe fixes required to make soft hyphens respectable:
-- HasCompressedLeadingWhitespace has a bug that I found with my soft hyphen tests. We were failing to advance aIterator, so the test for whether the current character is still skipped was bogus. Fix that and simplify HasCompressedLeadingWhitepace.
-- Make PropertyProvider::GetHyphenationBreaks refuse to allow a hyphenation break at the start of a line.
-- Issue #1 in Reflow: when we're doing "reflow the line again, forcing a break at a certain previously identified break opportunity", if the forced break opportunity is not a normal linebreak opportunity then it must be a hyphenation break opportunity, so set usedHyphenation to true to trigger hyphenation behaviour.
-- Issue #2 in Reflow: if there's a soft hyphen at the end of a text frame, we need to record it as a potential break opportunity.
-- Unrelated fix #1 in Reflow: if we broke this line at a forced break opportunity, we should not add trimmed whitespace to our metrics --- we're breaking so we might as well trim eagerly.
-- Unrelated fix #2 in Reflow: when determining whether a potential break opportunity is in the available width, use <= instead of < --- if it's exactly at the available width, it fits.
-- Unrelated fix #3 in Reflow: when recording a potential break opportunity due to trailing whitespace in the text frame, don't just assume that it fits in the available width --- check and pass in the right value.
This passes all reftests, including the soft-hyphen reftests I'm about to attach.
Created attachment 270165 [details] [diff] [review]
soft-hyphen reftests that check various combinations of soft hyphens and inline element boundaries, where things get tricky.
Comment on attachment 270164 [details] [diff] [review]
Shouldn't this be RESOLVED FIXED now? (Resolve it if you agree.)
Nevermind, they aren't checked in yet. Sorry for the spam.
If you find soft-hyphen related bugs, please file them as new bugs. DO NOT reopen this bug.
Comment 90 (Find should ignore soft hyphens) is bug 294615.
*** Bug 393691 has been marked as a duplicate of this bug. ***