Open Bug 479829 Opened 15 years ago Updated 4 months ago

A soft or auto hyphen within a possible ligature (eg 'f­f') paints a split ligature glyph

Categories

(Core :: Layout: Text and Fonts, defect)

x86
All
defect

Tracking

()

People

(Reporter: thomas.bsd, Unassigned)

References

(Blocks 2 open bugs, )

Details

Attachments

(6 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.0.6) Gecko/2009020409 Iceweasel/3.0.6 (Debian-3.0.6-1)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.0.6) Gecko/2009020409 Iceweasel/3.0.6 (Debian-3.0.6-1)

Soft hyphens and ligatures mix badly at a line break.  See the screenshots.  There are two problems.  Let's take the 'ff' ligature for example:

1) The part of the first 'f' which is usually above the second 'f' is on the wrong line.

2) There is too much space after the hyphen.

Reproducible: Always

Steps to Reproduce:
Wrap "a_long_word_f­f_blah_blah" between the two 'f'.

To reproduce the bug, you need a font where the two 'f' are overlapping -- and this is not always the case.  Take 'Minion Pro' or 'Linux Libertine'.



Additionally, you may note the lack of space between "s’" and "échauffait" for Minion.  But this is another problem.
Severity: trivial → minor
Component: General → Layout: Text
Product: Firefox → Core
QA Contact: general → layout.fonts-and-text
This is a known issue and pretty hard to fix. We need to reshape text at line breaks in a few situations. Doing it in all situations would be a big performance hit. We could hack it to avoid making ligatures where soft hyphens exist, but that seems bad too.
Basically I think the next time we make a major redesign of inline layout we should start supporting reshaping at the end of lines. Until then, I don't want to deal with it.
(In reply to comment #4)
> We could hack it to avoid making ligatures where soft hyphens
> exist, but that seems bad too.

It's far from ideal, but I think it would be an improvement over the current situation. Better to miss some ligatures than to draw like this.

(When we support automatic hyphenation this will become a much bigger issue, though, as it won't be limited to the relatively rare cases where the document contains explicit soft hyphens. We'll need a better solution then.)
One way that might be easy is to have nsTextFrameUtils turn a soft hyphen into a ZWNJ (if that's the right character) instead of discarding it.
Or into Word Joiner (U+2060), the modern replacement for ZWNBSP. We'd need to check that there aren't any ill effects if the character we pick isn't supported in the current font, though.

If we're using fallbacks for the word that includes the soft-hyphen, because the current font group and the preference fonts don't support the characters involved, then I suspect the ZWNJ or WJ could end up affecting the font chosen for the following characters, because fallback first tries the same font as was used for the previous character.

See http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/src/gfxFont.cpp#1137
We could have the word cache explicitly handle Word Joiner as a word boundary, effectively replacing it with a space before shaping; that should prevent any unwanted side effects on font selection.
Actually, it doesn't prevent side-effects, it just means the particular side-effects may be different.

Scenario: text contains "A­B". Current font group doesn't support either A or B, nor do the default fonts in Preferences. So we'll be searching for a fallback. The system offers potential candidate fonts X and Y, in that order. Font X supports B but not A; font Y supports both.

If the text were just "AB", we'd pick font Y for character A, and then because the fallback code likes to reuse the same font, we'll pick font Y for B as well. (Which is a good thing, usually.) However, if the text is "A­B", and we replace ­ with space before creating the glyph runs, then the fallback code will no longer favour font Y for the B, it will pick font X instead.

(At least that's how I think it would currently work.)
FWIW -- a recent article on AListApart (http://www.alistapart.com/articles/the-look-that-says-book/) talks about using JS libraries to automatically insert ­ everywhere that's appropriate.  The first example in that article,
http://readableweb.com/ala/booklook/booksampledesktop.htm, shows this problem too on Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100824 Firefox/3.6.9 using the font Constantia.  I can't edit the Platform or Status fields of the bug as appropriate, though.
With automatic hyphenation using ‘-moz-hyphens’ or the Hyphenator JS library, this crops up every now and then.  Is it still not feasible to fix the problem?  It’s not critical, obviously, but it would be nice.
While hyphenation is the most troubling example of this problem - because it's liable to occur within perfectly "normal" text, if a word such as "stuffing" gets hyphenated - it also affects text where inter-letter word breaks (without hyphens) are allowed by word-wrap:break-word or word-break:break-all.

Simple examples that exhibit the issue, assuming a default serif font with the "fi" ligature:

  data:text/html,<div style="width:0px; -moz-hyphens:auto" lang="en">stuffing

  data:text/html,<div style="width:0px; word-wrap:break-word">stuffing

  data:text/html;charset=utf-8,<div style="width:0px; word-wrap:break-word">لام السلام

Once bug 249159 lands, the latter two will also show the problem using word-break:break-all.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I suppose the last example is Arabic? Is it not less or at least a different issue in this case because the standard says: "When shaping scripts such as Arabic are allowed to break within words due to ‘break-all’, the characters must still be shaped as if the word were not broken."

So the problem in this case is just that parts of the ligature are at the wrong place, while in e.g. Latin on the other hand, the fact that a ligature is used is an issue in itself. Am I correct?
I think it's essentially the same issue, although fixing the Arabic case will add a little extra complexity, as shaping the text properly on each side of the line-break requires knowledge of the context on the other side of the break.

The only reasonable interpretation, IMO, of that statement in the spec is that when a line is broken between adjacent Arabic letters, they should still take on their joining forms (i.e. the letter before the break will be an initial or medial form, and the one after it will be a medial or final form). I don't think it should be taken to mean that the break does not affect shaping *at all*. If a breakpoint is chosen that happens to fall within a ligature (noting that some complex Arabic-script fonts may ligate entire sequences of half a dozen or more letters), the appropriate action is to decompose the ligature (but maintain joining forms at the break - that's the key point the standard is addressing) and re-shape each part of the word, perhaps finding different ligature forms for the two sections.

(Arguably, there should perhaps be a rule somewhere prohibiting a break between lam and alef, because that particular ligature is accorded a special "mandatory" status in Arabic, while others are regarded as stylistic options. But that would be an exceptional case; the general rule would still be that a line-break within a ligature forces it to decompose into its components, but joining forms are used across the breakpoint.)

The trouble is that we do text-shaping before line-breaking (which is necessary, in order to measure the shaped text), but then we don't adjust shaping in the light of chosen breakpoints. This is the root of the problems in both the Latin and Arabic examples. Getting this right is tricky, and potentially expensive.
FWIW, WebKit (and Blink) seem to get this right (when ligatures are enabled, of course).
(In reply to Jonathan Kew (:jfkthame) from comment #6)
> (In reply to comment #4)
> > We could hack it to avoid making ligatures where soft hyphens
> > exist, but that seems bad too.
> 
> It's far from ideal, but I think it would be an improvement over the current
> situation. Better to miss some ligatures than to draw like this.

I disagree. I've specifically chosen a font for my site that has subtle ligatures for the Latin script. This hyphenation problem occurs rarely, but turning off ligatures altogether when hyphenation is enabled would regress the rendering of many words on every page.

(Still, I'd prefer to have ligatures in general and still not have ligatures across hyphens, of course.)
Summary: A soft hyphen within a possible ligature (eg 'f&shy;f') causes a mess → A soft or auto hyphen within a possible ligature (eg 'f&shy;f') paints a split ligature glyph
Blocks: 988799
> We need to reshape text at line breaks in a few situations. 
> Doing it in all situations would be a big performance hit.

I'm a little ignorant as to how the font shaping works, sorry if these are incredibly dumb ideas. When the font is initially shaped, would there be any indication that a ligature replacement has been used?

Alternatively, would there be some way to access the liga data in the font, so that you could see which character combinations might need to be reshaped?

If either of those is possible, seems like that would give you the info to only do a reshape in those specific situations.
I started adding a flag for this to HarfBuzz, but never finished.  Let me dig it out and reply...  Here:

  https://gist.github.com/behdad/149ae8947c11afddc560

It doesn't merge right now.  Need to rebase it.

Main problem actually is the semantics.  My plan was to use a bit of the
info->mask to indicate something.  How to define that something is tricky.

To start, it's always considered unsafe to break within a HarfBuzz cluster.
So, that in fact is already enough information to fix this particular Firefox bug,
since the "ff" ligature will become one cluster.

It's more complicated for contextual lookups.  For those, some of the cluster
boundaries are also unsafe.  So, the bit we add will mean "it's safe to break
text at the start of this cluster".

The problem was I couldn't easily propagate that.  Reading the code again, I
think I know how to do it now.  Let me cook a branch.

Jonathan, would you have cycles to work on the Firefox side?
Blocks: 1280786
I'm still seeing this with discretionary hyphens such as "ct", where you want not to have a ligature if it's a hyphenation point, and also don't want to lose the stylish snazzy ligature if it's not hyphenated there.

Commenting because the increased use of web fonts and the addition of hyphenation support has made this more common.

I thought I had found a new bug (https://twitter.com/LordPachelbel/status/1113475005600366592) but now I've learned it was first reported 10 years ago. As Liam said above, this issue is going to happen more often now that web fonts with ligatures are widely used.

Here's the "af" ligature at a different font size where it isn't being split and hyphenated.

The web font is called Premiera Book.

The URL I saw this on is http://clagnut.com/blog/2395. In this case the source code doesn't have &shy; in the word:

<html lang="en">
...
<p>I presented some golden rules for typography on the web, and in the Q&amp;A afterwards
I was asked about the current state of automatic hyphenation on the web. This was an apt
question considering that German is well known for its long words – noun compounds feature
commonly (for example <dfn lang="de">Verbesserungsvorschlag</dfn> meaning <i>suggestion for
improvement</i>) – so hyphenation is extensively used in most written media.</p>

Which means the split ligature is happening because of auto-hyphen styles:

font-size: 1.3125rem;
line-height: 1.428571429em;
font-family: "Premiera", "Cambria", "Roboto Slab", "Georgia", "Times New Roman", serif, ".PhoneFallback", "Arial Unicode MS";
font-feature-settings: normal;
font-variant-ligatures: common-ligatures;
font-kerning: normal;
font-variant-numeric: oldstyle-nums proportional-nums;

-webkit-hyphens: auto;
-webkit-hyphenate-limit-before: 4;
-webkit-hyphenate-limit-after: 3;
-webkit-hyphenate-limit-chars: 7 4 3;
-webkit-hyphenate-limit-lines: 2;	
-webkit-hyphenate-limit-zone: 8%;
-webkit-hyphenate-limit-last: always;

-moz-hyphens: auto;
-moz-hyphenate-limit-chars: 7 4 3;
-moz-hyphenate-limit-lines: 2;	
-moz-hyphenate-limit-zone: 8%;
-moz-hyphenate-limit-last: always;

-ms-hyphens: auto;
-ms-hyphenate-limit-chars: 7 4 3;
-ms-hyphenate-limit-lines: 2;	
-ms-hyphenate-limit-zone: 8%;
-ms-hyphenate-limit-last: always;

hyphens: auto;
hyphenate-limit-chars: 7 4 3;
hyphenate-limit-lines: 2;	
hyphenate-limit-zone: 8%;
hyphenate-limit-last: always;
OS: Linux → All

I just ran into this in Firefox v69 on Mac (at https://www.gwern.net/Culture-is-not-about-Esthetics#the-experimental-results). "often" was split into "of- <linebreak> ten", but there's a blob above the t. It looks like the ft ligature was rendered and then split in two. Chrome handles it correctly.

It not only affects soft hyphens but also normal hyphenation in regular text. I've seen this a lot on websites everywhere. I'm wondering why this could even happen. A ligature must only be printed if both characters appear directly next to each other, not even with custom letter spacing, and certainly not across lines. Are there any plans to fix this?

Severity: minor → S4

The severity field for this bug is relatively low, S4. However, the bug has 7 duplicates and 14 votes.
:jfkthame, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jfkthame)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(jfkthame)

I was just about to report a similar issue and saw this bug. I see the same problem even without any hyphens. Is this a different bug, or does the title just need updating here?

Flags: needinfo?(jfkthame)

Yes, the same issue can sometimes occur without any hyphens, if a line break is allowed within a character sequence that forms a ligature (e.g. as a result of overflow-wrap:anywhere).

So on macOS, for example,

data:text/html,<div style="font:40px times; overflow-wrap:anywhere; width: 1px">office

will show two halves of the "fi" ligature on separate lines, which looks a bit odd; the line-break should cause the ligature to be undone.

Hyphen breaks are the most common example where this occurs, but the bare line-wrap example is the same issue.

Flags: needinfo?(jfkthame)

Wanted to report a similar bug, found that it is already here. A reduced case that I did encounter in practice: https://codepen.io/kizu/pen/XWGXYyp (probably only on macOS due to the font I used for this specific example).

Given Safari and Chrome handle this properly, it would be nice to fix this for interoperability.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: