Open Bug 479829 Opened 12 years ago Updated 6 months ago
A soft or auto hyphen within a possible ligature (eg 'f­f') paints a split ligature glyph
72.24 KB, image/png
77.24 KB, image/png
812 bytes, text/html
41.11 KB, image/png
41.36 KB, image/png
User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr; rv:220.127.116.11) Gecko/2009020409 Iceweasel/3.0.6 (Debian-3.0.6-1) Build Identifier: Mozilla/5.0 (X11; U; Linux i686; fr; rv:18.104.22.168) Gecko/2009020409 Iceweasel/3.0.6 (Debian-3.0.6-1) Soft hyphens and ligatures mix badly at a line break. See the screenshots. There are two problems. Let's take the 'ff' ligature for example: 1) The part of the first 'f' which is usually above the second 'f' is on the wrong line. 2) There is too much space after the hyphen. Reproducible: Always Steps to Reproduce: Wrap "a_long_word_f­f_blah_blah" between the two 'f'. To reproduce the bug, you need a font where the two 'f' are overlapping -- and this is not always the case. Take 'Minion Pro' or 'Linux Libertine'. Additionally, you may note the lack of space between "s’" and "échauffait" for Minion. But this is another problem.
Component: General → Layout: Text
Product: Firefox → Core
QA Contact: general → layout.fonts-and-text
This is a known issue and pretty hard to fix. We need to reshape text at line breaks in a few situations. Doing it in all situations would be a big performance hit. We could hack it to avoid making ligatures where soft hyphens exist, but that seems bad too.
Basically I think the next time we make a major redesign of inline layout we should start supporting reshaping at the end of lines. Until then, I don't want to deal with it.
(In reply to comment #4) > We could hack it to avoid making ligatures where soft hyphens > exist, but that seems bad too. It's far from ideal, but I think it would be an improvement over the current situation. Better to miss some ligatures than to draw like this. (When we support automatic hyphenation this will become a much bigger issue, though, as it won't be limited to the relatively rare cases where the document contains explicit soft hyphens. We'll need a better solution then.)
One way that might be easy is to have nsTextFrameUtils turn a soft hyphen into a ZWNJ (if that's the right character) instead of discarding it.
Or into Word Joiner (U+2060), the modern replacement for ZWNBSP. We'd need to check that there aren't any ill effects if the character we pick isn't supported in the current font, though. If we're using fallbacks for the word that includes the soft-hyphen, because the current font group and the preference fonts don't support the characters involved, then I suspect the ZWNJ or WJ could end up affecting the font chosen for the following characters, because fallback first tries the same font as was used for the previous character. See http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/src/gfxFont.cpp#1137
We could have the word cache explicitly handle Word Joiner as a word boundary, effectively replacing it with a space before shaping; that should prevent any unwanted side effects on font selection.
Actually, it doesn't prevent side-effects, it just means the particular side-effects may be different. Scenario: text contains "A­B". Current font group doesn't support either A or B, nor do the default fonts in Preferences. So we'll be searching for a fallback. The system offers potential candidate fonts X and Y, in that order. Font X supports B but not A; font Y supports both. If the text were just "AB", we'd pick font Y for character A, and then because the fallback code likes to reuse the same font, we'll pick font Y for B as well. (Which is a good thing, usually.) However, if the text is "A­B", and we replace ­ with space before creating the glyph runs, then the fallback code will no longer favour font Y for the B, it will pick font X instead. (At least that's how I think it would currently work.)
FWIW -- a recent article on AListApart (http://www.alistapart.com/articles/the-look-that-says-book/) talks about using JS libraries to automatically insert ­ everywhere that's appropriate. The first example in that article, http://readableweb.com/ala/booklook/booksampledesktop.htm, shows this problem too on Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:22.214.171.124) Gecko/20100824 Firefox/3.6.9 using the font Constantia. I can't edit the Platform or Status fields of the bug as appropriate, though.
With automatic hyphenation using ‘-moz-hyphens’ or the Hyphenator JS library, this crops up every now and then. Is it still not feasible to fix the problem? It’s not critical, obviously, but it would be nice.
While hyphenation is the most troubling example of this problem - because it's liable to occur within perfectly "normal" text, if a word such as "stuffing" gets hyphenated - it also affects text where inter-letter word breaks (without hyphens) are allowed by word-wrap:break-word or word-break:break-all. Simple examples that exhibit the issue, assuming a default serif font with the "fi" ligature: data:text/html,<div style="width:0px; -moz-hyphens:auto" lang="en">stuffing data:text/html,<div style="width:0px; word-wrap:break-word">stuffing data:text/html;charset=utf-8,<div style="width:0px; word-wrap:break-word">لام السلام Once bug 249159 lands, the latter two will also show the problem using word-break:break-all.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I suppose the last example is Arabic? Is it not less or at least a different issue in this case because the standard says: "When shaping scripts such as Arabic are allowed to break within words due to ‘break-all’, the characters must still be shaped as if the word were not broken." So the problem in this case is just that parts of the ligature are at the wrong place, while in e.g. Latin on the other hand, the fact that a ligature is used is an issue in itself. Am I correct?
I think it's essentially the same issue, although fixing the Arabic case will add a little extra complexity, as shaping the text properly on each side of the line-break requires knowledge of the context on the other side of the break. The only reasonable interpretation, IMO, of that statement in the spec is that when a line is broken between adjacent Arabic letters, they should still take on their joining forms (i.e. the letter before the break will be an initial or medial form, and the one after it will be a medial or final form). I don't think it should be taken to mean that the break does not affect shaping *at all*. If a breakpoint is chosen that happens to fall within a ligature (noting that some complex Arabic-script fonts may ligate entire sequences of half a dozen or more letters), the appropriate action is to decompose the ligature (but maintain joining forms at the break - that's the key point the standard is addressing) and re-shape each part of the word, perhaps finding different ligature forms for the two sections. (Arguably, there should perhaps be a rule somewhere prohibiting a break between lam and alef, because that particular ligature is accorded a special "mandatory" status in Arabic, while others are regarded as stylistic options. But that would be an exceptional case; the general rule would still be that a line-break within a ligature forces it to decompose into its components, but joining forms are used across the breakpoint.) The trouble is that we do text-shaping before line-breaking (which is necessary, in order to measure the shaped text), but then we don't adjust shaping in the light of chosen breakpoints. This is the root of the problems in both the Latin and Arabic examples. Getting this right is tricky, and potentially expensive.
FWIW, WebKit (and Blink) seem to get this right (when ligatures are enabled, of course).
(In reply to Jonathan Kew (:jfkthame) from comment #6) > (In reply to comment #4) > > We could hack it to avoid making ligatures where soft hyphens > > exist, but that seems bad too. > > It's far from ideal, but I think it would be an improvement over the current > situation. Better to miss some ligatures than to draw like this. I disagree. I've specifically chosen a font for my site that has subtle ligatures for the Latin script. This hyphenation problem occurs rarely, but turning off ligatures altogether when hyphenation is enabled would regress the rendering of many words on every page. (Still, I'd prefer to have ligatures in general and still not have ligatures across hyphens, of course.)
Summary: A soft hyphen within a possible ligature (eg 'f­f') causes a mess → A soft or auto hyphen within a possible ligature (eg 'f­f') paints a split ligature glyph
> We need to reshape text at line breaks in a few situations. > Doing it in all situations would be a big performance hit. I'm a little ignorant as to how the font shaping works, sorry if these are incredibly dumb ideas. When the font is initially shaped, would there be any indication that a ligature replacement has been used? Alternatively, would there be some way to access the liga data in the font, so that you could see which character combinations might need to be reshaped? If either of those is possible, seems like that would give you the info to only do a reshape in those specific situations.
I started adding a flag for this to HarfBuzz, but never finished. Let me dig it out and reply... Here: https://gist.github.com/behdad/149ae8947c11afddc560 It doesn't merge right now. Need to rebase it. Main problem actually is the semantics. My plan was to use a bit of the info->mask to indicate something. How to define that something is tricky. To start, it's always considered unsafe to break within a HarfBuzz cluster. So, that in fact is already enough information to fix this particular Firefox bug, since the "ff" ligature will become one cluster. It's more complicated for contextual lookups. For those, some of the cluster boundaries are also unsafe. So, the bit we add will mean "it's safe to break text at the start of this cluster". The problem was I couldn't easily propagate that. Reading the code again, I think I know how to do it now. Let me cook a branch. Jonathan, would you have cycles to work on the Firefox side?
I'm still seeing this with discretionary hyphens such as "ct", where you want not to have a ligature if it's a hyphenation point, and also don't want to lose the stylish snazzy ligature if it's not hyphenated there. Commenting because the increased use of web fonts and the addition of hyphenation support has made this more common.
You need to log in before you can comment on or make changes to this bug.