Closed Bug 801410 Opened 7 years ago Closed 7 years ago
avoid inserting dotted-circle for run-initial Unicode combining characters in "simple" scripts such as Latin
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/19.0 Firefox/19.0 Build ID: 20121014030627 Steps to reproduce: I viewed some unicode strikethrough text. Actual results: It displayed as dashed circles between the letters Expected results: It should have displayed a strikethrough text, like Firefox used to do.
Can you please attach a testcase or post the URL ?
Component: Untriaged → Layout: Text
Product: Firefox → Core
On OS X, I see some of the "strikethrough" on that page working, and some displayed "broken" as described in comment #0. The problem arises if the font being used for the real text doesn't support the character U+0336 (COMBINING LONG STROKE OVERLAY) that is being used to produce the "strikethrough" effect. In this case, Firefox will fall back to a different font for that character, and rendering a base character and combining overlay with different fonts won't work well. This probably appeared due to bug 789687, which IIRC was the harfbuzz update that introduced the dotted circle when rendering a "defective" combining character sequence (i.e. a combining character for which no valid base is available), as a font-change between the base and its combining mark will result in this. But in any case, the whole approach is fragile and font-dependent; many fonts don't have the "combining stroke overlay" character, and so there are no real guarantees as to how this will look. It's simply not a good idea unless you can ensure the text will always be displayed using fonts that actually support the combination.
For comparison (and to illustrate how unreliable this technique is), here's a screenshot from Safari on OS X - note how "ragged" the strikethrough looks.
So basically, the "gimmick" being used on that site is flawed, and the results are unlikely to look consistently good unless you can control the font being used to ensure it supports the characters in question, and renders them as desired. Which defeats the whole point. Bug 543200 would go some way to mitigate this, by trying to ensure that a base character and associated combining mark(s) are rendered using the same font rather than allowing font (fallback) changes within the cluster. But there'll always be a risk that mixing "unusual" Unicode characters into the text, without ensuring an appropriate font is selected, may result in "ransom-note" effects as font fallback kicks in.
Bug is not just limited to strikethroughs, It wreaks havoc on this wikipedia page on phonetics: https://en.wikipedia.org/wiki/Americanist_phonetic_notation
This sounds like an example that would concern me much more seriously. Could you post a screenshot illustrating the "havoc" you're seeing? What font(s) are being used on your system? The fontinfo addon can show you the fonts actually being used - which may not be the fonts specifically mentioned in CSS, if fallback is occurring - for a selected range of text.  https://addons.mozilla.org/en-US/firefox/addon/fontinfo/
The Fontinfo app goes like this on that page: Font face CSS Family Source DejaVu Sans DejaVu Sans Local Frutiger Frutiger Local Frutiger Frutiger Local Frutiger Frutiger Local Nimbus Sans L Bold Nimbus Sans L Local —————— Here is how the page appears while the default font in the FX Preferences is Frutiger: http://ubuntuone.com/2k6EZiao5KcPfT45hgi4tE (While I am sure being constantly compared to Chrom(ium) annoys you), this is how it looks in Chromium: http://ubuntuone.com/30tL3eLImU5XcH9EKG6mhZ Back to Firefox, under Open Sans: http://ubuntuone.com/7jEgl8UhIHPiFqzr8ndFZB Arial: http://ubuntuone.com/37KqTIlcGQuIF2wYxAThA7 Segoe: http://ubuntuone.com/79dr3l6wCbAVBv4mwyq1il Under DejaVu Sans, the problem is less severe, but still there: http://ubuntuone.com/7ha0xPofUGHg1CenZPhRXv ——————— It continues to display like that while Firefox is in Safe Mode, and after logging out of Wikipedia and clearing the cache.
OK - most of those fonts don't include the combining-mark characters, so fallback kicks in (you can confirm this by selecting one of the "problem" cells and choosing Show Fonts in Selection from the right-click context menu - you'll see that it is using a mixture of fonts). The diacritic is thus separated from its base, and the dotted circle appears. We should try to do something about that for examples like this. As DejaVu *does* support the combining marks, the result is far better. The problem with the uvular voiceless affricate (and a few other similar ones that I found) is actually a case of poor authoring. The source there has <td>qˣ<sup>̣</sup></td> (With non-ASCII chars converted to entities, to avoid potential difficulty seeing the characters.) Note how the dot-below diacritic has been placed in a <sup> element all by itself. The chances that this will actually render as intended are pretty small. In your Chromium screenshot, for example, the dot seems to have vanished; I'm not sure if it didn't render at all, or if it has overprinted the superscripted 'x'. Hardly a useful result. The author would have been better-advised to use something like <td>q<sup>x̣</sup></td> Even though <sup>x</sup> is unlikely to exactly match the Unicode superscripted x (U+02E3), this gives at least some chance that the dot-below diacritic will appear with its intended base. (While we're doing comparisons, what happened to the voiceless bilabial fricative in Chromium? At least Firefox's font fallback tries to produce *some* visual representation of the data, even though I grant that the result looks terrible when fallback kicks in on the diacritics.) Behdad: I'm thinking that in view of examples like this, at least until bug 543200 is addressed, perhaps we should refrain from dotted-circle insertion in Latin and similar "simple/generic" scripts, and only do it in cases of complex shaping (Indic, etc), where it is specifically desired in order to make encoding errors visible, and for display of combining-vowel characters in isolation. WDYT?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Unicode Combining Characters display incorrectly → avoid inserting dotted-circle for run-initial Unicode combining characters in "simple" scripts such as Latin
Just removing the call to hb_insert_dotted_circle from hb_ot_shape_internal should eliminate these dotted-circle insertions; the indic shaper will still insert them for ill-formed indic syllables, however. Does this seem a reasonable compromise for now, Behdad? Should we consider exposing the dotted-circle behavior as something the harfbuzz client can control?
Ah. I was under the Impression that Firefox chooses font per cluster. We hit this problem in gnome and that's exactly why I added the textual context stuff to master. See: https://bugzilla.redhat.com/show_bug.cgi?id=858736 If you don't have the textual context around to pass in, you can simply add a call before adding text to the buffer: hb_buffer_add_utf8 (buffer, " ", 1, 1, 0): This will add a space char as precontext and hence the dottedcircle insertion will be skipped.
(In reply to Behdad Esfahbod from comment #12) > Ah. I was under the Impression that Firefox chooses font per cluster. We > hit this problem in gnome and that's exactly why I added the textual context > stuff to master. See: https://bugzilla.redhat.com/show_bug.cgi?id=858736 > > If you don't have the textual context around to pass in, you can simply add > a call before adding text to the buffer: > > hb_buffer_add_utf8 (buffer, " ", 1, 1, 0): > > This will add a space char as precontext and hence the dottedcircle > insertion will be skipped. Seems like a good idea, but it doesn't quite work in practice. The trouble is that hb_buffer_add_utf always clears the pre-context if the current buffer length is zero, so the space we've so carefully primed it with gets thrown away when we subsequently add the real text of the word we want to shape. I think hb_buffer_add_utf should only clear the pre-context if there's actually some pre-context being provided; otherwise it should leave it untouched. Suggested patch attached.
With the preceding harfbuzz patch, we can then use Behdad's suggested technique of providing space as pre-context for shaping the word. It's possible that future harfbuzz API enhancements may change how we do this - e.g. if we add an options word with bits to control dotted-circle behavior, etc - but at least for the current version we should do this to avoid regressing pages such as the Wikipedia example.
Attachment #677103 - Flags: review?(jdaggett)
Comment on attachment 677100 [details] [diff] [review] [harfbuzz] don't clear pre-context from the buffer if no new context is being provided Review of attachment 677100 [details] [diff] [review]: ----------------------------------------------------------------- LGTM. Committed upstream after adding comments.
Attachment #677100 - Flags: review?(mozilla) → review+
jdaggett, review ping?
Updating to the latest harfbuzz code from upstream will resolve this problem, as the handling of dotted-circle insertion has been modified - it will no longer be added before an initial combining mark unless the Beginning-Of-Text flag has been set by the caller. So this will supersede the earlier patches here.
Updated to pick up recent fixes and improvements, esp. for Sinhala, Thai. (We'll likely release 0.9.7 at the end of the work-week, in which case I'll refresh again to take that version, but assume review can be carried forward as there will be only minor cleanup.)
Comment on attachment 682702 [details] [diff] [review] update harfbuzz to release 0.9.6 plus recent commits. Would be good to get this into FF19, as it includes the "real" fix the issue here (superseding the small patches above), as well as other issues such as bug 799869 and bug 810823.
Attachment #682702 - Flags: review?(joe)
Try build & reftest run: https://tbpl.mozilla.org/?tree=Try&rev=1230e88d9681
Comment on attachment 682702 [details] [diff] [review] update harfbuzz to release 0.9.6 plus recent commits. Joe, John: if either one of you has time to r+ this, it'd be great to get it landed before the train leaves....
Attachment #682702 - Flags: review?(jdaggett)
Comment on attachment 682702 [details] [diff] [review] update harfbuzz to release 0.9.6 plus recent commits. Review of attachment 682702 [details] [diff] [review]: ----------------------------------------------------------------- If you want a rubber stamp from me, you've got it. If you want a real review, you'll have to wait for John. :)
Attachment #682702 - Flags: review?(joe) → review+
I'll take either one, or both. :) https://hg.mozilla.org/mozilla-central/rev/be3a0b4edebd
Assignee: nobody → jfkthame
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Version: 19 Branch → Trunk
Comment on attachment 677103 [details] [diff] [review] prime the harfbuzz buffer with a space as pre-context to suppress run-initial dotted circle insertion No longer needed with hb 0.9.6, as the dotted circle won't be inserted unless the start-of-text flag has explicitly been set.
Attachment #677103 - Attachment is obsolete: true
Attachment #677100 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.