Closed Bug 1309934 Opened 4 years ago Closed 5 months ago

Apply NFC normalization in preference to falling back to a different font for combining marks [was: U+0303 COMBINING TILDE character has a broken rendering with "Source Sans Pro" font]

Categories

(Core :: Layout: Text and Fonts, defect, P3)

49 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla76
Tracking Status
firefox76 --- verified

People

(Reporter: jdpc557, Assigned: jfkthame)

References

(Blocks 1 open bug, )

Details

(Keywords: testcase)

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0
Build ID: 20160922113459

Steps to reproduce:

My name is "João Costa", which is the same name present on my "git config user.name"

On my git config, my name is encoded as "4a 6f 61 cc 83 6f 20 43 6f 73 74 61 0a"
(Note that "ã" is being encoded in two characters: "LATIN SMALL LETTER A" and "COMBINING TILDE" instead of "LATIN SMALL LETTER A WITH TILDE")

If I open travis or gitbucket, my name will be rendered incorrectly. Other browsers render the text just fine.

Tested on Firefox 49.0.1 (MacOS X 10.11.6)


Actual results:

The name is rendered incorrectly, see: https://i.imgur.com/0yHav3g.png


Expected results:

The name should be rendered in the same way as with "LATIN SMALL LETTER A WITH TILDE", see: https://i.imgur.com/Dc7udpI.png
Can you provide an example HTML page or the URL?
The rendering depends on the font, and maybe the actual encoding on the page (may be converted on server side).
Flags: needinfo?(jdpc557)
Component: Untriaged → Layout: Text
Product: Firefox → Core
(In reply to Tooru Fujisawa [:arai] from comment #1)
> Can you provide an example HTML page or the URL?
> The rendering depends on the font, and maybe the actual encoding on the page
> (may be converted on server side).

The URL is https://travis-ci.org/ShiftForward/apso/builds/167409170

The font used is "Source Sans Pro"

I'm not sure if pasting the HTML will help, because I don't know if the UTF-8 encoding will change.
Thanks.
confirmed the issue on Firefox Nightly 52.0a1 (2016-10-12) (64-bit) on OSX 10.11.6.
and the text is not changed, it's U+0061 U+0303 sequence.
(sorry I meant normalization forms, not encoding)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jdpc557)
Summary: U+0303 COMBINING TILDE character has a broken rendering → U+0303 COMBINING TILDE character has a broken rendering with "Source Sans Pro" font
The problem occurs because the site is using a webfont for Source Sans Pro that does not support the combining tilde character. Checking the CSS provided (from https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,600), we see a bunch of resources for different Unicode ranges, but none of them include the combining diacritics in the U+03xx range (the tilde is U+0303).

Therefore, while the base characters "Joao" are rendered with Source Sans Pro, as styled, the combining tilde falls back to a different font (what font you get may depend on your system/configuration). And positioning of diacritics will generally not work well across font boundaries.

It displays OK in Chrome, I expect, because they apply NFC normalization to the text prior to rendering, which replaces the sequence <a, combining tilde> with the single character U+00E3 'ã', which IS present in the Source Sans Pro font.

The simplest solution is to use the precomposed LATIN SMALL LETTER A WITH TILDE instead of the decomposed representation, as this is much more widely supported; relatively few fonts have good support for Unicode combining marks. (And in general, normalization form NFC is the recommended form for text on the web. See http://www.w3.org/International/questions/qa-html-css-normalization.)
Chrome shapes with HarfBuzz first then does font fallback based on the shaping result (at least in the “complex path” not sure if they switched Latin to it yet), so I think the composition is done by HarfBuzz here.

I think it might be worthwhile to do the same at some point.
Priority: -- → P3
Where the "current" font (i.e. the font chosen by the font-matching algorithm for the base character) doesn't support the combining mark(s) that follow(s), it would be better to try NFC normalization before falling back to a different font. That would fix the example here, and we get a steady trickle of reports of such cases in various languages and fonts.
Summary: U+0303 COMBINING TILDE character has a broken rendering with "Source Sans Pro" font → Apply NFC normalization in preference to falling back to a different font for combining marks [was: U+0303 COMBINING TILDE character has a broken rendering with "Source Sans Pro" font]
Duplicate of this bug: 1343737
Duplicate of this bug: 1395339
Duplicate of this bug: 1408732

This is still a problem with Firefox 66.0.5.

HTML source like this: za\u0301kon

is rendered as: zá kon

instead of the correct: zákon

Other browsers (Safari, Chrome, Opera) are displaying the above HTML just fine.

Any plans to implement NFC normalization?

Blocks: 1551809

(In reply to Jonathan Kew (:jfkthame) from comment #6)

we get a steady trickle of reports of such cases in various languages and fonts.

Looks like this is now the chosen solution. Should the following be duped to this?
Bug 1128330
Bug 1162921
Bug 1544488

Flags: needinfo?(jfkthame)

As various of the reported examples from "real" sites are no longer valid, as the sites involved have since changed, I have made a small codepen example that illustrates the kind of issues that we're looking at here: https://codepen.io/jfkthame/pen/VwLGoNY

So I think the more complete solution to this kind of issue would be bug 543200, which is more comprehensive than just using NFC. But it's also quite non-trivial and probably not getting resolved in the immediate future (it's been on file for 10 years now!).

As a near-term mitigation, though, I think we can do a simple patch here that goes part of the way, and will address the great majority of the cases that we actually see in the wild, without the complexity of a full bug 543200 implementation. The simple approach is just to check for a composable pair when we encounter a combining mark that is not supported by the current font, and if so (and the font supports the precomposed character), leave the font unchanged even though the mark itself is unsupported; internal normalization within harfbuzz will then handle the pair.

AFAICS that would solve the issue for the bugs mentioned above, so yes, they could be duped to this.

Flags: needinfo?(jfkthame)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Duplicate of this bug: 1128330
Duplicate of this bug: 1162921
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6bd53239007f
Apply NFC normalization in preference to falling back to a different font for combining marks. r=lsalzman
Duplicate of this bug: 1544488
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
Flags: qe-verify+

Reproduced the issue on macOS 10.15 with this test case https://codepen.io/jfkthame/pen/VwLGoNY, using an affected Nightly build from 2020-03-02.

The issue is verified as fixed on Beta 76.0b4, across platforms: Win 10 x64, macOS 10.15 and Ubuntu 18.04 x64.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.