Closed Bug 1836024 Opened 1 year ago Closed 1 year ago

Characters from other blocks masked for various orthographies

Tracking

()

Status:

RESOLVED FIXED

Milestone:

115 Branch

Tracking Flags:

Tracking

Status

firefox115

---

fixed

People

(Reporter: ishida, Assigned: jfkthame)

Details

Attachments

(1 file)

Bug 1836024 - Don't mask Devanagari DANDA characters from the font cmap when shaping support is absent, as they may be used by other scripts. r=#layout-reviewers 1 year ago Jonathan Kew [:jfkthame] 48 bytes, text/x-phabricator-request		Details \| Review

Richard Ishida

Reporter

Description

•

1 year ago

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/112.0

Steps to reproduce:

@jfkthame asked me to raise this bug as a follow-on from https://bugzilla.mozilla.org/show_bug.cgi?id=1834316#c8

Various orthographies use characters from other blocks, and even though these characters have glyphs in a font that covers that orthography, the browser falls back to a system font for their display.

Actual results:

https://r12a.github.io/app-charuse/index.html?charlist=%E0%A5%A4%E0%A5%A5%20%E2%B9%81%E2%81%8F%D8%9F%EF%B4%BE%EF%B4%BF shows a list of examples of characters that are borrowed from another script and some of the languages they are used with (many of which use different scripts).

This is currently affecting a large number of users.

Expected results:

Please allow the browser to show this kind of character using the font selected by the user where that font has glyphs for the character.

Jonathan Kew [:jfkthame]

Assignee

Comment 1

•

1 year ago

The specific characters listed in https://r12a.github.io/app-charuse/index.html?charlist=%E0%A5%A4%E0%A5%A5%20%E2%B9%81%E2%81%8F%D8%9F%EF%B4%BE%EF%B4%BF are:

।  U+0964 DEVANAGARI DANDA
॥  U+0965 DEVANAGARI DOUBLE DANDA
⹁  U+2E41 REVERSED COMMA
⁏  U+204F REVERSED SEMICOLON
؟  U+061F ARABIC QUESTION MARK
﴾  U+FD3E ORNATE LEFT PARENTHESIS
﴿  U+FD3F ORNATE RIGHT PARENTHESIS

The ARABIC QUESTION MARK was addressed by bug 1834316. Of the others listed here, only the DEVANAGARI DANDA and DOUBLE DANDA would be an issue, as the others belong to "neutral" punctuation or presentation-forms blocks where we do not impose any expectations of shaping support.

(The Arabic-script counterparts of U+2E41 and U+204F, i.e. U+060C and U+061B, may also be a concern; I'd expect these to be potentially used in some other RTL languages. But in any case, those were addressed along with the question mark in bug 1834316, so should no longer be an issue.)

So in short, all we need to do here is deal with the Devanagari dandas.

Assignee: nobody → jfkthame

Severity: -- → S3

Status: UNCONFIRMED → NEW

Ever confirmed: true

Jonathan Kew [:jfkthame]

Assignee

Comment 2

•

1 year ago

Attached file Bug 1836024 - Don't mask Devanagari DANDA characters from the font cmap when shaping support is absent, as they may be used by other scripts. r=#layout-reviewers — Details

Richard Ishida

Reporter

Comment 3

•

1 year ago

Fwiw, the characters mentioned above were just a few examples of characters taken from another block that i came across in the handful of orthographies that i checked. There may be more.

Are all Arabic block characters now unproblematic? For example, others that are shared include the tatweel U+0640 used in Adlam, Syriac, etc, the full stop used in Hanifi Rohingya, etc. There's also ALM, but maybe that's irrelevant because it's invisible anyway.

What about other blocks, such as Buginese, which uses a character from the Javanese block?

Jonathan Kew [:jfkthame]

Assignee

Comment 4

•

1 year ago

(In reply to Richard Ishida from comment #3)

Fwiw, the characters mentioned above were just a few examples of characters taken from another block that i came across in the handful of orthographies that i checked. There may be more.

Are all Arabic block characters now unproblematic? For example, others that are shared include the tatweel U+0640 used in Adlam, Syriac, etc, the full stop used in Hanifi Rohingya, etc. There's also ALM, but maybe that's irrelevant because it's invisible anyway.

Tatweel may be an issue still; I wonder how script-run itemization handles that. Does shaping work correctly with tatweel in Adlam and Syriac, when using a webfont that aims to support it?

What about other blocks, such as Buginese, which uses a character from the Javanese block?

That shouldn't be a problem, as we don't have any specific checks for Buginese or Javanese (or other more recently-added scripts handled by the Universal Shaping Engine).

In general, the underlying issue that was being addressed by masking out complex-script blocks if the relevant shaping tables were absent related to older fonts (like some legacy CJK fonts, if I recall correctly) that were filled with nominal-form glyphs for a bunch of the major scripts present in the earliest Unicode versions, but lacked any shaping support. I don't think that's generally a problem for more recently-encoded scripts.

Pulsebot

Comment 5

•

1 year ago

Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d58641e1ccd1
Don't mask Devanagari DANDA characters from the font cmap when shaping support is absent, as they may be used by other scripts. r=emilio

Cristina Horotan [:chorotan]

Comment 6

•

1 year ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d58641e1ccd1

Status: NEW → RESOLVED

Closed: 1 year ago

status-firefox115: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 115 Branch

Richard Ishida

Reporter

Comment 7

•

1 year ago

Does shaping work correctly with tatweel in Adlam and Syriac, when using a webfont that aims to support it?

You can test it at https://r12a.github.io/pickers/adlm/index.html?text=%F0%9E%A4%91%F0%9E%A4%B5%F0%9E%A5%85%F0%9E%A4%A4%D9%80%D9%80%D9%80%D9%80%F0%9E%A4%A2%F0%9E%A4%A4

Use the "Current font" pulldown to test with various webfonts and system fonts.

Places tatweel along the baseline when using a webfont (Noto Sans Adlam, or Noto Sans Adlam Unjoined), but when using the local Unjoined font in Firefox the tatweel is above the baseline. I imagine that the same is true of the Noto Sans Adlam, but i just get garbage when i try to display Adlam text with the local font (even though i removed and reinstalled the font). That means i can't check whether it joins appropriately on either side.

The inspector tells me that the browser fell back to another font to display the tatweel.

In Chrome, all works fine.

I get the same results (including the garbage) when trying to use an arabic full stop in Hanifi Rohingya, see https://r12a.github.io/pickers/rohg/index.html?text=%F0%90%B4%80%F0%90%B4%9E%F0%90%B4%95%F0%90%B4%90%F0%90%B4%9D%F0%90%B4%A6%F0%90%B4%95%20%DB%94

Ardelean Oana, Desktop QA [:oardelean]

Updated

•

1 year ago

Flags: qe-verify+

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Characters from other blocks masked for various orthographies

Categories

(Core :: Layout: Text and Fonts, defect)

Tracking

()

People

(Reporter: ishida, Assigned: jfkthame)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Attachment

General

Description

File Name

Content Type