Closed Bug 1638478 Opened 5 years ago Closed 4 years ago

Arabic shaper not used for Arabic combining marks on whitespace

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla78

Tracking Flags:

Tracking

Status

firefox78

---

fixed

People

(Reporter: eusgf4u4pw, Assigned: jfkthame)

Details

Attachments

(5 files, 1 obsolete file)

Demonstration html file 5 years ago Bob Hallissy 1.37 KB, text/html		Details
Rendering in Firefox 78.0a1 (2020-05-15) (64-bit) 5 years ago Bob Hallissy 33.43 KB, image/png		Details
Rendering in Chrome Version 81.0.4044.138 (Official Build) (64-bit) 5 years ago Bob Hallissy 31.96 KB, image/png		Details
Rendering in Firefox 78.0a1 (2020-05-15) (64-bit) 5 years ago Bob Hallissy 34.00 KB, image/png		Details
Bug 1638478 - Try to resolve Script=Common runs to a specific script for shaping purposes based on the ScriptExtensions property. r=jrmuizel 4 years ago Jonathan Kew [:jfkthame] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1638478 - Add WPT reftest for shaping Arabic diacritics stacked on NBSP. r=jrmuizel 4 years ago Jonathan Kew [:jfkthame] 47 bytes, text/x-phabricator-request		Details \| Review

Bob Hallissy

Reporter

Description

•

5 years ago

Attached file Demonstration html file — Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36

Steps to reproduce:

display Unicode character sequences:
U+00A0 U+0654 U+0670 (nbspace hamzaAbove superscriptAlef)
and
U+00A0 U+0670 U+0654 (nbspace superscriptAlef hamzaAbove)
within both
a Latin string ("hi bye")
and
an Arabic string ("ب ي").

Actual results:

Observed rendering:
-- within the Latin string, the text was reordered to U+00A0 U+0670 U+0654, resulting in incorrect rendering (as if the hamza was being attached to the alef)
-- within the Arabic string, the text was reordered to U+00A0 U+0654 U+0670, rendering correctly (the alef being attached to the hamza)

Expected results:

The marks should always be displayed as if the order was U+00A0 U+0654 U+0670, whether or not the surrounding context is Arabic or Latin, as explained in Unicode technical Report 53 UNICODE ARABIC MARK RENDERING.

A combining character sequence of arabic marks should always be rendered with the Arabic shaper in order to trigger the operation of UTR53 Arabic Mark Transient Reordering Algorithm (AMTRA) within Harfbuzz. This should be the case whether the combining character sequence is on an Arabic letter base or not.

UTR53 should be applied to the Arabic mark sequence in both cases instead of only in the case of the Arabic context.

Bob Hallissy

Reporter

Comment 1

•

5 years ago

Unicode Standard suggests:

Nonspacing combining marks used by the Unicode Standard may be exhibited in apparent isolation by applying them to U+00A0 NO-BREAK SPACE. This convention might be employed, for example, when talking about the combining mark itself

Bob Hallissy

Reporter

Comment 2

•

5 years ago

Attached image Rendering in Firefox 78.0a1 (2020-05-15) (64-bit) (obsolete) — Details

Bob Hallissy

Reporter

Comment 3

•

5 years ago

Attached image Rendering in Chrome Version 81.0.4044.138 (Official Build) (64-bit) — Details

Bob Hallissy

Reporter

Comment 4

•

5 years ago

Attached image Rendering in Firefox 78.0a1 (2020-05-15) (64-bit) — Details

Attachment #9149546 - Attachment is obsolete: true

Bob Hallissy

Reporter

Comment 5

•

5 years ago

I also tried various markup around the problematic sequence (nbsp + marks), including:

lang="ar"
lang="ar-Arab"
preceding the sequence with U+061C ARABIC LETTER MARK
wrapping the sequence in RLO/PDF or RLE/PDF pairs
wrapping the sequence in <span dir="rtl"></span>

But nothing seems to help.

Jonathan Kew [:jfkthame]

Assignee

Comment 6

•

4 years ago

The basic issue here is that the script itemizer doesn't identify the <NBSP, hamza-above, superscript-alef> sequence as an Arabic script run, because NBSP has Script=Common and the two diacritics have Script=Inherited. So the run just resolves to Script=Common, and we send it through the generic shaper.

The two Arabic diacritics do have ScriptExtensions=arab,syrc in Unicode. So I think we should try looking at the ScriptExtensions property when a run otherwise resolves to Common, and use the first "real" script found there. This will generally result in the right shaping, for cases like this where the marks depend on being processed by a specific shaper.

Even then, it's unclear to me whether <NBSP, hamza-above, superscript-alef> should be expected to work in an LTR context (e.g. between Latin words), because of directionality concerns. None of these characters have strong RTL directionality (obviously not NBSP, and the diacritics have class NonSpacingMark, which means they adopt the directionality of the base to which they're applied). As a result, a line containing

hi <NBSP, hamza-above, superscript-alef> bye

will simply resolve to LTR, and so I'm doubtful whether shaping of the <NBSP, hamza-above, superscript-alef> should be expected to work as if it were RTL. But wrapping this in <span dir=rtl> would fix that; the base direction will then be RTL and the sequence should render properly if sent through the Arabic shaper.

So the testcase

data:text/html;charset=utf-8,<span dir=rtl style="font:36px arial">&nbsp;&%23x654;&%23x670;</span>

should render with the superscript-alef on top of the hamza (as it does in Chrome), but this currently fails in Firefox. Fixing the script itemizer to check ScriptExtensions will resolve this.

(It's interesting that Chrome renders this "correctly" even without the dir=rtl, but as described above, it's unclear to me whether that should really be expected. Possibly an issue for further investigation as a followup.)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

4 years ago

Assignee: nobody → jfkthame

Severity: -- → S3

Status: UNCONFIRMED → NEW

Ever confirmed: true

Jonathan Kew [:jfkthame]

Assignee

Comment 7

•

4 years ago

Attached file Bug 1638478 - Try to resolve Script=Common runs to a specific script for shaping purposes based on the ScriptExtensions property. r=jrmuizel — Details

Jonathan Kew [:jfkthame]

Assignee

Comment 8

•

4 years ago

Attached file Bug 1638478 - Add WPT reftest for shaping Arabic diacritics stacked on NBSP. r=jrmuizel — Details

Depends on D75744

Pulsebot

Comment 9

•

4 years ago

Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3ac062ec44ec Try to resolve Script=Common runs to a specific script for shaping purposes based on the ScriptExtensions property. r=jrmuizel https://hg.mozilla.org/integration/autoland/rev/42c6f55994f3 Add WPT reftest for shaping Arabic diacritics stacked on NBSP. r=jrmuizel

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 10

•

4 years ago

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/23692 for changes under testing/web-platform/tests

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 11

•

4 years ago

Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Dorel Luca [:dluca]

Comment 12

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3ac062ec44ec
https://hg.mozilla.org/mozilla-central/rev/42c6f55994f3

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox78: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla78

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 13

•

4 years ago

Upstream PR merged by moz-wptsync-bot

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Arabic shaper not used for Arabic combining marks on whitespace

Categories

(Core :: Layout: Text and Fonts, defect)

Tracking

()

People

(Reporter: eusgf4u4pw, Assigned: jfkthame)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(5 files, 1 obsolete file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Attachment

General

Description

File Name

Content Type