diacritics in RTL text (e.g. Hebrew, Arabic) are displayed incorrectly in SVG

RESOLVED FIXED in mozilla13

Status

()

Core
SVG
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: Amir Aharoni, Assigned: jfkthame)

Tracking

(Blocks: 1 bug, {rtl})

Trunk
mozilla13
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(8 attachments, 4 obsolete attachments)

102.52 KB, application/x-font-ttf
Details
448 bytes, image/svg+xml
Details
381 bytes, text/html
Details
2.60 KB, image/png
Details
3.39 KB, image/png
Details
1.23 KB, text/html
Details
2.99 KB, text/html
Details
14.38 KB, patch
smontagu
: review+
Details | Diff | Splinter Review
(Reporter)

Description

6 years ago
Created attachment 592195 [details]
a font that should be installed when displaying the test files

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0a2) Gecko/20120108 Firefox/11.0a2
Build ID: 20120108053554

Steps to reproduce:

I tried to display an SVG file with a Hebrew word which has diacritics (nikud) in it.


Actual results:

The word was displayed incorrectly - the diacritics were not positioned properly.


Expected results:

The word should have been displayed correctly. The same word is displayed correctly if it's written in an HTML file. The same SVG file is displayed correctly in Chromium.

It is recommended to have the Taamey Frank CLM font installed on the computer when viewing the file. I'll attach a TTF file to make installation easier.
(Reporter)

Comment 1

6 years ago
Created attachment 592198 [details]
An SVG file for testing the font rendering problem
(Reporter)

Comment 2

6 years ago
Created attachment 592201 [details]
An HTML file with the same word that appears in the SVG file. Save both in one directory, open the HTML file and compare the rendering.
(Reporter)

Comment 3

6 years ago
Created attachment 592203 [details]
A PNG image that shows the correct rendering of the word. Created by opening the SVG file in Chromium.
(Reporter)

Updated

6 years ago
Component: Untriaged → SVG
Product: Firefox → Core

Updated

6 years ago
Attachment #592201 - Attachment mime type: text/plain → text/html
(Reporter)

Comment 4

6 years ago
Created attachment 592247 [details]
A PNG image that shows the bad rendering of the word. Created by opening the attached SVG file in Firefox.
(Reporter)

Comment 5

6 years ago
Maybe this will help, too: This issue is unique to Hebrew. It doesn't happen in other RTL scripts - Arabic and Thaana (Divehi).
(Reporter)

Comment 6

6 years ago
Initially i reported it on Ubuntu.

Now i tested it on Windows 7 and it is broken there, too.

In this regard this bug is different from Bug 662055, which manifests itself on Linux, Android and Windows XP, but not on Windows 7.

Updated

6 years ago
Keywords: rtl
OS: Linux → All
Hardware: x86_64 → All
This is a consequence of decisions made in bug 620446. The info is scattered in comments there, but I'll try to summarize it here.

We display bidirectional test in SVG by running the bidi algorithm on the text and reversing RTL runs so that the whole text is in visual LTR order.

The text reversing is done with a relatively simple-minded algorithm which has no special treatment for diacritics or shaping.

When text is rendered with harfbuzz, that is no problem, because the harfbuzz shaper knows the inherent directionality of the script, and handles the shaping of reversed text. (Internally, it reverses the text to ensure it is in the "natural" direction, then does shaping, then restores the original (overridden) direction.)

So everything works fine in RTL scripts for which we use harfbuzz, and also on platforms where harfbuzz is enabled for Hebrew by default, or with harfbuzz for Hebrew turned on in about:config (gfx.font_rendering.harfbuzz.scripts |= 4).

I've opened bug 722139 on turning harfbuzz on by default on all platforms.
Depends on: 722139
Summary: Hebrew diacritics are displayed incorrectly in SVG → Hebrew diacritics are displayed incorrectly in SVG with harfbuzz disabled
(Reporter)

Comment 8

6 years ago
Thank you, Simon.

There's a problem, though: Setting gfx.font_rendering.harfbuzz.scripts to 4 solves this bug, but makes the problem described in Bug 662055 worse on Linux (display on Windows XP remains the same).
Created attachment 597715 [details]
HTML testcase with reversed RTL text

Since the latest harfbuzz update in bug 695857, diacritic placement in reversed RTL text no longer works as expected in any script. Is that a deliberate change to normalization, reversing the behaviour described in bug 620446 comment 12? If so we need to add code to handle clusters in nsBidiPresUtils::WriteReverse
(Assignee)

Comment 10

6 years ago
Interesting... on OS X, the diacritics display incorrectly in the SVG example both with and without harfbuzz, though the problems are different. With HB enabled, it looks like they have approximately the correct relative positions, but are all shifted to the left as though they're positioned based on the following letter instead of the preceding one; whereas under Core Text, the relative positions are quite different from how they appear in HTML.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Version: 11 Branch → Trunk
(Assignee)

Comment 11

6 years ago
(In reply to Simon Montagu from comment #9)
> Since the latest harfbuzz update in bug 695857, diacritic placement in
> reversed RTL text no longer works as expected in any script. Is that a
> deliberate change to normalization, reversing the behaviour described in bug
> 620446 comment 12? If so we need to add code to handle clusters in
> nsBidiPresUtils::WriteReverse

There was indeed a change in how clusters and reversed-direction text are handled within harfbuzz. I have a feeling it may turn out to be a mistake, but need to investigate things a bit further, and discuss with Behdad.
(Assignee)

Comment 12

6 years ago
Created attachment 599089 [details]
testcase using reversed Latin-script text with diacritics

OK...this is a pretty confusing area, but I think I've convinced myself that the harfbuzz change (from harfbuzz-ng commit 54d1a0d2b2c4ffe15494967122c6422ecb1fc80b, to be exact), and the resulting change in how reversed-direction HTML text behaves, is in fact correct.

The key issue to note is that the Unicode bidi algorithm[1] explicitly says that "Combining characters *always* attach to the *preceding base character in the memory representation*. Even after reordering for display and performing character shaping, the glyph representing a combining character will attach to the glyph representing its base character in memory."

This applies consistently, regardless of any directionality overrides that may be in effect. As a result, using direction overrides (whether the Unicode control chars, or a higher-level protocol) to "reverse" the direction of a text run that includes combining characters is *not* equivalent to reversing the string on a character-by-character basis. Sometimes this may seem a surprising result; other times it seems entirely correct (IMO, anyhow).

See the attached testcase, which behaves correctly (according to my understanding) only in a recent Nightly, since bug 695857 landed; results in Aurora and earlier will be different.

Line 1 shows some Latin text with a variety of diacritics.

Line 2 has the exact same text, written in the same order, but with an RTL direction override applied. The expected result is that the text remains "the same" (contains the same sequence of accented letters), it just reads from right to left.

Line 3 has the RTL override, and the text has been reversed on a character-by-character basis. This seems at first glance as though it ought to result in the same rendering as Line 1 (we've reversed the character string, and we've reversed the direction), and indeed prior to the harfbuzz update it did so. But that is actually *incorrect* as per the Unicode-defined behavior where the combining marks are supposed to attach to the *preceding base character in memory*.

Line 4 also has the RTL override, and the text has been reversed, but on a cluster-by-cluster basis, so that the diacritics still *follow* their intended base character *in the memory representation*. This, AIUI, is the correct way to represent direction-overridden text, and should display the same as the natural-direction text in Line 1.

Of course, direction-overridden accented Latin-script text is not a common use case, but the same fundamental issue - that bidi reversal of text needs to work on a per-cluster basis - applies to preparing Hebrew or Arabic text with accents for LTR rendering in SVG.

[1] http://www.unicode.org/reports/tr9/
(Assignee)

Updated

6 years ago
Summary: Hebrew diacritics are displayed incorrectly in SVG with harfbuzz disabled → diacritics in RTL text (e.g. Hebrew, Arabic) are displayed incorrectly in SVG
(Assignee)

Comment 13

6 years ago
Created attachment 599271 [details] [diff] [review]
patch, support clusters in nsBidiPresUtils::WriteReverse

I believe this fixes the Hebrew-in-SVG problem, when (new) harfbuzz is enabled, by making nsBidiPresUtils::WriteReverse do its reversal by clusters rather than individual characters.

To implement this, I've factored out the clustering code from gfxShapedWord::SetupClusterBoundaries and moved it to a new ClusterIterator in gfxUnicodeProperties; then both gfxShapedWord and nsBidiPresUtils can make use of it.

(Behavior of the various non-harfbuzz platform shapers seems to be not entirely consistent. I think we should deal with those issues separately, if at all - in some cases the underlying shapers may simply be incapable of handling direction-overridden text properly.)
Attachment #599271 - Flags: review?(smontagu)
(Assignee)

Comment 14

6 years ago
(In reply to Jonathan Kew (:jfkthame) from comment #13)
> Created attachment 599271 [details] [diff] [review]
> patch, support clusters in nsBidiPresUtils::WriteReverse

This failed to build on Windows tryserver (stdint? what's that?), so will need a minor tweak for that.
(Assignee)

Comment 15

6 years ago
Created attachment 599494 [details] [diff] [review]
patch v2, support clusters in nsBidiPresUtils::WriteReverse

This fixes the Win64 build failure; otherwise unchanged.

(I propose to move the Unicode functionality to intl/ over in bug 724826.)
Assignee: nobody → jfkthame
Attachment #599271 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #599271 - Flags: review?(smontagu)
Attachment #599494 - Flags: review?(smontagu)
Created attachment 599496 [details]
attachment 599089 [details] with Hebrew and Arabic added

This supersedes attachment 597715 [details]. Interestingly I get different results in the Arabic case on Mac and Linux. On Linux "Reversed by clusters" is identical to the first line in all languages; on Mac "Reversed by characters is identical to the first line in Arabic only.
Attachment #597715 - Attachment is obsolete: true
(Assignee)

Comment 17

6 years ago
Created attachment 599497 [details]
HTML testcase with reversed RTL text, fixed version

As per comment #12, it turns out that attachment 597715 [details] was actually incorrect (it assumed reversal-by-characters, rather than by clusters). This is a fixed version that should render correctly given the proper Unicode-conformant handling of the diacritics in relation to directional overrides.
(Assignee)

Comment 18

6 years ago
(In reply to Simon Montagu from comment #16)
> Created attachment 599496 [details]
> attachment 599089 [details] with Hebrew and Arabic added
> 
> This supersedes attachment 597715 [details]. Interestingly I get different
> results in the Arabic case on Mac and Linux. On Linux "Reversed by clusters"
> is identical to the first line in all languages; on Mac "Reversed by
> characters is identical to the first line in Arabic only.

This is because you're getting an AAT (rather than OpenType) Arabic font on the Mac, which means we go through the Core Text backend, which doesn't handle the diacritics properly in reversed-direction text.

We could try to work around that, if we think it's worth the effort, but IMO we should regard that as a separate followup bug.
(Assignee)

Comment 19

6 years ago
Comment on attachment 599497 [details]
HTML testcase with reversed RTL text, fixed version

Obsoleting this, as in the meantime Simon incorporated Arabic and Hebrew into attachment 599496 [details], so that provides a more comprehensive example.
Attachment #599497 - Attachment is obsolete: true
Attachment #592195 - Attachment mime type: application/octet-stream → application/x-font-ttf
(Assignee)

Comment 20

6 years ago
Created attachment 600815 [details] [diff] [review]
patch v3, support clusters in nsBidiPresUtils::WriteReverse

Functionally the same as the previous versions, but updated to account for the move of rendering-related Unicode character properties from gfx to intl.
Attachment #599494 - Attachment is obsolete: true
Attachment #599494 - Flags: review?(smontagu)
Attachment #600815 - Flags: review?(smontagu)
Blocks: 543200
Comment on attachment 600815 [details] [diff] [review]
patch v3, support clusters in nsBidiPresUtils::WriteReverse

Review of attachment 600815 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/unicharutil/public/nsUnicodeProperties.h
@@ +69,5 @@
> +
> +inline bool IsClusterExtender(PRUint32 aCh) {
> +    return IsClusterExtender(aCh, GetGeneralCategory(aCh));
> +}
> +

What does this indirection give us?
(Assignee)

Comment 22

6 years ago
(In reply to Simon Montagu from comment #21)
> Comment on attachment 600815 [details] [diff] [review]
> patch v3, support clusters in nsBidiPresUtils::WriteReverse
> 
> Review of attachment 600815 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: intl/unicharutil/public/nsUnicodeProperties.h
> @@ +69,5 @@
> > +
> > +inline bool IsClusterExtender(PRUint32 aCh) {
> > +    return IsClusterExtender(aCh, GetGeneralCategory(aCh));
> > +}
> > +
> 
> What does this indirection give us?

It provides clients with the convenience of calling the one-argument IsClusterExtender() if all they have is the character code, while allowing clients who need to retrieve the category themselves for other reasons to pass it in to the two-argument version and thus avoid a second lookup.
Attachment #600815 - Flags: review?(smontagu) → review+
(Assignee)

Comment 23

6 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/7658ef219e8b
Target Milestone: --- → mozilla13
https://hg.mozilla.org/mozilla-central/rev/7658ef219e8b
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED

Updated

6 years ago
Depends on: 732330
Depends on: 732443

Updated

6 years ago
QA Contact: untriaged → general
You need to log in before you can comment on or make changes to this bug.