Last Comment Bug 721821 - diacritics in RTL text (e.g. Hebrew, Arabic) are displayed incorrectly in SVG
: diacritics in RTL text (e.g. Hebrew, Arabic) are displayed incorrectly in SVG
Status: RESOLVED FIXED
: rtl
Product: Core
Classification: Components
Component: SVG (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla13
Assigned To: Jonathan Kew (:jfkthame)
:
Mentors:
Depends on: 722139 732330 732443
Blocks: 543200
  Show dependency treegraph
 
Reported: 2012-01-27 11:45 PST by Amir Aharoni
Modified: 2012-03-26 09:10 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
a font that should be installed when displaying the test files (102.52 KB, application/x-font-ttf)
2012-01-27 11:45 PST, Amir Aharoni
no flags Details
An SVG file for testing the font rendering problem (448 bytes, image/svg+xml)
2012-01-27 11:46 PST, Amir Aharoni
no flags Details
An HTML file with the same word that appears in the SVG file. Save both in one directory, open the HTML file and compare the rendering. (381 bytes, text/html)
2012-01-27 11:51 PST, Amir Aharoni
no flags Details
A PNG image that shows the correct rendering of the word. Created by opening the SVG file in Chromium. (2.60 KB, image/png)
2012-01-27 11:53 PST, Amir Aharoni
no flags Details
A PNG image that shows the bad rendering of the word. Created by opening the attached SVG file in Firefox. (3.39 KB, image/png)
2012-01-27 13:22 PST, Amir Aharoni
no flags Details
HTML testcase with reversed RTL text (1017 bytes, text/html)
2012-02-16 01:18 PST, Simon Montagu :smontagu
no flags Details
testcase using reversed Latin-script text with diacritics (1.23 KB, text/html)
2012-02-21 03:47 PST, Jonathan Kew (:jfkthame)
no flags Details
patch, support clusters in nsBidiPresUtils::WriteReverse (15.51 KB, patch)
2012-02-21 11:09 PST, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
patch v2, support clusters in nsBidiPresUtils::WriteReverse (16.11 KB, patch)
2012-02-21 23:19 PST, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
attachment 599089 with Hebrew and Arabic added (2.99 KB, text/html)
2012-02-21 23:24 PST, Simon Montagu :smontagu
no flags Details
HTML testcase with reversed RTL text, fixed version (1017 bytes, text/html)
2012-02-21 23:26 PST, Jonathan Kew (:jfkthame)
no flags Details
patch v3, support clusters in nsBidiPresUtils::WriteReverse (14.38 KB, patch)
2012-02-26 14:22 PST, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review

Description Amir Aharoni 2012-01-27 11:45:11 PST
Created attachment 592195 [details]
a font that should be installed when displaying the test files

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0a2) Gecko/20120108 Firefox/11.0a2
Build ID: 20120108053554

Steps to reproduce:

I tried to display an SVG file with a Hebrew word which has diacritics (nikud) in it.


Actual results:

The word was displayed incorrectly - the diacritics were not positioned properly.


Expected results:

The word should have been displayed correctly. The same word is displayed correctly if it's written in an HTML file. The same SVG file is displayed correctly in Chromium.

It is recommended to have the Taamey Frank CLM font installed on the computer when viewing the file. I'll attach a TTF file to make installation easier.
Comment 1 Amir Aharoni 2012-01-27 11:46:37 PST
Created attachment 592198 [details]
An SVG file for testing the font rendering problem
Comment 2 Amir Aharoni 2012-01-27 11:51:32 PST
Created attachment 592201 [details]
An HTML file with the same word that appears in the SVG file. Save both in one directory, open the HTML file and compare the rendering.
Comment 3 Amir Aharoni 2012-01-27 11:53:42 PST
Created attachment 592203 [details]
A PNG image that shows the correct rendering of the word. Created by opening the SVG file in Chromium.
Comment 4 Amir Aharoni 2012-01-27 13:22:14 PST
Created attachment 592247 [details]
A PNG image that shows the bad rendering of the word. Created by opening the attached SVG file in Firefox.
Comment 5 Amir Aharoni 2012-01-27 13:28:40 PST
Maybe this will help, too: This issue is unique to Hebrew. It doesn't happen in other RTL scripts - Arabic and Thaana (Divehi).
Comment 6 Amir Aharoni 2012-01-27 13:54:57 PST
Initially i reported it on Ubuntu.

Now i tested it on Windows 7 and it is broken there, too.

In this regard this bug is different from Bug 662055, which manifests itself on Linux, Android and Windows XP, but not on Windows 7.
Comment 7 Simon Montagu :smontagu 2012-01-29 02:48:32 PST
This is a consequence of decisions made in bug 620446. The info is scattered in comments there, but I'll try to summarize it here.

We display bidirectional test in SVG by running the bidi algorithm on the text and reversing RTL runs so that the whole text is in visual LTR order.

The text reversing is done with a relatively simple-minded algorithm which has no special treatment for diacritics or shaping.

When text is rendered with harfbuzz, that is no problem, because the harfbuzz shaper knows the inherent directionality of the script, and handles the shaping of reversed text. (Internally, it reverses the text to ensure it is in the "natural" direction, then does shaping, then restores the original (overridden) direction.)

So everything works fine in RTL scripts for which we use harfbuzz, and also on platforms where harfbuzz is enabled for Hebrew by default, or with harfbuzz for Hebrew turned on in about:config (gfx.font_rendering.harfbuzz.scripts |= 4).

I've opened bug 722139 on turning harfbuzz on by default on all platforms.
Comment 8 Amir Aharoni 2012-01-29 03:42:21 PST
Thank you, Simon.

There's a problem, though: Setting gfx.font_rendering.harfbuzz.scripts to 4 solves this bug, but makes the problem described in Bug 662055 worse on Linux (display on Windows XP remains the same).
Comment 9 Simon Montagu :smontagu 2012-02-16 01:18:04 PST
Created attachment 597715 [details]
HTML testcase with reversed RTL text

Since the latest harfbuzz update in bug 695857, diacritic placement in reversed RTL text no longer works as expected in any script. Is that a deliberate change to normalization, reversing the behaviour described in bug 620446 comment 12? If so we need to add code to handle clusters in nsBidiPresUtils::WriteReverse
Comment 10 Jonathan Kew (:jfkthame) 2012-02-17 02:20:15 PST
Interesting... on OS X, the diacritics display incorrectly in the SVG example both with and without harfbuzz, though the problems are different. With HB enabled, it looks like they have approximately the correct relative positions, but are all shifted to the left as though they're positioned based on the following letter instead of the preceding one; whereas under Core Text, the relative positions are quite different from how they appear in HTML.
Comment 11 Jonathan Kew (:jfkthame) 2012-02-20 09:32:30 PST
(In reply to Simon Montagu from comment #9)
> Since the latest harfbuzz update in bug 695857, diacritic placement in
> reversed RTL text no longer works as expected in any script. Is that a
> deliberate change to normalization, reversing the behaviour described in bug
> 620446 comment 12? If so we need to add code to handle clusters in
> nsBidiPresUtils::WriteReverse

There was indeed a change in how clusters and reversed-direction text are handled within harfbuzz. I have a feeling it may turn out to be a mistake, but need to investigate things a bit further, and discuss with Behdad.
Comment 12 Jonathan Kew (:jfkthame) 2012-02-21 03:47:32 PST
Created attachment 599089 [details]
testcase using reversed Latin-script text with diacritics

OK...this is a pretty confusing area, but I think I've convinced myself that the harfbuzz change (from harfbuzz-ng commit 54d1a0d2b2c4ffe15494967122c6422ecb1fc80b, to be exact), and the resulting change in how reversed-direction HTML text behaves, is in fact correct.

The key issue to note is that the Unicode bidi algorithm[1] explicitly says that "Combining characters *always* attach to the *preceding base character in the memory representation*. Even after reordering for display and performing character shaping, the glyph representing a combining character will attach to the glyph representing its base character in memory."

This applies consistently, regardless of any directionality overrides that may be in effect. As a result, using direction overrides (whether the Unicode control chars, or a higher-level protocol) to "reverse" the direction of a text run that includes combining characters is *not* equivalent to reversing the string on a character-by-character basis. Sometimes this may seem a surprising result; other times it seems entirely correct (IMO, anyhow).

See the attached testcase, which behaves correctly (according to my understanding) only in a recent Nightly, since bug 695857 landed; results in Aurora and earlier will be different.

Line 1 shows some Latin text with a variety of diacritics.

Line 2 has the exact same text, written in the same order, but with an RTL direction override applied. The expected result is that the text remains "the same" (contains the same sequence of accented letters), it just reads from right to left.

Line 3 has the RTL override, and the text has been reversed on a character-by-character basis. This seems at first glance as though it ought to result in the same rendering as Line 1 (we've reversed the character string, and we've reversed the direction), and indeed prior to the harfbuzz update it did so. But that is actually *incorrect* as per the Unicode-defined behavior where the combining marks are supposed to attach to the *preceding base character in memory*.

Line 4 also has the RTL override, and the text has been reversed, but on a cluster-by-cluster basis, so that the diacritics still *follow* their intended base character *in the memory representation*. This, AIUI, is the correct way to represent direction-overridden text, and should display the same as the natural-direction text in Line 1.

Of course, direction-overridden accented Latin-script text is not a common use case, but the same fundamental issue - that bidi reversal of text needs to work on a per-cluster basis - applies to preparing Hebrew or Arabic text with accents for LTR rendering in SVG.

[1] http://www.unicode.org/reports/tr9/
Comment 13 Jonathan Kew (:jfkthame) 2012-02-21 11:09:34 PST
Created attachment 599271 [details] [diff] [review]
patch, support clusters in nsBidiPresUtils::WriteReverse

I believe this fixes the Hebrew-in-SVG problem, when (new) harfbuzz is enabled, by making nsBidiPresUtils::WriteReverse do its reversal by clusters rather than individual characters.

To implement this, I've factored out the clustering code from gfxShapedWord::SetupClusterBoundaries and moved it to a new ClusterIterator in gfxUnicodeProperties; then both gfxShapedWord and nsBidiPresUtils can make use of it.

(Behavior of the various non-harfbuzz platform shapers seems to be not entirely consistent. I think we should deal with those issues separately, if at all - in some cases the underlying shapers may simply be incapable of handling direction-overridden text properly.)
Comment 14 Jonathan Kew (:jfkthame) 2012-02-21 12:34:45 PST
(In reply to Jonathan Kew (:jfkthame) from comment #13)
> Created attachment 599271 [details] [diff] [review]
> patch, support clusters in nsBidiPresUtils::WriteReverse

This failed to build on Windows tryserver (stdint? what's that?), so will need a minor tweak for that.
Comment 15 Jonathan Kew (:jfkthame) 2012-02-21 23:19:29 PST
Created attachment 599494 [details] [diff] [review]
patch v2, support clusters in nsBidiPresUtils::WriteReverse

This fixes the Win64 build failure; otherwise unchanged.

(I propose to move the Unicode functionality to intl/ over in bug 724826.)
Comment 16 Simon Montagu :smontagu 2012-02-21 23:24:56 PST
Created attachment 599496 [details]
attachment 599089 [details] with Hebrew and Arabic added

This supersedes attachment 597715 [details]. Interestingly I get different results in the Arabic case on Mac and Linux. On Linux "Reversed by clusters" is identical to the first line in all languages; on Mac "Reversed by characters is identical to the first line in Arabic only.
Comment 17 Jonathan Kew (:jfkthame) 2012-02-21 23:26:21 PST
Created attachment 599497 [details]
HTML testcase with reversed RTL text, fixed version

As per comment #12, it turns out that attachment 597715 [details] was actually incorrect (it assumed reversal-by-characters, rather than by clusters). This is a fixed version that should render correctly given the proper Unicode-conformant handling of the diacritics in relation to directional overrides.
Comment 18 Jonathan Kew (:jfkthame) 2012-02-21 23:51:13 PST
(In reply to Simon Montagu from comment #16)
> Created attachment 599496 [details]
> attachment 599089 [details] with Hebrew and Arabic added
> 
> This supersedes attachment 597715 [details]. Interestingly I get different
> results in the Arabic case on Mac and Linux. On Linux "Reversed by clusters"
> is identical to the first line in all languages; on Mac "Reversed by
> characters is identical to the first line in Arabic only.

This is because you're getting an AAT (rather than OpenType) Arabic font on the Mac, which means we go through the Core Text backend, which doesn't handle the diacritics properly in reversed-direction text.

We could try to work around that, if we think it's worth the effort, but IMO we should regard that as a separate followup bug.
Comment 19 Jonathan Kew (:jfkthame) 2012-02-21 23:55:35 PST
Comment on attachment 599497 [details]
HTML testcase with reversed RTL text, fixed version

Obsoleting this, as in the meantime Simon incorporated Arabic and Hebrew into attachment 599496 [details], so that provides a more comprehensive example.
Comment 20 Jonathan Kew (:jfkthame) 2012-02-26 14:22:28 PST
Created attachment 600815 [details] [diff] [review]
patch v3, support clusters in nsBidiPresUtils::WriteReverse

Functionally the same as the previous versions, but updated to account for the move of rendering-related Unicode character properties from gfx to intl.
Comment 21 Simon Montagu :smontagu 2012-02-28 10:32:00 PST
Comment on attachment 600815 [details] [diff] [review]
patch v3, support clusters in nsBidiPresUtils::WriteReverse

Review of attachment 600815 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/unicharutil/public/nsUnicodeProperties.h
@@ +69,5 @@
> +
> +inline bool IsClusterExtender(PRUint32 aCh) {
> +    return IsClusterExtender(aCh, GetGeneralCategory(aCh));
> +}
> +

What does this indirection give us?
Comment 22 Jonathan Kew (:jfkthame) 2012-02-28 10:48:15 PST
(In reply to Simon Montagu from comment #21)
> Comment on attachment 600815 [details] [diff] [review]
> patch v3, support clusters in nsBidiPresUtils::WriteReverse
> 
> Review of attachment 600815 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: intl/unicharutil/public/nsUnicodeProperties.h
> @@ +69,5 @@
> > +
> > +inline bool IsClusterExtender(PRUint32 aCh) {
> > +    return IsClusterExtender(aCh, GetGeneralCategory(aCh));
> > +}
> > +
> 
> What does this indirection give us?

It provides clients with the convenience of calling the one-argument IsClusterExtender() if all they have is the character code, while allowing clients who need to retrieve the category themselves for other reasons to pass it in to the two-argument version and thus avoid a second lookup.
Comment 24 Matt Brubeck (:mbrubeck) 2012-02-29 11:17:39 PST
https://hg.mozilla.org/mozilla-central/rev/7658ef219e8b

Note You need to log in before you can comment on or make changes to this bug.