Closed
Bug 721821
Opened 14 years ago
Closed 13 years ago
diacritics in RTL text (e.g. Hebrew, Arabic) are displayed incorrectly in SVG
Categories
(Core :: SVG, defect)
Core
SVG
Tracking
()
RESOLVED
FIXED
mozilla13
People
(Reporter: amir.aharoni, Assigned: jfkthame)
References
(Blocks 1 open bug)
Details
(Keywords: rtl)
Attachments
(8 files, 4 obsolete files)
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0a2) Gecko/20120108 Firefox/11.0a2
Build ID: 20120108053554
Steps to reproduce:
I tried to display an SVG file with a Hebrew word which has diacritics (nikud) in it.
Actual results:
The word was displayed incorrectly - the diacritics were not positioned properly.
Expected results:
The word should have been displayed correctly. The same word is displayed correctly if it's written in an HTML file. The same SVG file is displayed correctly in Chromium.
It is recommended to have the Taamey Frank CLM font installed on the computer when viewing the file. I'll attach a TTF file to make installation easier.
Reporter | ||
Comment 1•14 years ago
|
||
Reporter | ||
Comment 2•14 years ago
|
||
Reporter | ||
Comment 3•14 years ago
|
||
Reporter | ||
Updated•14 years ago
|
Component: Untriaged → SVG
Product: Firefox → Core
Updated•14 years ago
|
Attachment #592201 -
Attachment mime type: text/plain → text/html
Reporter | ||
Comment 4•14 years ago
|
||
Reporter | ||
Comment 5•14 years ago
|
||
Maybe this will help, too: This issue is unique to Hebrew. It doesn't happen in other RTL scripts - Arabic and Thaana (Divehi).
Reporter | ||
Comment 6•14 years ago
|
||
Initially i reported it on Ubuntu.
Now i tested it on Windows 7 and it is broken there, too.
In this regard this bug is different from Bug 662055, which manifests itself on Linux, Android and Windows XP, but not on Windows 7.
![]() |
||
Updated•14 years ago
|
Comment 7•14 years ago
|
||
This is a consequence of decisions made in bug 620446. The info is scattered in comments there, but I'll try to summarize it here.
We display bidirectional test in SVG by running the bidi algorithm on the text and reversing RTL runs so that the whole text is in visual LTR order.
The text reversing is done with a relatively simple-minded algorithm which has no special treatment for diacritics or shaping.
When text is rendered with harfbuzz, that is no problem, because the harfbuzz shaper knows the inherent directionality of the script, and handles the shaping of reversed text. (Internally, it reverses the text to ensure it is in the "natural" direction, then does shaping, then restores the original (overridden) direction.)
So everything works fine in RTL scripts for which we use harfbuzz, and also on platforms where harfbuzz is enabled for Hebrew by default, or with harfbuzz for Hebrew turned on in about:config (gfx.font_rendering.harfbuzz.scripts |= 4).
I've opened bug 722139 on turning harfbuzz on by default on all platforms.
Depends on: 722139
Summary: Hebrew diacritics are displayed incorrectly in SVG → Hebrew diacritics are displayed incorrectly in SVG with harfbuzz disabled
Reporter | ||
Comment 8•14 years ago
|
||
Thank you, Simon.
There's a problem, though: Setting gfx.font_rendering.harfbuzz.scripts to 4 solves this bug, but makes the problem described in Bug 662055 worse on Linux (display on Windows XP remains the same).
Comment 9•13 years ago
|
||
Since the latest harfbuzz update in bug 695857, diacritic placement in reversed RTL text no longer works as expected in any script. Is that a deliberate change to normalization, reversing the behaviour described in bug 620446 comment 12? If so we need to add code to handle clusters in nsBidiPresUtils::WriteReverse
Assignee | ||
Comment 10•13 years ago
|
||
Interesting... on OS X, the diacritics display incorrectly in the SVG example both with and without harfbuzz, though the problems are different. With HB enabled, it looks like they have approximately the correct relative positions, but are all shifted to the left as though they're positioned based on the following letter instead of the preceding one; whereas under Core Text, the relative positions are quite different from how they appear in HTML.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Version: 11 Branch → Trunk
Assignee | ||
Comment 11•13 years ago
|
||
(In reply to Simon Montagu from comment #9)
> Since the latest harfbuzz update in bug 695857, diacritic placement in
> reversed RTL text no longer works as expected in any script. Is that a
> deliberate change to normalization, reversing the behaviour described in bug
> 620446 comment 12? If so we need to add code to handle clusters in
> nsBidiPresUtils::WriteReverse
There was indeed a change in how clusters and reversed-direction text are handled within harfbuzz. I have a feeling it may turn out to be a mistake, but need to investigate things a bit further, and discuss with Behdad.
Assignee | ||
Comment 12•13 years ago
|
||
OK...this is a pretty confusing area, but I think I've convinced myself that the harfbuzz change (from harfbuzz-ng commit 54d1a0d2b2c4ffe15494967122c6422ecb1fc80b, to be exact), and the resulting change in how reversed-direction HTML text behaves, is in fact correct.
The key issue to note is that the Unicode bidi algorithm[1] explicitly says that "Combining characters *always* attach to the *preceding base character in the memory representation*. Even after reordering for display and performing character shaping, the glyph representing a combining character will attach to the glyph representing its base character in memory."
This applies consistently, regardless of any directionality overrides that may be in effect. As a result, using direction overrides (whether the Unicode control chars, or a higher-level protocol) to "reverse" the direction of a text run that includes combining characters is *not* equivalent to reversing the string on a character-by-character basis. Sometimes this may seem a surprising result; other times it seems entirely correct (IMO, anyhow).
See the attached testcase, which behaves correctly (according to my understanding) only in a recent Nightly, since bug 695857 landed; results in Aurora and earlier will be different.
Line 1 shows some Latin text with a variety of diacritics.
Line 2 has the exact same text, written in the same order, but with an RTL direction override applied. The expected result is that the text remains "the same" (contains the same sequence of accented letters), it just reads from right to left.
Line 3 has the RTL override, and the text has been reversed on a character-by-character basis. This seems at first glance as though it ought to result in the same rendering as Line 1 (we've reversed the character string, and we've reversed the direction), and indeed prior to the harfbuzz update it did so. But that is actually *incorrect* as per the Unicode-defined behavior where the combining marks are supposed to attach to the *preceding base character in memory*.
Line 4 also has the RTL override, and the text has been reversed, but on a cluster-by-cluster basis, so that the diacritics still *follow* their intended base character *in the memory representation*. This, AIUI, is the correct way to represent direction-overridden text, and should display the same as the natural-direction text in Line 1.
Of course, direction-overridden accented Latin-script text is not a common use case, but the same fundamental issue - that bidi reversal of text needs to work on a per-cluster basis - applies to preparing Hebrew or Arabic text with accents for LTR rendering in SVG.
[1] http://www.unicode.org/reports/tr9/
Assignee | ||
Updated•13 years ago
|
Summary: Hebrew diacritics are displayed incorrectly in SVG with harfbuzz disabled → diacritics in RTL text (e.g. Hebrew, Arabic) are displayed incorrectly in SVG
Assignee | ||
Comment 13•13 years ago
|
||
I believe this fixes the Hebrew-in-SVG problem, when (new) harfbuzz is enabled, by making nsBidiPresUtils::WriteReverse do its reversal by clusters rather than individual characters.
To implement this, I've factored out the clustering code from gfxShapedWord::SetupClusterBoundaries and moved it to a new ClusterIterator in gfxUnicodeProperties; then both gfxShapedWord and nsBidiPresUtils can make use of it.
(Behavior of the various non-harfbuzz platform shapers seems to be not entirely consistent. I think we should deal with those issues separately, if at all - in some cases the underlying shapers may simply be incapable of handling direction-overridden text properly.)
Attachment #599271 -
Flags: review?(smontagu)
Assignee | ||
Comment 14•13 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #13)
> Created attachment 599271 [details] [diff] [review]
> patch, support clusters in nsBidiPresUtils::WriteReverse
This failed to build on Windows tryserver (stdint? what's that?), so will need a minor tweak for that.
Assignee | ||
Comment 15•13 years ago
|
||
This fixes the Win64 build failure; otherwise unchanged.
(I propose to move the Unicode functionality to intl/ over in bug 724826.)
Assignee: nobody → jfkthame
Attachment #599271 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #599271 -
Flags: review?(smontagu)
Attachment #599494 -
Flags: review?(smontagu)
Comment 16•13 years ago
|
||
This supersedes attachment 597715 [details]. Interestingly I get different results in the Arabic case on Mac and Linux. On Linux "Reversed by clusters" is identical to the first line in all languages; on Mac "Reversed by characters is identical to the first line in Arabic only.
Attachment #597715 -
Attachment is obsolete: true
Assignee | ||
Comment 17•13 years ago
|
||
As per comment #12, it turns out that attachment 597715 [details] was actually incorrect (it assumed reversal-by-characters, rather than by clusters). This is a fixed version that should render correctly given the proper Unicode-conformant handling of the diacritics in relation to directional overrides.
Assignee | ||
Comment 18•13 years ago
|
||
(In reply to Simon Montagu from comment #16)
> Created attachment 599496 [details]
> attachment 599089 [details] with Hebrew and Arabic added
>
> This supersedes attachment 597715 [details]. Interestingly I get different
> results in the Arabic case on Mac and Linux. On Linux "Reversed by clusters"
> is identical to the first line in all languages; on Mac "Reversed by
> characters is identical to the first line in Arabic only.
This is because you're getting an AAT (rather than OpenType) Arabic font on the Mac, which means we go through the Core Text backend, which doesn't handle the diacritics properly in reversed-direction text.
We could try to work around that, if we think it's worth the effort, but IMO we should regard that as a separate followup bug.
Assignee | ||
Comment 19•13 years ago
|
||
Comment on attachment 599497 [details]
HTML testcase with reversed RTL text, fixed version
Obsoleting this, as in the meantime Simon incorporated Arabic and Hebrew into attachment 599496 [details], so that provides a more comprehensive example.
Attachment #599497 -
Attachment is obsolete: true
Updated•13 years ago
|
Attachment #592195 -
Attachment mime type: application/octet-stream → application/x-font-ttf
Assignee | ||
Comment 20•13 years ago
|
||
Functionally the same as the previous versions, but updated to account for the move of rendering-related Unicode character properties from gfx to intl.
Attachment #599494 -
Attachment is obsolete: true
Attachment #599494 -
Flags: review?(smontagu)
Attachment #600815 -
Flags: review?(smontagu)
Comment 21•13 years ago
|
||
Comment on attachment 600815 [details] [diff] [review]
patch v3, support clusters in nsBidiPresUtils::WriteReverse
Review of attachment 600815 [details] [diff] [review]:
-----------------------------------------------------------------
::: intl/unicharutil/public/nsUnicodeProperties.h
@@ +69,5 @@
> +
> +inline bool IsClusterExtender(PRUint32 aCh) {
> + return IsClusterExtender(aCh, GetGeneralCategory(aCh));
> +}
> +
What does this indirection give us?
Assignee | ||
Comment 22•13 years ago
|
||
(In reply to Simon Montagu from comment #21)
> Comment on attachment 600815 [details] [diff] [review]
> patch v3, support clusters in nsBidiPresUtils::WriteReverse
>
> Review of attachment 600815 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: intl/unicharutil/public/nsUnicodeProperties.h
> @@ +69,5 @@
> > +
> > +inline bool IsClusterExtender(PRUint32 aCh) {
> > + return IsClusterExtender(aCh, GetGeneralCategory(aCh));
> > +}
> > +
>
> What does this indirection give us?
It provides clients with the convenience of calling the one-argument IsClusterExtender() if all they have is the character code, while allowing clients who need to retrieve the category themselves for other reasons to pass it in to the two-argument version and thus avoid a second lookup.
Updated•13 years ago
|
Attachment #600815 -
Flags: review?(smontagu) → review+
Assignee | ||
Comment 23•13 years ago
|
||
Target Milestone: --- → mozilla13
Comment 24•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
QA Contact: untriaged → general
You need to log in
before you can comment on or make changes to this bug.
Description
•