Closed Bug 620446 Opened 14 years ago Closed 14 years ago

Provide support for RTL languages in simple SVG text elements

Categories

(Core :: SVG, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- -

People

(Reporter: smontagu, Assigned: smontagu)

References

Details

(Keywords: rtl)

Attachments

(4 files, 6 obsolete files)

Attached patch Patch (obsolete) — Splinter Review
We won't be fixing bug 311545 before Firefox 4, but we need to provide some basic support for RTL in SVG. It's hurting us that RTL languages are displayed backwards in prominent places in our own sites, for example the labels on the graph at http://input.mozilla.com/ar/ and http://input.mozilla.com/he/

The attached patch is not necessarily final, because I want to clear up some details that are unclear in the spec, especially concerning the direction and text-anchor properties.
blocking2.0: --- → ?
I'm not particularly experienced in C/C++, but 
+      do {
+        *(dest++)  = *(--src);
+      } while (--length);

looks a bit unsafe. The call to EnsureStringLength doesn't help if length overflows, and then the destination will be incorrect, which leads to writing in incorrect areas.
I don't think we should actually block on this, but hopefully we can take a patch.
blocking2.0: ? → -
What if there are clusters? Do they need to stay in the right order?
(In reply to comment #3)
> What if there are clusters? Do they need to stay in the right order?

My first reaction to this was "Wow, awesome catch", but when I tested clusters in RTL text they came out fine. Jonathan, does the text run creation code automagically order clusters correctly in RTL script with LTR order?
Attached patch Patch v.2 (obsolete) — Splinter Review
Attachment #498770 - Attachment is obsolete: true
Attachment #499813 - Flags: review?(roc)
Attached patch Reftests (obsolete) — Splinter Review
Comment on attachment 499813 [details] [diff] [review]
Patch v.2

Strange, the reftests fail on tryserver on Windows only
Attachment #499813 - Flags: review?(roc)
Hmmm....The opt reftest images show that the text has been completely scrambled (replaced with apparently-random characters, or maybe byte-swapped or something). The debug reftests crashed during string manipulations under nsBidiPresUtils::CopyLogicalToVisual(), so perhaps the scrambled text is also due to CopyLogicalToVisual() copying data from (or to) some random/incorrect location instead of the proper string buffer.
Attached patch Patch v.3 (obsolete) — Splinter Review
tryserver likes this version better, but I would be happier if I knew why.

The difference is that I am caching the pointer returned by toBegin.get() in the converter, rather than caching the iterator itself.

Debugging the previous version I saw that it was writing to the wrong location and corrupting the stack.
Attachment #499813 - Attachment is obsolete: true
Attachment #500011 - Flags: review?(roc)
nsBidiPresUtils::CopyLogicalToVisual should truncate aDest at first.

I would like to understand how this works with clusters. It seems to me that putting combining marks before the base character rather than after it would break things. Why doesn't it?
I was hoping that Jonathan would respond to comment 4 to explain why it doesn't break things. It doesn't break in reversed-order HTML either, e.g.
data:text/html,<p>&#x202D;&#x05DD;&#x05B9;&#x05D5;&#x05DC;&#x05B8;&#x05C1;&#x05E9;&#x202C;</p>

It *does* break with surrogate pairs, but I'm happy to leave that (and also mirrored characters) to the real SVG bidi fix.
(In reply to comment #11)
> I was hoping that Jonathan would respond to comment 4 to explain why it doesn't
> break things. It doesn't break in reversed-order HTML either, e.g.
> data:text/html,<p>&#x202D;&#x05DD;&#x05B9;&#x05D5;&#x05DC;&#x05B8;&#x05C1;&#x05E9;&#x202C;</p>

At least in the case of harfbuzz shaping, this is supposed to work correctly: the shaper knows the inherent directionality of the script, and handles the shaping of reversed (direction-overridden) text. (Internally, it reverses the text to ensure it is in the "natural" direction, then does shaping, then restores the original (overridden) direction.)

> It *does* break with surrogate pairs, but I'm happy to leave that (and also
> mirrored characters) to the real SVG bidi fix.

Hmm. I think it's ok to ignore mirroring for now, but I'd be inclined to fix the surrogate issue, as mangling surrogates will completely trash the text, and means that we'll be passing illegal UTF-16 to lower-level code.
(If you do want to add mirroring support, maybe you could re-use the utilities that the harfbuzz shaper relies on. There's a gfxUnicodeProperties::GetMirroredChar() method; I think you'd just have to add gfxUnicodeProperties.h to the EXPORTS in Makefile.in, and it should be usable.)
(In reply to comment #12)
> Hmm. I think it's ok to ignore mirroring for now, but I'd be inclined to fix
> the surrogate issue, as mangling surrogates will completely trash the text, and
> means that we'll be passing illegal UTF-16 to lower-level code.

OTOH, mirrored characters are far more likely to come up in real-world examples...

In fact, with the harfbuzz mirroring utility it's not too hard to fix both problems by using the WriteReverse method, currently ifdef'd out in nsBidi.cpp
Attached patch Patch v.4 (obsolete) — Splinter Review
... like so.
Attachment #500011 - Attachment is obsolete: true
Attachment #500396 - Flags: review?(roc)
Attachment #500011 - Flags: review?(roc)
+    LogicalToVisual converter(toBegin.get(), aBaseDirection, mBidiEngine);
+    copy_string(fromBegin, fromEnd, converter);

I don't think copy_string is what you want here. It seems to be leftover cruft from when strings weren't necessarily contiguous in memory. I think you should just have a function that takes a const PRUnichar* and a length and writes the results to another PRUnichar* (plus the aBaseDirection and aBidiEngine parameters). And it should return a boolean indicating success.

Would it be too much to ask to extract the part of doWriteReverse we're actually going to use and clean up the code? It's amazingly ugly :-)
This is a slightly tweaked version of http://inkscape.org/doc/examples/i18n.svg, with this patch and Jonathan's patch from bug 621918.

The changes are: 
 modifications to text-anchor/startOffset in the right-to-left examples (all other browsers that I tested also need this change)
 addition of diacritics to the Hebrew text
(In reply to comment #17)
> Created attachment 500398 [details]
> Screenshot of a version of http://inkscape.org/doc/examples/i18n.svg
> 
> This is a slightly tweaked version of
> http://inkscape.org/doc/examples/i18n.svg, with this patch and Jonathan's patch
> from bug 621918.
> 
> The changes are: 
>  modifications to text-anchor/startOffset in the right-to-left examples (all
> other browsers that I tested also need this change)
>  addition of diacritics to the Hebrew text

Looks MUCH better! :) Except that the period at the end of the Arabic and Hebrew is appearing at the right-hand end instead of the left, suggesting that the text is being treated as primarily LTR rather than RTL. Any chance we can get this fixed?
Comment on attachment 500396 [details] [diff] [review]
Patch v.4

>+#include "gfxtypes.h"

Needs to be "gfxTypes.h".
(In reply to comment #18)
> Looks MUCH better! :) Except that the period at the end of the Arabic and
> Hebrew is appearing at the right-hand end instead of the left, suggesting that
> the text is being treated as primarily LTR rather than RTL. Any chance we can
> get this fixed?

To fix this we would have either to make another change to the source and specify direction: rtl for the Hebrew and Arabic text; or change the code to make direction depend on the inherent directionality of the text rather than the inherited direction property (with initial value of ltr). This is a possible reading of the rather vague spec, and it's what Batik does, but I don't think it's the right thing to do.

(In reply to comment #19)
> Comment on attachment 500396 [details] [diff] [review]
> >+#include "gfxtypes.h"
> 
> Needs to be "gfxTypes.h".

D'oh! Thanks
Attached patch Patch v.5 (obsolete) — Splinter Review
Attachment #500396 - Attachment is obsolete: true
Attachment #500604 - Flags: review?(roc)
Attachment #500396 - Flags: review?(roc)
Comment on attachment 500604 [details] [diff] [review]
Patch v.5

+      // no need to use the converter -- just copy the string in reverse order
+      WriteReverse(fromBegin.get(), srcLength, toBegin.get());

I think that WriteReverse returns srcLength in all cases, which you're assuming here, but I'm not 100% sure. If it does, then it shouldn't have a return value; remove it.
Attachment #500604 - Flags: review?(roc) → review+
Blocks: 614535
Attached patch Patch v.6Splinter Review
Updated to comments; transferring r=roc and requesting approval2.0
Attachment #500604 - Attachment is obsolete: true
Attachment #500966 - Flags: review+
Attachment #500966 - Flags: approval2.0?
Comment on attachment 500966 [details] [diff] [review]
Patch v.6

I'm approving this on the grounds that what we have now is totally broken and this is very unlikely to affect anything else, plus Simon is on the hook and has no blockers currently assigned.
Attachment #500966 - Flags: approval2.0? → approval2.0+
http://hg.mozilla.org/mozilla-central/rev/5a289c47df1a
http://hg.mozilla.org/mozilla-central/rev/e8c496238089
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Nice, Simon!  Looking at this test:

  http://dev.w3.org/SVG/profiles/1.1F2/test/harness/htmlObjectMiniApproved/text-intro-01-t.html

you can see that there is a combining character that doesn't combine with the aleph (which happens to be the only Hebrew letter I can recognise ;)).  Is this a known limitation of your patch (or perhaps a problem with the test)?
That seems to a problem with the Arial Unicode MS font. This testcase shows that the same thing happens in HTML.

More worrying is that the diacritic is misplaced in Ezra SIL and Arial (at least on Mac). It looks as if the harfbuzz shaping feature described in comment 12 doesn't work for these fonts. data:text/html,<p style="font-family: 'Ezra SIL'">&#x202D;&#x05DD;&#x05B9;&#x05D5;&#x05DC;&#x05B8;&#x05C1;&#x05E9;&#x202C;</p> also renders with a misplaced diacritic.
(In reply to comment #28)

> More worrying is that the diacritic is misplaced in Ezra SIL and Arial (at
> least on Mac). It looks as if the harfbuzz shaping feature described in comment
> 12 doesn't work for these fonts. data:text/html,<p style="font-family: 'Ezra
> SIL'">&#x202D;&#x05DD;&#x05B9;&#x05D5;&#x05DC;&#x05B8;&#x05C1;&#x05E9;&#x202C;</p>
> also renders with a misplaced diacritic.

We don't currently use harfbuzz by default for Hebrew; it is handled via the platform's shaping library (CoreText on Mac, Uniscribe or DirectWrite on Windows, Pango on Linux).

You can force all scripts to go through harfbuzz for testing purposes by setting gfx.font_rendering.harfbuzz.level to 3.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: