Closed Bug 955030 Opened 10 years ago Closed 10 years ago

RTL messages on the receiver side

Categories

(Chat Core :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugzilla, Assigned: aleth)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

*** Original post on bio 1600 by Moses <r5r4yster AT gmail.com> at 2012-07-20 16:19:00 UTC ***

*** Due to BzAPI limitations, the initial description is in comment 1 ***
*** Original post on bio 1600 as attmnt 1750 by r5r4yster AT gmail.com at 2012-07-20 16:19:00 UTC ***

When messages in Hebrew arrive in LTR directon at reciver side, this makes it rather difficult and annoying to read one's messages.
Languages such as Hebrew, Arabic, or Urdu should be displayed RTL. When the message is combined with English words, and renders Instantbird very uncomfortable to use.

I did as much as I could with the attachment added to show how the language shouldn't be displayed, and how It should. It's like right-handed people would hold a hammer in their left hand, and the nail in their right, just not enough accuracy.
*** Original post on bio 1600 at 2012-07-20 16:42:37 UTC ***

So if I understand correctly there is more than one issue here. Please correct it if this is wrong. It's quite possible I am completely missing an aspect of RTL text handling.

1) When a message starts with a RTL language character, assume it's a RTL message. (Or is there a better way to tell?)

2) Try to display such messages aligned on the right, rather than the left. Maybe the dir="rtl" HTML tag would achieve this?

3) RTL messages containing non-RTL characters should not be displayed as received, but rather reordered: RTL1 Latin RTL2 -> RTL2 Latin RTL1 ?
This is confusing to me - why shouldn't we display it the way the sender typed it?

Possibly useful: http://www.w3.org/TR/i18n-html-tech-bidi/
*** Original post on bio 1600 at 2012-07-20 16:45:16 UTC ***

Forgot this one https://developer.mozilla.org/en/CSS/direction
OS: Windows 7 → All
Hardware: x86 → All
Version: 1.1 → trunk
*** Original post on bio 1600 as attmnt 1751 by r5r4yster AT gmail.com at 2012-07-20 17:02:00 UTC ***

When RTL character starts the message, and then there's LTR character, then the message should be displayed RTL, otherwise, it would be very hard to read the message. 

See attachments.
*** Original post on bio 1600 at 2012-07-20 17:14:21 UTC ***

Since MDN is unhelpful: http://docs.oasis-open.org/dita/v1.1/OS/archspec/diratt.html

and 

The DIR attribute specifies the directionality of text--left-to-right (DIR=ltr, the default) or right-to-left (DIR=rtl). Characters in Unicode are assigned a directionality, left-to-right or right-to-left, to allow the text to be rendered properly. For example, while English characters are presented left-to-right, Hebrew characters are presented right-to-left.

Unicode defines a bidirectional algorithm that must be applied whenever a document contains right-to-left characters. While this algorithm usually gives the proper presentation, some situations leave directionally neutral text and require the DIR attribute to specify the base directionality.

Text is often directionally neutral when there are multiple embeddings of content with a different directionality. For example, an English sentence that contains a Hebrew phrase that contains an English quotation would require the DIR attribute to define the directionality of the Hebrew phrase. The Hebrew phrase, including the English quotation, should be contained within a SPAN element with DIR=rtl.
*** Original post on bio 1600 at 2012-07-20 17:14:59 UTC ***

Confirming.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: RTL on the receiver side → RTL messages on the receiver side
*** Original post on bio 1600 at 2012-07-20 21:39:19 UTC ***

Adium method that tests a message for its directionality via unicode:
http://hg.adium.im/adium-1.4/file/f4c2dffc9723/Frameworks/FriBidi%20Framework/NSString-FBAdditions.m

Given something like this, adding a dir=rtl tag to the message when appropriate should fix (at least a large part of) this bug, as long as the selected message style doesn't use text-align in messages.

I don't think we need worry about LTR-RTL-LTR nesting as described in comment #4 at this stage, and it seems like one level of RTL-LTR nesting (as described in the bug report) would be handled automatically as soon as the dir flag on the message html is set.
*** Original post on bio 1600 at 2012-07-20 21:48:40 UTC ***

Just noting that the adium code mentioned in comment #6 is a wrapper for GNU Fribidi (http://fribidi.org/). There does not appear to be a JS version as far as I can tell.
*** Original post on bio 1600 at 2012-07-20 22:22:26 UTC ***

Some more digging: This code https://mxr.mozilla.org/mozilla-central/source/layout/base/nsBidiPresUtils.cpp#583 looks very similar - if one could use getStyleTextReset() and getStyleVisibility() that would probably be enough.

Looking at this, I suspect the following trick may work: Check the direction attribute on the getComputedStyle of the (first few characters of?) the message.

But I'm a bit confused as to why, in that case, pure-RTL messages aren't already displayed appropriately... or are they?
*** Original post on bio 1600 by Moses <r5r4yster AT gmail.com> at 2012-07-21 12:31:13 UTC ***

More digging, hopefully It will be useful...

Pidgin implementation of RTL: http://developer.pidgin.im/viewmtn/revision/info/eb1889b8b74257cf008b6848b18344354cb8d02e

Rejected patch by shlomil: http://developer.pidgin.im/ticket/78
*** Original post on bio 1600 at 2012-07-21 16:44:14 UTC ***

Could it be that we just need to change imContentSink.jsm to preserve <SPAN style="direction:rtl;"> and <SPAN dir="RTL"> markups?
*** Original post on bio 1600 at 2012-07-22 19:34:20 UTC ***

(In reply to comment #10)
> Could it be that we just need to change imContentSink.jsm to preserve <SPAN
> style="direction:rtl;"> and <SPAN dir="RTL"> markups?

No, in general the incoming message will not have such markup.
Blocks: 955033
*** Original post on bio 1600 at 2012-07-22 19:58:35 UTC ***

I was asking because of http://lxr.instantbird.org/instantbird/source/purple/libpurple/protocols/msn/msnutils.c#135 (lines 135-145)
*** Original post on bio 1600 at 2012-07-22 20:21:52 UTC ***

(In reply to comment #12)
> I was asking because of
> http://lxr.instantbird.org/instantbird/source/purple/libpurple/protocols/msn/msnutils.c#135
> (lines 135-145)

Hmm, interesting, so MSN messages sometimes come with explicit RTL info. 
I don't fully understand this yet, but I doubt that's the way we want to add the markup, especially the text-align.
*** Original post on bio 1600 at 2012-07-22 20:24:35 UTC ***

(In reply to comment #13)
> the markup, especially the text-align.
oops, that should read "especially the text-align seems wrong."
*** Original post on bio 1600 at 2012-07-24 23:13:03 UTC ***

After discussing this with smontagu, it seems a clean solution will require bugs 562169 and 548206, which are slated for Gecko 17 or 18.
Blocks: 955036
No longer blocks: 955036
Depends on: 955036
*** Original post on bio 1600 at 2012-09-28 16:47:02 UTC ***

There's an add-on at https://addons.mozilla.org/en-US/thunderbird/addon/chat-rtl/ for a similar RTL issue.

I looked at the code, and what it does is:
if (/[\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC]/.test(aMsg))
  aMsg = "\u202B" + aMsg;

in the sendMsg method of imconversation.xml.
*** Original post on bio 1600 at 2012-09-29 17:15:28 UTC ***

(In reply to comment #16)
> in the sendMsg method of imconversation.xml.
This confuses me as this bug is mainly about incoming messages, so I'm not sure what problem exactly is being addressed. Is this a hack that works around the problem by adding something when messages are sent, so as long as you are talking to other people using the add-on you are OK?
*** Original post on bio 1600 at 2012-09-29 23:35:47 UTC ***

I think it's a special character that indicates that an RTL string embeds some LTR string and that the whole string should still be displayed RTL.

I must admit it confuses me too.
*** Original post on bio 1600 by Or Dagmi <or AT digmi.org> at 2012-09-30 05:54:25 UTC ***

Hello,

I'm the guy who wrote that add-on.

The problem I've noticed in thunderbird 15 is that when I'm sending a message the RTL is messed up when I use gtalk.

There is no problem on incoming messages, only on outgoing.

What my add-on does is to verify if there are any RTL symbols in the message (that's what the regex in the IF query does) and if so, it will insert a RLE character (http://www.fileformat.info/info/unicode/char/202b/index.htm) which enforcing RTL.

When I've added that ode, when I'm sending messages in hebrew with some english words they are received correctly.

The problem doesn't only effect the built-in chat in thunderbird, even if the other person was using gmail as a client, he would still receive the message messed up. therefore fixing the local direction in the display just won't do.

DiGMi.
*** Original post on bio 1600 at 2012-09-30 09:16:11 UTC ***

Hi DiGMi.

Your comment more or less confirms what I had understood, thanks!
What I'm wondering about the code change you made is: wouldn't this mess up the display when sending a message in English and embedding some words in Hebrew?
Is there any way to detect that the message is RTL as opposed to just containing some RTL symbols?
*** Original post on bio 1600 by Or Dagmi <or AT digmi.org> at 2012-09-30 09:29:04 UTC ***

There is no good way to determine whether a sentence suppose to be RTL or LTR.
The common way to do so is to  check if the first letters are suppose to be RTL. If so, handle the message as RTL.

There will still be some false positive, but you can't get into the writers head and you can't handle them all automatically.

For example:
א - Aleph, the first letter in the Hebrew alphabet
suppose to be LTR, because it's an English sentence.
but a message like
בDublin
means "at Dublin", a reply for "איפה אתה?" ("Where are you?") for example.
suppose to be RTL because it's an Hebrew sentence.

There is no way to determine which of the above the user ment programmatically. Because the first one is less common than the second one, I think it will be better to neglect the first.

So, You will probably want to change the regex so i will check if the first letters are RTL rather than if an RTL letters available.
*** Original post on bio 1600 at 2012-09-30 09:45:28 UTC ***

That makes sense.

I was also wondering if we should have a preferred default (maybe a hidden preference the user can change) that would be localized. So that for example an Hebrew build of Instantbird would be more likely to think a message with mixed RTL and LTR characters is RTL, and an English build would be more inclined to think it's LTR.

(I don't think Instantbird is currently localized into any RTL locale though.)
No longer depends on: 955036
Depends on: 955037
*** Original post on bio 1600 at 2012-11-25 14:01:20 UTC ***

(In reply to comment #15)
> After discussing this with smontagu, it seems a clean solution will require
> bugs 562169 and 548206, which are slated for Gecko 17 or 18.

Bio 562169 is now fixed for Mozilla 17.
Bio 548206 is now fixed for Mozilla 14 (this seems wrong, it was just checked in a few days ago? My guess is Mozilla 20).

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=562169
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=548206
No longer depends on: 955037
Depends on: 955262
*** Original post on bio 1600 at 2013-07-03 09:23:21 UTC ***

Since our HTML is message style-dependent, it would be easiest to fix this via CSS rather than by finding the correct tags to append dir="auto" to. Unfortunately, when attempting to do this, I discovered a gecko bug which (following discussion with smontagu) has now been filed
https://bugzilla.mozilla.org/show_bug.cgi?id=596002

So this will be delayed again :(
*** Original post on bio 1600 at 2013-07-03 09:25:40 UTC ***

(In reply to comment #24)
> https://bugzilla.mozilla.org/show_bug.cgi?id=596002
Oops, the link should be to
https://bugzilla.mozilla.org/show_bug.cgi?id=889742
Depends on: 955471
*** Original post on bio 1600 at 2013-07-06 17:28:58 UTC ***

https://bugzilla.mozilla.org/show_bug.cgi?id=889742 is fixed for moz25.
Attached patch PatchSplinter Review
*** Original post on bio 1600 as attmnt 3021 at 2013-11-08 16:27:00 UTC ***

Due to the work done by the bidi team, what's left to do on the IB side here is minimal.

This patch sets the directionality based on a best guess based on the unicode bidi information of the text content. I believe it fixes the issues described in the original bug report. (The alignment for RTL is correctly set to the right for all default message styles apart from Simple, and for Simple at least the text is displayed correctly.)

As I don't speak a RTL language, I could only test this with known examples, for which it works. Edge cases are expected of course (as discussed earlier in this bug) but as incoming messages don't have RTL/LTR markup, it's the best we can do.

I suspect RTL support will still be... incomplete in IB as the message styles themselves aren't written with it in mind.

The changes to the Simple message style are required as otherwise the prepended colons/spaces get inserted in the wrong place for RTL messages, as they logically belong to the nick and not the message text.
Attachment #8354802 - Flags: review?(florian)
Assignee: nobody → aleth
Status: NEW → ASSIGNED
Comment on attachment 8354802 [details] [diff] [review]
Patch

*** Original change on bio 1600 attmnt 3021 at 2013-11-08 16:48:00 UTC ***

I won't pretend to fully understand this, but I'm afraid nobody in our team will, and it looks reasonable, so r=me.
Attachment #8354802 - Flags: review?(florian) → review+
Whiteboard: [checkin-needed]
*** Original post on bio 1600 at 2013-11-10 03:19:18 UTC ***

Hopefully we can get someone to test this. :)

http://hg.instantbird.org/instantbird/rev/4ec0345f49a6
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [checkin-needed]
Target Milestone: --- → 1.5
You need to log in before you can comment on or make changes to this bug.