Closed Bug 236707 Opened 20 years ago Closed 20 years ago

ARABIC COMMA's Joining Class is wrong

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: zwnj, Assigned: zwnj)

References

()

Details

Attachments

(1 file, 2 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114

ARABIC COMMA <U+060C> is a Non-Joining character, but behaves like
Joining-Transparent.  In Arabic/Perisna texts, shaping of characters
after/before of ARABIC COMMA will be wrong.

Reproducible: Always
Steps to Reproduce:
1. Open <http://bamdad.org/~behnam/test/comma.fa.html>.

Actual Results:  
The 3rd ARABIC LETTER HEH is _middle_ and the 1st ARABIC LETTER WAW is a _final_.

Expected Results:  
In this text, the 3rd ARABIC LETTER HEH must be a _final_ and the 1st ARABIC
LETTER WAW must be a _isolated_.

As I looked at mozilla/source/content/shared/src/nsBidiUtils.cpp>, non of U+060x
and U+061x characters are in the shaping algorithm.  Also many characters added
to these codes in Unicode 4.
U+0600..U+0620 Added to Arabic Joining Class Table (gJoiningClass);
#define GetJoiningClass(c) Fixed (Optimized);
It works; I've tested.

I know that this joining implementation is not good enough and is going to
replaced with some other libs, but please check in this patch till replacing the
code.

This bug is very annoying in Persian (and perhaps Arabic) texts.

Behdad: Could you test it please?
Summary: ARABIC COMMA Joining Class in wronge → ARABIC COMMA's Joining Class is wrong
Thanks Behnam for the patch.  Unfortunately my laptop is not in a situation to
test anything.  Roozbeh, would you please?

Behnam, please attach a patch with diff -u format.
Attached patch previous patch with -u (obsolete) — Splinter Review
Attachment #143178 - Attachment is obsolete: true
Assignee: roland.mainz → mkaply
Status: UNCONFIRMED → NEW
Component: GFX: Xlib → Layout: BiDi Hebrew & Arabic
Ever confirmed: true
QA Contact: timeless → zach
I'd like to understand better the logic behind the bug, but the only computers I
have access to at the moment are on XP and don't go through this code path. What
happens if you use a regular ASCII comma (or any other codepoint outside the
U+06xx range) instead of an Arabic comma?
Everything is correct for ASCII characters between of Arabic letters.  So the
main problem was not here.

I think the problem was that IS_ARABIC_CHAR was defined as:

#define IS_ARABIC_CHAR(c) ((0x0600 <= (c)) && ((c)<= 0x06FF))

but GetJoiningClass was defined as:

#define GetJoiningClass(c)                   \
  (((0x0621 <= (c)) && ((c) <= 0x06FF)) ?    \
       (gJoiningClass[(c) - 0x0621]) :       \
      ((0x200D == (c)) ? eJC : eTr))


and as the comment said:

// All letters without an equivalen in the FB50 block are 'eNJ' here. This
// should be fixed after finding some better mechanism for handling Arabic.

it was for 0621~06FF, and works right for outside of 06xx, but works wrong for
0600~0620.
(In reply to comment #6)
> I think the problem was that IS_ARABIC_CHAR was defined as:
> 
> #define IS_ARABIC_CHAR(c) ((0x0600 <= (c)) && ((c)<= 0x06FF))
> 
> but GetJoiningClass was defined as:
> 
> #define GetJoiningClass(c)                   \
>   (((0x0621 <= (c)) && ((c) <= 0x06FF)) ?    \
>        (gJoiningClass[(c) - 0x0621]) :       \
>       ((0x200D == (c)) ? eJC : eTr))

Yes, that was my theory too. :-)

I think the patch would be cleaner if you change
   (((0x0621 <= (c)) && ((c) <= 0x06FF)) ?    \
to 
   (IS_ARABIC_CHAR(c))                        \
in GetJoiningClass and move the |((c) >> 8) == 0x06)| optimization to the
definition of |IS_ARABIC_CHAR|.
Table "gJoiningClass" is fixed like the previous one.

Using "IS_ARABIC_CHAR" for arabic character detection.

Remove the optimization, cause of human readability of header file.
Attachment #143181 - Attachment is obsolete: true
Comment on attachment 144290 [details] [diff] [review]
use IS_ARABIC_CHAR, fixed gJoiningClass

r=smontagu.
Attachment #144290 - Flags: review+
Sorry for the noise, but can this patch be applied please?
Comment on attachment 144290 [details] [diff] [review]
use IS_ARABIC_CHAR, fixed gJoiningClass

The patch still needs superreview before it can be applied: I'm requesting from
rbs.
Attachment #144290 - Flags: superreview?(rbs)
Comment on attachment 144290 [details] [diff] [review]
use IS_ARABIC_CHAR, fixed gJoiningClass

sr=rbs
Attachment #144290 - Flags: superreview?(rbs) → superreview+
Comment on attachment 144290 [details] [diff] [review]
use IS_ARABIC_CHAR, fixed gJoiningClass

a=mkaply
Attachment #144290 - Flags: approval1.7+
Assignee: mkaply → behnam
checked in for 1.7final

Checking in nsBidiUtils.cpp;
/cvsroot/mozilla/content/shared/src/nsBidiUtils.cpp,v  <--  nsBidiUtils.cpp
new revision: 3.11; previous revision: 3.10
done
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: