ARABIC COMMA's Joining Class is wrong

RESOLVED FIXED

Status

()

RESOLVED FIXED
15 years ago
10 years ago

People

(Reporter: zwnj, Assigned: zwnj)

Tracking

Trunk
x86
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

15 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114

ARABIC COMMA <U+060C> is a Non-Joining character, but behaves like
Joining-Transparent.  In Arabic/Perisna texts, shaping of characters
after/before of ARABIC COMMA will be wrong.

Reproducible: Always
Steps to Reproduce:
1. Open <http://bamdad.org/~behnam/test/comma.fa.html>.

Actual Results:  
The 3rd ARABIC LETTER HEH is _middle_ and the 1st ARABIC LETTER WAW is a _final_.

Expected Results:  
In this text, the 3rd ARABIC LETTER HEH must be a _final_ and the 1st ARABIC
LETTER WAW must be a _isolated_.

As I looked at mozilla/source/content/shared/src/nsBidiUtils.cpp>, non of U+060x
and U+061x characters are in the shaping algorithm.  Also many characters added
to these codes in Unicode 4.
(Assignee)

Comment 1

15 years ago
Created attachment 143178 [details] [diff] [review]
Fix Arabbic Joining Class Table in nsBidiUtils.cpp

U+0600..U+0620 Added to Arabic Joining Class Table (gJoiningClass);
#define GetJoiningClass(c) Fixed (Optimized);
(Assignee)

Comment 2

15 years ago
It works; I've tested.

I know that this joining implementation is not good enough and is going to
replaced with some other libs, but please check in this patch till replacing the
code.

This bug is very annoying in Persian (and perhaps Arabic) texts.

Behdad: Could you test it please?
Summary: ARABIC COMMA Joining Class in wronge → ARABIC COMMA's Joining Class is wrong

Comment 3

15 years ago
Thanks Behnam for the patch.  Unfortunately my laptop is not in a situation to
test anything.  Roozbeh, would you please?

Behnam, please attach a patch with diff -u format.
(Assignee)

Comment 4

15 years ago
Created attachment 143181 [details] [diff] [review]
previous patch with -u
Attachment #143178 - Attachment is obsolete: true

Updated

15 years ago
Assignee: roland.mainz → mkaply
Status: UNCONFIRMED → NEW
Component: GFX: Xlib → Layout: BiDi Hebrew & Arabic
Ever confirmed: true
QA Contact: timeless → zach
I'd like to understand better the logic behind the bug, but the only computers I
have access to at the moment are on XP and don't go through this code path. What
happens if you use a regular ASCII comma (or any other codepoint outside the
U+06xx range) instead of an Arabic comma?
(Assignee)

Comment 6

15 years ago
Everything is correct for ASCII characters between of Arabic letters.  So the
main problem was not here.

I think the problem was that IS_ARABIC_CHAR was defined as:

#define IS_ARABIC_CHAR(c) ((0x0600 <= (c)) && ((c)<= 0x06FF))

but GetJoiningClass was defined as:

#define GetJoiningClass(c)                   \
  (((0x0621 <= (c)) && ((c) <= 0x06FF)) ?    \
       (gJoiningClass[(c) - 0x0621]) :       \
      ((0x200D == (c)) ? eJC : eTr))


and as the comment said:

// All letters without an equivalen in the FB50 block are 'eNJ' here. This
// should be fixed after finding some better mechanism for handling Arabic.

it was for 0621~06FF, and works right for outside of 06xx, but works wrong for
0600~0620.
(In reply to comment #6)
> I think the problem was that IS_ARABIC_CHAR was defined as:
> 
> #define IS_ARABIC_CHAR(c) ((0x0600 <= (c)) && ((c)<= 0x06FF))
> 
> but GetJoiningClass was defined as:
> 
> #define GetJoiningClass(c)                   \
>   (((0x0621 <= (c)) && ((c) <= 0x06FF)) ?    \
>        (gJoiningClass[(c) - 0x0621]) :       \
>       ((0x200D == (c)) ? eJC : eTr))

Yes, that was my theory too. :-)

I think the patch would be cleaner if you change
   (((0x0621 <= (c)) && ((c) <= 0x06FF)) ?    \
to 
   (IS_ARABIC_CHAR(c))                        \
in GetJoiningClass and move the |((c) >> 8) == 0x06)| optimization to the
definition of |IS_ARABIC_CHAR|.
(Assignee)

Comment 8

15 years ago
Created attachment 144290 [details] [diff] [review]
use IS_ARABIC_CHAR, fixed gJoiningClass

Table "gJoiningClass" is fixed like the previous one.

Using "IS_ARABIC_CHAR" for arabic character detection.

Remove the optimization, cause of human readability of header file.
Attachment #143181 - Attachment is obsolete: true
Comment on attachment 144290 [details] [diff] [review]
use IS_ARABIC_CHAR, fixed gJoiningClass

r=smontagu.
Attachment #144290 - Flags: review+

Comment 10

15 years ago
Sorry for the noise, but can this patch be applied please?
Comment on attachment 144290 [details] [diff] [review]
use IS_ARABIC_CHAR, fixed gJoiningClass

The patch still needs superreview before it can be applied: I'm requesting from
rbs.
Attachment #144290 - Flags: superreview?(rbs)

Comment 12

15 years ago
Comment on attachment 144290 [details] [diff] [review]
use IS_ARABIC_CHAR, fixed gJoiningClass

sr=rbs
Attachment #144290 - Flags: superreview?(rbs) → superreview+

Comment 13

15 years ago
Comment on attachment 144290 [details] [diff] [review]
use IS_ARABIC_CHAR, fixed gJoiningClass

a=mkaply
Attachment #144290 - Flags: approval1.7+
Assignee: mkaply → behnam
checked in for 1.7final

Checking in nsBidiUtils.cpp;
/cvsroot/mozilla/content/shared/src/nsBidiUtils.cpp,v  <--  nsBidiUtils.cpp
new revision: 3.11; previous revision: 3.10
done
Status: NEW → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED

Updated

10 years ago
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.