Closed Bug 236707 Opened 21 years ago Closed 21 years ago

ARABIC COMMA's Joining Class is wrong

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: zwnj, Assigned: zwnj)

References

()

Details

Attachments

(1 file, 2 obsolete files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114 ARABIC COMMA <U+060C> is a Non-Joining character, but behaves like Joining-Transparent. In Arabic/Perisna texts, shaping of characters after/before of ARABIC COMMA will be wrong. Reproducible: Always Steps to Reproduce: 1. Open <http://bamdad.org/~behnam/test/comma.fa.html>. Actual Results: The 3rd ARABIC LETTER HEH is _middle_ and the 1st ARABIC LETTER WAW is a _final_. Expected Results: In this text, the 3rd ARABIC LETTER HEH must be a _final_ and the 1st ARABIC LETTER WAW must be a _isolated_. As I looked at mozilla/source/content/shared/src/nsBidiUtils.cpp>, non of U+060x and U+061x characters are in the shaping algorithm. Also many characters added to these codes in Unicode 4.
U+0600..U+0620 Added to Arabic Joining Class Table (gJoiningClass); #define GetJoiningClass(c) Fixed (Optimized);
It works; I've tested. I know that this joining implementation is not good enough and is going to replaced with some other libs, but please check in this patch till replacing the code. This bug is very annoying in Persian (and perhaps Arabic) texts. Behdad: Could you test it please?
Summary: ARABIC COMMA Joining Class in wronge → ARABIC COMMA's Joining Class is wrong
Thanks Behnam for the patch. Unfortunately my laptop is not in a situation to test anything. Roozbeh, would you please? Behnam, please attach a patch with diff -u format.
Attached patch previous patch with -u (obsolete) — Splinter Review
Attachment #143178 - Attachment is obsolete: true
Assignee: roland.mainz → mkaply
Status: UNCONFIRMED → NEW
Component: GFX: Xlib → Layout: BiDi Hebrew & Arabic
Ever confirmed: true
QA Contact: timeless → zach
I'd like to understand better the logic behind the bug, but the only computers I have access to at the moment are on XP and don't go through this code path. What happens if you use a regular ASCII comma (or any other codepoint outside the U+06xx range) instead of an Arabic comma?
Everything is correct for ASCII characters between of Arabic letters. So the main problem was not here. I think the problem was that IS_ARABIC_CHAR was defined as: #define IS_ARABIC_CHAR(c) ((0x0600 <= (c)) && ((c)<= 0x06FF)) but GetJoiningClass was defined as: #define GetJoiningClass(c) \ (((0x0621 <= (c)) && ((c) <= 0x06FF)) ? \ (gJoiningClass[(c) - 0x0621]) : \ ((0x200D == (c)) ? eJC : eTr)) and as the comment said: // All letters without an equivalen in the FB50 block are 'eNJ' here. This // should be fixed after finding some better mechanism for handling Arabic. it was for 0621~06FF, and works right for outside of 06xx, but works wrong for 0600~0620.
(In reply to comment #6) > I think the problem was that IS_ARABIC_CHAR was defined as: > > #define IS_ARABIC_CHAR(c) ((0x0600 <= (c)) && ((c)<= 0x06FF)) > > but GetJoiningClass was defined as: > > #define GetJoiningClass(c) \ > (((0x0621 <= (c)) && ((c) <= 0x06FF)) ? \ > (gJoiningClass[(c) - 0x0621]) : \ > ((0x200D == (c)) ? eJC : eTr)) Yes, that was my theory too. :-) I think the patch would be cleaner if you change (((0x0621 <= (c)) && ((c) <= 0x06FF)) ? \ to (IS_ARABIC_CHAR(c)) \ in GetJoiningClass and move the |((c) >> 8) == 0x06)| optimization to the definition of |IS_ARABIC_CHAR|.
Table "gJoiningClass" is fixed like the previous one. Using "IS_ARABIC_CHAR" for arabic character detection. Remove the optimization, cause of human readability of header file.
Attachment #143181 - Attachment is obsolete: true
Comment on attachment 144290 [details] [diff] [review] use IS_ARABIC_CHAR, fixed gJoiningClass r=smontagu.
Attachment #144290 - Flags: review+
Sorry for the noise, but can this patch be applied please?
Comment on attachment 144290 [details] [diff] [review] use IS_ARABIC_CHAR, fixed gJoiningClass The patch still needs superreview before it can be applied: I'm requesting from rbs.
Attachment #144290 - Flags: superreview?(rbs)
Comment on attachment 144290 [details] [diff] [review] use IS_ARABIC_CHAR, fixed gJoiningClass sr=rbs
Attachment #144290 - Flags: superreview?(rbs) → superreview+
Comment on attachment 144290 [details] [diff] [review] use IS_ARABIC_CHAR, fixed gJoiningClass a=mkaply
Attachment #144290 - Flags: approval1.7+
Assignee: mkaply → behnam
checked in for 1.7final Checking in nsBidiUtils.cpp; /cvsroot/mozilla/content/shared/src/nsBidiUtils.cpp,v <-- nsBidiUtils.cpp new revision: 3.11; previous revision: 3.10 done
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: