Closed
Bug 240943
Opened 20 years ago
Closed 20 years ago
Update bidi data files to Unicode 4.0.1
Categories
(Core :: Layout: Text and Fonts, defect)
Core
Layout: Text and Fonts
Tracking
()
RESOLVED
FIXED
People
(Reporter: smontagu, Assigned: smontagu)
References
()
Details
(Keywords: fixed1.7)
Attachments
(3 files, 1 obsolete file)
77.73 KB,
patch
|
jshin1987
:
review+
blizzard
:
superreview+
mkaply
:
approval1.7+
|
Details | Diff | Splinter Review |
56.17 KB,
patch
|
smontagu
:
review+
rbs
:
superreview+
|
Details | Diff | Splinter Review |
7.08 KB,
patch
|
roc
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
Unicode 4.0.1 includes changes to some Bidi properties, and Unicode 4.0 already added several new right-to-left characters, including some beyond the BMP. We need at least to rerun layout/tools/genbidicattable.pl and layout/tools/gensymmtable.pl on the latest version of the Unicode Character Database, and the scripts themselves may need some modifications.
Comment 1•20 years ago
|
||
Asking for blocking1.7. I think making the next stable release compliant with Unicode 4.0.1 makes a lot of sense. Moreover, fixing this will also resolve Bug 73251 which currently represents the most frequent and obvious rendering bug with Hebrew pages. Prog.
Flags: blocking1.7?
Assignee | ||
Comment 2•20 years ago
|
||
I wouldn't like to take the risk of adding support for Bidi surrogate characters to 1.7 at this late stage, so here is a patch which only changes the character types within the BMP, generated from a subset of the UCDB.
Assignee | ||
Comment 3•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review] Patch for 1.7 blizzard, can you rs this change to the generated file bidicattable.h? No actual code changes are involved, only modifications and additions to the bidi character types.
Attachment #147221 -
Flags: superreview?(blizzard)
Comment 4•20 years ago
|
||
Given that smontagu doesn't want to take take the risk of adding Bidi surrogate characters, is there good reason to hold the release of 1.7 for this patch? (I'm not suggesting we wouldn't approve a reviewed patch, just questioning whether or not we should block the release for this change).
Comment 5•20 years ago
|
||
Asa, see comment #1 for a couple of good reasons to block 1.7 for this bug. Prog.
Assignee | ||
Comment 6•20 years ago
|
||
(In reply to comment #4) > Given that smontagu doesn't want to take take the risk of adding Bidi surrogate > characters, is there good reason to hold the release of 1.7 for this patch? Let me clarify two points: (1) this patch only includes the parts that I don't think are risky; (2) as far as I know, no Bidi surrogate characters are yet in use in the wild, although Unicode 4.0 defined a few.
Assignee | ||
Comment 7•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review] Patch for 1.7 I guess I was forgetting correct process and thinking that an "rs" request didn't need r= first. jshin, can you review?
Attachment #147221 -
Flags: review?(jshin)
Assignee | ||
Comment 8•20 years ago
|
||
As well as the necessary changes for non-BMP characters, this includes some tightening up of the code in GetBidiCat(), and changes to the license block to match the manual changes that were made to bidicattable.h. I'll cut out the #ifdef DEBUG_Simon part before checkin.
Assignee | ||
Comment 9•20 years ago
|
||
Comment on attachment 147705 [details] [diff] [review] Patch for trunk Jungshik, can you review this one too? > #define CHAR_IS_BIDI(c) ( (IS_HINDI_DIGIT(c) ) || (IS_HEBREW_CHAR(c) ) \ >- || (IS_06_CHAR(c) ) || (IS_FE_CHAR(c) ) ) >+ || (IS_06_CHAR(c) ) || (IS_FE_CHAR(c) ) \ >+ || (IS_CYPRIOT_CHAR(c) ) ) I don't like this part very much myself. There are a whole bunch of RTL scripts roadmapped for addition to Unicode, and I don't like the idea of adding another condition to this #define for each one. The way to go is probably to move bidicattable.h back into intl/unicharutils and call it directly, but let's do that separately.
Attachment #147705 -
Flags: review?(jshin)
Comment 10•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review] Patch for 1.7 r=jshin
Attachment #147221 -
Flags: review?(jshin) → review+
Assignee | ||
Comment 11•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review] Patch for 1.7 transferring rs request. See comment 3 for justification of rs.
Attachment #147221 -
Flags: superreview?(blizzard) → superreview?(roc)
Updated•20 years ago
|
Attachment #147221 -
Flags: superreview?(roc) → superreview+
Assignee | ||
Comment 12•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review] Patch for 1.7 Requesting approval for 1.7. The behaviour of hyphen-minus between right-to-left characters and numbers has long been one of the most obvious difference between the Bidi behaviours of Mozilla and IE, and has led to many bug reports (see bug 73251 and its duplicates). In the past Mozilla was conforming to the Unicode standard better than IE, but meanwhile the standard has changed under us and we have become non-compliant. risk: minimal. The patch has no semantic code changes, only additions and modifications to the table of Bidi categories.
Attachment #147221 -
Flags: approval1.7?
Comment 13•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review] Patch for 1.7 a=mkaply
Attachment #147221 -
Flags: approval1.7? → approval1.7+
Comment 14•20 years ago
|
||
I'm removing my blocking1.7 request, this is already checked into the Branch. I hope the Trunk patch will soon follow (reminder: jshin). Prog.
Flags: blocking1.7?
Could you land the branch patch on the trunk until the trunk patch is ready so that the branch patch gets more testing?
Assignee | ||
Comment 16•20 years ago
|
||
(In reply to comment #15) > Could you land the branch patch on the trunk until the trunk patch is ready so > that the branch patch gets more testing? I did so, unfortunately regressing the recent manual change to the license block of bidicattable.h, but that will be corrected when I check in the final version.
Comment 17•20 years ago
|
||
Comment on attachment 147705 [details] [diff] [review] Patch for trunk r=jshin sorry for the delay. >Index: content/shared/src/nsTextFragment.cpp >@@ -344,17 +345,24 @@ > while (cp < end) { >- PRUnichar ch = *cp++; >- if (CHAR_IS_BIDI(ch) ) { >+ PRUnichar ch1 = *cp++; >+ PRUint32 utf32Char = ch1; >+ if (IS_HIGH_SURROGATE(ch1) && cp < end) { >+ PRUnichar ch2 = *cp++; >+ if (IS_LOW_SURROGATE(ch2)) { >+ utf32Char = SURROGATE_TO_UCS4(ch1, ch2); >+ } >+ } >+ if (CHAR_IS_BIDI(utf32Char) ) { > mState.mIsBidi = PR_TRUE; > break; an edge case: If a high surrogate code point is followed by a 'BIDI character' (in BMP), it wouldn't be detected. Therefore, it may be necessary to add 'else if (CHAR_IS_BIDI(ch2))' after 'if (IS_LOW_SURROGATE(ch2))'
Comment 18•20 years ago
|
||
Comment on attachment 147705 [details] [diff] [review] Patch for trunk sorry that i forgot to set the r flag.
Attachment #147705 -
Flags: review?(jshin) → review+
Assignee | ||
Comment 19•20 years ago
|
||
Thanks for review.
> an edge case: If a high surrogate code point is followed by a 'BIDI character'
> (in BMP), it wouldn't be detected. Therefore, it may be necessary to add 'else
> if (CHAR_IS_BIDI(ch2))' after 'if (IS_LOW_SURROGATE(ch2))'
Good catch! And if we are talking edge cases, I suppose we also have to consider
the case of an isolated high surrogate followed by a well-formed surrogate pair.
Assignee | ||
Comment 20•20 years ago
|
||
I changed that to: + PRUnichar ch1 = *cp++; + PRUint32 utf32Char = ch1; + if (IS_HIGH_SURROGATE(ch1) && + cp < end && + IS_LOW_SURROGATE(*cp)) { + PRUnichar ch2 = *cp++; + utf32Char = SURROGATE_TO_UCS4(ch1, ch2); + } and removed the debugging code.
Assignee | ||
Updated•20 years ago
|
Attachment #147705 -
Attachment is obsolete: true
Assignee | ||
Comment 21•20 years ago
|
||
Comment on attachment 148662 [details] [diff] [review] Patch for trunk Transferring r and requesting sr
Attachment #148662 -
Flags: superreview?(rbs)
Attachment #148662 -
Flags: review+
Comment 22•20 years ago
|
||
Comment on attachment 148662 [details] [diff] [review] Patch for trunk sr=rbs
Attachment #148662 -
Flags: superreview?(rbs) → superreview+
Assignee | ||
Comment 23•20 years ago
|
||
Fix checked in.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment 24•20 years ago
|
||
This isn't working the way you want. On Linux gcc throws pairs of warnings like these: nsTextBoxFrame.cpp:510: warning: comparison is always false due to limited range of data type nsTextBoxFrame.cpp:510: warning: comparison is always true due to limited range of data type This happens because CHAR_IS_BIDI now contains IS_CYPRIOT_CHAR and IS_CYPRIOT_CHAR tests against values > 0xffff. Gcc knows that PRUnichar is only 16 bits so it always evaluates IS_CYPRIOT_CHAR as false. If you're expecting to handle U+1xxxx characters you'll have to do something else. I'm reopening this bug because the patch clearly doesn't work as designed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 25•20 years ago
|
||
I saw the warning, but why does it mean that the patch is not working as expected? I have suggested getting rid of CHAR_IS_BIDI anyway for performance reasons (bug 244036). If we don't go ahead with that we could at least split it into two to remove the warning, assuming that the caller will always know whether the character is BMP or above: #define UCS2_CHAR_IS_BIDI(c) ( (IS_HINDI_DIGIT(c) ) || (IS_HEBREW_CHAR(c) ) \ || (IS_06_CHAR(c) ) || (IS_FE_CHAR(c) ) ) #define UTF32_CHAR_IS_BIDI(c) ( (IS_CYPRIOT_CHAR(c)) )
Comment 26•20 years ago
|
||
> I saw the warning, but why does it mean that the patch is not working as
> expected?
I was basing my statement on what's in the code. One would normally get the
impression that CHAR_IS_BIDI can detect Cypriot glyphs from reading the code.
One would have to know that this doesn't work for mozilla's PRUnichar. I think
that's a sure way to create subtle bugs as other people work on the code. While
gcc handles this at compile time other compilers might leave the tests in which
wouldn't help performance.
I like the idea of useing UCS2 in its name. That makes it more obvious what's
happening. I wonder about having both UCS2 and UTF32/UCS4 running around in
mozilla. Someone will eventually manage to assign one to the other and back
again which obviously doesn't work. Personally I would have gone with 32-bit
quantities form the beginning but I realize why that wasn't done.
Comment 27•20 years ago
|
||
(In reply to comment #9) > ... There are a whole bunch of RTL scripts > roadmapped for addition to Unicode, and I don't like the idea of adding another > condition to this #define for each one. Unicode has clearly roadmapped 0590-08FF and 00010800-00010FFF for RTL scripts, see http://www.unicode.org/roadmaps/ - although these blocks contain some combining marks which are formally bidi neutral and in principle usable with LTR scripts. Although this roadmap is not set in stone, it is reasonable at this stage to add just two conditions to cover these whole blocks. There are some other RTL characters, apparently FB1D-FDFF and FE70-FEFF. But the safest thing is to check with the Unicode bidi tables.
Assignee | ||
Comment 28•20 years ago
|
||
I think Peter Kirk's suggestion is excellent. It will make CHAR_IS_BIDI a coarser-grained and faster test, more appropriate for the places it is used on every character in a document, e.g. nsTextFragment::SetBidiFlag, leaving the more precise and inevitably slower test for the detailed processing of Bidi text in nsBidi::GetDirProps.
Assignee | ||
Comment 29•20 years ago
|
||
Assignee | ||
Updated•20 years ago
|
Attachment #152143 -
Flags: superreview?(roc)
Attachment #152143 -
Flags: review?(roc)
Attachment #152143 -
Flags: superreview?(roc)
Attachment #152143 -
Flags: superreview+
Attachment #152143 -
Flags: review?(roc)
Attachment #152143 -
Flags: review+
Assignee | ||
Comment 30•20 years ago
|
||
Checked in.
Status: REOPENED → RESOLVED
Closed: 20 years ago → 20 years ago
Resolution: --- → FIXED
Comment 31•19 years ago
|
||
*** Bug 291511 has been marked as a duplicate of this bug. ***
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in
before you can comment on or make changes to this bug.
Description
•